Re: [DISCUSS] client/server jar naming, post hbase-compat changes
> > Removing the useless unshaded client and server JAR maven artifacts would > free up those names, and we could create both symlinks in the assembly > that you suggest. +1 for remove the useless unshaded client and server JAR. > Guanghao Zhang 于2020年3月26日周四 下午12:39写道: > This would make downstream applications/users a little more simple -- >> not having to worry about the HBase version in use (since their concerns >> are what version of Phoenix is being used, instead). We could even >> introduce non-Phoenix-versioned symlinks for these jars (e.g. >> phoenix-client.jar and phoenix-server.jar). > > > I thought user still need to care about which hbase version in use? > phoenix-server-xxx-hbase-2.1.jar not worked with hbase 2.2.x cluster now? > >> >> > > Istvan Toth 于2020年3月26日周四 上午5:09写道: > >> Hi! >> >> According to comments in the POMs, the phoenix-VERSION-client/server.jar >> symlinks are deprecated. (The symlinks were already there BTW, I just >> updated their targets) >> I kind of agree with the deprecation, as permuting the components of the >> jar name to distinguish the shaded and non-shaded versions feels >> unintitive >> and error-prone. >> >> The phoenix-xx-VERSION.jars were meant to be the unshaded JARs. However, >> that doesn't make sense for the client and server artifacts, as those are >> just shaded views of core. >> >> Removing the useless unshaded client and server JAR maven artifacts would >> free up those names, and we could create both symlinks in the assembly >> that >> you suggest. >> >> This would also mean that maven wouldn't return a useless artifact for >> phoenix-client and phoenix-server without classifiers, which would also be >> one less unpleasant surprise to users. >> >> So the user could use the canonical maven artifact filename, or one of the >> two (or three if we keep the deprecate old name) symlinks from the >> assembly. >> If she wanted to use an artifact from artifact, she'd have to specify the >> hbase version as classifier to get the correct client. (this doesn't >> change) >> >> The same naming solution (or whatever we agree on) should be extended to >> PQS and connectors as well. >> >> On Wed, Mar 25, 2020 at 8:01 PM Josh Elser wrote: >> >> > Background: IstvanT has done a lot of really great work to clean up the >> > HBase 2.x compatibility issues for us. This lets us move away from the >> > HBase-version-tagged releases of Phoenix (e.g. HBase-1.3, HBase-1.4, >> > etc), and keep a single branch which can build all of these. >> > >> > Building master locally, I noticed the following in my tarball, >> > specifically the jars >> > >> > >> >phoenix-5.1.0-SNAPSHOT-hbase-2.2-client.jar -> >> > phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar >> >phoenix-5.1.0-SNAPSHOT-hbase-2.2-server.jar >> >phoenix-5.1.0-SNAPSHOT-server.jar >> >phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar >> > >> > >> > I think there are two things happening here. One is that the >> > phoenix-5.1.0-SNAPSHOT-server.jar is "empty" -- it's not the shaded >> > server jar, but the hbase-2.2-server.jar is the correct jar. I think >> > this is just a bug (you agree, Istvan?) >> > >> > The other thing I notice is that it feels like Istvan was try to >> > simplify some things via symlinks. My feeling was that we could take >> > this a step further. What if, instead of just having "hbase-x.y" named >> > jars, we give symlinked jars as well. Creating something like... >> > >> > >> >phoenix-5.1.0-SNAPSHOT-client.jar -> >> > phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar >> >phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar >> >phoenix-5.1.0-SNAPSHOT-server.jar -> >> > phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar >> >phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar >> > >> > >> > This would make downstream applications/users a little more simple -- >> > not having to worry about the HBase version in use (since their concerns >> > are what version of Phoenix is being used, instead). We could even >> > introduce non-Phoenix-versioned symlinks for these jars (e.g. >> > phoenix-client.jar and phoenix-server.jar). I think this also moves us a >> > little closer to what we used to have. >> > >> > Sounds like a good idea to others? >> > >> >
Re: [DISCUSS] client/server jar naming, post hbase-compat changes
> > This would make downstream applications/users a little more simple -- > not having to worry about the HBase version in use (since their concerns > are what version of Phoenix is being used, instead). We could even > introduce non-Phoenix-versioned symlinks for these jars (e.g. > phoenix-client.jar and phoenix-server.jar). I thought user still need to care about which hbase version in use? phoenix-server-xxx-hbase-2.1.jar not worked with hbase 2.2.x cluster now? > > Istvan Toth 于2020年3月26日周四 上午5:09写道: > Hi! > > According to comments in the POMs, the phoenix-VERSION-client/server.jar > symlinks are deprecated. (The symlinks were already there BTW, I just > updated their targets) > I kind of agree with the deprecation, as permuting the components of the > jar name to distinguish the shaded and non-shaded versions feels unintitive > and error-prone. > > The phoenix-xx-VERSION.jars were meant to be the unshaded JARs. However, > that doesn't make sense for the client and server artifacts, as those are > just shaded views of core. > > Removing the useless unshaded client and server JAR maven artifacts would > free up those names, and we could create both symlinks in the assembly that > you suggest. > > This would also mean that maven wouldn't return a useless artifact for > phoenix-client and phoenix-server without classifiers, which would also be > one less unpleasant surprise to users. > > So the user could use the canonical maven artifact filename, or one of the > two (or three if we keep the deprecate old name) symlinks from the > assembly. > If she wanted to use an artifact from artifact, she'd have to specify the > hbase version as classifier to get the correct client. (this doesn't > change) > > The same naming solution (or whatever we agree on) should be extended to > PQS and connectors as well. > > On Wed, Mar 25, 2020 at 8:01 PM Josh Elser wrote: > > > Background: IstvanT has done a lot of really great work to clean up the > > HBase 2.x compatibility issues for us. This lets us move away from the > > HBase-version-tagged releases of Phoenix (e.g. HBase-1.3, HBase-1.4, > > etc), and keep a single branch which can build all of these. > > > > Building master locally, I noticed the following in my tarball, > > specifically the jars > > > > > >phoenix-5.1.0-SNAPSHOT-hbase-2.2-client.jar -> > > phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar > >phoenix-5.1.0-SNAPSHOT-hbase-2.2-server.jar > >phoenix-5.1.0-SNAPSHOT-server.jar > >phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar > > > > > > I think there are two things happening here. One is that the > > phoenix-5.1.0-SNAPSHOT-server.jar is "empty" -- it's not the shaded > > server jar, but the hbase-2.2-server.jar is the correct jar. I think > > this is just a bug (you agree, Istvan?) > > > > The other thing I notice is that it feels like Istvan was try to > > simplify some things via symlinks. My feeling was that we could take > > this a step further. What if, instead of just having "hbase-x.y" named > > jars, we give symlinked jars as well. Creating something like... > > > > > >phoenix-5.1.0-SNAPSHOT-client.jar -> > > phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar > >phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar > >phoenix-5.1.0-SNAPSHOT-server.jar -> > > phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar > >phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar > > > > > > This would make downstream applications/users a little more simple -- > > not having to worry about the HBase version in use (since their concerns > > are what version of Phoenix is being used, instead). We could even > > introduce non-Phoenix-versioned symlinks for these jars (e.g. > > phoenix-client.jar and phoenix-server.jar). I think this also moves us a > > little closer to what we used to have. > > > > Sounds like a good idea to others? > > >
[jira] [Updated] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates
[ https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kadir OZDEMIR updated PHOENIX-5791: --- Attachment: PHOENIX-5791.4.x-HBase-1.5.001.patch > Eliminate false invalid row detection due to concurrent updates > > > Key: PHOENIX-5791 > URL: https://issues.apache.org/jira/browse/PHOENIX-5791 > Project: Phoenix > Issue Type: Sub-task >Reporter: Kadir OZDEMIR >Assignee: Kadir OZDEMIR >Priority: Major > Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > IndexTool verification generates an expected list of index mutations from the > data table rows and uses this list to check if index table rows are > consistent with the data table. To do that it follows the following steps: > # The data table rows are scanned with a raw scan. This raw scan is > configured to read all versions of rows. > # For each scanned row, the cells that are scanned are grouped into two > sets: put and delete. The put set is the set of put cells and the delete set > is the set of delete cells. > # The put and delete sets for a given row are further grouped based on their > timestamps into put and delete mutations such that all the cells in a > mutation have the timestamp. > # The put and delete mutations are then sorted within a single list. > Mutations in this list are sorted in ascending order of their timestamp. > The above process assumes that for each data table update, the index table > will be updated with the correct index row key. However, this assumption does > not hold in the presence of concurrent updates. > From the consistent indexing design (PHOENIX-5156) perspective, two or more > pending updates from different batches on the same data row are concurrent if > and only if for all of these updates the data table row state is read from > HBase under the row lock and for none of them the row lock has been acquired > the second time for updating the data table. In other words, all of them are > in the first update phase concurrently. For concurrent updates, the first two > update phases are done but the last update phase is skipped. This means the > data table row will be updated by these updates but the corresponding index > table rows will be left with the unverified status. Then, the read repair > process will repair these unverified index rows during scans. > Since expected index mutations are derived from the data table row after > these concurrent mutations are applied, the expected list would not match > with the actual list of index mutations. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates
[ https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kadir OZDEMIR updated PHOENIX-5791: --- Attachment: (was: PHOENIX-5791.4.x-HBase-1.5.001.patch) > Eliminate false invalid row detection due to concurrent updates > > > Key: PHOENIX-5791 > URL: https://issues.apache.org/jira/browse/PHOENIX-5791 > Project: Phoenix > Issue Type: Sub-task >Reporter: Kadir OZDEMIR >Assignee: Kadir OZDEMIR >Priority: Major > Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > IndexTool verification generates an expected list of index mutations from the > data table rows and uses this list to check if index table rows are > consistent with the data table. To do that it follows the following steps: > # The data table rows are scanned with a raw scan. This raw scan is > configured to read all versions of rows. > # For each scanned row, the cells that are scanned are grouped into two > sets: put and delete. The put set is the set of put cells and the delete set > is the set of delete cells. > # The put and delete sets for a given row are further grouped based on their > timestamps into put and delete mutations such that all the cells in a > mutation have the timestamp. > # The put and delete mutations are then sorted within a single list. > Mutations in this list are sorted in ascending order of their timestamp. > The above process assumes that for each data table update, the index table > will be updated with the correct index row key. However, this assumption does > not hold in the presence of concurrent updates. > From the consistent indexing design (PHOENIX-5156) perspective, two or more > pending updates from different batches on the same data row are concurrent if > and only if for all of these updates the data table row state is read from > HBase under the row lock and for none of them the row lock has been acquired > the second time for updating the data table. In other words, all of them are > in the first update phase concurrently. For concurrent updates, the first two > update phases are done but the last update phase is skipped. This means the > data table row will be updated by these updates but the corresponding index > table rows will be left with the unverified status. Then, the read repair > process will repair these unverified index rows during scans. > Since expected index mutations are derived from the data table row after > these concurrent mutations are applied, the expected list would not match > with the actual list of index mutations. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Add components for sub-projects to maven
+1 Ankit Singhal 于2020年3月26日周四 上午10:13写道: > +1 Agreed, I think adding these components would be really helpful. > > How about adding some more with one level deeper for connectors like > >- flume >- spark >- hive >- kafka >- pig >- presto > > so that people who are more versed/interested in certain components, > can keep a watch on a certain section of JIRAs where they want to > actively contribute with the reviews and ideas. > > Regards, > Ankit Singhal > > > > On Wed, Mar 25, 2020 at 2:13 PM Istvan Toth wrote: > > > Hi! > > > > Phoenix has been split into sub projects, and adopted some projects, so I > > think that it is time to reflect this in Jira as well. > > > > I propose adding the following components to the project (one per repo) > > > >- core > >- queryserver > >- connectors > >- tephra > >- omid > > > > What do you think ? > > > > This is tracked in https://issues.apache.org/jira/browse/PHOENIX-5781 , > I > > just wanted to get some opinions on this, hence this thread. > > > > regards > > > > Istvan > > >
Re: [DISCUSS] Add components for sub-projects to maven
+1 Agreed, I think adding these components would be really helpful. How about adding some more with one level deeper for connectors like - flume - spark - hive - kafka - pig - presto so that people who are more versed/interested in certain components, can keep a watch on a certain section of JIRAs where they want to actively contribute with the reviews and ideas. Regards, Ankit Singhal On Wed, Mar 25, 2020 at 2:13 PM Istvan Toth wrote: > Hi! > > Phoenix has been split into sub projects, and adopted some projects, so I > think that it is time to reflect this in Jira as well. > > I propose adding the following components to the project (one per repo) > >- core >- queryserver >- connectors >- tephra >- omid > > What do you think ? > > This is tracked in https://issues.apache.org/jira/browse/PHOENIX-5781 , I > just wanted to get some opinions on this, hence this thread. > > regards > > Istvan >
[jira] [Updated] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates
[ https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kadir OZDEMIR updated PHOENIX-5791: --- Attachment: (was: PHOENIX-5791.4.x-HBase-1.5.001.patch) > Eliminate false invalid row detection due to concurrent updates > > > Key: PHOENIX-5791 > URL: https://issues.apache.org/jira/browse/PHOENIX-5791 > Project: Phoenix > Issue Type: Sub-task >Reporter: Kadir OZDEMIR >Assignee: Kadir OZDEMIR >Priority: Major > Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > IndexTool verification generates an expected list of index mutations from the > data table rows and uses this list to check if index table rows are > consistent with the data table. To do that it follows the following steps: > # The data table rows are scanned with a raw scan. This raw scan is > configured to read all versions of rows. > # For each scanned row, the cells that are scanned are grouped into two > sets: put and delete. The put set is the set of put cells and the delete set > is the set of delete cells. > # The put and delete sets for a given row are further grouped based on their > timestamps into put and delete mutations such that all the cells in a > mutation have the timestamp. > # The put and delete mutations are then sorted within a single list. > Mutations in this list are sorted in ascending order of their timestamp. > The above process assumes that for each data table update, the index table > will be updated with the correct index row key. However, this assumption does > not hold in the presence of concurrent updates. > From the consistent indexing design (PHOENIX-5156) perspective, two or more > pending updates from different batches on the same data row are concurrent if > and only if for all of these updates the data table row state is read from > HBase under the row lock and for none of them the row lock has been acquired > the second time for updating the data table. In other words, all of them are > in the first update phase concurrently. For concurrent updates, the first two > update phases are done but the last update phase is skipped. This means the > data table row will be updated by these updates but the corresponding index > table rows will be left with the unverified status. Then, the read repair > process will repair these unverified index rows during scans. > Since expected index mutations are derived from the data table row after > these concurrent mutations are applied, the expected list would not match > with the actual list of index mutations. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates
[ https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kadir OZDEMIR updated PHOENIX-5791: --- Attachment: PHOENIX-5791.4.x-HBase-1.5.001.patch > Eliminate false invalid row detection due to concurrent updates > > > Key: PHOENIX-5791 > URL: https://issues.apache.org/jira/browse/PHOENIX-5791 > Project: Phoenix > Issue Type: Sub-task >Reporter: Kadir OZDEMIR >Assignee: Kadir OZDEMIR >Priority: Major > Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > IndexTool verification generates an expected list of index mutations from the > data table rows and uses this list to check if index table rows are > consistent with the data table. To do that it follows the following steps: > # The data table rows are scanned with a raw scan. This raw scan is > configured to read all versions of rows. > # For each scanned row, the cells that are scanned are grouped into two > sets: put and delete. The put set is the set of put cells and the delete set > is the set of delete cells. > # The put and delete sets for a given row are further grouped based on their > timestamps into put and delete mutations such that all the cells in a > mutation have the timestamp. > # The put and delete mutations are then sorted within a single list. > Mutations in this list are sorted in ascending order of their timestamp. > The above process assumes that for each data table update, the index table > will be updated with the correct index row key. However, this assumption does > not hold in the presence of concurrent updates. > From the consistent indexing design (PHOENIX-5156) perspective, two or more > pending updates from different batches on the same data row are concurrent if > and only if for all of these updates the data table row state is read from > HBase under the row lock and for none of them the row lock has been acquired > the second time for updating the data table. In other words, all of them are > in the first update phase concurrently. For concurrent updates, the first two > update phases are done but the last update phase is skipped. This means the > data table row will be updated by these updates but the corresponding index > table rows will be left with the unverified status. Then, the read repair > process will repair these unverified index rows during scans. > Since expected index mutations are derived from the data table row after > these concurrent mutations are applied, the expected list would not match > with the actual list of index mutations. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates
[ https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kadir OZDEMIR updated PHOENIX-5791: --- Attachment: PHOENIX-5791.4.x-HBase-1.5.001.patch > Eliminate false invalid row detection due to concurrent updates > > > Key: PHOENIX-5791 > URL: https://issues.apache.org/jira/browse/PHOENIX-5791 > Project: Phoenix > Issue Type: Sub-task >Reporter: Kadir OZDEMIR >Assignee: Kadir OZDEMIR >Priority: Major > Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > IndexTool verification generates an expected list of index mutations from the > data table rows and uses this list to check if index table rows are > consistent with the data table. To do that it follows the following steps: > # The data table rows are scanned with a raw scan. This raw scan is > configured to read all versions of rows. > # For each scanned row, the cells that are scanned are grouped into two > sets: put and delete. The put set is the set of put cells and the delete set > is the set of delete cells. > # The put and delete sets for a given row are further grouped based on their > timestamps into put and delete mutations such that all the cells in a > mutation have the timestamp. > # The put and delete mutations are then sorted within a single list. > Mutations in this list are sorted in ascending order of their timestamp. > The above process assumes that for each data table update, the index table > will be updated with the correct index row key. However, this assumption does > not hold in the presence of concurrent updates. > From the consistent indexing design (PHOENIX-5156) perspective, two or more > pending updates from different batches on the same data row are concurrent if > and only if for all of these updates the data table row state is read from > HBase under the row lock and for none of them the row lock has been acquired > the second time for updating the data table. In other words, all of them are > in the first update phase concurrently. For concurrent updates, the first two > update phases are done but the last update phase is skipped. This means the > data table row will be updated by these updates but the corresponding index > table rows will be left with the unverified status. Then, the read repair > process will repair these unverified index rows during scans. > Since expected index mutations are derived from the data table row after > these concurrent mutations are applied, the expected list would not match > with the actual list of index mutations. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates
[ https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kadir OZDEMIR updated PHOENIX-5791: --- Attachment: (was: PHOENIX-5791.4.x-HBase-1.5.001.patch) > Eliminate false invalid row detection due to concurrent updates > > > Key: PHOENIX-5791 > URL: https://issues.apache.org/jira/browse/PHOENIX-5791 > Project: Phoenix > Issue Type: Sub-task >Reporter: Kadir OZDEMIR >Assignee: Kadir OZDEMIR >Priority: Major > Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > IndexTool verification generates an expected list of index mutations from the > data table rows and uses this list to check if index table rows are > consistent with the data table. To do that it follows the following steps: > # The data table rows are scanned with a raw scan. This raw scan is > configured to read all versions of rows. > # For each scanned row, the cells that are scanned are grouped into two > sets: put and delete. The put set is the set of put cells and the delete set > is the set of delete cells. > # The put and delete sets for a given row are further grouped based on their > timestamps into put and delete mutations such that all the cells in a > mutation have the timestamp. > # The put and delete mutations are then sorted within a single list. > Mutations in this list are sorted in ascending order of their timestamp. > The above process assumes that for each data table update, the index table > will be updated with the correct index row key. However, this assumption does > not hold in the presence of concurrent updates. > From the consistent indexing design (PHOENIX-5156) perspective, two or more > pending updates from different batches on the same data row are concurrent if > and only if for all of these updates the data table row state is read from > HBase under the row lock and for none of them the row lock has been acquired > the second time for updating the data table. In other words, all of them are > in the first update phase concurrently. For concurrent updates, the first two > update phases are done but the last update phase is skipped. This means the > data table row will be updated by these updates but the corresponding index > table rows will be left with the unverified status. Then, the read repair > process will repair these unverified index rows during scans. > Since expected index mutations are derived from the data table row after > these concurrent mutations are applied, the expected list would not match > with the actual list of index mutations. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Add components for sub-projects to maven
Hi! Phoenix has been split into sub projects, and adopted some projects, so I think that it is time to reflect this in Jira as well. I propose adding the following components to the project (one per repo) - core - queryserver - connectors - tephra - omid What do you think ? This is tracked in https://issues.apache.org/jira/browse/PHOENIX-5781 , I just wanted to get some opinions on this, hence this thread. regards Istvan
Re: [DISCUSS] client/server jar naming, post hbase-compat changes
Hi! According to comments in the POMs, the phoenix-VERSION-client/server.jar symlinks are deprecated. (The symlinks were already there BTW, I just updated their targets) I kind of agree with the deprecation, as permuting the components of the jar name to distinguish the shaded and non-shaded versions feels unintitive and error-prone. The phoenix-xx-VERSION.jars were meant to be the unshaded JARs. However, that doesn't make sense for the client and server artifacts, as those are just shaded views of core. Removing the useless unshaded client and server JAR maven artifacts would free up those names, and we could create both symlinks in the assembly that you suggest. This would also mean that maven wouldn't return a useless artifact for phoenix-client and phoenix-server without classifiers, which would also be one less unpleasant surprise to users. So the user could use the canonical maven artifact filename, or one of the two (or three if we keep the deprecate old name) symlinks from the assembly. If she wanted to use an artifact from artifact, she'd have to specify the hbase version as classifier to get the correct client. (this doesn't change) The same naming solution (or whatever we agree on) should be extended to PQS and connectors as well. On Wed, Mar 25, 2020 at 8:01 PM Josh Elser wrote: > Background: IstvanT has done a lot of really great work to clean up the > HBase 2.x compatibility issues for us. This lets us move away from the > HBase-version-tagged releases of Phoenix (e.g. HBase-1.3, HBase-1.4, > etc), and keep a single branch which can build all of these. > > Building master locally, I noticed the following in my tarball, > specifically the jars > > >phoenix-5.1.0-SNAPSHOT-hbase-2.2-client.jar -> > phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar >phoenix-5.1.0-SNAPSHOT-hbase-2.2-server.jar >phoenix-5.1.0-SNAPSHOT-server.jar >phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar > > > I think there are two things happening here. One is that the > phoenix-5.1.0-SNAPSHOT-server.jar is "empty" -- it's not the shaded > server jar, but the hbase-2.2-server.jar is the correct jar. I think > this is just a bug (you agree, Istvan?) > > The other thing I notice is that it feels like Istvan was try to > simplify some things via symlinks. My feeling was that we could take > this a step further. What if, instead of just having "hbase-x.y" named > jars, we give symlinked jars as well. Creating something like... > > >phoenix-5.1.0-SNAPSHOT-client.jar -> > phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar >phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar >phoenix-5.1.0-SNAPSHOT-server.jar -> > phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar >phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar > > > This would make downstream applications/users a little more simple -- > not having to worry about the HBase version in use (since their concerns > are what version of Phoenix is being used, instead). We could even > introduce non-Phoenix-versioned symlinks for these jars (e.g. > phoenix-client.jar and phoenix-server.jar). I think this also moves us a > little closer to what we used to have. > > Sounds like a good idea to others? >
[jira] [Updated] (PHOENIX-5780) Add mvn dependency:analyze to build process
[ https://issues.apache.org/jira/browse/PHOENIX-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Toth updated PHOENIX-5780: - Attachment: PHOENIX-5780.master.v2.patch > Add mvn dependency:analyze to build process > --- > > Key: PHOENIX-5780 > URL: https://issues.apache.org/jira/browse/PHOENIX-5780 > Project: Phoenix > Issue Type: Task >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Attachments: PHOENIX-5780.master.v1.patch, > PHOENIX-5780.master.v2.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > mvn dependency:analyze has shown that the dependency definitions in Phoenix > are in a bad shape. > Include it in the build process, so that we can keep the dependencies true > and up to date. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] client/server jar naming, post hbase-compat changes
Background: IstvanT has done a lot of really great work to clean up the HBase 2.x compatibility issues for us. This lets us move away from the HBase-version-tagged releases of Phoenix (e.g. HBase-1.3, HBase-1.4, etc), and keep a single branch which can build all of these. Building master locally, I noticed the following in my tarball, specifically the jars phoenix-5.1.0-SNAPSHOT-hbase-2.2-client.jar -> phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar phoenix-5.1.0-SNAPSHOT-hbase-2.2-server.jar phoenix-5.1.0-SNAPSHOT-server.jar phoenix-client-5.1.0-SNAPSHOT-hbase-2.2.jar I think there are two things happening here. One is that the phoenix-5.1.0-SNAPSHOT-server.jar is "empty" -- it's not the shaded server jar, but the hbase-2.2-server.jar is the correct jar. I think this is just a bug (you agree, Istvan?) The other thing I notice is that it feels like Istvan was try to simplify some things via symlinks. My feeling was that we could take this a step further. What if, instead of just having "hbase-x.y" named jars, we give symlinked jars as well. Creating something like... phoenix-5.1.0-SNAPSHOT-client.jar -> phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar phoenix-client-5.1.0-SNAPSHOT-hbase-2.2-client.jar phoenix-5.1.0-SNAPSHOT-server.jar -> phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar phoenix-server-5.1.0-SNAPSHOT-hbase-2.2-server.jar This would make downstream applications/users a little more simple -- not having to worry about the HBase version in use (since their concerns are what version of Phoenix is being used, instead). We could even introduce non-Phoenix-versioned symlinks for these jars (e.g. phoenix-client.jar and phoenix-server.jar). I think this also moves us a little closer to what we used to have. Sounds like a good idea to others?
[jira] [Updated] (PHOENIX-5698) Phoenix Query with RVC IN list expression generates wrong scan with non-pk ordered pks
[ https://issues.apache.org/jira/browse/PHOENIX-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyi Yan updated PHOENIX-5698: --- Attachment: PHOENIX-5698-4.x.v6.patch > Phoenix Query with RVC IN list expression generates wrong scan with non-pk > ordered pks > -- > > Key: PHOENIX-5698 > URL: https://issues.apache.org/jira/browse/PHOENIX-5698 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.15.0, 4.14.3 >Reporter: Daniel Wong >Assignee: Xinyi Yan >Priority: Major > Labels: DESC > Attachments: PHOENIX-5698-4.14-HBase-1.3.patch, > PHOENIX-5698-4.x-HBase-1.3.patch, PHOENIX-5698-4.x.patch, > PHOENIX-5698-4.x.v3.patch, PHOENIX-5698-4.x.v4.patch, > PHOENIX-5698-4.x.v5.patch, PHOENIX-5698-4.x.v6.patch, > PHOENIX-5698-master.v2.patch, PHOENIX-5698.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > In the code below ideally we'd expect a SINGLE ROW DELETE plan client side. > However, this generates an incorrect scan with range ['tenant1 > 0CY005xx01Sv6o'). If the order of the RVCs is changed to row key order > Phoenix correctly generates a SINGLE ROW SCAN. As we provide the full PK > this we expect a either tightly bounded range scan or a client side delete. > Instead we get a range scan on composite leading edge > TENANT_ID,KEY_PREFIX,ID1. > > {code:java} > @Test > public void testInListExpressionWithDescAgain() throws Exception { > String fullTableName = generateUniqueName(); > String fullViewName = generateUniqueName(); > String tenantView = generateUniqueName(); > // create base table and global view using global connection > try (Connection conn = DriverManager.getConnection(getUrl())) > { conn.setAutoCommit(true); Statement stmt = conn.createStatement(); > stmt.execute("CREATE TABLE " + fullTableName + "(\n" + " TENANT_ID CHAR(15) > NOT NULL,\n" + " KEY_PREFIX CHAR(3) NOT NULL,\n" + " CONSTRAINT PK PRIMARY > KEY (\n" + " TENANT_ID," + " KEY_PREFIX" + ")) MULTI_TENANT=TRUE"); > stmt.execute("CREATE VIEW " + fullViewName + "(\n" + " ID1 VARCHAR NOT > NULL,\n" + " ID2 VARCHAR NOT NULL,\n" + " EVENT_DATE DATE NOT NULL,\n" + " > CONSTRAINT PKVIEW PRIMARY KEY\n" + " (\n" + " ID1, ID2 DESC, EVENT_DATE > DESC\n" + ")) AS SELECT * FROM " + fullTableName + " WHERE KEY_PREFIX = > '0CY'"); } > // create and use a tenant specific view to write data > try (Connection viewConn = DriverManager.getConnection(TENANT_SPECIFIC_URL1) > ) { > viewConn.setAutoCommit(true); //need autocommit for serverside deletion > Statement stmt = viewConn.createStatement(); > stmt.execute("CREATE VIEW IF NOT EXISTS " + tenantView + " AS SELECT * FROM > " + fullViewName ); > viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, > EVENT_DATE) VALUES ('005xx01Sv6o', '300', 153245823)"); > viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, > EVENT_DATE) VALUES ('005xx01Sv6o', '400', 153245824)"); > viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, > EVENT_DATE) VALUES ('005xx01Sv6o', '500', 153245825)"); > viewConn.commit(); > ResultSet rs = stmt.executeQuery("SELECT ID1, ID2, EVENT_DATE FROM " + > tenantView ); > printResultSet(rs); > System.out.println("Delete Start"); > rs = stmt.executeQuery("EXPLAIN DELETE FROM " + tenantView + " WHERE (ID1, > EVENT_DATE, ID2) IN (('005xx01Sv6o', 153245824, > '400'),('005xx01Sv6o', 153245823, '300'))"); > printResultSet(rs); // THIS SHOULD BE A SINGLE ROW SCAN > stmt.execute("DELETE FROM " + tenantView + " WHERE (ID1, EVENT_DATE, ID2) IN > (('005xx01Sv6o', 153245824, '400'),('005xx01Sv6o', > 153245823, '300'))"); > viewConn.commit(); > System.out.println("Delete End"); > rs = stmt.executeQuery("SELECT ID1, ID2, EVENT_DATE FROM " + tenantView ); > printResultSet(rs); > } > } > private void printResultSet(ResultSet rs) throws SQLException { > StringBuilder builder = new StringBuilder(); > while(rs.next()) { > for(int i = 0; i < rs.getMetaData().getColumnCount(); i++) { > Object col = rs.getObject(i + 1); > if(col == null) > { builder.append("null"); } > else { > if(col instanceof Date) > { DateFormat df = new SimpleDateFormat("-MM-dd HH:mm:ss"); > builder.append(df.format(col)); } > else { > builder.append(col.toString()); > } > } > builder.append(","); > } > builder.append("\n"); > } > System.out.println(builder.toString()); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5698) Phoenix Query with RVC IN list expression generates wrong scan with non-pk ordered pks
[ https://issues.apache.org/jira/browse/PHOENIX-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyi Yan updated PHOENIX-5698: --- Attachment: (was: PHOENIX-5698-4.x.v6.patch) > Phoenix Query with RVC IN list expression generates wrong scan with non-pk > ordered pks > -- > > Key: PHOENIX-5698 > URL: https://issues.apache.org/jira/browse/PHOENIX-5698 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.15.0, 4.14.3 >Reporter: Daniel Wong >Assignee: Xinyi Yan >Priority: Major > Labels: DESC > Attachments: PHOENIX-5698-4.14-HBase-1.3.patch, > PHOENIX-5698-4.x-HBase-1.3.patch, PHOENIX-5698-4.x.patch, > PHOENIX-5698-4.x.v3.patch, PHOENIX-5698-4.x.v4.patch, > PHOENIX-5698-4.x.v5.patch, PHOENIX-5698-master.v2.patch, PHOENIX-5698.patch > > Time Spent: 7h 50m > Remaining Estimate: 0h > > In the code below ideally we'd expect a SINGLE ROW DELETE plan client side. > However, this generates an incorrect scan with range ['tenant1 > 0CY005xx01Sv6o'). If the order of the RVCs is changed to row key order > Phoenix correctly generates a SINGLE ROW SCAN. As we provide the full PK > this we expect a either tightly bounded range scan or a client side delete. > Instead we get a range scan on composite leading edge > TENANT_ID,KEY_PREFIX,ID1. > > {code:java} > @Test > public void testInListExpressionWithDescAgain() throws Exception { > String fullTableName = generateUniqueName(); > String fullViewName = generateUniqueName(); > String tenantView = generateUniqueName(); > // create base table and global view using global connection > try (Connection conn = DriverManager.getConnection(getUrl())) > { conn.setAutoCommit(true); Statement stmt = conn.createStatement(); > stmt.execute("CREATE TABLE " + fullTableName + "(\n" + " TENANT_ID CHAR(15) > NOT NULL,\n" + " KEY_PREFIX CHAR(3) NOT NULL,\n" + " CONSTRAINT PK PRIMARY > KEY (\n" + " TENANT_ID," + " KEY_PREFIX" + ")) MULTI_TENANT=TRUE"); > stmt.execute("CREATE VIEW " + fullViewName + "(\n" + " ID1 VARCHAR NOT > NULL,\n" + " ID2 VARCHAR NOT NULL,\n" + " EVENT_DATE DATE NOT NULL,\n" + " > CONSTRAINT PKVIEW PRIMARY KEY\n" + " (\n" + " ID1, ID2 DESC, EVENT_DATE > DESC\n" + ")) AS SELECT * FROM " + fullTableName + " WHERE KEY_PREFIX = > '0CY'"); } > // create and use a tenant specific view to write data > try (Connection viewConn = DriverManager.getConnection(TENANT_SPECIFIC_URL1) > ) { > viewConn.setAutoCommit(true); //need autocommit for serverside deletion > Statement stmt = viewConn.createStatement(); > stmt.execute("CREATE VIEW IF NOT EXISTS " + tenantView + " AS SELECT * FROM > " + fullViewName ); > viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, > EVENT_DATE) VALUES ('005xx01Sv6o', '300', 153245823)"); > viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, > EVENT_DATE) VALUES ('005xx01Sv6o', '400', 153245824)"); > viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, > EVENT_DATE) VALUES ('005xx01Sv6o', '500', 153245825)"); > viewConn.commit(); > ResultSet rs = stmt.executeQuery("SELECT ID1, ID2, EVENT_DATE FROM " + > tenantView ); > printResultSet(rs); > System.out.println("Delete Start"); > rs = stmt.executeQuery("EXPLAIN DELETE FROM " + tenantView + " WHERE (ID1, > EVENT_DATE, ID2) IN (('005xx01Sv6o', 153245824, > '400'),('005xx01Sv6o', 153245823, '300'))"); > printResultSet(rs); // THIS SHOULD BE A SINGLE ROW SCAN > stmt.execute("DELETE FROM " + tenantView + " WHERE (ID1, EVENT_DATE, ID2) IN > (('005xx01Sv6o', 153245824, '400'),('005xx01Sv6o', > 153245823, '300'))"); > viewConn.commit(); > System.out.println("Delete End"); > rs = stmt.executeQuery("SELECT ID1, ID2, EVENT_DATE FROM " + tenantView ); > printResultSet(rs); > } > } > private void printResultSet(ResultSet rs) throws SQLException { > StringBuilder builder = new StringBuilder(); > while(rs.next()) { > for(int i = 0; i < rs.getMetaData().getColumnCount(); i++) { > Object col = rs.getObject(i + 1); > if(col == null) > { builder.append("null"); } > else { > if(col instanceof Date) > { DateFormat df = new SimpleDateFormat("-MM-dd HH:mm:ss"); > builder.append(df.format(col)); } > else { > builder.append(col.toString()); > } > } > builder.append(","); > } > builder.append("\n"); > } > System.out.println(builder.toString()); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5796) Possible query optimization when projecting uncovered columns and querying on indexed columns
[ https://issues.apache.org/jira/browse/PHOENIX-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinmay Kulkarni updated PHOENIX-5796: -- Labels: query-optimization (was: ) > Possible query optimization when projecting uncovered columns and querying on > indexed columns > - > > Key: PHOENIX-5796 > URL: https://issues.apache.org/jira/browse/PHOENIX-5796 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 5.0.0, 4.15.0 >Reporter: Chinmay Kulkarni >Priority: Major > Labels: query-optimization > Attachments: Screen Shot 2020-03-23 at 3.25.38 PM.png, Screen Shot > 2020-03-23 at 3.32.24 PM.png, Screen Shot 2020-03-24 at 11.51.12 AM.png > > > Start HBase-1.3 server with Phoenix-4.15.0-HBase-1.3 server jar. Connect to > it using sqlline.py which has Phoenix-4.15.0-HBase-1.3 Phoenix client. > Create a base table like: > {code:sql} > create table t (a integer primary key, b varchar(10), c integer); > {code} > Create an uncovered index on top of it like: > {code:sql} > create index uncov_index_t on t(b); > {code} > Now if you issue the query: > {code:sql} > explain select c from t where b='abc'; > {code} > You'd see the following explain plan: > !Screen Shot 2020-03-23 at 3.25.38 PM.png|height=150,width=700! > *Which is a full table scan on the base table 't'* since we cannot use the > global index as 'c' is not a covered column in the global index. > *However, projecting columns contained fully within the index pk is correctly > a range scan:* > {code:sql} > explain select a,b from t where b='abc'; > {code} > produces the following explain plan: > !Screen Shot 2020-03-23 at 3.32.24 PM.png|height=150,width=700! > In the first query, can there be an optimization to *query the index table, > get the start and stop keys of the base table and then issue a range > scan/(bunch of point lookups) on the base table* instead of doing a full > table scan on the base table like we currently do? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (PHOENIX-5795) Supporting selective queries for index rows updated concurrently
[ https://issues.apache.org/jira/browse/PHOENIX-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kadir OZDEMIR updated PHOENIX-5795: --- Attachment: PHOENIX-5795.4.x-HBase-1.5.002.patch > Supporting selective queries for index rows updated concurrently > > > Key: PHOENIX-5795 > URL: https://issues.apache.org/jira/browse/PHOENIX-5795 > Project: Phoenix > Issue Type: Sub-task >Reporter: Kadir OZDEMIR >Assignee: Kadir OZDEMIR >Priority: Critical > Attachments: PHOENIX-5795.4.x-HBase-1.5.001.patch, > PHOENIX-5795.4.x-HBase-1.5.002.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > From the consistent indexing design (PHOENIX-5156) perspective, two or more > pending updates from different batches on the same data row are concurrent if > and only if for all of these updates the data table row state is read from > HBase under the row lock and for none of them the row lock has been acquired > the second time for updating the data table. In other words, all of them are > in the first update phase concurrently. For concurrent updates, the first two > update phases are done but the last update phase is skipped. This means the > data table row will be updated by these updates but the corresponding index > table rows will be left with the unverified status. Then, the read repair > process will repair these unverified index rows during scans. > In addition to leaving index rows unverified, the concurrent updates may > generate index row with incorrect row keys. For example, consider that an > application issues the verify first two upserts on the same row concurrently > and the second update does not include one or more of the indexed columns. > When these updates arrive concurrently to IndexRegionObserver, the existing > row state would be null for both of these updates. This mean the index > updates will be generated solely from the pending updates. The partial upsert > with missing indexed columns will generate an index row by assuming missing > indexed columns have null value, and this assumption may not true as the > other concurrent upsert may have non-null values for indexed columns. After > issuing the concurrent update, if the application attempts to read back the > row using a selective query on the index table and this selective query maps > to an HBase scan that does not scan these unverified rows due to incorrect > row keys on these rows, the application will not get the row content back > correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)