[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 7:42 AM: - [~jamestaylor].thank your for your suggestion, my considerations as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". So I think my patch is Ok, just as the following code explained, it just needs to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. was (Author: comnetwork): [~jamestaylor].thank your for your suggestion, my considerations as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". So I think my patch is Ok, just as the following code explained, it just need to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402 ] chenglei commented on PHOENIX-3451: --- [~jamestaylor].thank your for your suggestion, my considerations as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". So I think my patch is Ok, just as the following code explained, it just need to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3482) Provide a work around for HBASE-17096
Samarth Jain created PHOENIX-3482: - Summary: Provide a work around for HBASE-17096 Key: PHOENIX-3482 URL: https://issues.apache.org/jira/browse/PHOENIX-3482 Project: Phoenix Issue Type: Bug Reporter: Samarth Jain HBASE-17096 causes failures in UpgradeIT#testAcquiringAndReleasingUpgradeMutex. Essentially releasing of the upgrade mutex by using the checkAndMutate api isn't working correctly. A simple though not ideal work around would be to not call releaseMutex() and let the lock expire by the virtue of TTL set on the cell. The side effect is that if a client encounters and exception while executing the upgrade code, then a new client won't be able to initiate the upgrade till the TTL expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666301#comment-15666301 ] Samarth Jain commented on PHOENIX-3481: --- bq. Actually, you're better off if you can track down and change it on the server-side as otherwise we'll need to check-in a new Phoenix jar to core (assuming this is potentially a real issue rather than a test only issue). [~jamestaylor], I looked at the changes in HBase code between the versions 0.98.17 and 0.98.23 and I see that HBASE-15245 added new checks that were not present before. Specifically this change in HRegionServer: {code} +ByteString value = regionSpecifier.getValue(); +RegionSpecifierType type = regionSpecifier.getType(); +switch (type) { +case REGION_NAME: + byte[] regionName = value.toByteArray(); + String encodedRegionName = HRegionInfo.encodeRegionName(regionName); + return getRegionByEncodedName(regionName, encodedRegionName); +case ENCODED_REGION_NAME: + return getRegionByEncodedName(value.toStringUtf8()); +default: + throw new DoNotRetryIOException( +"Unsupported region specifier type: " + type); +} {code} The call {code} getRegionByEncodedName(regionName, encodedRegionName); {code} throws a NotServingRegionException. This exception is thrown before the Phoenix coprocessor hook for doPostScannerOpen() could throw a StaleRegionBoundaryCacheException for the phoenix client to retry. FWIW, this change has also been made to HBase-1.3.0 as part of HBASE-15177. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch, PHOENIX-3481_v3_0.98.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Apache Phoenix committer - Kevin Liew
Thank you James and PMC. I'm excited to see how Apache Phoenix will evolve, and I am grateful for the opportunity to contribute as a committer. On 2016-11-10 11:07 (-0800), James Taylorwrote: > On behalf of the Apache Phoenix PMC, I'm pleased to announce that Kevin > Liew has accepted our invitation to become a committer on the Apache > Phoenix project. He's done an great job finding and fixing many important > Phoenix JIRAs as well as most recently implementing support for default > column value declarations [1] in our upcoming 4.9.0 release. > > Welcome aboard, Kevin. Looking forward to many more contributions! > > Regards, > James > > [1] https://issues.apache.org/jira/browse/PHOENIX-476 >
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666071#comment-15666071 ] James Taylor commented on PHOENIX-3451: --- I think we're on the same page. I'm just suggesting we track the original PK position in the GroupByCompiler as we have the info there already. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665917#comment-15665917 ] Samarth Jain commented on PHOENIX-3481: --- It turns out upgrading to HBase-0.98.23 may not be straightforward after all. I am running into a test failure in UpgradeIT#testAcquiringAndReleasingUpgradeMutex because of a bug in HBase checkAndMutate api. See HBASE-17096 for details. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch, PHOENIX-3481_v3_0.98.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:30 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the patch actually did like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy. My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method,the final Orderby's orderByExpressions are not created by the Info's pkPosition, Info's pkPosition only affects OrderPreservingTracker. isOrderPreserving method. In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the patch actually did like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665899#comment-15665899 ] James Taylor commented on PHOENIX-3451: --- Sorry about that - I based my feedback on your description in your very helpful analysis. Your patch is indeed on the right track, but it makes some assumptions. For example, it would only work if a GROUP BY is done directly on a RowKeyColumnExpression (i.e. GROUP BY pkCol1, pkCol2), but not for cases like GROUP BY pkCol1 + 1, TRUNC(pkCol2). I'd recommend adding information to the GroupBy object to capture the pk position of each Info. You only need to do that here in GroupByCompiler (by adding a new setter tot he GroupByBuilder): {code} if (isOrderPreserving || isUngroupedAggregate) { return new GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving).setOrderPreservingColumnCount(orderPreservingColumnCount).build(); } {code} At that point, you still have the tracker, so you can add a method like tracker.getPkPositions() which returns a list or array of int which would be the position of the primary key that gets passed into the new Builder setter method. Then you can use that information in the OrderByCompiler if an aggregation is being done. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:26 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the patch actually did like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:25 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:25 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch did not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:24 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch did not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch did not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch is only change the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:17 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch did not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch is only change the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei commented on PHOENIX-3451: --- [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665868#comment-15665868 ] James Taylor commented on PHOENIX-3451: --- [~comnetwork] - good, helpful analysis, but your conclusion isn't quite right. We want to use the pk position according to the GROUP BY because the rows returned from the server which are *aggregated* rows and the sort will be done on the client. The problem appears to be that our hasEqualityConstraints isn't taking this into account. It should be determining if there's an equality constraint for the GROUP BY expressions in those positions (instead of treating those as positions in the original schema). Probably the easiest fix would be to hold on to the OrderPreservingTracker.Info in a list for the GroupByCompiler. Then in the OrderByCompiler, we could look at the List from the GroupBy and essentially index it by position, but then translate it based on OrderPreservingTracker.Info.pkPosition. That would be the correct index to use when calling hasEqualityConstraints. If there's no list (i.e. it's not an aggregation), we'd just use the pkPosition to directly index the ScanRanges as we're doing now. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665767#comment-15665767 ] James Taylor commented on PHOENIX-3481: --- Actually, you're better off if you can track down and change it on the server-side as otherwise we'll need to check-in a new Phoenix jar to core (assuming this is potentially a real issue rather than a test only issue). > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch, PHOENIX-3481_v3_0.98.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3333) Support Spark 2.0
[ https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665762#comment-15665762 ] DEQUN commented on PHOENIX-: Also confused, can you supply more details , thanks ! :-) > Support Spark 2.0 > - > > Key: PHOENIX- > URL: https://issues.apache.org/jira/browse/PHOENIX- > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 > Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is > hdp 2.5 >Reporter: dalin qin > Attachments: PHOENIX--interim.patch > > > spark version is 2.0.0.2.5.0.0-1245 > As mentioned by Josh , I believe spark 2.0 changed their api so that failed > phoenix. Please come up with update version to adapt spark's change. > In [1]: df = sqlContext.read \ >...: .format("org.apache.phoenix.spark") \ >...: .option("table", "TABLE1") \ >...: .option("zkUrl", "namenode:2181:/hbase-unsecure") \ >...: .load() > --- > Py4JJavaError Traceback (most recent call last) > in () > > 1 df = sqlContext.read .format("org.apache.phoenix.spark") > .option("table", "TABLE1") .option("zkUrl", > "namenode:2181:/hbase-unsecure") .load() > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, > path, format, schema, **options) > 151 return > self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path))) > 152 else: > --> 153 return self._df(self._jreader.load()) > 154 > 155 @since(1.4) > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py > in __call__(self, *args) > 931 answer = self.gateway_client.send_command(command) > 932 return_value = get_return_value( > --> 933 answer, self.gateway_client, self.target_id, self.name) > 934 > 935 for temp_arg in temp_args: > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw) > 61 def deco(*a, **kw): > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > 65 s = e.java_exception.toString() > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 310 raise Py4JJavaError( > 311 "An error occurred while calling {0}{1}{2}.\n". > --> 312 format(target_id, ".", name), value) > 313 else: > 314 raise Py4JError( > Py4JJavaError: An error occurred while calling o43.load. > : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at > java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:472) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365) > at >
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665754#comment-15665754 ] Samarth Jain commented on PHOENIX-3481: --- bq. The intent is that NotServingRegionException is translated on the server side to StaleRegionBoundaryCacheException I see. I wasn't sure if I should make the change in the ServerUtil method itself since it will make the change across the board. But from your comment it looks like it should be. Will commit the v2 patch then (provided my local run is successful). > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch, PHOENIX-3481_v3_0.98.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3333) Support Spark 2.0
[ https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665749#comment-15665749 ] lichenglin commented on PHOENIX-: - In fact,There is no need to build spark. Just add jars of spark into project phoenix. > Support Spark 2.0 > - > > Key: PHOENIX- > URL: https://issues.apache.org/jira/browse/PHOENIX- > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 > Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is > hdp 2.5 >Reporter: dalin qin > Attachments: PHOENIX--interim.patch > > > spark version is 2.0.0.2.5.0.0-1245 > As mentioned by Josh , I believe spark 2.0 changed their api so that failed > phoenix. Please come up with update version to adapt spark's change. > In [1]: df = sqlContext.read \ >...: .format("org.apache.phoenix.spark") \ >...: .option("table", "TABLE1") \ >...: .option("zkUrl", "namenode:2181:/hbase-unsecure") \ >...: .load() > --- > Py4JJavaError Traceback (most recent call last) > in () > > 1 df = sqlContext.read .format("org.apache.phoenix.spark") > .option("table", "TABLE1") .option("zkUrl", > "namenode:2181:/hbase-unsecure") .load() > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, > path, format, schema, **options) > 151 return > self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path))) > 152 else: > --> 153 return self._df(self._jreader.load()) > 154 > 155 @since(1.4) > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py > in __call__(self, *args) > 931 answer = self.gateway_client.send_command(command) > 932 return_value = get_return_value( > --> 933 answer, self.gateway_client, self.target_id, self.name) > 934 > 935 for temp_arg in temp_args: > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw) > 61 def deco(*a, **kw): > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > 65 s = e.java_exception.toString() > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 310 raise Py4JJavaError( > 311 "An error occurred while calling {0}{1}{2}.\n". > --> 312 format(target_id, ".", name), value) > 313 else: > 314 raise Py4JError( > Py4JJavaError: An error occurred while calling o43.load. > : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at > java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:472) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365) >
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665743#comment-15665743 ] James Taylor commented on PHOENIX-3481: --- v2 patch was simpler. Any reason why you changed it? The intent is that NotServingRegionException is translated on the server side to StaleRegionBoundaryCacheException, but it seems that there's a new code path where this is not being done. [~rajeshbabu] is the real expert in this area of the code. My preference is v2 as it'd be good to be consistent across the board with this. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch, PHOENIX-3481_v3_0.98.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-3481: -- Attachment: PHOENIX-3481_v3_0.98.patch v3 patch that makes the change localized in TableResultIterator. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch, PHOENIX-3481_v3_0.98.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665671#comment-15665671 ] Samarth Jain commented on PHOENIX-3481: --- [~jamestaylor], need your keen eyes again on this v2 patch. I am running tests locally on the 0.98 branch. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-3481: -- Attachment: PHOENIX-3481_master_v2.patch v2 patch for master branch. Hopefully this will trigger the QA run. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch, > PHOENIX-3481_master_v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665597#comment-15665597 ] Andrew Purtell commented on PHOENIX-3481: - Thanks [~samarth.j...@gmail.com] > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665569#comment-15665569 ] Samarth Jain commented on PHOENIX-3481: --- [~apurtell] - https://github.com/apache/hbase/commit/381fcdcfdfd3bac274090cacfeea7c132ba8dd1e looks like the change that you are looking for. [~jamestaylor], I ran into another test failure locally with my patch : SkipScanAfterManualSplitIT#testManualSplit(). {code} org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: callTimeout=120, callDuration=9000109: row ' c' on table 'T02' at region=T02,,1479169996157.dd7e5b63ff5fdc6ef21fd3e26a6400a5., hostname=localhost,54017,1479169931042, seqNum=1 at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:752) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:696) at org.apache.phoenix.iterate.ConcatResultIterator.getIterators(ConcatResultIterator.java:50) at org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:97) at org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117) at org.apache.phoenix.iterate.BaseGroupedAggregatingResultIterator.next(BaseGroupedAggregatingResultIterator.java:64) at org.apache.phoenix.iterate.UngroupedAggregatingResultIterator.next(UngroupedAggregatingResultIterator.java:39) at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778) at org.apache.phoenix.end2end.SkipScanAfterManualSplitIT.testManualSplit(SkipScanAfterManualSplitIT.java:123) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.util.concurrent.ExecutionException: org.apache.phoenix.exception.PhoenixIOException: callTimeout=120, callDuration=9000109: row ' c' on table 'T02' at region=T02,,1479169996157.dd7e5b63ff5fdc6ef21fd3e26a6400a5., hostname=localhost,54017,1479169931042, seqNum=1 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:202) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:747) ... 34 more Caused by: org.apache.phoenix.exception.PhoenixIOException: callTimeout=120, callDuration=9000109: row ' c' on table 'T02' at region=T02,,1479169996157.dd7e5b63ff5fdc6ef21fd3e26a6400a5., hostname=localhost,54017,1479169931042, seqNum=1 at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111) at
[jira] [Updated] (PHOENIX-3469) Incorrect sort order for DESC primary key for NULLS LAST/NULLS FIRST
[ https://issues.apache.org/jira/browse/PHOENIX-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-3469: -- Attachment: PHOENIX-3469_v3.patch Patch on top of PHOENIX-3452 > Incorrect sort order for DESC primary key for NULLS LAST/NULLS FIRST > > > Key: PHOENIX-3469 > URL: https://issues.apache.org/jira/browse/PHOENIX-3469 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3469_v2.patch, PHOENIX-3469_v3.patch > > > This problem can be reproduced as following: > {code:borderStyle=solid} >CREATE TABLE DESC_TEST ( > ORGANIZATION_ID VARCHAR, > CONTAINER_ID VARCHAR, > ENTITY_ID VARCHAR NOT NULL, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID DESC, > CONTAINER_ID DESC, > ENTITY_ID > )) > UPSERT INTO DESC_TEST VALUES ('a',null,'11') > UPSERT INTO DESC_TEST VALUES (null,'2','22') > UPSERT INTO DESC_TEST VALUES ('c','3','33') > {code} > For the following sql: > {code:borderStyle=solid} > SELECT CONTAINER_ID,ORGANIZATION_ID FROM DESC_TEST order by > CONTAINER_ID ASC NULLS LAST > {code} > the expecting result is: > {code:borderStyle=solid} > 2, null > 3,c > null, a > {code} > but the actual result is: > {code:borderStyle=solid} > null, a > 2, null > 3,c > {code} > By debuging the source code,I found the ScanPlan passes the OrderByExpression > to both the ScanRegionObserver and MergeSortTopNResultIterator in line 100 > and line 232,but the OrderByExpression 's "isNullsLast" property is false, > while the sql is "order by CONTAINER_ID ASC NULLS LAST", the "isNullsLast" > property should be true. > {code:borderStyle=solid} > 90private ScanPlan(StatementContext context, FilterableStatement > statement, TableRef table, RowProjector projector, Integer limit, Integer > offset, OrderBy orderBy, ParallelIteratorFactory parallelIteratorFactory, > boolean allowPageFilter, Expression dynamicFilter) throws SQLException { > .. > 95 boolean isOrdered = !orderBy.getOrderByExpressions().isEmpty(); > 96 if (isOrdered) { // TopN > 97 int thresholdBytes = > context.getConnection().getQueryServices().getProps().getInt( > 98 QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, > QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); > 99 ScanRegionObserver.serializeIntoScan(context.getScan(), > thresholdBytes, > 100 limit == null ? -1 : QueryUtil.getOffsetLimit(limit, > offset), orderBy.getOrderByExpressions(), > 101 projector.getEstimatedRowByteSize()); > 102 } > .. > 231} else if (isOrdered) { > 232scanner = new MergeSortTopNResultIterator(iterators, limit, > offset, orderBy.getOrderByExpressions()); > {code} > so the problem is caused by the OrderByCompiler, in line 144, it should not > negative the "isNullsLast",because the "isNullsLast" should not be influenced > by the SortOrder,no matter it is DESC or ASC: > {code:borderStyle=solid} > 142 if (expression.getSortOrder() == SortOrder.DESC) { > 143 isAscending = !isAscending; > 144 isNullsLast = !isNullsLast; > 145 } > {code} > I include more IT test cases in my patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665530#comment-15665530 ] Andrew Purtell commented on PHOENIX-3481: - I will bisect to find the change that causes it, so we know > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3452) NULLS FIRST/NULL LAST should not impact whether GROUP BY is order preserving
[ https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-3452: -- Summary: NULLS FIRST/NULL LAST should not impact whether GROUP BY is order preserving (was: Secondary index and query using distinct: ORDER BY doesn't work correctly) > NULLS FIRST/NULL LAST should not impact whether GROUP BY is order preserving > > > Key: PHOENIX-3452 > URL: https://issues.apache.org/jira/browse/PHOENIX-3452 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3452_v2.patch, PHOENIX-3452_v3.patch > > > This may be related to PHOENIX-3451 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1); > UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3); > UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5); > UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6); > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org1' > AND container_id = 'container1' > ORDER BY score DESC > Notice that the returned results are not returned in descending score order. > Instead they are returned in descending entity_id order. If I remove the > DISTINCT or remove the secondary index the result is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3469) Incorrect sort order for DESC primary key for NULLS LAST/NULLS FIRST
[ https://issues.apache.org/jira/browse/PHOENIX-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-3469: -- Summary: Incorrect sort order for DESC primary key for NULLS LAST/NULLS FIRST (was: Once a column in primary key or index is DESC, the corresponding order by NULLS LAST/NULLS FIRST may work incorrectly) > Incorrect sort order for DESC primary key for NULLS LAST/NULLS FIRST > > > Key: PHOENIX-3469 > URL: https://issues.apache.org/jira/browse/PHOENIX-3469 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3469_v2.patch > > > This problem can be reproduced as following: > {code:borderStyle=solid} >CREATE TABLE DESC_TEST ( > ORGANIZATION_ID VARCHAR, > CONTAINER_ID VARCHAR, > ENTITY_ID VARCHAR NOT NULL, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID DESC, > CONTAINER_ID DESC, > ENTITY_ID > )) > UPSERT INTO DESC_TEST VALUES ('a',null,'11') > UPSERT INTO DESC_TEST VALUES (null,'2','22') > UPSERT INTO DESC_TEST VALUES ('c','3','33') > {code} > For the following sql: > {code:borderStyle=solid} > SELECT CONTAINER_ID,ORGANIZATION_ID FROM DESC_TEST order by > CONTAINER_ID ASC NULLS LAST > {code} > the expecting result is: > {code:borderStyle=solid} > 2, null > 3,c > null, a > {code} > but the actual result is: > {code:borderStyle=solid} > null, a > 2, null > 3,c > {code} > By debuging the source code,I found the ScanPlan passes the OrderByExpression > to both the ScanRegionObserver and MergeSortTopNResultIterator in line 100 > and line 232,but the OrderByExpression 's "isNullsLast" property is false, > while the sql is "order by CONTAINER_ID ASC NULLS LAST", the "isNullsLast" > property should be true. > {code:borderStyle=solid} > 90private ScanPlan(StatementContext context, FilterableStatement > statement, TableRef table, RowProjector projector, Integer limit, Integer > offset, OrderBy orderBy, ParallelIteratorFactory parallelIteratorFactory, > boolean allowPageFilter, Expression dynamicFilter) throws SQLException { > .. > 95 boolean isOrdered = !orderBy.getOrderByExpressions().isEmpty(); > 96 if (isOrdered) { // TopN > 97 int thresholdBytes = > context.getConnection().getQueryServices().getProps().getInt( > 98 QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, > QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); > 99 ScanRegionObserver.serializeIntoScan(context.getScan(), > thresholdBytes, > 100 limit == null ? -1 : QueryUtil.getOffsetLimit(limit, > offset), orderBy.getOrderByExpressions(), > 101 projector.getEstimatedRowByteSize()); > 102 } > .. > 231} else if (isOrdered) { > 232scanner = new MergeSortTopNResultIterator(iterators, limit, > offset, orderBy.getOrderByExpressions()); > {code} > so the problem is caused by the OrderByCompiler, in line 144, it should not > negative the "isNullsLast",because the "isNullsLast" should not be influenced > by the SortOrder,no matter it is DESC or ASC: > {code:borderStyle=solid} > 142 if (expression.getSortOrder() == SortOrder.DESC) { > 143 isAscending = !isAscending; > 144 isNullsLast = !isNullsLast; > 145 } > {code} > I include more IT test cases in my patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665462#comment-15665462 ] James Taylor commented on PHOENIX-3481: --- +1. Thanks, [~samarthjain]. Weird that this just started happening for 0.98.21+. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-3481: -- Attachment: PHOENIX-3481_master.patch Attaching patch for master branch to get a qa run. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch, PHOENIX-3481_master.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-3481: -- Attachment: PHOENIX-3481.patch The reason is that there is a race condition between the call initiated by Phoenix to async modify the SYSTEM.CATALOG hbase metadata and the call to put Phoenix Metadata for SYSTEM.CATALOG. The simple fix is to wait for the async call to finish. [~jamestaylor], please review. > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3481.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665378#comment-15665378 ] Samarth Jain commented on PHOENIX-3481: --- To reproduce this issue, run any IT test. The start up fails with the below error: {code} 2016-11-14 15:27:05,427 WARN [main] org.apache.hadoop.hbase.client.HTable(1725): Error calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row \x00SYSTEM\x00CATALOG java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: callTimeout=120, callDuration=9000103: row 'SYSTEMCATALOG' on table 'SYSTEM.CATALOG' at region=SYSTEM.CATALOG,,1479166015987.0c94299fe4e5271c0b7a12aba74707d7., hostname=localhost,52826,1479165992749, seqNum=1 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1723) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1680) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1283) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1263) at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1448) at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2275) at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:939) at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:211) at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:355) at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:1) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:337) at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1440) at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2410) at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:1) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76) at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2358) at org.apache.phoenix.jdbc.PhoenixTestDriver.getConnectionQueryServices(PhoenixTestDriver.java:96) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:147) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:141) at org.apache.phoenix.jdbc.PhoenixTestDriver.connect(PhoenixTestDriver.java:83) at org.apache.phoenix.query.BaseTest.initAndRegisterTestDriver(BaseTest.java:680) at org.apache.phoenix.query.BaseTest.setUpTestDriver(BaseTest.java:564) at org.apache.phoenix.query.BaseTest.setUpTestDriver(BaseTest.java:557) at org.apache.phoenix.end2end.ParallelStatsDisabledIT.doSetup(ParallelStatsDisabledIT.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.net.SocketTimeoutException:
[jira] [Updated] (PHOENIX-3481) Phoenix initialization fails for HBase 0.98.21 and beyond
[ https://issues.apache.org/jira/browse/PHOENIX-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-3481: -- Summary: Phoenix initialization fails for HBase 0.98.21 and beyond (was: Phoenix initialization fails for 0.98.21 and beyond) > Phoenix initialization fails for HBase 0.98.21 and beyond > - > > Key: PHOENIX-3481 > URL: https://issues.apache.org/jira/browse/PHOENIX-3481 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3481) Phoenix initialization fails for 0.98.21 and beyond
Samarth Jain created PHOENIX-3481: - Summary: Phoenix initialization fails for 0.98.21 and beyond Key: PHOENIX-3481 URL: https://issues.apache.org/jira/browse/PHOENIX-3481 Project: Phoenix Issue Type: Bug Reporter: Samarth Jain Assignee: Samarth Jain Fix For: 4.9.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Coprocessor metrics
Thanks everyone, this is really helpful. If we can leverage the work already Josh/Nick did in Avatica in HBase that will be really good. Seems that the consensus is to follow #2 approach, and pay the price at replicating the API layer in HBase for the convenience of coprocessors and not to tie ourselves with a third party API + implementation. However, even if we do an avatica release, HBase depending on metrics API in Avatica is the same thing as HBase depending on dropwizard directly since HBase does not "control" the Avatica API either. At this point blindly forking the code inside HBase seems like the way to go (possibly in it's own module). Let me poke around, and fork the code if possible inside HBase. I'll send reviews your way. Enis On Mon, Nov 14, 2016 at 7:58 AM, Josh Elserwrote: > Yep -- see avatica-metrics[1], avatica-dropwizard-metrics3[2], and my > dropwizard-hadoop-metrics2[3] project for what Nick is referring to. > > What I ended up doing in Calcite/Avatica was a step beyond your #3, Enis. > Instead of choosing a subset of some standard metrics library to expose, I > "re-built" the actual API that I wanted to expose. At the end of the day, > the API I "built" was nearly 100% what dropwizard metrics' API was. I like > the dropwizard-metrics API; however, we wanted to avoid the strong coupling > to a single metrics implementation. > > My current feeling is that external API should never include > classes/interfaces which you don't "own". Re-building the API that already > exists is pedantic, but I think it's a really good way to pay down the > maintenance debt (whenever the next metrics library "hotness" takes off). > > If it's amenable to you, Enis, I'm happy to work with you to do whatever > decoupling of this metrics abstraction away from the "core" of Avatica > (e.g. presently, a new update of the library would also require a full > release of Avatica which is no-good for HBase). I think a lot of the > lifting I've done already would be reusable by you and help make a better > product at the end of the day. > > - Josh > > [1] https://github.com/apache/calcite/tree/master/avatica/metrics > [2] https://github.com/apache/calcite/tree/master/avatica/metric > s-dropwizardmetrics3 > [3] https://github.com/joshelser/dropwizard-hadoop-metrics2 > > > Nick Dimiduk wrote: > >> IIRC, the plan is to get off of Hadoop Metrics2, so I am in favor of >> either >> (2) or (3). Specifically for (3), I believe there is an implementation for >> translating Dropwizard Metrics to Hadoop Metrics2, in or around Avatica >> and/or Phoenix Query Server. >> >> On Fri, Nov 11, 2016 at 3:15 PM, Enis Söztutar wrote: >> >> HBase / Phoenix devs, >>> >>> I would like to solicit early feedback on the design approach that we >>> would >>> pursue for exposing coprocessor metrics. It has implications for our >>> compatibility, so lets try to have some consensus. Added Phoenix devs as >>> well since this will affect how coprocessors can emit metrics via region >>> server metrics bus. >>> >>> The issue is HBASE-9774 [1]. >>> >>> >>> We have a couple of options: >>> >>> (1) Expose Hadoop Metrics2 + HBase internal classes (like BaseSourceImpl, >>> MutableFastCounter, FastLongHistogram, etc). This option is the least >>> amount of work in terms of defining the API. We would mark the important >>> classes with LimitedPrivate(Coprocessor) and have the coprocessors each >>> write their metrics source classes separately. The disadvantage would be >>> that some of the internal APIs are now public and has to be evolved with >>> regards to coprocessor API compatibility. Also it will make it so that >>> breaking coprocessors are now easier across minor releases. >>> (2) Build a Metrics subset API in HBase to abstract away HBase metrics >>> classes and Hadoop2 metrics classes and expose this API only. The API >>> will >>> probably be limited and will be a small subset. HBase internals do not >>> need >>> to be changed that much, but the API has to be kept >>> LimitedPrivate(Coprocessor) with the compatibility implications. >>> (3) Expose (a limited subset of) third-party API to the coprocessors >>> (like >>> Yammer metrics) and never expose internal HBase / Hadoop implementation. >>> Build a translation layer between the yammer metrics and our Hadoop >>> metrics >>> 2 implementation so that things will still work. If we end up changing >>> the >>> implementation, existing coprocessors will not be affected. The downside >>> is >>> that whatever API that we agree to expose becomes our compatibility >>> point. >>> We cannot change that dependency version unless it is acceptable via our >>> compatibility guidelines. >>> >>> Personally, I would like to pursue option (3) especially with Yammer >>> metrics since we do not have to build yet another API endpoint. Hadoop's >>> metrics API is not the best and we do not know whether we will end up >>> changing that dependency. What do you guys think? >>>
Re: Coprocessor metrics
+1 We also need to get away from Guava's Service interface for replication endpoint plugins using this same approach (#2): HBASE-15982 On Mon, Nov 14, 2016 at 10:37 AM, Gary Helmlingwrote: > > > > > > My current feeling is that external API should never include > > classes/interfaces which you don't "own". Re-building the API that > > already exists is pedantic, but I think it's a really good way to pay > > down the maintenance debt (whenever the next metrics library "hotness" > > takes off). > > > > > +1 to this. I'd be very hesitant to tie ourselves too strongly to a > specific implementation, even if it is just copying an interface. > > For coprocessors specifically, I think we can start with a limited API > exposing common metric types and evolve it from there. But starting simple > seems key. > > So #2 seems like the right approach to me. > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Coprocessor metrics
> > > My current feeling is that external API should never include > classes/interfaces which you don't "own". Re-building the API that > already exists is pedantic, but I think it's a really good way to pay > down the maintenance debt (whenever the next metrics library "hotness" > takes off). > > +1 to this. I'd be very hesitant to tie ourselves too strongly to a specific implementation, even if it is just copying an interface. For coprocessors specifically, I think we can start with a limited API exposing common metric types and evolve it from there. But starting simple seems key. So #2 seems like the right approach to me.
[jira] [Commented] (PHOENIX-3241) Convert_tz doesn't allow timestamp data type
[ https://issues.apache.org/jira/browse/PHOENIX-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664597#comment-15664597 ] Josh Elser commented on PHOENIX-3241: - Thanks, [~an...@apache.org]! [~jamestaylor], unless I hear otherwise from you, I will wait for 4.9.0 to finish before landing this. [~lhofhansl] ditto for 4.8.2 > Convert_tz doesn't allow timestamp data type > > > Key: PHOENIX-3241 > URL: https://issues.apache.org/jira/browse/PHOENIX-3241 > Project: Phoenix > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Josh Elser > Fix For: 4.10.0 > > Attachments: PHOENIX-3241.002.patch, PHOENIX-3241.003.patch, > PHOENIX-3241.patch > > > As per documentation, we allow timestamp data type of convert_tz but as per > code only DATE dataype is allowed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 4:19 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression in line 291, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 4:18 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression in line 291, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Updated] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3451: -- Attachment: PHOENIX-3451.diff > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Coprocessor metrics
Yep -- see avatica-metrics[1], avatica-dropwizard-metrics3[2], and my dropwizard-hadoop-metrics2[3] project for what Nick is referring to. What I ended up doing in Calcite/Avatica was a step beyond your #3, Enis. Instead of choosing a subset of some standard metrics library to expose, I "re-built" the actual API that I wanted to expose. At the end of the day, the API I "built" was nearly 100% what dropwizard metrics' API was. I like the dropwizard-metrics API; however, we wanted to avoid the strong coupling to a single metrics implementation. My current feeling is that external API should never include classes/interfaces which you don't "own". Re-building the API that already exists is pedantic, but I think it's a really good way to pay down the maintenance debt (whenever the next metrics library "hotness" takes off). If it's amenable to you, Enis, I'm happy to work with you to do whatever decoupling of this metrics abstraction away from the "core" of Avatica (e.g. presently, a new update of the library would also require a full release of Avatica which is no-good for HBase). I think a lot of the lifting I've done already would be reusable by you and help make a better product at the end of the day. - Josh [1] https://github.com/apache/calcite/tree/master/avatica/metrics [2] https://github.com/apache/calcite/tree/master/avatica/metrics-dropwizardmetrics3 [3] https://github.com/joshelser/dropwizard-hadoop-metrics2 Nick Dimiduk wrote: IIRC, the plan is to get off of Hadoop Metrics2, so I am in favor of either (2) or (3). Specifically for (3), I believe there is an implementation for translating Dropwizard Metrics to Hadoop Metrics2, in or around Avatica and/or Phoenix Query Server. On Fri, Nov 11, 2016 at 3:15 PM, Enis Söztutarwrote: HBase / Phoenix devs, I would like to solicit early feedback on the design approach that we would pursue for exposing coprocessor metrics. It has implications for our compatibility, so lets try to have some consensus. Added Phoenix devs as well since this will affect how coprocessors can emit metrics via region server metrics bus. The issue is HBASE-9774 [1]. We have a couple of options: (1) Expose Hadoop Metrics2 + HBase internal classes (like BaseSourceImpl, MutableFastCounter, FastLongHistogram, etc). This option is the least amount of work in terms of defining the API. We would mark the important classes with LimitedPrivate(Coprocessor) and have the coprocessors each write their metrics source classes separately. The disadvantage would be that some of the internal APIs are now public and has to be evolved with regards to coprocessor API compatibility. Also it will make it so that breaking coprocessors are now easier across minor releases. (2) Build a Metrics subset API in HBase to abstract away HBase metrics classes and Hadoop2 metrics classes and expose this API only. The API will probably be limited and will be a small subset. HBase internals do not need to be changed that much, but the API has to be kept LimitedPrivate(Coprocessor) with the compatibility implications. (3) Expose (a limited subset of) third-party API to the coprocessors (like Yammer metrics) and never expose internal HBase / Hadoop implementation. Build a translation layer between the yammer metrics and our Hadoop metrics 2 implementation so that things will still work. If we end up changing the implementation, existing coprocessors will not be affected. The downside is that whatever API that we agree to expose becomes our compatibility point. We cannot change that dependency version unless it is acceptable via our compatibility guidelines. Personally, I would like to pursue option (3) especially with Yammer metrics since we do not have to build yet another API endpoint. Hadoop's metrics API is not the best and we do not know whether we will end up changing that dependency. What do you guys think? [1] https://issues.apache.org/jira/browse/HBASE-9774
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:21 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression in line 291, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:19 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression in line 291, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:13 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression in line 291, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:11 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression in line 291, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:00 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following select sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler think the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because the where condition "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode converts to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression, because the index of "score" in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info = node.accept(visitor); 109 if (info == null)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:08 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression in line 291, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:06 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" column is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression, and because the index of "score" column in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info =
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:05 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode is converted to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression, because the index of "score" in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info = node.accept(visitor); 109
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei commented on PHOENIX-3451: --- This bug is caused by the OrderByCompiler,my analysis of this bug is as following: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following select sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler think the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because the where condition "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode converts to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression, because the index of "score" in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info = node.accept(visitor); 109 if (info == null) { 110 isOrderPreserving = false;
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:03 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler thinking the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because we can see the where condition is "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode converts to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression, because the index of "score" in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info = node.accept(visitor); 109 if
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664118#comment-15664118 ] chenglei edited comment on PHOENIX-3451 at 11/14/16 3:01 PM: - This bug is caused by the OrderByCompiler,my analysis of this bug is as follows: take following table which describe by [~jpalmert] as a example : {code:borderStyle=solid} CREATE TABLE IF NOT EXISTS TEST.TEST ( ORGANIZATION_ID CHAR(15) NOT NULL, CONTAINER_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, ENTITY_ID ) ) CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); {code} for the following query sql : {code:borderStyle=solid} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id = 'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY score DESC LIMIT 2 {code} the phoenix would use the following index table TEST_SCORE to do the query: {code:borderStyle=solid} CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC); {code} Using that index is good,the problem is that the OrderByCompiler think the OrderBy is OrderBy.FWD_ROW_KEY_ORDER_BY,but because the where condition "container_id IN ( 'container1','container2','container3' )", obviously OrderBy is not OrderBy.FWD_ROW_KEY_ORDER_BY. When we look into OrderByCompiler's compile method, in line 123, the "score" ColumnParseNode in "ORDER BY score DESC" accepts a ExpressionCompiler visitor: {code:borderStyle=solid} 123expression = node.getNode().accept(compiler); 124// Detect mix of aggregate and non aggregates (i.e. ORDER BY txns, SUM(txns) 125if (!expression.isStateless() && !compiler.isAggregate()) { 126if (statement.isAggregate() || statement.isDistinct()) { 127// Detect ORDER BY not in SELECT DISTINCT: SELECT DISTINCT count(*) FROM t ORDER BY x 128if (statement.isDistinct()) { 129throw new SQLExceptionInfo.Builder(SQLExceptionCode.ORDER_BY_NOT_IN_SELECT_DISTINCT) 130 .setMessage(expression.toString()).build().buildException(); 131} 132 ExpressionCompiler.throwNonAggExpressionInAggException(expression.toString()); {code} In ExpressionCompiler 's visit method,the "score" ColumnParseNode converts to a KeyValueColumnExpression in line 408,then in line 409,wrapGroupByExpression method is invoked: {code:borderStyle=solid} 393 public Expression visit(ColumnParseNode node) throws SQLException { 408 Expression expression = ref.newColumnExpression(node.isTableNameCaseSensitive(), node.isCaseSensitive()); 409 Expression wrappedExpression = wrapGroupByExpression(expression); {code} in wrapGroupByExpression method,because the "score" is in groupBy.getExpressions(),which is "[ENTITY_ID, SCORE]",so KeyValueColumnExpression is replaced by RowKeyColumnExpression, because the index of "score" in "[ENTITY_ID, SCORE]" is 1,so the return value of RowKeyColumnExpression 's position method is 1 : {code:borderStyle=solid} 282 private Expression wrapGroupByExpression(Expression expression) { . 286if (aggregateFunction == null) { 287int index = groupBy.getExpressions().indexOf(expression); 288if (index >= 0) { 289isAggregate = true; 290RowKeyValueAccessor accessor = new RowKeyValueAccessor(groupBy.getKeyExpressions(), index); 291expression = new RowKeyColumnExpression(expression, accessor, groupBy.getKeyExpressions().get(index).getDataType()); 292} 293} 294return expression; 295} {code} so when OrderByCompiler's compile method invokes OrderPreservingTracker's track method, in line 108,the return Info's pkPosition is 1: {code:borderStyle=solid} 106public void track(Expression node, SortOrder sortOrder, boolean isNullsLast) { 107 if (isOrderPreserving) { 108 Info info = node.accept(visitor); 109 if (info == null) {
[jira] [Comment Edited] (PHOENIX-3333) Support Spark 2.0
[ https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663791#comment-15663791 ] DEQUN edited comment on PHOENIX- at 11/14/16 12:25 PM: --- I'm new to git, maven and phoenix. Could you show me how to use this path or how to build spark_2.0 branch? was (Author: sd10): I'm new to git, maven and phoenix. Could you show me tutorial how to use this path or how to build spark_2.0 branch? > Support Spark 2.0 > - > > Key: PHOENIX- > URL: https://issues.apache.org/jira/browse/PHOENIX- > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 > Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is > hdp 2.5 >Reporter: dalin qin > Attachments: PHOENIX--interim.patch > > > spark version is 2.0.0.2.5.0.0-1245 > As mentioned by Josh , I believe spark 2.0 changed their api so that failed > phoenix. Please come up with update version to adapt spark's change. > In [1]: df = sqlContext.read \ >...: .format("org.apache.phoenix.spark") \ >...: .option("table", "TABLE1") \ >...: .option("zkUrl", "namenode:2181:/hbase-unsecure") \ >...: .load() > --- > Py4JJavaError Traceback (most recent call last) > in () > > 1 df = sqlContext.read .format("org.apache.phoenix.spark") > .option("table", "TABLE1") .option("zkUrl", > "namenode:2181:/hbase-unsecure") .load() > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, > path, format, schema, **options) > 151 return > self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path))) > 152 else: > --> 153 return self._df(self._jreader.load()) > 154 > 155 @since(1.4) > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py > in __call__(self, *args) > 931 answer = self.gateway_client.send_command(command) > 932 return_value = get_return_value( > --> 933 answer, self.gateway_client, self.target_id, self.name) > 934 > 935 for temp_arg in temp_args: > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw) > 61 def deco(*a, **kw): > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > 65 s = e.java_exception.toString() > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 310 raise Py4JJavaError( > 311 "An error occurred while calling {0}{1}{2}.\n". > --> 312 format(target_id, ".", name), value) > 313 else: > 314 raise Py4JError( > Py4JJavaError: An error occurred while calling o43.load. > : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at > java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:472) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) > at
[jira] [Commented] (PHOENIX-3333) Support Spark 2.0
[ https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663791#comment-15663791 ] DEQUN commented on PHOENIX-: I'm new to git, maven and phoenix. Could you show me tutorial how to use this path or how to build spark_2.0 branch? > Support Spark 2.0 > - > > Key: PHOENIX- > URL: https://issues.apache.org/jira/browse/PHOENIX- > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 > Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is > hdp 2.5 >Reporter: dalin qin > Attachments: PHOENIX--interim.patch > > > spark version is 2.0.0.2.5.0.0-1245 > As mentioned by Josh , I believe spark 2.0 changed their api so that failed > phoenix. Please come up with update version to adapt spark's change. > In [1]: df = sqlContext.read \ >...: .format("org.apache.phoenix.spark") \ >...: .option("table", "TABLE1") \ >...: .option("zkUrl", "namenode:2181:/hbase-unsecure") \ >...: .load() > --- > Py4JJavaError Traceback (most recent call last) > in () > > 1 df = sqlContext.read .format("org.apache.phoenix.spark") > .option("table", "TABLE1") .option("zkUrl", > "namenode:2181:/hbase-unsecure") .load() > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, > path, format, schema, **options) > 151 return > self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path))) > 152 else: > --> 153 return self._df(self._jreader.load()) > 154 > 155 @since(1.4) > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py > in __call__(self, *args) > 931 answer = self.gateway_client.send_command(command) > 932 return_value = get_return_value( > --> 933 answer, self.gateway_client, self.target_id, self.name) > 934 > 935 for temp_arg in temp_args: > /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw) > 61 def deco(*a, **kw): > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > 65 s = e.java_exception.toString() > /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 310 raise Py4JJavaError( > 311 "An error occurred while calling {0}{1}{2}.\n". > --> 312 format(target_id, ".", name), value) > 313 else: > 314 raise Py4JError( > Py4JJavaError: An error occurred while calling o43.load. > : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.getDeclaredMethod(Class.java:2128) > at > java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475) > at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:472) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) > at
[jira] [Commented] (PHOENIX-3465) order by incorrect when array column be used
[ https://issues.apache.org/jira/browse/PHOENIX-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663737#comment-15663737 ] Yuan Kang commented on PHOENIX-3465: some one can help? > order by incorrect when array column be used > > > Key: PHOENIX-3465 > URL: https://issues.apache.org/jira/browse/PHOENIX-3465 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Yuan Kang > Labels: bug > > when I create a table like that: > create table "TABLE_A" > ( > task_id varchar not null, > date varchar not null, > dim varchar not null, > valueArray double array, > dimNameArray varchar array, > constraint pk primary key (task_id, date, dim) > ) SALT_BUCKETS = 4, COMPRESSION='SNAPPY'; > upsert some data ,when I query a sql below : > select date, sum(valueArray[16]) as val1 > from TABLE_A > where date = '2016-11-01' and task_id = '4692' order by val1 desc limit 50;" > the result is incorrert.the similer issus was announced be fix in 4.5.0,this > issus is happened in 4.8.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3478) UPDATE STATISTICS SET syntax error
[ https://issues.apache.org/jira/browse/PHOENIX-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663468#comment-15663468 ] Ankit Singhal commented on PHOENIX-3478: you need to use quotes if you have SQL reserved characters(like ".") in your property name(https://phoenix.apache.org/language/index.html#name). {code} UPDATE STATISTICS my_table SET "phoenix.stats.guidepost.width"=5000 {code} Let me see If I can update documentation to represent this better. > UPDATE STATISTICS SET syntax error > -- > > Key: PHOENIX-3478 > URL: https://issues.apache.org/jira/browse/PHOENIX-3478 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.1 >Reporter: Alex Batyrshin >Priority: Minor > > According to > [documentation|https://phoenix.apache.org/language/index.html#update_statistics] > example: > bq. UPDATE STATISTICS my_table SET phoenix.stats.guidepost.width=5000 > And this is real world: > {code} > 0: jdbc:phoenix:> UPDATE STATISTICS "xxx" SET > phoenix.stats.guidepost.width=3000; > Error: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "EQ", got > "." at line 1, column 52. (state=42P00,code=604) > {code} > Looks like SET parameter should be quoted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-3478) UPDATE STATISTICS SET syntax error
[ https://issues.apache.org/jira/browse/PHOENIX-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal resolved PHOENIX-3478. Resolution: Not A Problem > UPDATE STATISTICS SET syntax error > -- > > Key: PHOENIX-3478 > URL: https://issues.apache.org/jira/browse/PHOENIX-3478 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.1 >Reporter: Alex Batyrshin >Priority: Minor > > According to > [documentation|https://phoenix.apache.org/language/index.html#update_statistics] > example: > bq. UPDATE STATISTICS my_table SET phoenix.stats.guidepost.width=5000 > And this is real world: > {code} > 0: jdbc:phoenix:> UPDATE STATISTICS "xxx" SET > phoenix.stats.guidepost.width=3000; > Error: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "EQ", got > "." at line 1, column 52. (state=42P00,code=604) > {code} > Looks like SET parameter should be quoted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3241) Convert_tz doesn't allow timestamp data type
[ https://issues.apache.org/jira/browse/PHOENIX-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated PHOENIX-3241: --- Assignee: Josh Elser (was: Ankit Singhal) > Convert_tz doesn't allow timestamp data type > > > Key: PHOENIX-3241 > URL: https://issues.apache.org/jira/browse/PHOENIX-3241 > Project: Phoenix > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Josh Elser > Fix For: 4.10.0 > > Attachments: PHOENIX-3241.002.patch, PHOENIX-3241.003.patch, > PHOENIX-3241.patch > > > As per documentation, we allow timestamp data type of convert_tz but as per > code only DATE dataype is allowed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3241) Convert_tz doesn't allow timestamp data type
[ https://issues.apache.org/jira/browse/PHOENIX-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663426#comment-15663426 ] Ankit Singhal edited comment on PHOENIX-3241 at 11/14/16 10:39 AM: --- [~elserj], +1 , Adding a test case is always better. And, Extending credit(Assigning to you) as you have spent more time on this than me. :) was (Author: an...@apache.org): [~elserj], +1 , Adding a test case is always better. And, Extending credit(Assigning to you) as you have spent more time on this than me. > Convert_tz doesn't allow timestamp data type > > > Key: PHOENIX-3241 > URL: https://issues.apache.org/jira/browse/PHOENIX-3241 > Project: Phoenix > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3241.002.patch, PHOENIX-3241.003.patch, > PHOENIX-3241.patch > > > As per documentation, we allow timestamp data type of convert_tz but as per > code only DATE dataype is allowed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3241) Convert_tz doesn't allow timestamp data type
[ https://issues.apache.org/jira/browse/PHOENIX-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663426#comment-15663426 ] Ankit Singhal edited comment on PHOENIX-3241 at 11/14/16 10:38 AM: --- [~elserj], +1 , Adding a test case is always better. And, Extending credit(Assigning to you) as you have spent more time on this than me. was (Author: an...@apache.org): [~elserj], +1 , Adding a test case is always better. > Convert_tz doesn't allow timestamp data type > > > Key: PHOENIX-3241 > URL: https://issues.apache.org/jira/browse/PHOENIX-3241 > Project: Phoenix > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3241.002.patch, PHOENIX-3241.003.patch, > PHOENIX-3241.patch > > > As per documentation, we allow timestamp data type of convert_tz but as per > code only DATE dataype is allowed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3241) Convert_tz doesn't allow timestamp data type
[ https://issues.apache.org/jira/browse/PHOENIX-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663426#comment-15663426 ] Ankit Singhal commented on PHOENIX-3241: [~elserj], +1 , Adding a test case is always better. > Convert_tz doesn't allow timestamp data type > > > Key: PHOENIX-3241 > URL: https://issues.apache.org/jira/browse/PHOENIX-3241 > Project: Phoenix > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3241.002.patch, PHOENIX-3241.003.patch, > PHOENIX-3241.patch > > > As per documentation, we allow timestamp data type of convert_tz but as per > code only DATE dataype is allowed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3461) Statistics collection broken if name space mapping enabled for SYSTEM tables
[ https://issues.apache.org/jira/browse/PHOENIX-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663418#comment-15663418 ] Ankit Singhal commented on PHOENIX-3461: bq. I think if we have any hope of namespaces not regressing, we need better test coverage. Yes [~jamestaylor] , created PHOENIX-3480 for increasing the test coverage. And, I think it will be great if we can reach a level where all the test can be run with namespace enabled and disabled. > Statistics collection broken if name space mapping enabled for SYSTEM tables > > > Key: PHOENIX-3461 > URL: https://issues.apache.org/jira/browse/PHOENIX-3461 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Samarth Jain > Fix For: 4.9.0 > > Attachments: PHOENIX-3461-v3.patch, PHOENIX-3461_master.patch, > PHOENIX-3461_v2.patch, PHOENIX-3461_v4.patch, PHOENIX-3461_v5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3480) Extend test coverage of namespace mapping feature to avoid regression
Ankit Singhal created PHOENIX-3480: -- Summary: Extend test coverage of namespace mapping feature to avoid regression Key: PHOENIX-3480 URL: https://issues.apache.org/jira/browse/PHOENIX-3480 Project: Phoenix Issue Type: Bug Reporter: Ankit Singhal Assignee: Ankit Singhal We should increase test coverage of namespace mapping feature to ensure that new changes are using the right set of APIs to get physical name for user and system tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)