[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670234#comment-15670234 ]
chenglei edited comment on PHOENIX-3451 at 11/16/16 11:44 AM: -------------------------------------------------------------- [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} - /* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ - if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { - isOrderPreserving = false; - isReverse = false; - return; - } {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232 aggResultIterator = new GroupedAggregatingResultIterator( 233 new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} So if the AggregatePlan is not modified, just remove out the above code may cause problem. Maybe I can open a new JIRA to fix this problem if the JIRA does not exist,because it is irrelevant to PHOENIX-3451 was (Author: comnetwork): [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} - /* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ - if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { - isOrderPreserving = false; - isReverse = false; - return; - } {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232 aggResultIterator = new GroupedAggregatingResultIterator( 233 new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} So if the AggregatePlan is not modified, just remove out the above code may cause problem. Maybe I can open a new JIRA to fix this problem if the JIRA does not exist. > Secondary index and query using distinct: LIMIT doesn't return the first rows > ----------------------------------------------------------------------------- > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.8.0 > Reporter: Joel Palmert > Assignee: chenglei > Attachments: PHOENIX-3451_v1.patch > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId5 1.2 > entityId3 1.4 > The expected out out would be > entityId8 1.45 > entityId3 1.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)