[jira] [Comment Edited] (PHOENIX-3689) Not determinist order by with limit
[ https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882399#comment-15882399 ] chenglei edited comment on PHOENIX-3689 at 2/24/17 10:27 AM: - Thank your for adding DDL,but it seems that the sql given before can not match the DDL, may be it is better that you give us a complete test case,just like the PHOENIX-3578 does,thanks. was (Author: comnetwork): Thank your for adding DDL,but it seems that the sql given before can not match the DDL, may be you had better give us a complete test case,just like the PHOENIX-3578 does,thanks. > Not determinist order by with limit > --- > > Key: PHOENIX-3689 > URL: https://issues.apache.org/jira/browse/PHOENIX-3689 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 >Reporter: Arthur > > The following request does not return the last value of myTable: > select * from myTable order by myKey desc limit 1; > Adding a 'group by myKey' clause gets back the good result. > I noticed that an order by with 'limit 10' returns a merge of 10 results from > each region and not 10 results of the whole request. > So 'order by' is not determinist. It is a bug or a feature ? > Here is my DDL: > CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id > varchar(20) NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, > id)); > And some data with a dynamic column (I have 2 millions of similar rows sorted > by time) : > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:00.3730',91,'','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:00.7170',91,'0001','PO','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:01.9030',91,'0002','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:02.7330',91,'0003','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:03.5470',91,'0004','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:04.7330',91,'0005','POUR','S_052305'); -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3689) Not determinist order by with limit
[ https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882399#comment-15882399 ] chenglei commented on PHOENIX-3689: --- Thank your for adding DDL,but it seems that the sql given before can not match the DDL, may be you had better give us a complete test case,just like the PHOENIX-3578 does,thanks. > Not determinist order by with limit > --- > > Key: PHOENIX-3689 > URL: https://issues.apache.org/jira/browse/PHOENIX-3689 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 >Reporter: Arthur > > The following request does not return the last value of myTable: > select * from myTable order by myKey desc limit 1; > Adding a 'group by myKey' clause gets back the good result. > I noticed that an order by with 'limit 10' returns a merge of 10 results from > each region and not 10 results of the whole request. > So 'order by' is not determinist. It is a bug or a feature ? > Here is my DDL: > CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id > varchar(20) NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, > id)); > And some data with a dynamic column (I have 2 millions of similar rows sorted > by time) : > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:00.3730',91,'','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:00.7170',91,'0001','PO','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:01.9030',91,'0002','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:02.7330',91,'0003','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:03.5470',91,'0004','POUR','S_052303'); > UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 > 03:31:04.7330',91,'0005','POUR','S_052305'); -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3689) Not determinist order by with limit
[ https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881828#comment-15881828 ] chenglei commented on PHOENIX-3689: --- Could you please provide your table DDL and some sample data? so we can reproduce and check if it is a bug. > Not determinist order by with limit > --- > > Key: PHOENIX-3689 > URL: https://issues.apache.org/jira/browse/PHOENIX-3689 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 >Reporter: Arthur > > The following request does not return the last value of myTable: > select * from myTable order by myKey desc limit 1; > Adding a 'group by myKey' clause gets back the good result. > I noticed that an order by with 'limit 10' returns a merge of 10 results from > each region and not 10 results of the whole request. > So 'order by' is not determinist. It is a bug or a feature ? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877706#comment-15877706 ] chenglei commented on PHOENIX-3578: --- [~jamestaylor], ok ,I will try to do it. > Incorrect query results when applying inner join and orderby desc > - > > Key: PHOENIX-3578 > URL: https://issues.apache.org/jira/browse/PHOENIX-3578 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 > Environment: hbase-1.1.2 >Reporter: sungmin.cho > Attachments: PHOENIX-3578_v1.patch > > > Step to reproduce: > h4. 1. Create two tables > {noformat} > CREATE TABLE IF NOT EXISTS master ( > id integer not null, > col1 varchar, > constraint pk_master primary key(id) > ); > CREATE TABLE IF NOT EXISTS detail ( > id integer not null, > seq integer not null, > col2 varchar, > constraint pk_master primary key(id, seq) > ); > {noformat} > h4. 2. Upsert values > {noformat} > upsert into master values(1, 'A1'); > upsert into master values(2, 'A2'); > upsert into master values(3, 'A3'); > upsert into detail values(1, 1, 'B1'); > upsert into detail values(1, 2, 'B2'); > upsert into detail values(2, 1, 'B1'); > upsert into detail values(2, 2, 'B2'); > upsert into detail values(3, 1, 'B1'); > upsert into detail values(3, 2, 'B2'); > upsert into detail values(3, 3, 'B3'); > {noformat} > h4. 3. Execute query > {noformat} > select m.id, m.col1, d.seq, d.col2 > from master m, detail d > where m.id = d.id > and d.id between 1 and 2 > order by m.id desc > {noformat} > h4. (/) Expected result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 2 | A2 | 1 | B1 | > | 2 | A2 | 2 | B2 | > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} > h4. (!) Incorrect result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877651#comment-15877651 ] chenglei commented on PHOENIX-3578: --- [~maryannxue], thank you very much for review , yes , It is true we cannot use skip scan if the scan is in reverse order,but we can use range scan if the scan is in reverse order,so the dynamic join filter should still be consided if the the scan is in reverse order,because the dynamic join filter may narrowing down the LHS reverse scan range. I will modify my patch following your suggestion. > Incorrect query results when applying inner join and orderby desc > - > > Key: PHOENIX-3578 > URL: https://issues.apache.org/jira/browse/PHOENIX-3578 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 > Environment: hbase-1.1.2 >Reporter: sungmin.cho > Attachments: PHOENIX-3578_v1.patch > > > Step to reproduce: > h4. 1. Create two tables > {noformat} > CREATE TABLE IF NOT EXISTS master ( > id integer not null, > col1 varchar, > constraint pk_master primary key(id) > ); > CREATE TABLE IF NOT EXISTS detail ( > id integer not null, > seq integer not null, > col2 varchar, > constraint pk_master primary key(id, seq) > ); > {noformat} > h4. 2. Upsert values > {noformat} > upsert into master values(1, 'A1'); > upsert into master values(2, 'A2'); > upsert into master values(3, 'A3'); > upsert into detail values(1, 1, 'B1'); > upsert into detail values(1, 2, 'B2'); > upsert into detail values(2, 1, 'B1'); > upsert into detail values(2, 2, 'B2'); > upsert into detail values(3, 1, 'B1'); > upsert into detail values(3, 2, 'B2'); > upsert into detail values(3, 3, 'B3'); > {noformat} > h4. 3. Execute query > {noformat} > select m.id, m.col1, d.seq, d.col2 > from master m, detail d > where m.id = d.id > and d.id between 1 and 2 > order by m.id desc > {noformat} > h4. (/) Expected result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 2 | A2 | 1 | B1 | > | 2 | A2 | 2 | B2 | > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} > h4. (!) Incorrect result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876265#comment-15876265 ] chenglei commented on PHOENIX-3578: --- Actually the test had ran:https://builds.apache.org/job/PreCommit-PHOENIX-Build/778/console, but the failed IT tests seems no relation to this issue. It seems that after upgrade to HBase 1.2.4 by PHOENIX-3659,some IT tests timeout and failed. > Incorrect query results when applying inner join and orderby desc > - > > Key: PHOENIX-3578 > URL: https://issues.apache.org/jira/browse/PHOENIX-3578 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 > Environment: hbase-1.1.2 >Reporter: sungmin.cho > Attachments: PHOENIX-3578_v1.patch > > > Step to reproduce: > h4. 1. Create two tables > {noformat} > CREATE TABLE IF NOT EXISTS master ( > id integer not null, > col1 varchar, > constraint pk_master primary key(id) > ); > CREATE TABLE IF NOT EXISTS detail ( > id integer not null, > seq integer not null, > col2 varchar, > constraint pk_master primary key(id, seq) > ); > {noformat} > h4. 2. Upsert values > {noformat} > upsert into master values(1, 'A1'); > upsert into master values(2, 'A2'); > upsert into master values(3, 'A3'); > upsert into detail values(1, 1, 'B1'); > upsert into detail values(1, 2, 'B2'); > upsert into detail values(2, 1, 'B1'); > upsert into detail values(2, 2, 'B2'); > upsert into detail values(3, 1, 'B1'); > upsert into detail values(3, 2, 'B2'); > upsert into detail values(3, 3, 'B3'); > {noformat} > h4. 3. Execute query > {noformat} > select m.id, m.col1, d.seq, d.col2 > from master m, detail d > where m.id = d.id > and d.id between 1 and 2 > order by m.id desc > {noformat} > h4. (/) Expected result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 2 | A2 | 1 | B1 | > | 2 | A2 | 2 | B2 | > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} > h4. (!) Incorrect result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 10:25 AM: - This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} is called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} \\ Finally, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id are 1, the rows which id are 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} \\ In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 10:22 AM: - This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} is called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} \\ In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id are 1, the rows which id are 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} \\ In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id
[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875743#comment-15875743 ] chenglei commented on PHOENIX-3578: --- I uploaded my first patch, please help me have a review,thanks. > Incorrect query results when applying inner join and orderby desc > - > > Key: PHOENIX-3578 > URL: https://issues.apache.org/jira/browse/PHOENIX-3578 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 > Environment: hbase-1.1.2 >Reporter: sungmin.cho > Attachments: PHOENIX-3578_v1.patch > > > Step to reproduce: > h4. 1. Create two tables > {noformat} > CREATE TABLE IF NOT EXISTS master ( > id integer not null, > col1 varchar, > constraint pk_master primary key(id) > ); > CREATE TABLE IF NOT EXISTS detail ( > id integer not null, > seq integer not null, > col2 varchar, > constraint pk_master primary key(id, seq) > ); > {noformat} > h4. 2. Upsert values > {noformat} > upsert into master values(1, 'A1'); > upsert into master values(2, 'A2'); > upsert into master values(3, 'A3'); > upsert into detail values(1, 1, 'B1'); > upsert into detail values(1, 2, 'B2'); > upsert into detail values(2, 1, 'B1'); > upsert into detail values(2, 2, 'B2'); > upsert into detail values(3, 1, 'B1'); > upsert into detail values(3, 2, 'B2'); > upsert into detail values(3, 3, 'B3'); > {noformat} > h4. 3. Execute query > {noformat} > select m.id, m.col1, d.seq, d.col2 > from master m, detail d > where m.id = d.id > and d.id between 1 and 2 > order by m.id desc > {noformat} > h4. (/) Expected result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 2 | A2 | 1 | B1 | > | 2 | A2 | 2 | B2 | > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} > h4. (!) Incorrect result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3578: -- Attachment: PHOENIX-3578_v1.patch > Incorrect query results when applying inner join and orderby desc > - > > Key: PHOENIX-3578 > URL: https://issues.apache.org/jira/browse/PHOENIX-3578 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 > Environment: hbase-1.1.2 >Reporter: sungmin.cho > Attachments: PHOENIX-3578_v1.patch > > > Step to reproduce: > h4. 1. Create two tables > {noformat} > CREATE TABLE IF NOT EXISTS master ( > id integer not null, > col1 varchar, > constraint pk_master primary key(id) > ); > CREATE TABLE IF NOT EXISTS detail ( > id integer not null, > seq integer not null, > col2 varchar, > constraint pk_master primary key(id, seq) > ); > {noformat} > h4. 2. Upsert values > {noformat} > upsert into master values(1, 'A1'); > upsert into master values(2, 'A2'); > upsert into master values(3, 'A3'); > upsert into detail values(1, 1, 'B1'); > upsert into detail values(1, 2, 'B2'); > upsert into detail values(2, 1, 'B1'); > upsert into detail values(2, 2, 'B2'); > upsert into detail values(3, 1, 'B1'); > upsert into detail values(3, 2, 'B2'); > upsert into detail values(3, 3, 'B3'); > {noformat} > h4. 3. Execute query > {noformat} > select m.id, m.col1, d.seq, d.col2 > from master m, detail d > where m.id = d.id > and d.id between 1 and 2 > order by m.id desc > {noformat} > h4. (/) Expected result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 2 | A2 | 1 | B1 | > | 2 | A2 | 2 | B2 | > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} > h4. (!) Incorrect result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:59 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} \\ In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id are 1, the rows which id are 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} \\ In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:57 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} \\ In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is {{[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF)}}, so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} \\ In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:58 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} \\ In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} \\ In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:56 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} \\ In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} \\ In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:56 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} \\ \\ In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:55 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use SkipScanFilter: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {noformat} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); {noformat} and modify the sql as : {noformat} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id and
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:47 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); (code} and modify the sql as : {code} select m.id, m.col1,d.col2 from master m, detail d where m.id = d.id and d.id in (3,5,7) order by m.id desc {code}, because
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:46 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); (code} and modify the sql as : {code} select m.id, m.col1,d.col2 from master
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:43 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF, \x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); (code} and modify the sql as : {code:borderStyle=solid}
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:43 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the rows of {{master}} table which id is 1, the rows which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); (code} and modify the sql as : {code:borderStyle=solid}
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:41 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF), so we can only get the row of {{master}} table which id is 1, the row which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not {{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); (code} and modify the sql as : {code:borderStyle=solid} select
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:37 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow is {{\x80\x00\x00\x02\x00}}: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF), so we can only get the row of {{master}} table which id is 1, the row which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not {{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:35 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\\x80\\x00\\x00\\x01}},and maxKey is \\x80\\x00\\x00\\x02\\x00, and correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow is \\x80\\x00\\x00\\x02\\x00: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF), so we can only get the row of {{master}} table which id is 1, the row which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not {{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:35 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\x80\x00\x00\x01}},and maxKey is \\x80\\x00\\x00\\x02\\x00, and correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow is \\x80\\x00\\x00\\x02\\x00: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF), so we can only get the row of {{master}} table which id is 1, the row which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not {{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail
[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei edited comment on PHOENIX-3578 at 2/21/17 9:30 AM: This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is {{\\x80\\x00\\x00\\x01}},and maxKey is \\x80\\x00\\x00\\x02\\x00, and correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow is \\x80\\x00\\x00\\x02\\x00: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF), so we can only get the row of {{master}} table which id is 1, the row which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not {{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail
[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665 ] chenglei commented on PHOENIX-3578: --- This issue is caused by the join dynamic filter, from following RHS, we get d.id is in (1,2): {code:borderStyle=solid} select d.seq,d.col2,d.id from detail d where d.id between 1 and 2 {code} so with join dynamic filter, m.id is also in (1,2). Before applying join dynamic filter,LHS is: {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc {code} Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying join dynamic filter LHS turns to : {code:borderStyle=solid} select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by m.id desc {code} Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then {{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in (1,2)}} into Scan , and useSkipScan is true in following line 274 of {{WhereOptimizer.pushKeyExpressionsToScan}} method: {code:borderStyle=solid} 273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || (hasRangeKey && forcedRangeScan); 274useSkipScan |= !stopExtracting && !forcedRangeScan && (keyRanges.size() > 1 || hasRangeKey); {code} next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in {{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is turned to SchemaUtil.VAR_BINARY_SCHEMA: {code:borderStyle=solid} 111 if (keys.size() > 1 || SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, schema.getField(schema.getFieldCount()-1)) == QueryConstants.DESC_SEPARATOR_BYTE) { 112schema = SchemaUtil.VAR_BINARY_SCHEMA; 113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN; 114 } else { {code} so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is \\x80\\x00\\x00\\x01,and maxKey is \\x80\\x00\\x00\\x02\\x00, and correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow is \\x80\\x00\\x00\\x02\\x00: {code:borderStyle=solid} 134if (nBuckets == null || !isPointLookup || !useSkipScan) { 135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, slotSpan); 136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, slotSpan); {code} In summary, when we scan the LHS {{master}} table, the Scan range is {{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses {{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In {{BaseScannerRegionObserver.preScannerOpen}} method,following {{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow and endRow.Unfortunately, the reversed Scan's range computed by {{ScanUtil.setupReverseScan}} method is [\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF), so we can only get the row of {{master}} table which id is 1, the row which id is 2 is excluded. {code:borderStyle=solid} 621 public static void setupReverseScan(Scan scan) { 622if (isReversed(scan)) { 623byte[] newStartRow = getReversedRow(scan.getStartRow()); 624byte[] newStopRow = getReversedRow(scan.getStopRow()); 625scan.setStartRow(newStopRow); 626scan.setStopRow(newStartRow); 627scan.setReversed(true); 628} 629} {code} In conclusion, following two problems causes this issue: (1) the {{ScanUtil.getReversedRow}} method is not right for {{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not {{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}. (2) even though {{ScanUtil.getReversedRow}} method is right,there may be another problem,if I change the table data as following : {code:borderStyle=solid} UPSERT INTO master VALUES (1, 'A1'); UPSERT INTO master VALUES (2, 'A2'); UPSERT INTO master VALUES (3, 'A3'); UPSERT INTO master VALUES (4, 'A4'); UPSERT INTO master VALUES (5, 'A5'); UPSERT INTO master VALUES (6, 'A6'); UPSERT INTO master VALUES (8, 'A8'); UPSERT INTO detail VALUES (1, 1, 'B1'); UPSERT INTO detail VALUES (2, 2, 'B2'); UPSERT INTO detail VALUES (3, 3, 'B3'); UPSERT INTO detail VALUES (4, 4, 'B4'); UPSERT INTO detail VALUES (5, 5, 'B5'); UPSERT INTO detail VALUES (6, 6, 'B6'); UPSERT INTO detail VALUES (7, 7, 'B7'); UPSERT INTO detail VALUES (8, 8, 'B8'); (code} and modify the sql as :
[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3670: -- Description: In my business system, there is a following join SQL(which is simplified), fact_table is a fact table, joining dimension table dim_table1 and dim_table2 : {code:borderStyle=solid} select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; {code} I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small dataset, the sql executes quickly, but when the dataset is bigger, the sql becomes very slowly,when the row count of fact_table is 30 million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above query costs 17s. When I debug the SQL executing, I find RHS1 return 5523 rows: {code:borderStyle=solid} select d1.id from dim_table1 d1 where d1.code = 2008 {code} and RHS2 return 23881 rows: {code:borderStyle=solid} select d2.id from dim_table2 d2 where d2.region='us' {code} then HashJoinPlan uses KeyRange.intersect(List , List ) method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing down fact_table.cust_id should be. Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql execution only costs 17s.After I read the code of KeyRange.intersect method,I find following two problem: (1) The double loop is inefficient in line 521 and line 522,when keyRanges size is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 5523*23881: {code:borderStyle=solid} 519 public static List intersect(List keyRanges, List keyRanges2) { 520List tmp = new ArrayList(); 521for (KeyRange r1 : keyRanges) { 522for (KeyRange r2 : keyRanges2) { 523KeyRange r = r1.intersect(r2); 524if (EMPTY_RANGE != r) { 525tmp.add(r); 526} 527} 528} {code} (2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as KeyRange.coalesce method does: {code:borderStyle=solid} 532Collections.sort(tmp, KeyRange.COMPARATOR); 533List tmp2 = new ArrayList(); 534KeyRange r = tmp.get(0); 535for (int i=1; i
[jira] [Comment Edited] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978 ] chenglei edited comment on PHOENIX-3670 at 2/14/17 3:43 PM: I uploaded my first patch, could someone help me review for this patch? The time complexity of KeyRange.intersect method in my patch is reduced to O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example explained above,after applied the patch,KeyRange.intersect method only cost 20ms, dramatically faster than original 11s.I also add some unit tests for KeyRange.intersect(List,List) method in my patch. was (Author: comnetwork): I uploaded my first patch, could someone help me review for this patch? The time complexity of KeyRange.intersect method in my patch is reduced to O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example explained above,after applied the patch,KeyRange.intersect method only cost 20ms, dramatically faster than original 11s. > KeyRange.intersect(List , List) is inefficient,especially > for join dynamic filter > - > > Key: PHOENIX-3670 > URL: https://issues.apache.org/jira/browse/PHOENIX-3670 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.9.0 >Reporter: chenglei > Attachments: PHOENIX-3670_v1.patch > > > In my business system, there is a following join SQL(which is simplified), > fact_table is a fact table, joining dimension table dim_table1 and > dim_table2 : > {code:borderStyle=solid} > select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on > t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date > between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; > {code} > I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small > dataset, the sql executes quickly, but when the dataset is bigger, the sql > become very slowly. When the row count of fact_table is 30 > million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above > query costs 17s. > When I debug the SQL executing, I find RHS1 return 5523 rows: > {code:borderStyle=solid} >select d1.id from dim_table1 d1 where d1.code = 2008 > {code} > and RHS2 return 23881 rows: > {code:borderStyle=solid} >select d2.id from dim_table2 d2 where d2.region='us' > {code} > then HashJoinPlan uses KeyRange.intersect(List , List ) > method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down > fact_table.cust_id should be. > Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql > execution only costs 17s.After I read the code of KeyRange.intersect > method,I find following two problem: > 1. The double loop is inefficient in line 521 and line 522,when keyRanges > size is M, keyRanges2 size is N, the time complexity is O(M*N), for my > example,is 5523*23881: > {code:borderStyle=solid} > 519 public static List intersect(List keyRanges, > List keyRanges2) { > 520List tmp = new ArrayList(); > 521for (KeyRange r1 : keyRanges) { > 522for (KeyRange r2 : keyRanges2) { > 523KeyRange r = r1.intersect(r2); > 524if (EMPTY_RANGE != r) { > 525tmp.add(r); > 526} > 527} > 528} > {code} > 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as > KeyRange.coalesce method does: > {code:borderStyle=solid} > 532Collections.sort(tmp, KeyRange.COMPARATOR); > 533List tmp2 = new ArrayList(); > 534KeyRange r = tmp.get(0); > 535for (int i=1; i536if (EMPTY_RANGE == r.intersect(tmp.get(i))) { > 537tmp2.add(r); > 538r = tmp.get(i); > 539} else { > 540r = r.intersect(tmp.get(i)); > 541} > 542} > {code} > and it seems that no unit tests for KeyRange.intersect(List , > List) method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978 ] chenglei edited comment on PHOENIX-3670 at 2/14/17 3:44 PM: I uploaded my first patch, could someone help me review for this patch? The time complexity of KeyRange.intersect method in my patch is reduced to O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example explained above,after applied the patch,KeyRange.intersect method only cost 20ms, dramatically faster than original 11s. I also add some unit tests for KeyRange.intersect(List,List) method in my patch. was (Author: comnetwork): I uploaded my first patch, could someone help me review for this patch? The time complexity of KeyRange.intersect method in my patch is reduced to O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example explained above,after applied the patch,KeyRange.intersect method only cost 20ms, dramatically faster than original 11s.I also add some unit tests for KeyRange.intersect(List,List) method in my patch. > KeyRange.intersect(List , List) is inefficient,especially > for join dynamic filter > - > > Key: PHOENIX-3670 > URL: https://issues.apache.org/jira/browse/PHOENIX-3670 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.9.0 >Reporter: chenglei > Attachments: PHOENIX-3670_v1.patch > > > In my business system, there is a following join SQL(which is simplified), > fact_table is a fact table, joining dimension table dim_table1 and > dim_table2 : > {code:borderStyle=solid} > select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on > t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date > between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; > {code} > I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small > dataset, the sql executes quickly, but when the dataset is bigger, the sql > become very slowly. When the row count of fact_table is 30 > million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above > query costs 17s. > When I debug the SQL executing, I find RHS1 return 5523 rows: > {code:borderStyle=solid} >select d1.id from dim_table1 d1 where d1.code = 2008 > {code} > and RHS2 return 23881 rows: > {code:borderStyle=solid} >select d2.id from dim_table2 d2 where d2.region='us' > {code} > then HashJoinPlan uses KeyRange.intersect(List , List ) > method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down > fact_table.cust_id should be. > Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql > execution only costs 17s.After I read the code of KeyRange.intersect > method,I find following two problem: > 1. The double loop is inefficient in line 521 and line 522,when keyRanges > size is M, keyRanges2 size is N, the time complexity is O(M*N), for my > example,is 5523*23881: > {code:borderStyle=solid} > 519 public static List intersect(List keyRanges, > List keyRanges2) { > 520List tmp = new ArrayList(); > 521for (KeyRange r1 : keyRanges) { > 522for (KeyRange r2 : keyRanges2) { > 523KeyRange r = r1.intersect(r2); > 524if (EMPTY_RANGE != r) { > 525tmp.add(r); > 526} > 527} > 528} > {code} > 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as > KeyRange.coalesce method does: > {code:borderStyle=solid} > 532Collections.sort(tmp, KeyRange.COMPARATOR); > 533List tmp2 = new ArrayList(); > 534KeyRange r = tmp.get(0); > 535for (int i=1; i536if (EMPTY_RANGE == r.intersect(tmp.get(i))) { > 537tmp2.add(r); > 538r = tmp.get(i); > 539} else { > 540r = r.intersect(tmp.get(i)); > 541} > 542} > {code} > and it seems that no unit tests for KeyRange.intersect(List , > List) method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978 ] chenglei commented on PHOENIX-3670: --- I uploaded my first patch, could someone help me review for this patch? The time complexity of KeyRange.intersect method in my patch is reduced to O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example explained above,after applied the patch,KeyRange.intersect method only cost 20ms, dramatically faster than original 11s. > KeyRange.intersect(List , List) is inefficient,especially > for join dynamic filter > - > > Key: PHOENIX-3670 > URL: https://issues.apache.org/jira/browse/PHOENIX-3670 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.9.0 >Reporter: chenglei > Attachments: PHOENIX-3670_v1.patch > > > In my business system, there is a following join SQL(which is simplified), > fact_table is a fact table, joining dimension table dim_table1 and > dim_table2 : > {code:borderStyle=solid} > select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on > t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date > between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; > {code} > I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small > dataset, the sql executes quickly, but when the dataset is bigger, the sql > become very slowly. When the row count of fact_table is 30 > million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above > query costs 17s. > When I debug the SQL executing, I find RHS1 return 5523 rows: > {code:borderStyle=solid} >select d1.id from dim_table1 d1 where d1.code = 2008 > {code} > and RHS2 return 23881 rows: > {code:borderStyle=solid} >select d2.id from dim_table2 d2 where d2.region='us' > {code} > then HashJoinPlan uses KeyRange.intersect(List , List ) > method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down > fact_table.cust_id should be. > Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql > execution only costs 17s.After I read the code of KeyRange.intersect > method,I find following two problem: > 1. The double loop is inefficient in line 521 and line 522,when keyRanges > size is M, keyRanges2 size is N, the time complexity is O(M*N), for my > example,is 5523*23881: > {code:borderStyle=solid} > 519 public static List intersect(List keyRanges, > List keyRanges2) { > 520List tmp = new ArrayList(); > 521for (KeyRange r1 : keyRanges) { > 522for (KeyRange r2 : keyRanges2) { > 523KeyRange r = r1.intersect(r2); > 524if (EMPTY_RANGE != r) { > 525tmp.add(r); > 526} > 527} > 528} > {code} > 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as > KeyRange.coalesce method does: > {code:borderStyle=solid} > 532Collections.sort(tmp, KeyRange.COMPARATOR); > 533List tmp2 = new ArrayList(); > 534KeyRange r = tmp.get(0); > 535for (int i=1; i536if (EMPTY_RANGE == r.intersect(tmp.get(i))) { > 537tmp2.add(r); > 538r = tmp.get(i); > 539} else { > 540r = r.intersect(tmp.get(i)); > 541} > 542} > {code} > and it seems that no unit tests for KeyRange.intersect(List , > List) method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3670: -- Attachment: PHOENIX-3670_v1.patch > KeyRange.intersect(List , List) is inefficient,especially > for join dynamic filter > - > > Key: PHOENIX-3670 > URL: https://issues.apache.org/jira/browse/PHOENIX-3670 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.9.0 >Reporter: chenglei > Attachments: PHOENIX-3670_v1.patch > > > In my business system, there is a following join SQL(which is simplified), > fact_table is a fact table, joining dimension table dim_table1 and > dim_table2 : > {code:borderStyle=solid} > select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on > t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date > between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; > {code} > I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small > dataset, the sql executes quickly, but when the dataset is bigger, the sql > become very slowly. When the row count of fact_table is 30 > million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above > query costs 17s. > When I debug the SQL executing, I find RHS1 return 5523 rows: > {code:borderStyle=solid} >select d1.id from dim_table1 d1 where d1.code = 2008 > {code} > and RHS2 return 23881 rows: > {code:borderStyle=solid} >select d2.id from dim_table2 d2 where d2.region='us' > {code} > then HashJoinPlan uses KeyRange.intersect(List , List ) > method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down > fact_table.cust_id should be. > Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql > execution only costs 17s.After I read the code of KeyRange.intersect > method,I find following two problem: > 1. The double loop is inefficient in line 521 and line 522,when keyRanges > size is M, keyRanges2 size is N, the time complexity is O(M*N), for my > example,is 5523*23881: > {code:borderStyle=solid} > 519 public static List intersect(List keyRanges, > List keyRanges2) { > 520List tmp = new ArrayList(); > 521for (KeyRange r1 : keyRanges) { > 522for (KeyRange r2 : keyRanges2) { > 523KeyRange r = r1.intersect(r2); > 524if (EMPTY_RANGE != r) { > 525tmp.add(r); > 526} > 527} > 528} > {code} > 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as > KeyRange.coalesce method does: > {code:borderStyle=solid} > 532Collections.sort(tmp, KeyRange.COMPARATOR); > 533List tmp2 = new ArrayList(); > 534KeyRange r = tmp.get(0); > 535for (int i=1; i536if (EMPTY_RANGE == r.intersect(tmp.get(i))) { > 537tmp2.add(r); > 538r = tmp.get(i); > 539} else { > 540r = r.intersect(tmp.get(i)); > 541} > 542} > {code} > and it seems that no unit tests for KeyRange.intersect(List , > List) method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3670: -- Description: In my business system, there is a following join SQL(which is simplified), fact_table is a fact table, joining dimension table dim_table1 and dim_table2 : {code:borderStyle=solid} select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; {code} I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small dataset, the sql executes quickly, but when the dataset is bigger, the sql become very slowly. When the row count of fact_table is 30 million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above query costs 17s. When I debug the SQL executing, I find RHS1 return 5523 rows: {code:borderStyle=solid} select d1.id from dim_table1 d1 where d1.code = 2008 {code} and RHS2 return 23881 rows: {code:borderStyle=solid} select d2.id from dim_table2 d2 where d2.region='us' {code} then HashJoinPlan uses KeyRange.intersect(List , List ) method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down fact_table.cust_id should be. Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql execution only costs 17s.After I read the code of KeyRange.intersect method,I find following two problem: 1. The double loop is inefficient in line 521 and line 522,when keyRanges size is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 5523*23881: {code:borderStyle=solid} 519 public static List intersect(List keyRanges, List keyRanges2) { 520List tmp = new ArrayList(); 521for (KeyRange r1 : keyRanges) { 522for (KeyRange r2 : keyRanges2) { 523KeyRange r = r1.intersect(r2); 524if (EMPTY_RANGE != r) { 525tmp.add(r); 526} 527} 528} {code} 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as KeyRange.coalesce method does: {code:borderStyle=solid} 532Collections.sort(tmp, KeyRange.COMPARATOR); 533List tmp2 = new ArrayList(); 534KeyRange r = tmp.get(0); 535for (int i=1; i
[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3670: -- Description: In my business system, there is a following join SQL(which is simplified), fact_table is a fact table, joining dimension table dim_table1 and dim_table2 : {code:borderStyle=solid} select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; {code} I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small dataset, the sql executes quickly, but when the dataset is bigger, the sql become very slowly. When the row count of fact_table is 30 million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above query costs 17s. When I debug the SQL executing, I find RHS1 return 5523 rows: {code:borderStyle=solid} select d1.id from dim_table1 d1 where d1.code = 2008 {code} and RHS2 return 23881 rows: {code:borderStyle=solid} select d2.id from dim_table2 d2 where d2.region='us' {code} then HashJoinPlan uses KeyRange.intersect(List , List ) method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down fact_table.cust_id should be. Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql execution only costs 17s.After I read the code of KeyRange.intersect method,I find following two problem: 1. The double loop is inefficient in line 521 and line 522,when keyRanges size is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 5523*23881: {code:borderStyle=solid} 519 public static List intersect(List keyRanges, List keyRanges2) { 520List tmp = new ArrayList(); 521for (KeyRange r1 : keyRanges) { 522for (KeyRange r2 : keyRanges2) { 523KeyRange r = r1.intersect(r2); 524if (EMPTY_RANGE != r) { 525tmp.add(r); 526} 527} 528} {code} 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect,just as KeyRange.coalesce method does: {code:borderStyle=solid} 532Collections.sort(tmp, KeyRange.COMPARATOR); 533List tmp2 = new ArrayList(); 534KeyRange r = tmp.get(0); 535for (int i=1; i
[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter
[ https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3670: -- Summary: KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter (was: KeyRange.intersect(List , List ) is inefficient,especially for join dynamic filter) > KeyRange.intersect(List , List) is inefficient,especially > for join dynamic filter > - > > Key: PHOENIX-3670 > URL: https://issues.apache.org/jira/browse/PHOENIX-3670 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.9.0 >Reporter: chenglei > > In my business system, there is a following join SQL(which is simplified), > fact_table is a fact table, joining dimension table dim_table1 and > dim_table2 : > {code:borderStyle=solid} > select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on > t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date > between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; > {code} > I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small > dataset, the sql executes quickly, but when the dataset is bigger, the sql > become very slowly. When the row count of fact_table is 30 > million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above > query costs 17s. > When I debug the SQL executing, I find RHS1 return 5523 rows: > {code:borderStyle=solid} >select d1.id from dim_table1 d1 where d1.code = 2008 > {code} > and RHS2 return 23881 rows: > {code:borderStyle=solid} >select d2.id from dim_table2 d2 where d2.region='us' > {code} > then HashJoinPlan uses KeyRange.intersect(List , List ) > method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down > fact_table.cust_id should be. > Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql > execution only costs 17s.After I read the code of KeyRange.intersect > method,I find following two problem: > 1. The double loop is inefficient in line 521 and line 522,when keyRanges > size is M, keyRanges2 size is N, the time complexity is O(M*N), for my > example,is 5523*23881: > {code:borderStyle=solid} > 519 public static List intersect(List keyRanges, > List keyRanges2) { > 520List tmp = new ArrayList(); > 521for (KeyRange r1 : keyRanges) { > 522for (KeyRange r2 : keyRanges2) { > 523KeyRange r = r1.intersect(r2); > 524if (EMPTY_RANGE != r) { > 525tmp.add(r); > 526} > 527} > 528} > {code} > 2. line 540 shoule be r = r.union(tmp.get(i)), not intersect,just as > KeyRange.coalesce method does: > {code:borderStyle=solid} > 532Collections.sort(tmp, KeyRange.COMPARATOR); > 533List tmp2 = new ArrayList(); > 534KeyRange r = tmp.get(0); > 535for (int i=1; i536if (EMPTY_RANGE == r.intersect(tmp.get(i))) { > 537tmp2.add(r); > 538r = tmp.get(i); > 539} else { > 540r = r.intersect(tmp.get(i)); > 541} > 542} > {code} > and it seems that no unit tests for KeyRange.intersect(List , > List) method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PHOENIX-3670) KeyRange.intersect(List , List ) is inefficient,especially for join dynamic filter
chenglei created PHOENIX-3670: - Summary: KeyRange.intersect(List , List ) is inefficient,especially for join dynamic filter Key: PHOENIX-3670 URL: https://issues.apache.org/jira/browse/PHOENIX-3670 Project: Phoenix Issue Type: Improvement Affects Versions: 4.9.0 Reporter: chenglei In my business system, there is a following join SQL(which is simplified), fact_table is a fact table, joining dimension table dim_table1 and dim_table2 : {code:borderStyle=solid} select /*+ SKIP_SCAN */ sum(t.click) from fact_table t join dim_table1 d1 on t.cust_id=d1.id join dim_table2 d2 on t.cust_id =d2.id where t.date between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us'; {code} I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small dataset, the sql executes quickly, but when the dataset is bigger, the sql become very slowly. When the row count of fact_table is 30 million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above query costs 17s. When I debug the SQL executing, I find RHS1 return 5523 rows: {code:borderStyle=solid} select d1.id from dim_table1 d1 where d1.code = 2008 {code} and RHS2 return 23881 rows: {code:borderStyle=solid} select d2.id from dim_table2 d2 where d2.region='us' {code} then HashJoinPlan uses KeyRange.intersect(List , List ) method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down fact_table.cust_id should be. Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql execution only costs 17s.After I read the code of KeyRange.intersect method,I find following two problem: 1. The double loop is inefficient in line 521 and line 522,when keyRanges size is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 5523*23881: {code:borderStyle=solid} 519 public static List intersect(List keyRanges, List keyRanges2) { 520List tmp = new ArrayList(); 521for (KeyRange r1 : keyRanges) { 522for (KeyRange r2 : keyRanges2) { 523KeyRange r = r1.intersect(r2); 524if (EMPTY_RANGE != r) { 525tmp.add(r); 526} 527} 528} {code} 2. line 540 shoule be r = r.union(tmp.get(i)), not intersect,just as KeyRange.coalesce method does: {code:borderStyle=solid} 532Collections.sort(tmp, KeyRange.COMPARATOR); 533List tmp2 = new ArrayList(); 534KeyRange r = tmp.get(0); 535for (int i=1; i
[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc
[ https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817695#comment-15817695 ] chenglei commented on PHOENIX-3578: --- It can also be reproduced under 4.9.0, it may be caused by the fact that the Join SQL is using SkipScanFilter after dynamic filtering but the sql is also OrderBy.REV_ROW_KEY_ORDER_BY. > Incorrect query results when applying inner join and orderby desc > - > > Key: PHOENIX-3578 > URL: https://issues.apache.org/jira/browse/PHOENIX-3578 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 > Environment: hbase-1.1.2 >Reporter: sungmin.cho > > Step to reproduce: > h4. 1. Create two tables > {noformat} > CREATE TABLE IF NOT EXISTS master ( > id integer not null, > col1 varchar, > constraint pk_master primary key(id) > ); > CREATE TABLE IF NOT EXISTS detail ( > id integer not null, > seq integer not null, > col2 varchar, > constraint pk_master primary key(id, seq) > ); > {noformat} > h4. 2. Upsert values > {noformat} > upsert into master values(1, 'A1'); > upsert into master values(2, 'A2'); > upsert into master values(3, 'A3'); > upsert into detail values(1, 1, 'B1'); > upsert into detail values(1, 2, 'B2'); > upsert into detail values(2, 1, 'B1'); > upsert into detail values(2, 2, 'B2'); > upsert into detail values(3, 1, 'B1'); > upsert into detail values(3, 2, 'B2'); > upsert into detail values(3, 3, 'B3'); > {noformat} > h4. 3. Execute query > {noformat} > select m.id, m.col1, d.seq, d.col2 > from master m, detail d > where m.id = d.id > and d.id between 1 and 2 > order by m.id desc > {noformat} > h4. (/) Expected result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 2 | A2 | 1 | B1 | > | 2 | A2 | 2 | B2 | > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} > h4. (!) Incorrect result > {noformat} > +---+-++-+ > | M.ID | M.COL1 | D.SEQ | D.COL2 | > +---+-++-+ > | 1 | A1 | 1 | B1 | > | 1 | A1 | 2 | B2 | > +---+-++-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787679#comment-15787679 ] chenglei edited comment on PHOENIX-3491 at 12/30/16 1:36 PM: - [~jamestaylor],I noticed this patch had be pushed to master branch and included in 4.9.0,Could you please close this issue and mark it resolved ? was (Author: comnetwork): [~jamestaylor],I noticed this patch was pushed to master branch and included in 4.9.0,Could you please close this issue and mark it resolved ? > OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy > is reverse > > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787647#comment-15787647 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 1:34 PM: - I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.I ran all the existing unit tests and IT tests in my local machine. was (Author: comnetwork): I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.I ran the all the existing unit tests and IT tests in my local machine > Secondary index and query using distinct: Outer query results in ERROR 201 > (22000): Illegal data. CHAR types may only contain single byte characters > > > Key: PHOENIX-3453 > URL: https://issues.apache.org/jira/browse/PHOENIX-3453 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3453_v1.patch > > > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID > DESC); > UPSERT INTO test.test VALUES ('entity1',1.1); > SELECT DISTINCT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > LIMIT 25 > ); > Output (in SQuirreL) > ��� 1.1 > If you run it in SQuirreL it results in the entity_id column getting the > above error value. Notice that if you remove the secondary index or DISTINCT > you get the correct result. > I've also run the query through the Phoenix java api. Then I get the > following exception: > Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types > may only contain single byte characters () > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) > at > org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291) > at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121) > at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997) > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787647#comment-15787647 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 1:34 PM: - I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.I ran the all the existing unit tests and IT tests in my local machine was (Author: comnetwork): I uploaded my first patch,[~jamestaylor],please help me have a review,thanks. > Secondary index and query using distinct: Outer query results in ERROR 201 > (22000): Illegal data. CHAR types may only contain single byte characters > > > Key: PHOENIX-3453 > URL: https://issues.apache.org/jira/browse/PHOENIX-3453 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3453_v1.patch > > > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID > DESC); > UPSERT INTO test.test VALUES ('entity1',1.1); > SELECT DISTINCT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > LIMIT 25 > ); > Output (in SQuirreL) > ��� 1.1 > If you run it in SQuirreL it results in the entity_id column getting the > above error value. Notice that if you remove the secondary index or DISTINCT > you get the correct result. > I've also run the query through the Phoenix java api. Then I get the > following exception: > Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types > may only contain single byte characters () > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) > at > org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291) > at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121) > at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997) > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787679#comment-15787679 ] chenglei commented on PHOENIX-3491: --- [~jamestaylor],I noticed this patch was pushed to master branch and included in 4.9.0,Could you please close this issue and mark it resolved ? > OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy > is reverse > > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 1:27 PM: - I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.getDefault(),which is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the HBase RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC.That is to say,the SortOrder of the groupBy key got from ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code} Unfortunately, finally in following PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method in line 524, the ColumnProjector's
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 1:24 PM: - I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.getDefault(),which is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the HBase RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC.That is to say,the SortOrder of the groupBy key got from ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code} Unfortunately, finally in following PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method in line 524, the ColumnProjector's
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 1:22 PM: - I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the HBase RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC.That is to say,the SortOrder of the groupBy key got from ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code} Unfortunately, finally in following PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method in line 524, the ColumnProjector's Expression is
[jira] [Commented] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787647#comment-15787647 ] chenglei commented on PHOENIX-3453: --- I uploaded my first patch,[~jamestaylor],please help me have a review,thanks. > Secondary index and query using distinct: Outer query results in ERROR 201 > (22000): Illegal data. CHAR types may only contain single byte characters > > > Key: PHOENIX-3453 > URL: https://issues.apache.org/jira/browse/PHOENIX-3453 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3453_v1.patch > > > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID > DESC); > UPSERT INTO test.test VALUES ('entity1',1.1); > SELECT DISTINCT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > LIMIT 25 > ); > Output (in SQuirreL) > ��� 1.1 > If you run it in SQuirreL it results in the entity_id column getting the > above error value. Notice that if you remove the secondary index or DISTINCT > you get the correct result. > I've also run the query through the Phoenix java api. Then I get the > following exception: > Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types > may only contain single byte characters () > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) > at > org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291) > at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121) > at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997) > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3453: -- Attachment: PHOENIX-3453_v1.patch > Secondary index and query using distinct: Outer query results in ERROR 201 > (22000): Illegal data. CHAR types may only contain single byte characters > > > Key: PHOENIX-3453 > URL: https://issues.apache.org/jira/browse/PHOENIX-3453 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3453_v1.patch > > > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID > DESC); > UPSERT INTO test.test VALUES ('entity1',1.1); > SELECT DISTINCT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > LIMIT 25 > ); > Output (in SQuirreL) > ��� 1.1 > If you run it in SQuirreL it results in the entity_id column getting the > above error value. Notice that if you remove the secondary index or DISTINCT > you get the correct result. > I've also run the query through the Phoenix java api. Then I get the > following exception: > Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types > may only contain single byte characters () > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) > at > org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291) > at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121) > at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997) > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 11:18 AM: -- I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which would convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the HBase RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC.That is to say,the SortOrder of the groupBy key got from ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code} Unfortunately, finally in following PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method in line 524, the ColumnProjector's Expression is
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 11:15 AM: -- I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which would convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the HBase RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC.That is to say,the SortOrder of the groupBy key got from ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code} Unfortunately, finally in PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method, the columnProjector's Expression is RowKeyColumnExpression,which thinks
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 11:07 AM: -- I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which would convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code} Unfortunately, finally in PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method, the columnProjector's Expression is RowKeyColumnExpression,which thinks the SortOrder of bytes got from the above-mentioned ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.DESC, so it decodes the bytes
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 11:03 AM: -- I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which would convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code:borderStyle=solid} Unfortunately, finally in PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method, the columnProjector's Expression is RowKeyColumnExpression,which expects the SortOrder of bytes got from the above-mentioned ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.DESC, so
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 11:04 AM: -- I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debug into the source code,I find the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is CoerceExpression wrapping ProjectedColumnExpression ,which would convert the the ProjectedColumnExpression from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} When I look into CoerceExpression.create method,in line 68 of the following code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression is SortOrder.DESC: {code:borderStyle=solid} 46 public static Expression create(Expression expression, PDataType toType) throws SQLException { 47if (toType == expression.getDataType()) { 48return expression; 49} 50return new CoerceExpression(expression, toType); 51} .. 66 //Package protected for tests 67CoerceExpression(Expression expression, PDataType toType) { 68this(expression, toType, SortOrder.getDefault(), null, true); 69} {code} So when we get the query results, in ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that actualType parameter is PInteger,actualModifier parameter is SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 842,we get the PInteger "1" from the ptr which is got from the RegionServer, and in line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we encode the PDecimal "1" to bytes, but because the expectedModifier parameter is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC. {code:borderStyle=solid} 826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType actualType, Integer actualMaxLength, 827Integer actualScale, SortOrder actualModifier, Integer desiredMaxLength, Integer desiredScale, 828SortOrder expectedModifier) { .. 840// Optimization for cases in which we already have the object around 841if (o == null) { 842o = actualType.toObject(ptr, actualType, actualModifier); 843} 844 845o = toObject(o, actualType); 846byte[] b = toBytes(o, expectedModifier); 847ptr.set(b); 848} {code} Unfortunately, finally in PhoenixResult.getObject method, when we invoke the ColumnProjector.getValue method, the columnProjector's Expression is RowKeyColumnExpression,which expects the SortOrder of bytes got from the above-mentioned ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.DESC, so it decode the bytes
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 9:09 AM: - I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. When I debuged into the source code,I found the cause of the problem is the distinct(or group by) statement in the outer query.By the following code in GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's expressions is ProjectedColumnExpression,but in line 245, the "entity" column in GroupBy's keyExpressions is replaced by CoerceExpression,which would convert the the "entity" column from PInteger to PDecimal: {code:borderStyle=solid} 232 for (int i = expressions.size()-2; i >= 0; i--) { 233Expression expression = expressions.get(i); 234PDataType keyType = getGroupByDataType(expression); 235if (keyType == expression.getDataType()) { 236continue; 237} 238// Copy expressions only when keyExpressions will be different than expressions 239if (keyExpressions == expressions) { 240keyExpressions = new ArrayList(expressions); 241} 242// Wrap expression in an expression that coerces the expression to the required type.. 243// This is done so that we have a way of expressing null as an empty key when more 244// than one fixed and nullable types are used in a group by clause 245keyExpressions.set(i, CoerceExpression.create(expression, keyType)); 246} {code} was (Author: comnetwork): I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. > Secondary index and query using distinct: Outer query results in ERROR 201 > (22000): Illegal data. CHAR types may only contain single byte characters > > > Key: PHOENIX-3453 > URL: https://issues.apache.org/jira/browse/PHOENIX-3453 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID > DESC); > UPSERT INTO test.test VALUES ('entity1',1.1); > SELECT DISTINCT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > LIMIT 25 > ); > Output (in SQuirreL) > ��� 1.1 > If you run it in SQuirreL it results in the entity_id column getting the > above error value. Notice that if you remove the secondary index or DISTINCT > you get the correct result. > I've also run the query through the Phoenix java api. Then I get the > following exception: > Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types > may only contain single byte characters () > at >
[jira] [Commented] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei commented on PHOENIX-3453: --- I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when there SQL has a SubQuery. > Secondary index and query using distinct: Outer query results in ERROR 201 > (22000): Illegal data. CHAR types may only contain single byte characters > > > Key: PHOENIX-3453 > URL: https://issues.apache.org/jira/browse/PHOENIX-3453 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID > DESC); > UPSERT INTO test.test VALUES ('entity1',1.1); > SELECT DISTINCT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > LIMIT 25 > ); > Output (in SQuirreL) > ��� 1.1 > If you run it in SQuirreL it results in the entity_id column getting the > above error value. Notice that if you remove the secondary index or DISTINCT > you get the correct result. > I've also run the query through the Phoenix java api. Then I get the > following exception: > Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types > may only contain single byte characters () > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) > at > org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291) > at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121) > at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997) > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters
[ https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222 ] chenglei edited comment on PHOENIX-3453 at 12/30/16 8:40 AM: - I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when the SQL has a SubQuery. was (Author: comnetwork): I wrote following test case to make this problem can be reproduced under 4.9.0, simplifying the original test case by removing the index table and change the type from CHAR(15) to Integer, which is more easier to debug: {code:borderStyle=solid} CREATE TABLE GROUPBY3453_INT ( ENTITY_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID DESC,SCORE DESC) ) UPSERT INTO GROUPBY3453_INT VALUES (1,1,1) select DISTINCT entity_id, score from ( select entity_id, score from GROUPBY3453_INT limit 1) {code} the expecting result is : {code:borderStyle=solid} 1 1 {code} but the actual result is: {code:borderStyle=solid} -104 1 {code} This problem can only be reproduced when there SQL has a SubQuery. > Secondary index and query using distinct: Outer query results in ERROR 201 > (22000): Illegal data. CHAR types may only contain single byte characters > > > Key: PHOENIX-3453 > URL: https://issues.apache.org/jira/browse/PHOENIX-3453 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID > DESC); > UPSERT INTO test.test VALUES ('entity1',1.1); > SELECT DISTINCT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > LIMIT 25 > ); > Output (in SQuirreL) > ��� 1.1 > If you run it in SQuirreL it results in the entity_id column getting the > above error value. Notice that if you remove the secondary index or DISTINCT > you get the correct result. > I've also run the query through the Phoenix java api. Then I get the > following exception: > Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types > may only contain single byte characters () > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) > at > org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291) > at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121) > at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997) > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678453#comment-15678453 ] chenglei edited comment on PHOENIX-3491 at 11/19/16 3:11 AM: - [~jamestaylor],because I have some time difference with you,sorry for late response,yes ,I ran the all the existing unit tests and IT tests in my local machine,seems the https://builds.apache.org/job/PreCommit-PHOENIX-Build/ is hanging. was (Author: comnetwork): [~jamestaylor],because I have time difference with you,sorry for late response,yes ,I ran the all the existing unit tests and IT tests in my local machine,seems the https://builds.apache.org/job/PreCommit-PHOENIX-Build/ is hanging. > OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy > is reverse > > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678453#comment-15678453 ] chenglei commented on PHOENIX-3491: --- [~jamestaylor],because I have time difference with you,sorry for late response,yes ,I ran the all the existing unit tests and IT tests in my local machine,seems the https://builds.apache.org/job/PreCommit-PHOENIX-Build/ is hanging. > OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy > is reverse > > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508 ] chenglei edited comment on PHOENIX-3491 at 11/18/16 1:17 PM: - The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seem very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { Expression expression = RowKeyExpression.INSTANCE; boolean isNullsLast=false; boolean isAscending=true; if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) { isNullsLast=true; //which is needed for the whole rowKey. isAscending=false; } OrderByExpression orderByExpression = new OrderByExpression(expression, isNullsLast, isAscending); int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); } {code} After modifying AggregatePlan ,It seems we can remove the code in OrderPreservingTracker, I include a lot unit tests and IT tests in my patch to check it, such as ASC/DESC, salted Table, mult-region table and NULLS FIRST/LAST. was (Author: comnetwork): The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new
[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508 ] chenglei edited comment on PHOENIX-3491 at 11/18/16 1:16 PM: - The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { Expression expression = RowKeyExpression.INSTANCE; boolean isNullsLast=false; boolean isAscending=true; if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) { isNullsLast=true; //which is needed for the whole rowKey. isAscending=false; } OrderByExpression orderByExpression = new OrderByExpression(expression, isNullsLast, isAscending); int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); } {code} After modifying AggregatePlan ,It seems we can remove the code in OrderPreservingTracker, I include a lot unit tests and IT tests in my patch to check it, such as ASC/DESC, salted Table, mult-region table and NULLS FIRST/LAST. was (Author: comnetwork): The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new
[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508 ] chenglei edited comment on PHOENIX-3491 at 11/18/16 1:15 PM: - The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { Expression expression = RowKeyExpression.INSTANCE; boolean isNullsLast=false; boolean isAscending=true; if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) { isNullsLast=true; //which is needed for the whole rowKey. isAscending=false; } OrderByExpression orderByExpression = new OrderByExpression(expression, isNullsLast, isAscending); int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); } {code} After modifying AggregatePlan ,It seems we can remove the code in OrderPreservingTracker, I include a lot unit tests and IT tests in my patch to check it, such as ASC/DESC, Salted Table, mult-region table and NULLS FIRST/LAST. was (Author: comnetwork): The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new
[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Summary: OrderBy should be compiled out if GroupBy is not orderPreserving and OrderBy is reverse (was: OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse) > OrderBy should be compiled out if GroupBy is not orderPreserving and OrderBy > is reverse > --- > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Summary: OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse (was: OrderBy should be compiled out if GroupBy is not orderPreserving and OrderBy is reverse) > OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy > is reverse > > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508 ] chenglei edited comment on PHOENIX-3491 at 11/18/16 12:07 PM: -- The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { Expression expression = RowKeyExpression.INSTANCE; boolean isNullsLast=false; boolean isAscending=true; if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) { isNullsLast=true; //which is needed for the whole rowKey. isAscending=false; } OrderByExpression orderByExpression = new OrderByExpression(expression, isNullsLast, isAscending); int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); } {code} After modifying AggregatePlan ,It seems we can remove the code in OrderPreservingTracker, I include a lot of test cases in my patch to prove it, such as ASC/DESC, Salted Table and NULLS FIRST/LAST. was (Author: comnetwork): The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner,
[jira] [Commented] (PHOENIX-3451) Incorrect determination of preservation of order for an aggregate query leads to incorrect query results
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676576#comment-15676576 ] chenglei commented on PHOENIX-3451: --- [~jamestaylor], I filed a JIRA PHOENIX-3491 to remove the following code you mentioned, and uploaded my patch, please help me review,thank you. {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} > Incorrect determination of preservation of order for an aggregate query leads > to incorrect query results > > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Fix For: 4.9.0, 4.8.2 > > Attachments: PHOENIX-3451_v1.patch, PHOENIX-3451_v2.patch > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Attachment: (was: PHOENIX-3491_v1.patch) > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Attachment: PHOENIX-3491_v1.patch > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Attachment: PHOENIX-3491_v1.patch > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > Attachments: PHOENIX-3491_v1.patch > > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508 ] chenglei edited comment on PHOENIX-3491 at 11/18/16 11:54 AM: -- The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} The above code and its comment seems very strange, in fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { Expression expression = RowKeyExpression.INSTANCE; boolean isNullsLast=false; boolean isAscending=true; if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) { isNullsLast=true; //which is needed for the whole rowKey. isAscending=false; } OrderByExpression orderByExpression = new OrderByExpression(expression, isNullsLast, isAscending); int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); } {code} After modifying AggregatePlan ,It seems we can remove the code in OrderPreservingTracker, I include a lot of test cases to prove it, such as ASC/DESC, Salted Table and NULLS FIRST/LAST. was (Author: comnetwork): The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} In fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code}
[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508 ] chenglei edited comment on PHOENIX-3491 at 11/18/16 11:24 AM: -- The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} 123/* 124 * When a GROUP BY is not order preserving, we cannot do a reverse 125 * scan to eliminate the ORDER BY since our server-side scan is not 126 * ordered in that case. 127 */ 128if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { 129isOrderPreserving = false; 130isReverse = false; 131return; 132} {code} In fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { Expression expression = RowKeyExpression.INSTANCE; boolean isNullsLast=false; boolean isAscending=true; if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) { isNullsLast=true; //which is needed for the whole rowKey. isAscending=false; } OrderByExpression orderByExpression = new OrderByExpression(expression, isNullsLast, isAscending); int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); } {code} was (Author: comnetwork): The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} /* * When a GROUP BY is not order preserving, we cannot do a reverse * scan to eliminate the ORDER BY since our server-side scan is not * ordered in that case. */ if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { isOrderPreserving = false; isReverse = false; return; } {code} In fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan)
[jira] [Commented] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508 ] chenglei commented on PHOENIX-3491: --- The OrderBy could not be compiled out because the following code in OrderPreservingTracker : {code:borderStyle=solid} /* * When a GROUP BY is not order preserving, we cannot do a reverse * scan to eliminate the ORDER BY since our server-side scan is not * ordered in that case. */ if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { isOrderPreserving = false; isReverse = false; return; } {code} In fact,if GroupBy is not orderPreserving, AggregatePlan would sort the aggregated Keys after geting results from RegionServer at the client side, but AggregatePlan always uses ASC order to sort the aggregated Keys, just as the following code,so if we just remove above code, the query result will be error when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY : {code:borderStyle=solid} 137 public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { 138Expression expression = RowKeyExpression.INSTANCE; 139OrderByExpression orderByExpression = new OrderByExpression(expression, false, true); 140int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); 141return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); 142 } {code} Therefore,we can modify the AggregatePlan to make it can sort the aggregated Keys ASC or DESC: {code:borderStyle=solid} public PeekingResultIterator newIterator(StatementContext context, ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws SQLException { Expression expression = RowKeyExpression.INSTANCE; boolean isNullsLast=false; boolean isAscending=true; if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) { isNullsLast=true; //which is needed for the whole rowKey. isAscending=false; } OrderByExpression orderByExpression = new OrderByExpression(expression, isNullsLast, isAscending); int threshold = services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES); return new OrderedResultIterator(scanner, Collections.singletonList(orderByExpression), threshold); } {code} > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID >
[jira] [Assigned] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei reassigned PHOENIX-3491: - Assignee: chenglei > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei >Assignee: chenglei > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Description: for the following table: {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )); {code} If we execute explain on the following sql: {code:borderStyle=solid} SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} the result is : {code:borderStyle=solid} --+ | PLAN | +--+ | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST| | SERVER FILTER BY FIRST KEY ONLY | | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | | CLIENT MERGE SORT| | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | +--+ {code} from the above explain result, we can see that the ORDER BY ORGANIZATION_ID DESC, SCORE DESC is not compiled out,but obviously it should be compiled out as OrderBy.REV_ROW_KEY_ORDER_BY. was: for the following table: {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )); {code} If we execute explain on the following sql: {code:borderStyle=solid} SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} the result is : {code:borderStyle=solid} --+ | PLAN | +--+ | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST| | SERVER FILTER BY FIRST KEY ONLY | | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | | CLIENT MERGE SORT| | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | +--+ {code} from the above explain result, we can see that the ORDER BY ORGANIZATION_ID DESC, SCORE DESC is not compiled out,but obviously it should be compiled out as OrderBy.REV_ROW_KEY_ORDER_BY. > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | >
[jira] [Commented] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672486#comment-15672486 ] chenglei commented on PHOENIX-3491: --- Just from PHOENIX-3451,but is irrelevant to PHOENIX-3451,so open a new JIRA > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672486#comment-15672486 ] chenglei edited comment on PHOENIX-3491 at 11/17/16 2:51 AM: - Just from PHOENIX-3451,but is irrelevant to PHOENIX-3451,so open a new JIRA. was (Author: comnetwork): Just from PHOENIX-3451,but is irrelevant to PHOENIX-3451,so open a new JIRA > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obviously it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Summary: OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse (was: OrderBy should be compiled out when GroupBy is not OrderPreserving but OrderBy is reverse) > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | > +--+ > | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST| > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | > | CLIENT MERGE SORT| > | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | > +--+ > {code} > from the above explain result, we can see that the ORDER BY ORGANIZATION_ID > DESC, SCORE DESC is not compiled out,but obvious it should be compiled out > as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse
[ https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3491: -- Description: for the following table: {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )); {code} If we execute explain on the following sql: {code:borderStyle=solid} SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} the result is : {code:borderStyle=solid} --+ | PLAN | +--+ | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST| | SERVER FILTER BY FIRST KEY ONLY | | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | | CLIENT MERGE SORT| | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | +--+ {code} from the above explain result, we can see that the ORDER BY ORGANIZATION_ID DESC, SCORE DESC is not compiled out,but obviously it should be compiled out as OrderBy.REV_ROW_KEY_ORDER_BY. was: for the following table: {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )); {code} If we execute explain on the following sql: {code:borderStyle=solid} SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} the result is : {code:borderStyle=solid} --+ | PLAN | +--+ | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST| | SERVER FILTER BY FIRST KEY ONLY | | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | | CLIENT MERGE SORT| | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | +--+ {code} from the above explain result, we can see that the ORDER BY ORGANIZATION_ID DESC, SCORE DESC is not compiled out,but obvious it should be compiled out as OrderBy.REV_ROW_KEY_ORDER_BY. > OrderBy should be compiled out when GroupBy is not orderPreserving but > OrderBy is reverse > - > > Key: PHOENIX-3491 > URL: https://issues.apache.org/jira/browse/PHOENIX-3491 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: chenglei > > for the following table: > {code:borderStyle=solid} > CREATE TABLE ORDERBY_TEST ( > ORGANIZATION_ID INTEGER NOT NULL, > CONTAINER_ID INTEGER NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID INTEGER NOT NULL, >CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > SCORE, > ENTITY_ID > )); > {code} > > If we execute explain on the following sql: > {code:borderStyle=solid} > SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, > SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC > {code} > the result is : > {code:borderStyle=solid} > --+ > | PLAN | >
[jira] [Created] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not OrderPreserving but OrderBy is reverse
chenglei created PHOENIX-3491: - Summary: OrderBy should be compiled out when GroupBy is not OrderPreserving but OrderBy is reverse Key: PHOENIX-3491 URL: https://issues.apache.org/jira/browse/PHOENIX-3491 Project: Phoenix Issue Type: Improvement Affects Versions: 4.8.0 Reporter: chenglei for the following table: {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )); {code} If we execute explain on the following sql: {code:borderStyle=solid} SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} the result is : {code:borderStyle=solid} --+ | PLAN | +--+ | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST| | SERVER FILTER BY FIRST KEY ONLY | | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE] | | CLIENT MERGE SORT| | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC] | +--+ {code} from the above explain result, we can see that the ORDER BY ORGANIZATION_ID DESC, SCORE DESC is not compiled out,but obvious it should be compiled out as OrderBy.REV_ROW_KEY_ORDER_BY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670770#comment-15670770 ] chenglei commented on PHOENIX-3451: --- Thank you for feedback ,[~jamestaylor], yes, if those lines are added back, it seems Ok, I will test a bit more, and I will open a new JIRA for the optimization problem. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451_v1.patch > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 2:49 PM: - [~jamestaylor], I have a problem when I testing your patch: Why did you remove out the following lines,or may be because your another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} Indeed the above code which you removed in your patch should be removed,but if the AggregatePlan is not modified, just remove out the above code may cause problem. It is actually a optimization and is irrelevant to PHOENIX-3451,maybe I can open a new JIRA if the JIRA does not exist. was (Author: comnetwork): [~jamestaylor], I have a problem when I tested your patch: Why did you remove out the following lines,or may be because your another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 2:49 PM: - [~jamestaylor], I have a problem when I testing your patch: Why did you remove out the following lines,or maybe because your another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} Indeed the above code which you removed in your patch should be removed,but if the AggregatePlan is not modified, just remove out the above code may cause problem. It is actually a optimization and is irrelevant to PHOENIX-3451,maybe I can open a new JIRA if the JIRA does not exist. was (Author: comnetwork): [~jamestaylor], I have a problem when I testing your patch: Why did you remove out the following lines,or may be because your another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 2:48 PM: - [~jamestaylor], I have a problem when I tested your patch: Why did you remove out the following lines,or may be because your another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} Indeed the above code which you removed in your patch should be removed,but if the AggregatePlan is not modified, just remove out the above code may cause problem. It is actually a optimization and is irrelevant to PHOENIX-3451,maybe I can open a new JIRA if the JIRA does not exist. was (Author: comnetwork): [~jamestaylor], I have a problem when I tested your patch: Why did you remove out the following lines,or may be because your another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 2:46 PM: - [~jamestaylor], I have a problem when I tested your patch: Why did you remove out the following lines,or may be because your another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST GROUP BY ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} Indeed the above code which you removed in your patch should be removed,but if the AggregatePlan is not modified, just remove out the above code may cause problem. It is actually a optimization and is irrelevant to PHOENIX-3451,maybe I can open a new JIRA if the JIRA does not exist. was (Author: comnetwork): [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 2:38 PM: - [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} So indeed the above code should be removed,but if the AggregatePlan is not modified, just remove out the above code may cause problem, this actually is a optimization and is irrelevant to PHOENIX-3451,maybe I can open a new JIRA if the JIRA does not exist. was (Author: comnetwork): [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 11:44 AM: -- [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} So if the AggregatePlan is not modified, just remove out the above code may cause problem. Maybe I can open a new JIRA to fix this problem if the JIRA does not exist,because it is irrelevant to PHOENIX-3451 was (Author: comnetwork): [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 11:41 AM: -- [~jamestaylor], I have a problem with your patch: Why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} So if the AggregatePlan is not modified, just remove out the above code may cause problem. Maybe I can open a new JIRA to fix this problem if the JIRA does not exist. was (Author: comnetwork): [~jamestaylor],I have a problem with your patch: why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234 ] chenglei commented on PHOENIX-3451: --- [~jamestaylor],I have a problem with your patch: why did you remove out the following lines,or may be you want to fix another almost ready JIRA? {code:borderStyle=solid} -/* - * When a GROUP BY is not order preserving, we cannot do a reverse - * scan to eliminate the ORDER BY since our server-side scan is not - * ordered in that case. - */ -if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) { -isOrderPreserving = false; -isReverse = false; -return; -} {code} It seems for current master branch, removing theses lines may cause some problem , which can be reproduced as follows : {code:borderStyle=solid} CREATE TABLE ORDERBY_TEST ( ORGANIZATION_ID INTEGER NOT NULL, CONTAINER_ID INTEGER NOT NULL, SCORE INTEGER NOT NULL, ENTITY_ID INTEGER NOT NULL, CONSTRAINT TEST_PK PRIMARY KEY ( ORGANIZATION_ID, CONTAINER_ID, SCORE, ENTITY_ID )) split on(4); UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1); UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2); UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3); UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4); UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5); UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6); SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC {code} expecting results are: {code:borderStyle=solid} 6,6 5,5 4,4 3,3 2,2 1,1 {code} but the actual results are: {code:borderStyle=solid} 4,4 5,5 6,6 1,1 2,2 3,3 {code} The problem is caused by the AggregatePlan, when the above code was removed, the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after geting results from RegionServer at the client side, which is a ASC order, the sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final result is incorrect. {code:borderStyle=solid} 232aggResultIterator = new GroupedAggregatingResultIterator( 233new MergeSortRowKeyResultIterator(iterators, 0, this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators); {code} So if the AggregatePlan is not modified, just remove out the above code may cause problem. Maybe I can open a new JIRA to fix this problem if the JIRA does not exist. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451_v1.patch > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669252#comment-15669252 ] chenglei edited comment on PHOENIX-3451 at 11/16/16 3:39 AM: - [~jamestaylor], thank your explanation for the patch, I had just make a new patch just like you, but your patch is more comprehensive than me, I think your patch is good, and I will do more tests for some details of your patch. was (Author: comnetwork): [~jamestaylor], thank your explanation, I had just make a new patch just like you, but your patch is more comprehensive than me, I think your patch is good, and I will do more tests for some details of your patch. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451_v1.patch > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669252#comment-15669252 ] chenglei commented on PHOENIX-3451: --- [~jamestaylor], thank your explanation, I had just make a new patch just like you, but your patch is more comprehensive than me, I think your patch is good, and I will do more tests for some details of your patch. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451_v1.patch > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667149#comment-15667149 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 4:16 PM: - [~jamestaylor], thank you for your suggestion, I am sorry my patch indeed is not good,I will modify according your suggestion. was (Author: comnetwork): @James Taylor, thank you for your suggestion, I am sorry my patch indeed is not good,I will modify according your suggestion. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3451: -- Attachment: (was: PHOENIX-3451.diff) > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667149#comment-15667149 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 1:37 PM: - @James Taylor, thank you for your suggestion, I am sorry my patch indeed is not good,I will modify according your suggestion. was (Author: comnetwork): @James Taylor, thank you for your suggestion, my patch indeed is not good,I will modify according your suggestion. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667149#comment-15667149 ] chenglei commented on PHOENIX-3451: --- @James Taylor, thank you for your suggestion, my patch indeed is not good,I will modify according your suggestion. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenglei updated PHOENIX-3451: -- Comment: was deleted (was: [~jamestaylor].thank your for your suggestion, my considerations are as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". Take the following SQL as a example,the GROUP BY and ORDER BY are certainly not "isOrderPreserving" : select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) order by pkCol1 + 1, TRUNC(pkCol2) So I think my patch is Ok, just as the following code explained, it just needs to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI.) > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 8:02 AM: - [~jamestaylor].thank your for your suggestion, my considerations are as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". Take the following SQL as a example,the GROUP BY and ORDER BY are certainly not "isOrderPreserving" : select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) order by pkCol1 + 1, TRUNC(pkCol2) So I think my patch is Ok, just as the following code explained, it just needs to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. was (Author: comnetwork): [~jamestaylor].thank your for your suggestion, my considerations are as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving", take the following SQL as a example,the GROUP BY and ORDER BY are certainly not "isOrderPreserving" : select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) order by pkCol1 + 1, TRUNC(pkCol2) So I think my patch is Ok, just as the following code explained, it just needs to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > >
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 8:01 AM: - [~jamestaylor].thank your for your suggestion, my considerations are as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving", take the following SQL as a example,the GROUP BY and ORDER BY are certainly not "isOrderPreserving" : select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) order by pkCol1 + 1, TRUNC(pkCol2) So I think my patch is Ok, just as the following code explained, it just needs to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. was (Author: comnetwork): [~jamestaylor].thank your for your suggestion, my considerations as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". So I think my patch is Ok, just as the following code explained, it just needs to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 7:42 AM: - [~jamestaylor].thank your for your suggestion, my considerations as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". So I think my patch is Ok, just as the following code explained, it just needs to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. was (Author: comnetwork): [~jamestaylor].thank your for your suggestion, my considerations as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". So I think my patch is Ok, just as the following code explained, it just need to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS
[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402 ] chenglei commented on PHOENIX-3451: --- [~jamestaylor].thank your for your suggestion, my considerations as follows: 1. If GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be "ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must match the GroupBy columns. 2. Only when all GROUP BY/Order BY expressions are simple RowKey Columns (i.e. GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go further to check if the GROUP BY/Order BY is "isOrderPreserving". If GROUP BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the GROUP BY/Order BY should not be "isOrderPreserving". So I think my patch is Ok, just as the following code explained, it just need to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough for checking if the Order BY is "isOrderPreserving",for other type of Expression, the following visit method return null, and the OrderPreservingTracker.isOrderPreserving method will return false,which is as expected. {code:borderStyle=solid} @Override public Info visit(RowKeyColumnExpression node) { if(groupBy==null || groupBy.isEmpty()) { return new Info(node.getPosition()); } int pkPosition=node.getPosition(); assert pkPosition < groupBy.getExpressions().size(); Expression groupByExpression=groupBy.getExpressions().get(pkPosition); if(!(groupByExpression instanceof RowKeyColumnExpression)) { return null; } int orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition(); return new Info(orginalPkPosition); } {code} By the way, I had already considered the modification as same as your suggestion when I made my patch, finally I select current patch because it is more simpler ,and the modification is just restricted in the single OrderPreservingTracker class,FYI. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:30 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the patch actually did like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy. My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method,the final Orderby's orderByExpressions are not created by the Info's pkPosition, Info's pkPosition only affects OrderPreservingTracker. isOrderPreserving method. In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the patch actually did like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:26 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the patch actually did like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:25 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the SQL has GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:25 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch do not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch did not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows
[ https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874 ] chenglei edited comment on PHOENIX-3451 at 11/15/16 3:24 AM: - [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch did not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch only changes the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving method,the pkPosition must be the position in original RowKey columns if the sql exists GroupBy. was (Author: comnetwork): [~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I indeed patch like you said. My patch did not change the final Orderby's orderByExpressions, orderByExpression's position is still the position in GroupBy.My patch is only change the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving > Secondary index and query using distinct: LIMIT doesn't return the first rows > - > > Key: PHOENIX-3451 > URL: https://issues.apache.org/jira/browse/PHOENIX-3451 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Joel Palmert >Assignee: chenglei > Attachments: PHOENIX-3451.diff > > > This may be related to PHOENIX-3452 but the behavior is different so filing > it separately. > Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > CONTAINER_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE, > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > CONTAINER_ID, > ENTITY_ID > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000; > CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, > ENTITY_ID DESC); > UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1); > UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2); > UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3); > UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4); > UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35); > UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45); > EXPLAIN > SELECT DISTINCT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > AND container_id IN ( 'container1','container2','container3' ) > ORDER BY score DESC > LIMIT 2 > OUTPUT > entityId51.2 > entityId31.4 > The expected out out would be > entityId81.45 > entityId31.4 > You will get the expected output if you remove the secondary index from the > table or remove distinct from the query. > As described in PHOENIX-3452 if you run the query without the LIMIT the > ordering is not correct. However, the 2first results in that ordering is > still not the onces returned by the limit clause, which makes me think there > are multiple issues here and why I filed both separately. The rows being > returned are the ones assigned to container1. It looks like Phoenix is first > getting the rows from the first container and when it finds that to be enough > it stops the scan. What it should be doing is getting 2 results for each > container and then merge then and then limit again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)