[jira] [Comment Edited] (PHOENIX-3689) Not determinist order by with limit

2017-02-24 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882399#comment-15882399
 ] 

chenglei edited comment on PHOENIX-3689 at 2/24/17 10:27 AM:
-

Thank your for adding DDL,but it seems that the sql given before can not match 
the DDL, may be it is better that you give us a complete test case,just like 
the PHOENIX-3578 does,thanks. 


was (Author: comnetwork):
Thank your for adding DDL,but it seems that the sql given before can not match 
the DDL, may be you had better give us a complete test case,just like the 
PHOENIX-3578 does,thanks. 

> Not determinist order by with limit
> ---
>
> Key: PHOENIX-3689
> URL: https://issues.apache.org/jira/browse/PHOENIX-3689
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
>Reporter: Arthur
>
> The following request does not return the last value of myTable:
> select * from myTable order by myKey desc limit 1;
> Adding a 'group by myKey' clause gets back the good result.
> I noticed that an order by with 'limit 10' returns a merge of 10 results from 
> each region and not 10 results of the whole request.
> So 'order by' is not determinist. It is a bug or a feature ?
> Here is my DDL:
> CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id 
> varchar(20) NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, 
> id));
> And some data with a dynamic column (I have 2 millions of similar rows sorted 
> by time) :
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:00.3730',91,'','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:00.7170',91,'0001','PO','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:01.9030',91,'0002','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:02.7330',91,'0003','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:03.5470',91,'0004','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:04.7330',91,'0005','POUR','S_052305');



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3689) Not determinist order by with limit

2017-02-24 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882399#comment-15882399
 ] 

chenglei commented on PHOENIX-3689:
---

Thank your for adding DDL,but it seems that the sql given before can not match 
the DDL, may be you had better give us a complete test case,just like the 
PHOENIX-3578 does,thanks. 

> Not determinist order by with limit
> ---
>
> Key: PHOENIX-3689
> URL: https://issues.apache.org/jira/browse/PHOENIX-3689
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
>Reporter: Arthur
>
> The following request does not return the last value of myTable:
> select * from myTable order by myKey desc limit 1;
> Adding a 'group by myKey' clause gets back the good result.
> I noticed that an order by with 'limit 10' returns a merge of 10 results from 
> each region and not 10 results of the whole request.
> So 'order by' is not determinist. It is a bug or a feature ?
> Here is my DDL:
> CREATE TABLE TT (dt timestamp NOT NULL, message bigint NOT NULL, id 
> varchar(20) NOT NULL, version varchar CONSTRAINT PK PRIMARY KEY (dt, message, 
> id));
> And some data with a dynamic column (I have 2 millions of similar rows sorted 
> by time) :
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:00.3730',91,'','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:00.7170',91,'0001','PO','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:01.9030',91,'0002','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:02.7330',91,'0003','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:03.5470',91,'0004','POUR','S_052303');
> UPSERT INTO TT (dt, message, id, version, seg varchar) VALUES ('2013-12-03 
> 03:31:04.7330',91,'0005','POUR','S_052305');



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3689) Not determinist order by with limit

2017-02-23 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881828#comment-15881828
 ] 

chenglei commented on PHOENIX-3689:
---

Could you please provide your table DDL and some sample data? so we can 
reproduce and check if it is a bug.

> Not determinist order by with limit
> ---
>
> Key: PHOENIX-3689
> URL: https://issues.apache.org/jira/browse/PHOENIX-3689
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
>Reporter: Arthur
>
> The following request does not return the last value of myTable:
> select * from myTable order by myKey desc limit 1;
> Adding a 'group by myKey' clause gets back the good result.
> I noticed that an order by with 'limit 10' returns a merge of 10 results from 
> each region and not 10 results of the whole request.
> So 'order by' is not determinist. It is a bug or a feature ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877706#comment-15877706
 ] 

chenglei commented on PHOENIX-3578:
---

[~jamestaylor], ok ,I will try to do it.

> Incorrect query results when applying inner join and orderby desc
> -
>
> Key: PHOENIX-3578
> URL: https://issues.apache.org/jira/browse/PHOENIX-3578
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
> Environment: hbase-1.1.2
>Reporter: sungmin.cho
> Attachments: PHOENIX-3578_v1.patch
>
>
> Step to reproduce:
> h4. 1. Create two tables
> {noformat}
> CREATE TABLE IF NOT EXISTS master (
>   id integer not null,
>   col1 varchar,
>   constraint pk_master primary key(id)
> );
> CREATE TABLE IF NOT EXISTS detail (
>   id integer not null,
>   seq integer not null,
>   col2 varchar,
>   constraint pk_master primary key(id, seq)
> );
> {noformat}
> h4. 2. Upsert values
> {noformat}
> upsert into master values(1, 'A1');
> upsert into master values(2, 'A2');
> upsert into master values(3, 'A3');
> upsert into detail values(1, 1, 'B1');
> upsert into detail values(1, 2, 'B2');
> upsert into detail values(2, 1, 'B1');
> upsert into detail values(2, 2, 'B2');
> upsert into detail values(3, 1, 'B1');
> upsert into detail values(3, 2, 'B2');
> upsert into detail values(3, 3, 'B3');
> {noformat}
> h4. 3. Execute query
> {noformat}
> select m.id, m.col1, d.seq, d.col2
> from master m, detail d
> where m.id = d.id
>   and d.id between 1 and 2
> order by m.id desc
> {noformat}
> h4. (/) Expected result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 2 | A2  | 1  | B1  |
> | 2 | A2  | 2  | B2  |
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}
> h4. (!) Incorrect result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877651#comment-15877651
 ] 

chenglei commented on PHOENIX-3578:
---

[~maryannxue], thank you very much for review , yes , It is true we cannot use 
skip scan if the scan is in reverse order,but we can use range scan if the scan 
is in reverse order,so the dynamic join filter should still be consided if the 
the scan is in reverse order,because the dynamic join filter may narrowing down 
the LHS reverse scan range.
I will modify my patch following your suggestion.

> Incorrect query results when applying inner join and orderby desc
> -
>
> Key: PHOENIX-3578
> URL: https://issues.apache.org/jira/browse/PHOENIX-3578
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
> Environment: hbase-1.1.2
>Reporter: sungmin.cho
> Attachments: PHOENIX-3578_v1.patch
>
>
> Step to reproduce:
> h4. 1. Create two tables
> {noformat}
> CREATE TABLE IF NOT EXISTS master (
>   id integer not null,
>   col1 varchar,
>   constraint pk_master primary key(id)
> );
> CREATE TABLE IF NOT EXISTS detail (
>   id integer not null,
>   seq integer not null,
>   col2 varchar,
>   constraint pk_master primary key(id, seq)
> );
> {noformat}
> h4. 2. Upsert values
> {noformat}
> upsert into master values(1, 'A1');
> upsert into master values(2, 'A2');
> upsert into master values(3, 'A3');
> upsert into detail values(1, 1, 'B1');
> upsert into detail values(1, 2, 'B2');
> upsert into detail values(2, 1, 'B1');
> upsert into detail values(2, 2, 'B2');
> upsert into detail values(3, 1, 'B1');
> upsert into detail values(3, 2, 'B2');
> upsert into detail values(3, 3, 'B3');
> {noformat}
> h4. 3. Execute query
> {noformat}
> select m.id, m.col1, d.seq, d.col2
> from master m, detail d
> where m.id = d.id
>   and d.id between 1 and 2
> order by m.id desc
> {noformat}
> h4. (/) Expected result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 2 | A2  | 1  | B1  |
> | 2 | A2  | 2  | B2  |
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}
> h4. (!) Incorrect result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876265#comment-15876265
 ] 

chenglei commented on PHOENIX-3578:
---

Actually the test had 
ran:https://builds.apache.org/job/PreCommit-PHOENIX-Build/778/console, but the 
failed IT tests seems no relation to this issue. It seems that after upgrade to 
HBase 1.2.4 by PHOENIX-3659,some IT tests timeout and failed.

> Incorrect query results when applying inner join and orderby desc
> -
>
> Key: PHOENIX-3578
> URL: https://issues.apache.org/jira/browse/PHOENIX-3578
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
> Environment: hbase-1.1.2
>Reporter: sungmin.cho
> Attachments: PHOENIX-3578_v1.patch
>
>
> Step to reproduce:
> h4. 1. Create two tables
> {noformat}
> CREATE TABLE IF NOT EXISTS master (
>   id integer not null,
>   col1 varchar,
>   constraint pk_master primary key(id)
> );
> CREATE TABLE IF NOT EXISTS detail (
>   id integer not null,
>   seq integer not null,
>   col2 varchar,
>   constraint pk_master primary key(id, seq)
> );
> {noformat}
> h4. 2. Upsert values
> {noformat}
> upsert into master values(1, 'A1');
> upsert into master values(2, 'A2');
> upsert into master values(3, 'A3');
> upsert into detail values(1, 1, 'B1');
> upsert into detail values(1, 2, 'B2');
> upsert into detail values(2, 1, 'B1');
> upsert into detail values(2, 2, 'B2');
> upsert into detail values(3, 1, 'B1');
> upsert into detail values(3, 2, 'B2');
> upsert into detail values(3, 3, 'B3');
> {noformat}
> h4. 3. Execute query
> {noformat}
> select m.id, m.col1, d.seq, d.col2
> from master m, detail d
> where m.id = d.id
>   and d.id between 1 and 2
> order by m.id desc
> {noformat}
> h4. (/) Expected result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 2 | A2  | 1  | B1  |
> | 2 | A2  | 2  | B2  |
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}
> h4. (!) Incorrect result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 10:25 AM:
-

This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} is called to push {{m.id in (1,2)}} 
into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}
\\
Finally, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id are 1, the rows which 
id are 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

\\
In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 10:22 AM:
-

This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} is called to push {{m.id in (1,2)}} 
into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}
\\
In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id are 1, the rows which 
id are 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

\\
In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  

[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875743#comment-15875743
 ] 

chenglei commented on PHOENIX-3578:
---

I uploaded my first patch, please help me have a review,thanks.

> Incorrect query results when applying inner join and orderby desc
> -
>
> Key: PHOENIX-3578
> URL: https://issues.apache.org/jira/browse/PHOENIX-3578
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
> Environment: hbase-1.1.2
>Reporter: sungmin.cho
> Attachments: PHOENIX-3578_v1.patch
>
>
> Step to reproduce:
> h4. 1. Create two tables
> {noformat}
> CREATE TABLE IF NOT EXISTS master (
>   id integer not null,
>   col1 varchar,
>   constraint pk_master primary key(id)
> );
> CREATE TABLE IF NOT EXISTS detail (
>   id integer not null,
>   seq integer not null,
>   col2 varchar,
>   constraint pk_master primary key(id, seq)
> );
> {noformat}
> h4. 2. Upsert values
> {noformat}
> upsert into master values(1, 'A1');
> upsert into master values(2, 'A2');
> upsert into master values(3, 'A3');
> upsert into detail values(1, 1, 'B1');
> upsert into detail values(1, 2, 'B2');
> upsert into detail values(2, 1, 'B1');
> upsert into detail values(2, 2, 'B2');
> upsert into detail values(3, 1, 'B1');
> upsert into detail values(3, 2, 'B2');
> upsert into detail values(3, 3, 'B3');
> {noformat}
> h4. 3. Execute query
> {noformat}
> select m.id, m.col1, d.seq, d.col2
> from master m, detail d
> where m.id = d.id
>   and d.id between 1 and 2
> order by m.id desc
> {noformat}
> h4. (/) Expected result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 2 | A2  | 1  | B1  |
> | 2 | A2  | 2  | B2  |
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}
> h4. (!) Incorrect result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3578:
--
Attachment: PHOENIX-3578_v1.patch

> Incorrect query results when applying inner join and orderby desc
> -
>
> Key: PHOENIX-3578
> URL: https://issues.apache.org/jira/browse/PHOENIX-3578
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
> Environment: hbase-1.1.2
>Reporter: sungmin.cho
> Attachments: PHOENIX-3578_v1.patch
>
>
> Step to reproduce:
> h4. 1. Create two tables
> {noformat}
> CREATE TABLE IF NOT EXISTS master (
>   id integer not null,
>   col1 varchar,
>   constraint pk_master primary key(id)
> );
> CREATE TABLE IF NOT EXISTS detail (
>   id integer not null,
>   seq integer not null,
>   col2 varchar,
>   constraint pk_master primary key(id, seq)
> );
> {noformat}
> h4. 2. Upsert values
> {noformat}
> upsert into master values(1, 'A1');
> upsert into master values(2, 'A2');
> upsert into master values(3, 'A3');
> upsert into detail values(1, 1, 'B1');
> upsert into detail values(1, 2, 'B2');
> upsert into detail values(2, 1, 'B1');
> upsert into detail values(2, 2, 'B2');
> upsert into detail values(3, 1, 'B1');
> upsert into detail values(3, 2, 'B2');
> upsert into detail values(3, 3, 'B3');
> {noformat}
> h4. 3. Execute query
> {noformat}
> select m.id, m.col1, d.seq, d.col2
> from master m, detail d
> where m.id = d.id
>   and d.id between 1 and 2
> order by m.id desc
> {noformat}
> h4. (/) Expected result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 2 | A2  | 1  | B1  |
> | 2 | A2  | 2  | B2  |
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}
> h4. (!) Incorrect result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:59 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}
\\
In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id are 1, the rows which 
id are 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

\\
In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:57 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}
\\
In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
{{[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF)}},
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

\\
In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:58 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}
\\
In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

\\
In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:56 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}
\\
In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

\\
In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:56 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

\\
\\
In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:55 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method,so the Scan would use 
SkipScanFilter: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}



In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{noformat}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
{noformat}

and modify the sql as :
{noformat}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  and 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:47 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :

{code}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
(code}

and modify the sql as :
{code}
   select m.id, m.col1,d.col2 from master m, detail d  where m.id = d.id  and 
d.id in (3,5,7) order by m.id desc
{code},
because 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:46 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
(code}

and modify the sql as :
{code}
   select m.id, m.col1,d.col2 from master 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:43 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,
\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
(code}

and modify the sql as :
{code:borderStyle=solid}
   

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:43 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the rows of {{master}} table which id is 1, the rows which 
id is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
(code}

and modify the sql as :
{code:borderStyle=solid}
   

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:41 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\x80\x00\x00\x01,\x80\x00\x00\x02\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF,\x80\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF),
 so we can only get the row of {{master}} table which id is 1, the row which id 
is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\x80\x00\x00\x02\x00}},which should return {{\x80\x00\x00\x02}},not 
{{\x80\x00\x00\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
(code}

and modify the sql as :
{code:borderStyle=solid}
   select 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:37 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is {{\x80\x00\x00\x02\x00}}, and 
correspondingly,the Scan's startRow is {{\x80\x00\x00\x01}}, and Scan's endRow 
is {{\x80\x00\x00\x02\x00}}:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF),
 so we can only get the row of {{master}} table which id is 1, the row which id 
is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not 
{{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:35 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\\x80\\x00\\x00\\x01}},and maxKey is \\x80\\x00\\x00\\x02\\x00, and 
correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow 
is \\x80\\x00\\x00\\x02\\x00:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF),
 so we can only get the row of {{master}} table which id is 1, the row which id 
is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not 
{{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:35 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
 {{\x80\x00\x00\x01}},and maxKey is \\x80\\x00\\x00\\x02\\x00, and 
correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow 
is \\x80\\x00\\x00\\x02\\x00:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF),
 so we can only get the row of {{master}} table which id is 1, the row which id 
is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not 
{{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail 

[jira] [Comment Edited] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei edited comment on PHOENIX-3578 at 2/21/17 9:30 AM:


This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
{{\\x80\\x00\\x00\\x01}},and maxKey is \\x80\\x00\\x00\\x02\\x00, and 
correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow 
is \\x80\\x00\\x00\\x02\\x00:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF),
 so we can only get the row of {{master}} table which id is 1, the row which id 
is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not 
{{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail 

[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-02-21 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875665#comment-15875665
 ] 

chenglei commented on PHOENIX-3578:
---

This issue is caused by the join dynamic filter, from following RHS, we get 
d.id is in (1,2):
{code:borderStyle=solid} 
  select d.seq,d.col2,d.id from detail d  where d.id between 1 and 2
{code} 
so with join dynamic filter, m.id is also in (1,2). Before applying join 
dynamic filter,LHS is:
{code:borderStyle=solid}
  select m,id,m.col1,d.seq,d.col2 from master m order by m.id desc
{code}
Obviously, LHS's OrderBy is {{OrderBy.REV_ROW_KEY_ORDER_BY}},after applying 
join dynamic filter LHS turns to :
{code:borderStyle=solid} 
select m,id,m.col1,d.seq,d.col2 from master m where m.id in (1,2) order by 
m.id desc
{code} 
Notice LHS's OrderBy is still {{OrderBy.REV_ROW_KEY_ORDER_BY}} now,then 
{{WhereOptimizer.pushKeyExpressionsToScan}} was called to push {{m.id in 
(1,2)}} into Scan , and useSkipScan is true in following line 274 of 
{{WhereOptimizer.pushKeyExpressionsToScan}} method: 
{code:borderStyle=solid}
273stopExtracting |= (hasUnboundedRange && !forcedSkipScan) || 
(hasRangeKey && forcedRangeScan);
274useSkipScan |= !stopExtracting && !forcedRangeScan && 
(keyRanges.size() > 1 || hasRangeKey);
{code} 

next step the {{startRow}} and {{endRow}} of LHS's Scan was computed in 
{{ScanRanges.create}} method, in following line 112 the LHS's RowKeySchema is 
turned to SchemaUtil.VAR_BINARY_SCHEMA: 

{code:borderStyle=solid} 
111  if (keys.size() > 1 || 
SchemaUtil.getSeparatorByte(schema.rowKeyOrderOptimizable(), false, 
schema.getField(schema.getFieldCount()-1)) == 
QueryConstants.DESC_SEPARATOR_BYTE) {
112schema = SchemaUtil.VAR_BINARY_SCHEMA;
113slotSpan = ScanUtil.SINGLE_COLUMN_SLOT_SPAN;
114   } else { 
{code}

so in following line 135 and line 136 of {{ScanRanges.create}} method,minKey is 
\\x80\\x00\\x00\\x01,and maxKey is \\x80\\x00\\x00\\x02\\x00, and 
correspondingly,the Scan's startRow is \\x80\\x00\\x00\\x01, and Scan's endRow 
is \\x80\\x00\\x00\\x02\\x00:
{code:borderStyle=solid}
134if (nBuckets == null || !isPointLookup || !useSkipScan) {
135byte[] minKey = ScanUtil.getMinKey(schema, sortedRanges, 
slotSpan);
136byte[] maxKey = ScanUtil.getMaxKey(schema, sortedRanges, 
slotSpan);
{code}

In summary, when we scan the LHS {{master}} table, the Scan range is 
{{[\\x80\\x00\\x00\\x01,\\x80\\x00\\x00\\x02\\x00)}} ,and the Scan uses 
{{SkipScanFilter}}.Furthermore,because the LHS's OrderBy is 
{{OrderBy.REV_ROW_KEY_ORDER_By}},so the Scan range should be reversed.In 
{{BaseScannerRegionObserver.preScannerOpen}} method,following 
{{ScanUtil.setupReverseScan}} method is called to reverse the Scan's startRow 
and endRow.Unfortunately, the reversed Scan's range computed by  
{{ScanUtil.setupReverseScan}} method is 
[\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF,\\x80\\x00\\x00\\x00\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF),
 so we can only get the row of {{master}} table which id is 1, the row which id 
is 2 is excluded.

{code:borderStyle=solid} 
621  public static void setupReverseScan(Scan scan) {
622if (isReversed(scan)) {
623byte[] newStartRow = getReversedRow(scan.getStartRow());
624byte[] newStopRow = getReversedRow(scan.getStopRow());
625scan.setStartRow(newStopRow);
626scan.setStopRow(newStartRow);
627scan.setReversed(true);
628}
629}  
{code}

In conclusion, following two problems causes this issue:
(1) the {{ScanUtil.getReversedRow}} method is not right for 
{{\\x80\\x00\\x00\\x02\\x00}},which should return {{\\x80\\x00\\x00\\x02}},not 
{{\\x80\\x00\\x00\\x01\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF\\xFF}}.
(2) even though {{ScanUtil.getReversedRow}} method is right,there may be 
another problem,if I change the table data as following :
{code:borderStyle=solid}
UPSERT INTO master VALUES (1, 'A1');
UPSERT INTO master VALUES (2, 'A2');
UPSERT INTO master VALUES (3, 'A3');
UPSERT INTO master VALUES (4, 'A4');
UPSERT INTO master VALUES (5, 'A5');
UPSERT INTO master VALUES (6, 'A6');
UPSERT INTO master VALUES (8, 'A8');

UPSERT INTO detail VALUES (1, 1, 'B1');
UPSERT INTO detail VALUES (2, 2, 'B2');
UPSERT INTO detail VALUES (3, 3, 'B3');
UPSERT INTO detail VALUES (4, 4, 'B4');
UPSERT INTO detail VALUES (5, 5, 'B5');
UPSERT INTO detail VALUES (6, 6, 'B6');
UPSERT INTO detail VALUES (7, 7, 'B7');
UPSERT INTO detail VALUES (8, 8, 'B8');
(code}

and modify the sql as :

[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Description: 
In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
becomes very slowly,when the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing 
down fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

(1) The double loop is inefficient in line 521 and line 522,when keyRanges  
size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
example,is 5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

(2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Comment Edited] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978
 ] 

chenglei edited comment on PHOENIX-3670 at 2/14/17 3:43 PM:


I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.I also add some unit tests for 
KeyRange.intersect(List,List) method in my patch.


was (Author: comnetwork):
I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978
 ] 

chenglei edited comment on PHOENIX-3670 at 2/14/17 3:44 PM:


I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.
I also add some unit tests for KeyRange.intersect(List,List) method in my patch.


was (Author: comnetwork):
I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.I also add some unit tests for 
KeyRange.intersect(List,List) method in my patch.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978
 ] 

chenglei commented on PHOENIX-3670:
---

I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Attachment: PHOENIX-3670_v1.patch

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Description: 
In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
become very slowly. When the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

1. The double loop is inefficient in line 521 and line 522,when keyRanges  size 
is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 
5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Description: 
In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
become very slowly. When the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

1. The double loop is inefficient in line 521 and line 522,when keyRanges  size 
is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 
5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

2. line 540 shoule be r = r.union(tmp.get( i )), not intersect,just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Summary: KeyRange.intersect(List , List) is 
inefficient,especially for join dynamic filter  (was: 
KeyRange.intersect(List , List ) is inefficient,especially 
for join dynamic filter)

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get(i)), not intersect,just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3670) KeyRange.intersect(List , List ) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)
chenglei created PHOENIX-3670:
-

 Summary: KeyRange.intersect(List , List ) is 
inefficient,especially for join dynamic filter
 Key: PHOENIX-3670
 URL: https://issues.apache.org/jira/browse/PHOENIX-3670
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.9.0
Reporter: chenglei


In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
become very slowly. When the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

1. The double loop is inefficient in line 521 and line 522,when keyRanges  size 
is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 
5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

2. line 540 shoule be r = r.union(tmp.get(i)), not intersect,just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Commented] (PHOENIX-3578) Incorrect query results when applying inner join and orderby desc

2017-01-11 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817695#comment-15817695
 ] 

chenglei commented on PHOENIX-3578:
---

It can also be reproduced under 4.9.0, it may be caused by the fact that the 
Join SQL is using SkipScanFilter after dynamic filtering but the sql is also 
OrderBy.REV_ROW_KEY_ORDER_BY.

> Incorrect query results when applying inner join and orderby desc
> -
>
> Key: PHOENIX-3578
> URL: https://issues.apache.org/jira/browse/PHOENIX-3578
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
> Environment: hbase-1.1.2
>Reporter: sungmin.cho
>
> Step to reproduce:
> h4. 1. Create two tables
> {noformat}
> CREATE TABLE IF NOT EXISTS master (
>   id integer not null,
>   col1 varchar,
>   constraint pk_master primary key(id)
> );
> CREATE TABLE IF NOT EXISTS detail (
>   id integer not null,
>   seq integer not null,
>   col2 varchar,
>   constraint pk_master primary key(id, seq)
> );
> {noformat}
> h4. 2. Upsert values
> {noformat}
> upsert into master values(1, 'A1');
> upsert into master values(2, 'A2');
> upsert into master values(3, 'A3');
> upsert into detail values(1, 1, 'B1');
> upsert into detail values(1, 2, 'B2');
> upsert into detail values(2, 1, 'B1');
> upsert into detail values(2, 2, 'B2');
> upsert into detail values(3, 1, 'B1');
> upsert into detail values(3, 2, 'B2');
> upsert into detail values(3, 3, 'B3');
> {noformat}
> h4. 3. Execute query
> {noformat}
> select m.id, m.col1, d.seq, d.col2
> from master m, detail d
> where m.id = d.id
>   and d.id between 1 and 2
> order by m.id desc
> {noformat}
> h4. (/) Expected result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 2 | A2  | 1  | B1  |
> | 2 | A2  | 2  | B2  |
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}
> h4. (!) Incorrect result
> {noformat}
> +---+-++-+
> | M.ID  | M.COL1  | D.SEQ  | D.COL2  |
> +---+-++-+
> | 1 | A1  | 1  | B1  |
> | 1 | A1  | 2  | B2  |
> +---+-++-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787679#comment-15787679
 ] 

chenglei edited comment on PHOENIX-3491 at 12/30/16 1:36 PM:
-

[~jamestaylor],I noticed this patch had be pushed to master branch and included 
in 4.9.0,Could you please close this issue and mark it resolved ?


was (Author: comnetwork):
[~jamestaylor],I noticed this patch was pushed to master branch and included in 
4.9.0,Could you please close this issue and mark it resolved ?

> OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy 
> is reverse
> 
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787647#comment-15787647
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 1:34 PM:
-

I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.I 
ran all the existing unit tests and IT tests in my local machine.


was (Author: comnetwork):
I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.I 
ran the all the existing unit tests and IT tests in my local machine

> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787647#comment-15787647
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 1:34 PM:
-

I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.I 
ran the all the existing unit tests and IT tests in my local machine


was (Author: comnetwork):
I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.

> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787679#comment-15787679
 ] 

chenglei commented on PHOENIX-3491:
---

[~jamestaylor],I noticed this patch was pushed to master branch and included in 
4.9.0,Could you please close this issue and mark it resolved ?

> OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy 
> is reverse
> 
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 1:27 PM:
-

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which  convert the the ProjectedColumnExpression 
from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is 
SortOrder.getDefault(),which is SortOrder.ASC, but it ought to be 
SortOrder.DESC,because the SortOder of ProjectedColumnExpression  is 
SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in  
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, 
noticed that actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the PInteger  "1" from the ptr which is got from the HBase 
RegionServer, and in line 845,we convert the PInteger "1"  to PDecimal "1", 
last in line 846, we encode the  PDecimal "1" to bytes, but because the 
expectedModifier parameter is SortOrder.ASC, so the PDecimal "1"  is encoded by 
 SortOrder.ASC.That is to say,the SortOrder of the groupBy key got from 
ClientGroupedAggregatingResultIterator.getGroupingKey method is  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code}

Unfortunately, finally in following PhoenixResult.getObject method, when we 
invoke the ColumnProjector.getValue method in line 524,  the ColumnProjector's 

[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 1:24 PM:
-

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which  convert the the ProjectedColumnExpression 
from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is 
SortOrder.getDefault(),which is SortOrder.ASC, but it ought to be 
SortOrder.DESC,because the SortOder of ProjectedColumnExpression  is 
SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in  
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, 
noticed that actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the
PInteger  "1" from the ptr which is got from the HBase RegionServer, and in 
line 845,we convert the PInteger "1"  to PDecimal "1", last in line 846, we 
encode the  PDecimal "1" to bytes, but because the expectedModifier parameter 
is SortOrder.ASC, so the PDecimal "1"  is encoded by  SortOrder.ASC.That is to 
say,the SortOrder of the groupBy key got from 
ClientGroupedAggregatingResultIterator.getGroupingKey method is  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code}

Unfortunately, finally in following PhoenixResult.getObject method, when we 
invoke the ColumnProjector.getValue method in line 524,  the ColumnProjector's 

[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 1:22 PM:
-

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which  convert the the ProjectedColumnExpression 
from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but 
it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression 
 is SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in  
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, 
noticed that actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the
PInteger  "1" from the ptr which is got from the HBase RegionServer, and in 
line 845,we convert the PInteger "1"  to PDecimal "1", last in line 846, we 
encode the  PDecimal "1" to bytes, but because the expectedModifier parameter 
is SortOrder.ASC, so the PDecimal "1"  is encoded by  SortOrder.ASC.That is to 
say,the SortOrder of the groupBy key got from 
ClientGroupedAggregatingResultIterator.getGroupingKey method is  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code}

Unfortunately, finally in following PhoenixResult.getObject method, when we 
invoke the ColumnProjector.getValue method in line 524,  the ColumnProjector's 
Expression is 

[jira] [Commented] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787647#comment-15787647
 ] 

chenglei commented on PHOENIX-3453:
---

I uploaded my first patch,[~jamestaylor],please help me have a review,thanks.

> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3453:
--
Attachment: PHOENIX-3453_v1.patch

> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 11:18 AM:
--

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which would convert the the 
ProjectedColumnExpression from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but 
it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression 
 is SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in  
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, 
noticed that actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the
PInteger  "1" from the ptr which is got from the HBase RegionServer, and in 
line 845,we convert the PInteger "1"  to PDecimal "1", last in line 846, we 
encode the  PDecimal "1" to bytes, but because the expectedModifier parameter 
is SortOrder.ASC, so the PDecimal "1"  is encoded by  SortOrder.ASC.That is to 
say,the SortOrder of the groupBy key got from 
ClientGroupedAggregatingResultIterator.getGroupingKey method is  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code}

Unfortunately, finally in following PhoenixResult.getObject method, when we 
invoke the ColumnProjector.getValue method in line 524,  the ColumnProjector's 
Expression is 

[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 11:15 AM:
--

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which would convert the the 
ProjectedColumnExpression from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but 
it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression 
 is SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in  
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
following PDecimal.coerceBytes method to get the coerceBytes of PDecimal, 
noticed that actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the
PInteger  "1" from the ptr which is got from the HBase RegionServer, and in 
line 845,we convert the PInteger "1"  to PDecimal "1", last in line 846, we 
encode the  PDecimal "1" to bytes, but because the expectedModifier parameter 
is SortOrder.ASC, so the PDecimal "1"  is encoded by  SortOrder.ASC.That is to 
say,the SortOrder of the groupBy key got from 
ClientGroupedAggregatingResultIterator.getGroupingKey method is  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code}

Unfortunately, finally in PhoenixResult.getObject method, when we invoke the 
ColumnProjector.getValue method,  the columnProjector's Expression is 
RowKeyColumnExpression,which thinks 

[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 11:07 AM:
--

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which would convert the the 
ProjectedColumnExpression from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but 
it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression 
 is SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in 
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that 
actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the
PInteger  "1" from the ptr which is got from the RegionServer, and in line 
845,we convert the PInteger "1"  to PDecimal "1", last in line 846, we encode 
the  PDecimal "1" to bytes, but because the expectedModifier parameter is 
SortOrder.ASC, so the PDecimal "1"  is encoded by  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code}

Unfortunately, finally in PhoenixResult.getObject method, when we invoke the 
ColumnProjector.getValue method,  the columnProjector's Expression is 
RowKeyColumnExpression,which thinks the SortOrder of bytes got from the 
above-mentioned ClientGroupedAggregatingResultIterator.getGroupingKey method is 
SortOrder.DESC, so it decodes the bytes 

[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 11:03 AM:
--

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which would convert the the 
ProjectedColumnExpression from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but 
it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression 
 is SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in 
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that 
actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the
PInteger  "1" from the ptr which is got from the RegionServer, and in line 
845,we convert the PInteger "1"  to PDecimal "1", last in line 846, we encode 
the  PDecimal "1" to bytes, but because the expectedModifier parameter is 
SortOrder.ASC, so the PDecimal "1"  is encoded by  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code:borderStyle=solid}

Unfortunately, finally in PhoenixResult.getObject method, when we invoke the 
ColumnProjector.getValue method,  the columnProjector's Expression is 
RowKeyColumnExpression,which expects the SortOrder of bytes got from the 
above-mentioned ClientGroupedAggregatingResultIterator.getGroupingKey method is 
SortOrder.DESC, so 

[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 11:04 AM:
--

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debug into the source code,I find the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is CoerceExpression wrapping 
ProjectedColumnExpression ,which would convert the the 
ProjectedColumnExpression from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 

When I look into CoerceExpression.create method,in line 68 of the following 
code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but 
it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression 
 is SortOrder.DESC:

{code:borderStyle=solid} 
46 public static Expression create(Expression expression, PDataType toType) 
throws  SQLException {
47if (toType == expression.getDataType()) {
48return expression;
49}
50return new CoerceExpression(expression, toType);
51}
  ..
66   //Package protected for tests
67CoerceExpression(Expression expression, PDataType toType) {
68this(expression, toType, SortOrder.getDefault(), null, true);
69}
{code} 

So when we get the query results, in 
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the 
PDecimal.coerceBytes method to get the coerceBytes of PDecimal, noticed that 
actualType parameter is PInteger,actualModifier parameter  is 
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line 
842,we get the
PInteger  "1" from the ptr which is got from the RegionServer, and in line 
845,we convert the PInteger "1"  to PDecimal "1", last in line 846, we encode 
the  PDecimal "1" to bytes, but because the expectedModifier parameter is 
SortOrder.ASC, so the PDecimal "1"  is encoded by  SortOrder.ASC.

{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType 
actualType, Integer actualMaxLength,
827Integer actualScale, SortOrder actualModifier, Integer 
desiredMaxLength, Integer desiredScale,
828SortOrder expectedModifier) {
   
..

840// Optimization for cases in which we already have the object around
841if (o == null) {
842o = actualType.toObject(ptr, actualType, actualModifier);
843}
844
845o = toObject(o, actualType);
846byte[] b = toBytes(o, expectedModifier);
847ptr.set(b);
848}
{code}

Unfortunately, finally in PhoenixResult.getObject method, when we invoke the 
ColumnProjector.getValue method,  the columnProjector's Expression is 
RowKeyColumnExpression,which expects the SortOrder of bytes got from the 
above-mentioned ClientGroupedAggregatingResultIterator.getGroupingKey method is 
SortOrder.DESC, so it decode the bytes 

[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 9:09 AM:
-

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.

When I debuged into the source code,I found the cause of the problem is the 
distinct(or group by) statement in the outer query.By the following code in 
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's 
expressions is ProjectedColumnExpression,but in line 245, the "entity" column 
in GroupBy's keyExpressions is replaced by CoerceExpression,which would convert 
the the "entity" column from PInteger to PDecimal:
{code:borderStyle=solid} 
232   for (int i = expressions.size()-2; i >= 0; i--) {
233Expression expression = expressions.get(i);
234PDataType keyType = getGroupByDataType(expression);
235if (keyType == expression.getDataType()) {
236continue;
237}
238// Copy expressions only when keyExpressions will be 
different than expressions
239if (keyExpressions == expressions) {
240keyExpressions = new ArrayList(expressions);
241}
242// Wrap expression in an expression that coerces the 
expression to the required type..
243// This is done so that we have a way of expressing null as 
an empty key when more
244// than one fixed and nullable types are used in a group by 
clause
245keyExpressions.set(i, CoerceExpression.create(expression, 
keyType));
246}
{code} 


was (Author: comnetwork):
I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.


> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> 

[jira] [Commented] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei commented on PHOENIX-3453:
---

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when there SQL has a SubQuery.


> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2016-12-30 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787222#comment-15787222
 ] 

chenglei edited comment on PHOENIX-3453 at 12/30/16 8:40 AM:
-

I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when the SQL has a SubQuery.



was (Author: comnetwork):
I wrote following test case to make this problem can be reproduced under 4.9.0, 
simplifying the original test case by removing the index table and change the 
type from CHAR(15) to Integer,  which is more easier to debug:

{code:borderStyle=solid} 
  CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID 
DESC,SCORE DESC)
  )
 
  UPSERT INTO  GROUPBY3453_INT  VALUES (1,1,1)
  select DISTINCT entity_id, score from ( select entity_id, score from 
GROUPBY3453_INT limit 1)
{code} 

the expecting result is : 
{code:borderStyle=solid} 
   1  1
{code} 

but the actual result is:
{code:borderStyle=solid}
  -104  1
{code} 

This problem can only be reproduced when there SQL has a SubQuery.


> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678453#comment-15678453
 ] 

chenglei edited comment on PHOENIX-3491 at 11/19/16 3:11 AM:
-

[~jamestaylor],because I have some time difference with you,sorry for late 
response,yes ,I ran the all the existing unit tests and IT tests in my local 
machine,seems the  https://builds.apache.org/job/PreCommit-PHOENIX-Build/   is 
hanging.


was (Author: comnetwork):
[~jamestaylor],because I have time difference with you,sorry for late 
response,yes ,I ran the all the existing unit tests and IT tests in my local 
machine,seems the  https://builds.apache.org/job/PreCommit-PHOENIX-Build/   is 
hanging.

> OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy 
> is reverse
> 
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678453#comment-15678453
 ] 

chenglei commented on PHOENIX-3491:
---

[~jamestaylor],because I have time difference with you,sorry for late 
response,yes ,I ran the all the existing unit tests and IT tests in my local 
machine,seems the  https://builds.apache.org/job/PreCommit-PHOENIX-Build/   is 
hanging.

> OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy 
> is reverse
> 
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch, PHOENIX-3491_v2.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508
 ] 

chenglei edited comment on PHOENIX-3491 at 11/18/16 1:17 PM:
-

The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seem very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
Expression expression = RowKeyExpression.INSTANCE;
boolean isNullsLast=false;
boolean isAscending=true;
if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) {
isNullsLast=true; //which is needed for the whole rowKey.
isAscending=false;
}
OrderByExpression orderByExpression = new 
OrderByExpression(expression, isNullsLast, isAscending);
int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
}
{code} 

After modifying AggregatePlan ,It seems we can remove the code in 
OrderPreservingTracker,
I include a lot unit tests and IT tests in my patch to check it, such as 
ASC/DESC, salted Table, mult-region table and NULLS FIRST/LAST.

 




was (Author: comnetwork):
The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new 

[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508
 ] 

chenglei edited comment on PHOENIX-3491 at 11/18/16 1:16 PM:
-

The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
Expression expression = RowKeyExpression.INSTANCE;
boolean isNullsLast=false;
boolean isAscending=true;
if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) {
isNullsLast=true; //which is needed for the whole rowKey.
isAscending=false;
}
OrderByExpression orderByExpression = new 
OrderByExpression(expression, isNullsLast, isAscending);
int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
}
{code} 

After modifying AggregatePlan ,It seems we can remove the code in 
OrderPreservingTracker,
I include a lot unit tests and IT tests in my patch to check it, such as 
ASC/DESC, salted Table, mult-region table and NULLS FIRST/LAST.

 




was (Author: comnetwork):
The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new 

[jira] [Comment Edited] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508
 ] 

chenglei edited comment on PHOENIX-3491 at 11/18/16 1:15 PM:
-

The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
Expression expression = RowKeyExpression.INSTANCE;
boolean isNullsLast=false;
boolean isAscending=true;
if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) {
isNullsLast=true; //which is needed for the whole rowKey.
isAscending=false;
}
OrderByExpression orderByExpression = new 
OrderByExpression(expression, isNullsLast, isAscending);
int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
}
{code} 

After modifying AggregatePlan ,It seems we can remove the code in 
OrderPreservingTracker,
I include a lot unit tests and IT tests in my patch to check it, such as 
ASC/DESC, Salted Table, mult-region table and NULLS FIRST/LAST.

 




was (Author: comnetwork):
The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new 

[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Summary: OrderBy should be compiled out if GroupBy is not orderPreserving 
and OrderBy is reverse  (was: OrderBy should be compiled out when GroupBy is 
not orderPreserving but OrderBy is reverse)

> OrderBy should be compiled out if GroupBy is not orderPreserving and OrderBy 
> is reverse
> ---
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3491) OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Summary: OrderBy can not be compiled out if GroupBy is not orderPreserving 
and OrderBy is reverse  (was: OrderBy should be compiled out if GroupBy is not 
orderPreserving and OrderBy is reverse)

> OrderBy can not be compiled out if GroupBy is not orderPreserving and OrderBy 
> is reverse
> 
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508
 ] 

chenglei edited comment on PHOENIX-3491 at 11/18/16 12:07 PM:
--

The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
Expression expression = RowKeyExpression.INSTANCE;
boolean isNullsLast=false;
boolean isAscending=true;
if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) {
isNullsLast=true; //which is needed for the whole rowKey.
isAscending=false;
}
OrderByExpression orderByExpression = new 
OrderByExpression(expression, isNullsLast, isAscending);
int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
}
{code} 

After modifying AggregatePlan ,It seems we can remove the code in 
OrderPreservingTracker,
I include a lot of test cases in my patch to prove it, such as ASC/DESC, Salted 
Table and NULLS FIRST/LAST.

 




was (Author: comnetwork):
The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 

[jira] [Commented] (PHOENIX-3451) Incorrect determination of preservation of order for an aggregate query leads to incorrect query results

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676576#comment-15676576
 ] 

chenglei commented on PHOENIX-3451:
---

[~jamestaylor], I filed a  JIRA  PHOENIX-3491 to  remove the following code you 
mentioned, and uploaded my patch, please help me review,thank you.
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

> Incorrect determination of preservation of order for an aggregate query leads 
> to incorrect query results
> 
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Fix For: 4.9.0, 4.8.2
>
> Attachments: PHOENIX-3451_v1.patch, PHOENIX-3451_v2.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Attachment: (was: PHOENIX-3491_v1.patch)

> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Attachment: PHOENIX-3491_v1.patch

> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Attachment: PHOENIX-3491_v1.patch

> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
> Attachments: PHOENIX-3491_v1.patch
>
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508
 ] 

chenglei edited comment on PHOENIX-3491 at 11/18/16 11:54 AM:
--

The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

The above code and its comment seems very strange, in fact,if GroupBy is not 
orderPreserving,  AggregatePlan would sort the aggregated Keys after geting 
results from RegionServer at the client side, but AggregatePlan always uses ASC 
order to sort the aggregated Keys, just as the following code,so if we just 
remove above code, the query result will be error when OrderBy is 
OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
Expression expression = RowKeyExpression.INSTANCE;
boolean isNullsLast=false;
boolean isAscending=true;
if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) {
isNullsLast=true; //which is needed for the whole rowKey.
isAscending=false;
}
OrderByExpression orderByExpression = new 
OrderByExpression(expression, isNullsLast, isAscending);
int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
}
{code} 

After modifying AggregatePlan ,It seems we can remove the code in 
OrderPreservingTracker,
I include a lot of test cases to prove it, such as ASC/DESC, Salted Table and 
NULLS FIRST/LAST.

 




was (Author: comnetwork):
The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

In fact,if GroupBy is not orderPreserving,  AggregatePlan would sort the 
aggregated Keys after geting results from RegionServer at the client side, but 
AggregatePlan always uses ASC order to sort the aggregated Keys, just as the 
following code,so if we just remove above code, the query result will be error 
when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 


[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508
 ] 

chenglei edited comment on PHOENIX-3491 at 11/18/16 11:24 AM:
--

The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
123/*
124 * When a GROUP BY is not order preserving, we 
cannot do a reverse
125 * scan to eliminate the ORDER BY since our 
server-side scan is not
126 * ordered in that case.
127 */
128if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
129isOrderPreserving = false;
130isReverse = false;
131return;
132}
{code} 

In fact,if GroupBy is not orderPreserving,  AggregatePlan would sort the 
aggregated Keys after geting results from RegionServer at the client side, but 
AggregatePlan always uses ASC order to sort the aggregated Keys, just as the 
following code,so if we just remove above code, the query result will be error 
when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
Expression expression = RowKeyExpression.INSTANCE;
boolean isNullsLast=false;
boolean isAscending=true;
if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) {
isNullsLast=true; //which is needed for the whole rowKey.
isAscending=false;
}
OrderByExpression orderByExpression = new 
OrderByExpression(expression, isNullsLast, isAscending);
int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
}
{code} 






was (Author: comnetwork):
The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
/*
 * When a GROUP BY is not order preserving, we cannot 
do a reverse
 * scan to eliminate the ORDER BY since our server-side 
scan is not
 * ordered in that case.
 */
if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) 
{
isOrderPreserving = false;
isReverse = false;
return;
}
{code} 

In fact,if GroupBy is not orderPreserving,  AggregatePlan would sort the 
aggregated Keys after geting results from RegionServer at the client side, but 
AggregatePlan always uses ASC order to sort the aggregated Keys, just as the 
following code,so if we just remove above code, the query result will be error 
when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) 

[jira] [Commented] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676508#comment-15676508
 ] 

chenglei commented on PHOENIX-3491:
---

The OrderBy could not be compiled out because the following code in 
OrderPreservingTracker :

{code:borderStyle=solid} 
/*
 * When a GROUP BY is not order preserving, we cannot 
do a reverse
 * scan to eliminate the ORDER BY since our server-side 
scan is not
 * ordered in that case.
 */
if (!groupBy.isEmpty() && !groupBy.isOrderPreserving()) 
{
isOrderPreserving = false;
isReverse = false;
return;
}
{code} 

In fact,if GroupBy is not orderPreserving,  AggregatePlan would sort the 
aggregated Keys after geting results from RegionServer at the client side, but 
AggregatePlan always uses ASC order to sort the aggregated Keys, just as the 
following code,so if we just remove above code, the query result will be error 
when the OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY :

{code:borderStyle=solid} 
137 public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
138Expression expression = RowKeyExpression.INSTANCE;
139OrderByExpression orderByExpression = new 
OrderByExpression(expression, false, true);
140int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
141return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
142   }
{code} 

Therefore,we can modify the AggregatePlan to make it can sort the aggregated 
Keys ASC or DESC:
{code:borderStyle=solid} 
public PeekingResultIterator newIterator(StatementContext context, 
ResultIterator scanner, Scan scan, String tableName, QueryPlan plan) throws 
SQLException {
Expression expression = RowKeyExpression.INSTANCE;
boolean isNullsLast=false;
boolean isAscending=true;
if(this.orderBy==OrderBy.REV_ROW_KEY_ORDER_BY) {
isNullsLast=true; //which is needed for the whole rowKey.
isAscending=false;
}
OrderByExpression orderByExpression = new 
OrderByExpression(expression, isNullsLast, isAscending);
int threshold = 
services.getProps().getInt(QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
return new OrderedResultIterator(scanner, 
Collections.singletonList(orderByExpression), threshold);
}
{code} 





> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> 

[jira] [Assigned] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-17 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei reassigned PHOENIX-3491:
-

Assignee: chenglei

> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>Assignee: chenglei
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-16 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Description: 
for the following table:
{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
));
 {code}   
  
If we execute explain on the following sql: 
{code:borderStyle=solid}   
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
{code}

the result is :
{code:borderStyle=solid} 
--+
| PLAN |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER ORDERBY_TEST|
| SERVER FILTER BY FIRST KEY ONLY  |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
| CLIENT MERGE SORT|
| CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
+--+
{code} 

from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out  
as OrderBy.REV_ROW_KEY_ORDER_BY.

  was:
for the following table:
{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
));
 {code}   
  
If we execute explain on the following sql: 
{code:borderStyle=solid}   
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
{code}

the result is :
{code:borderStyle=solid} 
--+
| PLAN |
+--+
| CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST|
| SERVER FILTER BY FIRST KEY ONLY  |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
| CLIENT MERGE SORT|
| CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
+--+
{code} 

from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out  
as OrderBy.REV_ROW_KEY_ORDER_BY.


> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> 

[jira] [Commented] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672486#comment-15672486
 ] 

chenglei commented on PHOENIX-3491:
---

Just from PHOENIX-3451,but is irrelevant to PHOENIX-3451,so open a new JIRA

> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672486#comment-15672486
 ] 

chenglei edited comment on PHOENIX-3491 at 11/17/16 2:51 AM:
-

Just from PHOENIX-3451,but is irrelevant to PHOENIX-3451,so open a new JIRA.


was (Author: comnetwork):
Just from PHOENIX-3451,but is irrelevant to PHOENIX-3451,so open a new JIRA

> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out 
>  as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-16 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Summary: OrderBy should be compiled out when GroupBy is not orderPreserving 
but OrderBy is reverse  (was: OrderBy should be compiled out when GroupBy is 
not OrderPreserving but OrderBy is reverse)

> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> +--+
> | CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST|
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
> +--+
> {code} 
> from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
> DESC, SCORE DESC is not  compiled out,but obvious it should be compiled out  
> as OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not orderPreserving but OrderBy is reverse

2016-11-16 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3491:
--
Description: 
for the following table:
{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
));
 {code}   
  
If we execute explain on the following sql: 
{code:borderStyle=solid}   
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
{code}

the result is :
{code:borderStyle=solid} 
--+
| PLAN |
+--+
| CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST|
| SERVER FILTER BY FIRST KEY ONLY  |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
| CLIENT MERGE SORT|
| CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
+--+
{code} 

from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
DESC, SCORE DESC is not  compiled out,but obviously it should be compiled out  
as OrderBy.REV_ROW_KEY_ORDER_BY.

  was:
for the following table:
{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
));
 {code}   
  
If we execute explain on the following sql: 
{code:borderStyle=solid}   
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
{code}

the result is :
{code:borderStyle=solid} 
--+
| PLAN |
+--+
| CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST|
| SERVER FILTER BY FIRST KEY ONLY  |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
| CLIENT MERGE SORT|
| CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
+--+
{code} 

from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
DESC, SCORE DESC is not  compiled out,but obvious it should be compiled out  as 
OrderBy.REV_ROW_KEY_ORDER_BY.


> OrderBy should be compiled out when GroupBy is not orderPreserving but 
> OrderBy is reverse
> -
>
> Key: PHOENIX-3491
> URL: https://issues.apache.org/jira/browse/PHOENIX-3491
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: chenglei
>
> for the following table:
> {code:borderStyle=solid}
> CREATE TABLE ORDERBY_TEST ( 
> ORGANIZATION_ID INTEGER NOT NULL,
> CONTAINER_ID INTEGER NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID INTEGER NOT NULL, 
>CONSTRAINT TEST_PK PRIMARY KEY ( 
> ORGANIZATION_ID,
> CONTAINER_ID,
> SCORE,
> ENTITY_ID
> ));
>  {code}   
>   
> If we execute explain on the following sql: 
> {code:borderStyle=solid}   
> SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
> SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
> {code}
> the result is :
> {code:borderStyle=solid} 
> --+
> | PLAN |
> 

[jira] [Created] (PHOENIX-3491) OrderBy should be compiled out when GroupBy is not OrderPreserving but OrderBy is reverse

2016-11-16 Thread chenglei (JIRA)
chenglei created PHOENIX-3491:
-

 Summary: OrderBy should be compiled out when GroupBy is not 
OrderPreserving but OrderBy is reverse
 Key: PHOENIX-3491
 URL: https://issues.apache.org/jira/browse/PHOENIX-3491
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.8.0
Reporter: chenglei


for the following table:
{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
));
 {code}   
  
If we execute explain on the following sql: 
{code:borderStyle=solid}   
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC 
{code}

the result is :
{code:borderStyle=solid} 
--+
| PLAN |
+--+
| CLIENT 2-CHUNK PARALLEL 2-WAY FULL SCAN OVER ORDERBY_TEST|
| SERVER FILTER BY FIRST KEY ONLY  |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [ORGANIZATION_ID, SCORE]  |
| CLIENT MERGE SORT|
| CLIENT SORTED BY [ORGANIZATION_ID DESC, SCORE DESC]  |
+--+
{code} 

from the above explain result, we can see that the ORDER BY ORGANIZATION_ID 
DESC, SCORE DESC is not  compiled out,but obvious it should be compiled out  as 
OrderBy.REV_ROW_KEY_ORDER_BY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670770#comment-15670770
 ] 

chenglei commented on PHOENIX-3451:
---

Thank you for feedback ,[~jamestaylor],  yes, if those lines are added back, it 
seems Ok, I will test a bit more, and I will open a new JIRA for the  
optimization problem.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 2:49 PM:
-

[~jamestaylor], I have a problem when I testing your patch:
Why did you remove out the following lines,or may be because your another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);

SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

Indeed the above code which you removed in your patch should be removed,but if  
the AggregatePlan is not modified, just remove out the above code may cause 
problem. It  is actually a optimization and is irrelevant to PHOENIX-3451,maybe 
I can open a new JIRA if the JIRA does not exist.



was (Author: comnetwork):
[~jamestaylor], I have a problem when I tested your patch:
Why did you remove out the following lines,or may be because your another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 2:49 PM:
-

[~jamestaylor], I have a problem when I testing your patch:
Why did you remove out the following lines,or maybe because your another almost 
ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);

SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

Indeed the above code which you removed in your patch should be removed,but if  
the AggregatePlan is not modified, just remove out the above code may cause 
problem. It  is actually a optimization and is irrelevant to PHOENIX-3451,maybe 
I can open a new JIRA if the JIRA does not exist.



was (Author: comnetwork):
[~jamestaylor], I have a problem when I testing your patch:
Why did you remove out the following lines,or may be because your another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 2:48 PM:
-

[~jamestaylor], I have a problem when I tested your patch:
Why did you remove out the following lines,or may be because your another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);

SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY ORGANIZATION_ID, 
SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

Indeed the above code which you removed in your patch should be removed,but if  
the AggregatePlan is not modified, just remove out the above code may cause 
problem. It  is actually a optimization and is irrelevant to PHOENIX-3451,maybe 
I can open a new JIRA if the JIRA does not exist.



was (Author: comnetwork):
[~jamestaylor], I have a problem when I tested your patch:
Why did you remove out the following lines,or may be because your another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 2:46 PM:
-

[~jamestaylor], I have a problem when I tested your patch:
Why did you remove out the following lines,or may be because your another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);

   SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  GROUP BY 
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sort the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

Indeed the above code which you removed in your patch should be removed,but if  
the AggregatePlan is not modified, just remove out the above code may cause 
problem. It  is actually a optimization and is irrelevant to PHOENIX-3451,maybe 
I can open a new JIRA if the JIRA does not exist.



was (Author: comnetwork):
[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 2:38 PM:
-

[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  group by 
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

So indeed the above code should be removed,but if  the AggregatePlan is not 
modified, just remove out the above code may cause problem, this actually is a 
optimization and is irrelevant to PHOENIX-3451,maybe I can open a new JIRA if 
the JIRA does not exist.



was (Author: comnetwork):
[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 11:44 AM:
--

[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  group by 
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

So if  the AggregatePlan is not modified, just remove out the above code may 
cause problem. Maybe I can open a new JIRA to fix this
problem if the JIRA does not  exist,because it is irrelevant to PHOENIX-3451



was (Author: comnetwork):
[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 11:41 AM:
--

[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  group by 
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

So if  the AggregatePlan is not modified, just remove out the above code may 
cause problem. Maybe I can open a new JIRA to fix this
problem if the JIRA does not  exist.



was (Author: comnetwork):
[~jamestaylor],I have a problem with your patch:
why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);

[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670234#comment-15670234
 ] 

chenglei commented on PHOENIX-3451:
---

[~jamestaylor],I have a problem with your patch:
why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-/*
- * When a GROUP BY is not order preserving, we cannot 
do a reverse
- * scan to eliminate the ORDER BY since our 
server-side scan is not
- * ordered in that case.
- */
-if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-isOrderPreserving = false;
-isReverse = false;
-return;
-}
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST ( 
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL, 
   CONSTRAINT TEST_PK PRIMARY KEY ( 
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
 
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  group by 
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
   
{code}

expecting results are:
{code:borderStyle=solid}
  6,6
  5,5
  4,4
  3,3
  2,2
  1,1
{code}

but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232aggResultIterator = new GroupedAggregatingResultIterator(
233new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

So if  the AggregatePlan is not modified, just remove out the above code may 
cause problem. Maybe I can open a new JIRA to fix this
problem if the JIRA does not  exist.


> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669252#comment-15669252
 ] 

chenglei edited comment on PHOENIX-3451 at 11/16/16 3:39 AM:
-

[~jamestaylor], thank your explanation for the patch, I had just make a new 
patch just like you, but your patch is more comprehensive than me, I think your 
patch is good, and I will do more tests for some details of your patch. 


was (Author: comnetwork):
[~jamestaylor], thank your explanation, I had just make a new patch just like 
you, but your patch is more comprehensive than me, I think your patch is good, 
and I will do more tests for some details of your patch. 

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669252#comment-15669252
 ] 

chenglei commented on PHOENIX-3451:
---

[~jamestaylor], thank your explanation, I had just make a new patch just like 
you, but your patch is more comprehensive than me, I think your patch is good, 
and I will do more tests for some details of your patch. 

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667149#comment-15667149
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 4:16 PM:
-

[~jamestaylor], thank you for your suggestion, I am sorry my patch indeed is 
not good,I will modify according your suggestion.


was (Author: comnetwork):
@James Taylor, thank you for your suggestion, I am sorry my patch indeed is not 
good,I will modify according your suggestion.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3451:
--
Attachment: (was: PHOENIX-3451.diff)

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667149#comment-15667149
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 1:37 PM:
-

@James Taylor, thank you for your suggestion, I am sorry my patch indeed is not 
good,I will modify according your suggestion.


was (Author: comnetwork):
@James Taylor, thank you for your suggestion, my patch indeed is not good,I 
will modify according your suggestion.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667149#comment-15667149
 ] 

chenglei commented on PHOENIX-3451:
---

@James Taylor, thank you for your suggestion, my patch indeed is not good,I 
will modify according your suggestion.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3451:
--
Comment: was deleted

(was: [~jamestaylor].thank your for your suggestion, my considerations are as 
follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving".
Take the following SQL as a example,the GROUP BY and ORDER BY are certainly not 
"isOrderPreserving" :

select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) 
order by pkCol1 + 1, TRUNC(pkCol2)

So I think my patch is Ok, just as the following code explained,  it just needs 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.)

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 8:02 AM:
-

[~jamestaylor].thank your for your suggestion, my considerations are as follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving".
Take the following SQL as a example,the GROUP BY and ORDER BY are certainly not 
"isOrderPreserving" :

select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) 
order by pkCol1 + 1, TRUNC(pkCol2)

So I think my patch is Ok, just as the following code explained,  it just needs 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.


was (Author: comnetwork):
[~jamestaylor].thank your for your suggestion, my considerations are as follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving", take the following SQL as a 
example,the GROUP BY and ORDER BY are certainly not "isOrderPreserving" :

select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) 
order by pkCol1 + 1, TRUNC(pkCol2)

So I think my patch is Ok, just as the following code explained,  it just needs 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> 

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-15 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 8:01 AM:
-

[~jamestaylor].thank your for your suggestion, my considerations are as follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving", take the following SQL as a 
example,the GROUP BY and ORDER BY are certainly not "isOrderPreserving" :

select pkCol1 + 1,TRUNC(pkCol2) from table group by pkCol1 + 1, TRUNC(pkCol2) 
order by pkCol1 + 1, TRUNC(pkCol2)

So I think my patch is Ok, just as the following code explained,  it just needs 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.


was (Author: comnetwork):
[~jamestaylor].thank your for your suggestion, my considerations as follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving".

So I think my patch is Ok, just as the following code explained,  it just needs 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>  

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 7:42 AM:
-

[~jamestaylor].thank your for your suggestion, my considerations as follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving".

So I think my patch is Ok, just as the following code explained,  it just needs 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.


was (Author: comnetwork):
[~jamestaylor].thank your for your suggestion, my considerations as follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving".

So I think my patch is Ok, just as the following code explained,  it just need 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS 

[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666402#comment-15666402
 ] 

chenglei commented on PHOENIX-3451:
---

[~jamestaylor].thank your for your suggestion, my considerations as follows:

1. If  GroupBy is "GROUP BY pkCol1 + 1, TRUNC(pkCol2)", the OrderBy must be 
"ORDER BY pkCol1 + 1" or "ORDER BY TRUNC(pkCol2)", the OrderBy columns must 
match the GroupBy columns.
2. Only when all  GROUP BY/Order BY expressions are simple RowKey Columns (i.e. 
GROUP BY pkCol1, pkCol2 or OrderBy BY pkCol1, pkCol2) , we have necessary to go 
further to  check if  the  GROUP BY/Order BY is "isOrderPreserving". If  GROUP 
BY/Order BY expressions are not simple RowKey Columns(i.e.GROUP BY pkCol1 + 1, 
TRUNC(pkCol2) or ORDER BY pkCol1 + 1, TRUNC(pkCol2)), surely the  GROUP 
BY/Order BY should not be "isOrderPreserving".

So I think my patch is Ok, just as the following code explained,  it just need 
to only conside the RowKeyColumnExpression. RowKeyColumnExpression is enough 
for checking if the Order BY is "isOrderPreserving",for other type of 
Expression, the following visit method return null, and the 
OrderPreservingTracker.isOrderPreserving method will return false,which is as  
expected.

 {code:borderStyle=solid} 
@Override
public Info visit(RowKeyColumnExpression node) {
if(groupBy==null || groupBy.isEmpty()) {
return new Info(node.getPosition());
}
int pkPosition=node.getPosition();
assert pkPosition < groupBy.getExpressions().size();
Expression 
groupByExpression=groupBy.getExpressions().get(pkPosition);
if(!(groupByExpression instanceof RowKeyColumnExpression)) {
return null;
}
int 
orginalPkPosition=((RowKeyColumnExpression)groupByExpression).getPosition();
return new Info(orginalPkPosition);
}
 {code} 

By the way, I had already considered the modification as same as your 
suggestion when I made my patch, finally I select current patch because it is 
more simpler ,and the modification is just restricted in the single 
OrderPreservingTracker class,FYI.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 3:30 AM:
-

[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the 
patch actually did like you said.

My patch do not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy. My patch only 
changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving 
method,the final Orderby's orderByExpressions are not  created by the Info's 
pkPosition, Info's pkPosition only affects OrderPreservingTracker. 
isOrderPreserving method.  In OrderPreservingTracker. isOrderPreserving 
method,the pkPosition  must be the position in  original RowKey columns if the 
SQL has GroupBy.




was (Author: comnetwork):
[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the 
patch actually did like you said.

My patch do not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving 
method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition  
must be the position in  original RowKey columns if the SQL has GroupBy.



> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 3:26 AM:
-

[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, the 
patch actually did like you said.

My patch do not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving 
method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition  
must be the position in  original RowKey columns if the SQL has GroupBy.




was (Author: comnetwork):
[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I 
indeed patch like you said.

My patch do not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving 
method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition  
must be the position in  original RowKey columns if the SQL has GroupBy.



> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 3:25 AM:
-

[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I 
indeed patch like you said.

My patch do not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker.isOrderPreserving 
method, In OrderPreservingTracker. isOrderPreserving method,the pkPosition  
must be the position in  original RowKey columns if the SQL has GroupBy.




was (Author: comnetwork):
[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I 
indeed patch like you said.

My patch do not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker's 
isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving 
method,the pkPosition  must be the position in  original RowKey columns if the 
sql  exists GroupBy.



> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 3:25 AM:
-

[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I 
indeed patch like you said.

My patch do not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker's 
isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving 
method,the pkPosition  must be the position in  original RowKey columns if the 
sql  exists GroupBy.




was (Author: comnetwork):
[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I 
indeed patch like you said.

My patch did not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker's 
isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving 
method,the pkPosition  must be the position in  original RowKey columns if the 
sql  exists GroupBy.



> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665874#comment-15665874
 ] 

chenglei edited comment on PHOENIX-3451 at 11/15/16 3:24 AM:
-

[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I 
indeed patch like you said.

My patch did not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch only 
changes the Info's pkPosition used in OrderPreservingTracker's 
isOrderPreserving method, In OrderPreservingTracker's isOrderPreserving 
method,the pkPosition  must be the position in  original RowKey columns if the 
sql  exists GroupBy.




was (Author: comnetwork):
[~jamestaylor], It seems you did not look at my uploaded PHOENIX-3451.diff, I 
indeed patch like you said.

My patch did not change the final Orderby's orderByExpressions, 
orderByExpression's position is still the position in GroupBy.My patch is only 
change the Info's pkPosition used in OrderPreservingTracker's isOrderPreserving



> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451.diff
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   8   9   10   >