[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving

2018-04-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442221#comment-16442221
 ] 

chenglei commented on PHOENIX-4690:
---

Uploaded the second patch for renamed method name and added comments , because  
the end of {{AggregateIT.java}} file in 4.x-HBase-1.2 has a more newline than 
other branches, so there is a separating patch for 4.x-HBase-1.2 

> GroupBy expressions should follow the order of PK Columns if GroupBy is 
> orderPreserving
> ---
>
> Key: PHOENIX-4690
> URL: https://issues.apache.org/jira/browse/PHOENIX-4690
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0, 4.13.2
>Reporter: chenglei
>Assignee: chenglei
>Priority: Critical
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4690_4.x-HBase-1.2_v2.patch, 
> PHOENIX-4690_v1.patch, PHOENIX-4690_v2.patch
>
>
> Given a table :
> {code}
>  create table test ( 
>pk1 integer not null , 
>pk2 integer not null, 
>v integer, 
>CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
> {code}
> and some data:
> {code}
> +--+--+-+
> | PK1  | PK2  |  V  |
> +--+--+-+
> | 1| 8| 10  |
> | 1| 9| 11  |
> | 2| 3| 13  |
> | 2| 7| 15  |
> | 3| 2| 17  |
> +--+--+-+
> {code}
> for following sql :
> {code}
> select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
> {code}
> the expected result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 2| 3| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 8| 1| 1 |
> | 9| 1| 1 |
> +--+--+---+
> {code}
> but the actual result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 8| 1| 1 |
> | 9| 1| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 2| 3| 1 |
> +--+--+---+
> {code}
> The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for 
> {{group by pk2,pk1}},
> {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the 
> {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}},  
> but in line 158, for the new GroupBy, the GroupBy.expressions is still 
> {{pk2,pk1}},not the actual {{pk1,pk2}}:
> {code}
> 141   public GroupBy compile(StatementContext context, TupleProjector 
> tupleProjector) throws SQLException {
> 142  boolean isOrderPreserving = this.isOrderPreserving;
> 143  int orderPreservingColumnCount = 0;
> 144  if (isOrderPreserving) {
> 145OrderPreservingTracker tracker = new 
> OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
> expressions.size(), tupleProjector);
> 146  for (int i = 0; i < expressions.size(); i++) {
> 147Expression expression = expressions.get(i);
> 148tracker.track(expression);
> 149}
> 150
> 151// This is true if the GROUP BY is composed of only PK 
> columns. We further check here that
> 152// there are no "gaps" in the PK columns positions used 
> (i.e. we start with the first PK
> 153// column and use each subsequent one in PK order).
> 154isOrderPreserving = tracker.isOrderPreserving();
> 155orderPreservingColumnCount = 
> tracker.getOrderPreservingColumnCount();
> 156}
> 157if (isOrderPreserving || isUngroupedAggregate) {
> 158return new 
> GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
> .setOrderPreservingColumnCount(orderPreservingColumnCount).build();
> 159 }
> {code}
> Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, 
> because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} 
> is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in 
> above  {{GroupBy.compile}} method, so the result of  
> {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.
> But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so 
> we need to execute  {{order by pk2,pk1}} after the {{group by pk1,pk2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving

2018-04-18 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442223#comment-16442223
 ] 

chenglei commented on PHOENIX-4690:
---

Pushed to master, 4.x-HBase-1.3, 4.x-HBase-1.2,  4.x-HBase-1.1, 4.x-HBase-0.98, 
and 5.x-HBase-2.0 branch

> GroupBy expressions should follow the order of PK Columns if GroupBy is 
> orderPreserving
> ---
>
> Key: PHOENIX-4690
> URL: https://issues.apache.org/jira/browse/PHOENIX-4690
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0, 4.13.2
>Reporter: chenglei
>Assignee: chenglei
>Priority: Critical
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4690_4.x-HBase-1.2_v2.patch, 
> PHOENIX-4690_v1.patch, PHOENIX-4690_v2.patch
>
>
> Given a table :
> {code}
>  create table test ( 
>pk1 integer not null , 
>pk2 integer not null, 
>v integer, 
>CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
> {code}
> and some data:
> {code}
> +--+--+-+
> | PK1  | PK2  |  V  |
> +--+--+-+
> | 1| 8| 10  |
> | 1| 9| 11  |
> | 2| 3| 13  |
> | 2| 7| 15  |
> | 3| 2| 17  |
> +--+--+-+
> {code}
> for following sql :
> {code}
> select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
> {code}
> the expected result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 2| 3| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 8| 1| 1 |
> | 9| 1| 1 |
> +--+--+---+
> {code}
> but the actual result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 8| 1| 1 |
> | 9| 1| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 2| 3| 1 |
> +--+--+---+
> {code}
> The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for 
> {{group by pk2,pk1}},
> {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the 
> {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}},  
> but in line 158, for the new GroupBy, the GroupBy.expressions is still 
> {{pk2,pk1}},not the actual {{pk1,pk2}}:
> {code}
> 141   public GroupBy compile(StatementContext context, TupleProjector 
> tupleProjector) throws SQLException {
> 142  boolean isOrderPreserving = this.isOrderPreserving;
> 143  int orderPreservingColumnCount = 0;
> 144  if (isOrderPreserving) {
> 145OrderPreservingTracker tracker = new 
> OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
> expressions.size(), tupleProjector);
> 146  for (int i = 0; i < expressions.size(); i++) {
> 147Expression expression = expressions.get(i);
> 148tracker.track(expression);
> 149}
> 150
> 151// This is true if the GROUP BY is composed of only PK 
> columns. We further check here that
> 152// there are no "gaps" in the PK columns positions used 
> (i.e. we start with the first PK
> 153// column and use each subsequent one in PK order).
> 154isOrderPreserving = tracker.isOrderPreserving();
> 155orderPreservingColumnCount = 
> tracker.getOrderPreservingColumnCount();
> 156}
> 157if (isOrderPreserving || isUngroupedAggregate) {
> 158return new 
> GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
> .setOrderPreservingColumnCount(orderPreservingColumnCount).build();
> 159 }
> {code}
> Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, 
> because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} 
> is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in 
> above  {{GroupBy.compile}} method, so the result of  
> {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.
> But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so 
> we need to execute  {{order by pk2,pk1}} after the {{group by pk1,pk2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving

2018-04-16 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440461#comment-16440461
 ] 

chenglei commented on PHOENIX-4690:
---

[~jamestaylor] , thank you for the review.
bq.Orthogonal to this, but it'd be good if we had a way to know if it's better 
to have the GROUP BY be order preserving followed by a client-side sort, or for 
the GROUP BY to be non order preserving with a merge sort on the client. I 
think we need histograms to make that determination.

> GroupBy expressions should follow the order of PK Columns if GroupBy is 
> orderPreserving
> ---
>
> Key: PHOENIX-4690
> URL: https://issues.apache.org/jira/browse/PHOENIX-4690
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0, 4.13.2
>Reporter: chenglei
>Assignee: chenglei
>Priority: Critical
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4690_v1.patch
>
>
> Given a table :
> {code}
>  create table test ( 
>pk1 integer not null , 
>pk2 integer not null, 
>v integer, 
>CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
> {code}
> and some data:
> {code}
> +--+--+-+
> | PK1  | PK2  |  V  |
> +--+--+-+
> | 1| 8| 10  |
> | 1| 9| 11  |
> | 2| 3| 13  |
> | 2| 7| 15  |
> | 3| 2| 17  |
> +--+--+-+
> {code}
> for following sql :
> {code}
> select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
> {code}
> the expected result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 2| 3| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 8| 1| 1 |
> | 9| 1| 1 |
> +--+--+---+
> {code}
> but the actual result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 8| 1| 1 |
> | 9| 1| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 2| 3| 1 |
> +--+--+---+
> {code}
> The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for 
> {{group by pk2,pk1}},
> {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the 
> {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}},  
> but in line 158, for the new GroupBy, the GroupBy.expressions is still 
> {{pk2,pk1}},not the actual {{pk1,pk2}}:
> {code}
> 141   public GroupBy compile(StatementContext context, TupleProjector 
> tupleProjector) throws SQLException {
> 142  boolean isOrderPreserving = this.isOrderPreserving;
> 143  int orderPreservingColumnCount = 0;
> 144  if (isOrderPreserving) {
> 145OrderPreservingTracker tracker = new 
> OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
> expressions.size(), tupleProjector);
> 146  for (int i = 0; i < expressions.size(); i++) {
> 147Expression expression = expressions.get(i);
> 148tracker.track(expression);
> 149}
> 150
> 151// This is true if the GROUP BY is composed of only PK 
> columns. We further check here that
> 152// there are no "gaps" in the PK columns positions used 
> (i.e. we start with the first PK
> 153// column and use each subsequent one in PK order).
> 154isOrderPreserving = tracker.isOrderPreserving();
> 155orderPreservingColumnCount = 
> tracker.getOrderPreservingColumnCount();
> 156}
> 157if (isOrderPreserving || isUngroupedAggregate) {
> 158return new 
> GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
> .setOrderPreservingColumnCount(orderPreservingColumnCount).build();
> 159 }
> {code}
> Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, 
> because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} 
> is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in 
> above  {{GroupBy.compile}} method, so the result of  
> {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.
> But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so 
> we need to execute  {{order by pk2,pk1}} after the {{group by pk1,pk2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving

2018-04-16 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440355#comment-16440355
 ] 

James Taylor commented on PHOENIX-4690:
---

+1. Nice explanation and great work, [~comnetwork]. Orthogonal to this, but 
it'd be good if we had a way to know if it's better to have the GROUP BY be 
order preserving followed by a client-side sort, or for the GROUP BY to be non 
order preserving with a merge sort on the client. I think we need histograms to 
make that determination.

> GroupBy expressions should follow the order of PK Columns if GroupBy is 
> orderPreserving
> ---
>
> Key: PHOENIX-4690
> URL: https://issues.apache.org/jira/browse/PHOENIX-4690
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0, 4.13.2
>Reporter: chenglei
>Assignee: chenglei
>Priority: Critical
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4690_v1.patch
>
>
> Given a table :
> {code}
>  create table test ( 
>pk1 integer not null , 
>pk2 integer not null, 
>v integer, 
>CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
> {code}
> and some data:
> {code}
> +--+--+-+
> | PK1  | PK2  |  V  |
> +--+--+-+
> | 1| 8| 10  |
> | 1| 9| 11  |
> | 2| 3| 13  |
> | 2| 7| 15  |
> | 3| 2| 17  |
> +--+--+-+
> {code}
> for following sql :
> {code}
> select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
> {code}
> the expected result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 2| 3| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 8| 1| 1 |
> | 9| 1| 1 |
> +--+--+---+
> {code}
> but the actual result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 8| 1| 1 |
> | 9| 1| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 2| 3| 1 |
> +--+--+---+
> {code}
> The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for 
> {{group by pk2,pk1}},
> {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the 
> {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}},  
> but in line 158, for the new GroupBy, the GroupBy.expressions is still 
> {{pk2,pk1}},not the actual {{pk1,pk2}}:
> {code}
> 141   public GroupBy compile(StatementContext context, TupleProjector 
> tupleProjector) throws SQLException {
> 142  boolean isOrderPreserving = this.isOrderPreserving;
> 143  int orderPreservingColumnCount = 0;
> 144  if (isOrderPreserving) {
> 145OrderPreservingTracker tracker = new 
> OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
> expressions.size(), tupleProjector);
> 146  for (int i = 0; i < expressions.size(); i++) {
> 147Expression expression = expressions.get(i);
> 148tracker.track(expression);
> 149}
> 150
> 151// This is true if the GROUP BY is composed of only PK 
> columns. We further check here that
> 152// there are no "gaps" in the PK columns positions used 
> (i.e. we start with the first PK
> 153// column and use each subsequent one in PK order).
> 154isOrderPreserving = tracker.isOrderPreserving();
> 155orderPreservingColumnCount = 
> tracker.getOrderPreservingColumnCount();
> 156}
> 157if (isOrderPreserving || isUngroupedAggregate) {
> 158return new 
> GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
> .setOrderPreservingColumnCount(orderPreservingColumnCount).build();
> 159 }
> {code}
> Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, 
> because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} 
> is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in 
> above  {{GroupBy.compile}} method, so the result of  
> {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.
> But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so 
> we need to execute  {{order by pk2,pk1}} after the {{group by pk1,pk2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving

2018-04-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438290#comment-16438290
 ] 

chenglei commented on PHOENIX-4690:
---

I uploded the first patch, [~jamestaylor], please help me review

> GroupBy expressions should follow the order of PK Columns if GroupBy is 
> orderPreserving
> ---
>
> Key: PHOENIX-4690
> URL: https://issues.apache.org/jira/browse/PHOENIX-4690
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.2
>Reporter: chenglei
>Priority: Critical
>
> Given a table :
> {code}
>  create table test ( 
>pk1 integer not null , 
>pk2 integer not null, 
>v integer, 
>CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2))
> {code}
> and some data:
> {code}
> +--+--+-+
> | PK1  | PK2  |  V  |
> +--+--+-+
> | 1| 8| 10  |
> | 1| 9| 11  |
> | 2| 3| 13  |
> | 2| 7| 15  |
> | 3| 2| 17  |
> +--+--+-+
> {code}
> for following sql :
> {code}
> select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1
> {code}
> the expected result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 2| 3| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 8| 1| 1 |
> | 9| 1| 1 |
> +--+--+---+
> {code}
> but the actual result is :
> {code}
> +--+--+---+
> | PK2  | PK1  | COUNT(V)  |
> +--+--+---+
> | 8| 1| 1 |
> | 9| 1| 1 |
> | 3| 2| 1 |
> | 7| 2| 1 |
> | 2| 3| 1 |
> +--+--+---+
> {code}
> The problem is caused by the GroupBy.compile, obviously, in line 154, for 
> {{group by pk2,pk1}},
> {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the 
> {{pk2,pk1}} following the order of PK columns {{pk1,pk2}},  but in line 158, 
> for the new GroupBy, the GroupBy.expressions is still {{pk2,pk1}},not the 
> actual {{pk1,pk2}}:
> {code}
> 141   public GroupBy compile(StatementContext context, TupleProjector 
> tupleProjector) throws SQLException {
> 142  boolean isOrderPreserving = this.isOrderPreserving;
> 143  int orderPreservingColumnCount = 0;
> 144  if (isOrderPreserving) {
> 145OrderPreservingTracker tracker = new 
> OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
> expressions.size(), tupleProjector);
> 146  for (int i = 0; i < expressions.size(); i++) {
> 147Expression expression = expressions.get(i);
> 148tracker.track(expression);
> 149}
> 150
> 151// This is true if the GROUP BY is composed of only PK 
> columns. We further check here that
> 152// there are no "gaps" in the PK columns positions used 
> (i.e. we start with the first PK
> 153// column and use each subsequent one in PK order).
> 154isOrderPreserving = tracker.isOrderPreserving();
> 155orderPreservingColumnCount = 
> tracker.getOrderPreservingColumnCount();
> 156}
> 157if (isOrderPreserving || isUngroupedAggregate) {
> 158return new 
> GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving)
> .setOrderPreservingColumnCount(orderPreservingColumnCount).build();
> 159 }
> {code}
> Then when we compile {{order by pk2,pk1}} in OrderByCompiler.compile, because 
> {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} is 
> consistent with the GroupBy.expressions {{group by pk2,pk1}}, so  the result 
> of  {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}.
> But in fact,because the actual GroupBy.expression is {{group by pk1,pk2}},so 
> we need to execute  {{order by pk2,pk1}} after
> the {{group by pk1,pk2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)