[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving
[ https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442221#comment-16442221 ] chenglei commented on PHOENIX-4690: --- Uploaded the second patch for renamed method name and added comments , because the end of {{AggregateIT.java}} file in 4.x-HBase-1.2 has a more newline than other branches, so there is a separating patch for 4.x-HBase-1.2 > GroupBy expressions should follow the order of PK Columns if GroupBy is > orderPreserving > --- > > Key: PHOENIX-4690 > URL: https://issues.apache.org/jira/browse/PHOENIX-4690 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0, 4.13.2 >Reporter: chenglei >Assignee: chenglei >Priority: Critical > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4690_4.x-HBase-1.2_v2.patch, > PHOENIX-4690_v1.patch, PHOENIX-4690_v2.patch > > > Given a table : > {code} > create table test ( >pk1 integer not null , >pk2 integer not null, >v integer, >CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2)) > {code} > and some data: > {code} > +--+--+-+ > | PK1 | PK2 | V | > +--+--+-+ > | 1| 8| 10 | > | 1| 9| 11 | > | 2| 3| 13 | > | 2| 7| 15 | > | 3| 2| 17 | > +--+--+-+ > {code} > for following sql : > {code} > select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1 > {code} > the expected result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 2| 3| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 8| 1| 1 | > | 9| 1| 1 | > +--+--+---+ > {code} > but the actual result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 8| 1| 1 | > | 9| 1| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 2| 3| 1 | > +--+--+---+ > {code} > The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for > {{group by pk2,pk1}}, > {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the > {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}}, > but in line 158, for the new GroupBy, the GroupBy.expressions is still > {{pk2,pk1}},not the actual {{pk1,pk2}}: > {code} > 141 public GroupBy compile(StatementContext context, TupleProjector > tupleProjector) throws SQLException { > 142 boolean isOrderPreserving = this.isOrderPreserving; > 143 int orderPreservingColumnCount = 0; > 144 if (isOrderPreserving) { > 145OrderPreservingTracker tracker = new > OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, > expressions.size(), tupleProjector); > 146 for (int i = 0; i < expressions.size(); i++) { > 147Expression expression = expressions.get(i); > 148tracker.track(expression); > 149} > 150 > 151// This is true if the GROUP BY is composed of only PK > columns. We further check here that > 152// there are no "gaps" in the PK columns positions used > (i.e. we start with the first PK > 153// column and use each subsequent one in PK order). > 154isOrderPreserving = tracker.isOrderPreserving(); > 155orderPreservingColumnCount = > tracker.getOrderPreservingColumnCount(); > 156} > 157if (isOrderPreserving || isUngroupedAggregate) { > 158return new > GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving) > .setOrderPreservingColumnCount(orderPreservingColumnCount).build(); > 159 } > {code} > Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, > because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} > is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in > above {{GroupBy.compile}} method, so the result of > {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}. > But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so > we need to execute {{order by pk2,pk1}} after the {{group by pk1,pk2}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving
[ https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442223#comment-16442223 ] chenglei commented on PHOENIX-4690: --- Pushed to master, 4.x-HBase-1.3, 4.x-HBase-1.2, 4.x-HBase-1.1, 4.x-HBase-0.98, and 5.x-HBase-2.0 branch > GroupBy expressions should follow the order of PK Columns if GroupBy is > orderPreserving > --- > > Key: PHOENIX-4690 > URL: https://issues.apache.org/jira/browse/PHOENIX-4690 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0, 4.13.2 >Reporter: chenglei >Assignee: chenglei >Priority: Critical > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4690_4.x-HBase-1.2_v2.patch, > PHOENIX-4690_v1.patch, PHOENIX-4690_v2.patch > > > Given a table : > {code} > create table test ( >pk1 integer not null , >pk2 integer not null, >v integer, >CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2)) > {code} > and some data: > {code} > +--+--+-+ > | PK1 | PK2 | V | > +--+--+-+ > | 1| 8| 10 | > | 1| 9| 11 | > | 2| 3| 13 | > | 2| 7| 15 | > | 3| 2| 17 | > +--+--+-+ > {code} > for following sql : > {code} > select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1 > {code} > the expected result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 2| 3| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 8| 1| 1 | > | 9| 1| 1 | > +--+--+---+ > {code} > but the actual result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 8| 1| 1 | > | 9| 1| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 2| 3| 1 | > +--+--+---+ > {code} > The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for > {{group by pk2,pk1}}, > {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the > {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}}, > but in line 158, for the new GroupBy, the GroupBy.expressions is still > {{pk2,pk1}},not the actual {{pk1,pk2}}: > {code} > 141 public GroupBy compile(StatementContext context, TupleProjector > tupleProjector) throws SQLException { > 142 boolean isOrderPreserving = this.isOrderPreserving; > 143 int orderPreservingColumnCount = 0; > 144 if (isOrderPreserving) { > 145OrderPreservingTracker tracker = new > OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, > expressions.size(), tupleProjector); > 146 for (int i = 0; i < expressions.size(); i++) { > 147Expression expression = expressions.get(i); > 148tracker.track(expression); > 149} > 150 > 151// This is true if the GROUP BY is composed of only PK > columns. We further check here that > 152// there are no "gaps" in the PK columns positions used > (i.e. we start with the first PK > 153// column and use each subsequent one in PK order). > 154isOrderPreserving = tracker.isOrderPreserving(); > 155orderPreservingColumnCount = > tracker.getOrderPreservingColumnCount(); > 156} > 157if (isOrderPreserving || isUngroupedAggregate) { > 158return new > GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving) > .setOrderPreservingColumnCount(orderPreservingColumnCount).build(); > 159 } > {code} > Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, > because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} > is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in > above {{GroupBy.compile}} method, so the result of > {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}. > But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so > we need to execute {{order by pk2,pk1}} after the {{group by pk1,pk2}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving
[ https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440461#comment-16440461 ] chenglei commented on PHOENIX-4690: --- [~jamestaylor] , thank you for the review. bq.Orthogonal to this, but it'd be good if we had a way to know if it's better to have the GROUP BY be order preserving followed by a client-side sort, or for the GROUP BY to be non order preserving with a merge sort on the client. I think we need histograms to make that determination. > GroupBy expressions should follow the order of PK Columns if GroupBy is > orderPreserving > --- > > Key: PHOENIX-4690 > URL: https://issues.apache.org/jira/browse/PHOENIX-4690 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0, 4.13.2 >Reporter: chenglei >Assignee: chenglei >Priority: Critical > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4690_v1.patch > > > Given a table : > {code} > create table test ( >pk1 integer not null , >pk2 integer not null, >v integer, >CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2)) > {code} > and some data: > {code} > +--+--+-+ > | PK1 | PK2 | V | > +--+--+-+ > | 1| 8| 10 | > | 1| 9| 11 | > | 2| 3| 13 | > | 2| 7| 15 | > | 3| 2| 17 | > +--+--+-+ > {code} > for following sql : > {code} > select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1 > {code} > the expected result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 2| 3| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 8| 1| 1 | > | 9| 1| 1 | > +--+--+---+ > {code} > but the actual result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 8| 1| 1 | > | 9| 1| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 2| 3| 1 | > +--+--+---+ > {code} > The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for > {{group by pk2,pk1}}, > {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the > {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}}, > but in line 158, for the new GroupBy, the GroupBy.expressions is still > {{pk2,pk1}},not the actual {{pk1,pk2}}: > {code} > 141 public GroupBy compile(StatementContext context, TupleProjector > tupleProjector) throws SQLException { > 142 boolean isOrderPreserving = this.isOrderPreserving; > 143 int orderPreservingColumnCount = 0; > 144 if (isOrderPreserving) { > 145OrderPreservingTracker tracker = new > OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, > expressions.size(), tupleProjector); > 146 for (int i = 0; i < expressions.size(); i++) { > 147Expression expression = expressions.get(i); > 148tracker.track(expression); > 149} > 150 > 151// This is true if the GROUP BY is composed of only PK > columns. We further check here that > 152// there are no "gaps" in the PK columns positions used > (i.e. we start with the first PK > 153// column and use each subsequent one in PK order). > 154isOrderPreserving = tracker.isOrderPreserving(); > 155orderPreservingColumnCount = > tracker.getOrderPreservingColumnCount(); > 156} > 157if (isOrderPreserving || isUngroupedAggregate) { > 158return new > GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving) > .setOrderPreservingColumnCount(orderPreservingColumnCount).build(); > 159 } > {code} > Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, > because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} > is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in > above {{GroupBy.compile}} method, so the result of > {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}. > But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so > we need to execute {{order by pk2,pk1}} after the {{group by pk1,pk2}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving
[ https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440355#comment-16440355 ] James Taylor commented on PHOENIX-4690: --- +1. Nice explanation and great work, [~comnetwork]. Orthogonal to this, but it'd be good if we had a way to know if it's better to have the GROUP BY be order preserving followed by a client-side sort, or for the GROUP BY to be non order preserving with a merge sort on the client. I think we need histograms to make that determination. > GroupBy expressions should follow the order of PK Columns if GroupBy is > orderPreserving > --- > > Key: PHOENIX-4690 > URL: https://issues.apache.org/jira/browse/PHOENIX-4690 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0, 4.13.2 >Reporter: chenglei >Assignee: chenglei >Priority: Critical > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4690_v1.patch > > > Given a table : > {code} > create table test ( >pk1 integer not null , >pk2 integer not null, >v integer, >CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2)) > {code} > and some data: > {code} > +--+--+-+ > | PK1 | PK2 | V | > +--+--+-+ > | 1| 8| 10 | > | 1| 9| 11 | > | 2| 3| 13 | > | 2| 7| 15 | > | 3| 2| 17 | > +--+--+-+ > {code} > for following sql : > {code} > select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1 > {code} > the expected result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 2| 3| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 8| 1| 1 | > | 9| 1| 1 | > +--+--+---+ > {code} > but the actual result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 8| 1| 1 | > | 9| 1| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 2| 3| 1 | > +--+--+---+ > {code} > The problem is caused by the {{GroupBy.compile}}, obviously, in line 154, for > {{group by pk2,pk1}}, > {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the > {{pk2,pk1}} to {{pk1,pk2}} following the order of PK columns {{pk1,pk2}}, > but in line 158, for the new GroupBy, the GroupBy.expressions is still > {{pk2,pk1}},not the actual {{pk1,pk2}}: > {code} > 141 public GroupBy compile(StatementContext context, TupleProjector > tupleProjector) throws SQLException { > 142 boolean isOrderPreserving = this.isOrderPreserving; > 143 int orderPreservingColumnCount = 0; > 144 if (isOrderPreserving) { > 145OrderPreservingTracker tracker = new > OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, > expressions.size(), tupleProjector); > 146 for (int i = 0; i < expressions.size(); i++) { > 147Expression expression = expressions.get(i); > 148tracker.track(expression); > 149} > 150 > 151// This is true if the GROUP BY is composed of only PK > columns. We further check here that > 152// there are no "gaps" in the PK columns positions used > (i.e. we start with the first PK > 153// column and use each subsequent one in PK order). > 154isOrderPreserving = tracker.isOrderPreserving(); > 155orderPreservingColumnCount = > tracker.getOrderPreservingColumnCount(); > 156} > 157if (isOrderPreserving || isUngroupedAggregate) { > 158return new > GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving) > .setOrderPreservingColumnCount(orderPreservingColumnCount).build(); > 159 } > {code} > Then when we compile {{order by pk2,pk1}} in {{OrderByCompiler.compile}}, > because {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} > is consistent with the GroupBy.expressions {{group by pk2,pk1}} created in > above {{GroupBy.compile}} method, so the result of > {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}. > But in fact,because the actual GroupBy.expressions is {{group by pk1,pk2}},so > we need to execute {{order by pk2,pk1}} after the {{group by pk1,pk2}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4690) GroupBy expressions should follow the order of PK Columns if GroupBy is orderPreserving
[ https://issues.apache.org/jira/browse/PHOENIX-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438290#comment-16438290 ] chenglei commented on PHOENIX-4690: --- I uploded the first patch, [~jamestaylor], please help me review > GroupBy expressions should follow the order of PK Columns if GroupBy is > orderPreserving > --- > > Key: PHOENIX-4690 > URL: https://issues.apache.org/jira/browse/PHOENIX-4690 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.2 >Reporter: chenglei >Priority: Critical > > Given a table : > {code} > create table test ( >pk1 integer not null , >pk2 integer not null, >v integer, >CONSTRAINT TEST_PK PRIMARY KEY (pk1,pk2)) > {code} > and some data: > {code} > +--+--+-+ > | PK1 | PK2 | V | > +--+--+-+ > | 1| 8| 10 | > | 1| 9| 11 | > | 2| 3| 13 | > | 2| 7| 15 | > | 3| 2| 17 | > +--+--+-+ > {code} > for following sql : > {code} > select pk2,pk1,count(v) from test group by pk2,pk1 order by pk2,pk1 > {code} > the expected result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 2| 3| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 8| 1| 1 | > | 9| 1| 1 | > +--+--+---+ > {code} > but the actual result is : > {code} > +--+--+---+ > | PK2 | PK1 | COUNT(V) | > +--+--+---+ > | 8| 1| 1 | > | 9| 1| 1 | > | 3| 2| 1 | > | 7| 2| 1 | > | 2| 3| 1 | > +--+--+---+ > {code} > The problem is caused by the GroupBy.compile, obviously, in line 154, for > {{group by pk2,pk1}}, > {{OrderPreservingTracker.isOrderPreserving}} is true by reorder the > {{pk2,pk1}} following the order of PK columns {{pk1,pk2}}, but in line 158, > for the new GroupBy, the GroupBy.expressions is still {{pk2,pk1}},not the > actual {{pk1,pk2}}: > {code} > 141 public GroupBy compile(StatementContext context, TupleProjector > tupleProjector) throws SQLException { > 142 boolean isOrderPreserving = this.isOrderPreserving; > 143 int orderPreservingColumnCount = 0; > 144 if (isOrderPreserving) { > 145OrderPreservingTracker tracker = new > OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, > expressions.size(), tupleProjector); > 146 for (int i = 0; i < expressions.size(); i++) { > 147Expression expression = expressions.get(i); > 148tracker.track(expression); > 149} > 150 > 151// This is true if the GROUP BY is composed of only PK > columns. We further check here that > 152// there are no "gaps" in the PK columns positions used > (i.e. we start with the first PK > 153// column and use each subsequent one in PK order). > 154isOrderPreserving = tracker.isOrderPreserving(); > 155orderPreservingColumnCount = > tracker.getOrderPreservingColumnCount(); > 156} > 157if (isOrderPreserving || isUngroupedAggregate) { > 158return new > GroupBy.GroupByBuilder(this).setIsOrderPreserving(isOrderPreserving) > .setOrderPreservingColumnCount(orderPreservingColumnCount).build(); > 159 } > {code} > Then when we compile {{order by pk2,pk1}} in OrderByCompiler.compile, because > {{the GroupBy.isOrderPreserving}} is true, and {{order by pk2,pk1}} is > consistent with the GroupBy.expressions {{group by pk2,pk1}}, so the result > of {{OrderByCompiler.compile}} is {{OrderBy.FWD_ROW_KEY_ORDER_BY}}. > But in fact,because the actual GroupBy.expression is {{group by pk1,pk2}},so > we need to execute {{order by pk2,pk1}} after > the {{group by pk1,pk2}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)