Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20379 )
Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits ...................................................................... Patch Set 7: (1 comment) > Patch Set 7: > > (1 comment) Unless I misunderstood your example, we can't sort on partition keys > create table foo (s string) partitioned by (a int, b int) sort by (a, b); Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b) ERROR: AnalysisException: SORT BY column list must not contain partition column: 'a' http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786 PS4, Line 786: && isMultiPhase() Unless I misunderstood your example, we can't sort on partition keys > default> create table foo (s string) partitioned by (a int, b int) sort by > (a, b); Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b) ERROR: AnalysisException: SORT BY column list must not contain partition column: 'a' If you go through and examine every AggregationNode we create in DistributedPlanner, they either setIntermediateTuple or setEndsMultiPhase, so this should have no functional difference with DistributedPlanner. The only case this should have any impact is when SingleNodePlanner creates a single AggregationNode, and there's nothing to push down to. When I look at examples for a similar case > create table foo (c int, d int) partitioned by (a int, b int) sort by (c, d); where I've created 3 partitions and run > select distinct a, b from foo limit 2 the plans are unchanged with and without this patch Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------- F02:ROOT 1 1 323.181us 323.181us 4.01 MB 4.00 MB 04:EXCHANGE 1 1 12.331us 12.331us 2 2 24.00 KB 16.00 KB UNPARTITIONED F01:EXCHANGE SENDER 3 3 34.995us 43.647us 14.25 KB 48.00 KB 03:AGGREGATE 3 3 407.997us 513.288us 3 2 2.08 MB 10.00 MB FINALIZE 02:EXCHANGE 3 3 10.555us 14.924us 3 2 16.00 KB 16.00 KB HASH(a,b) F00:EXCHANGE SENDER 3 3 51.210us 60.198us 46.69 KB 144.00 KB 01:AGGREGATE 3 3 157.104us 183.497us 3 2 2.03 MB 10.00 MB STREAMING 00:SCAN HDFS 3 3 1.612ms 1.962ms 3 2 32.00 KB 32.00 MB default.foo Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------- F02:ROOT 1 1 64.470us 64.470us 4.01 MB 4.00 MB 04:EXCHANGE 1 1 9.816us 9.816us 2 2 24.00 KB 16.00 KB UNPARTITIONED F01:EXCHANGE SENDER 3 3 22.339us 23.764us 14.28 KB 48.00 KB 03:AGGREGATE 3 3 290.786us 323.453us 3 2 2.08 MB 10.00 MB FINALIZE 02:EXCHANGE 3 3 16.727us 31.736us 3 2 24.00 KB 16.00 KB HASH(a,b) F00:EXCHANGE SENDER 3 3 62.719us 69.838us 46.69 KB 144.00 KB 01:AGGREGATE 3 3 179.794us 196.761us 3 2 2.03 MB 10.00 MB STREAMING 00:SCAN HDFS 3 3 40.368ms 42.799ms 3 2 32.00 KB 32.00 MB default.foo Same for SingleNodePlanner. Same results with > select distinct c, d from foo limit 2 -- To view, visit http://gerrit.cloudera.org:8080/20379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816 Gerrit-Change-Number: 20379 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Fri, 25 Aug 2023 00:10:16 +0000 Gerrit-HasComments: Yes