Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
......................................................................


Patch Set 7:

(1 comment)

> Patch Set 7:
>
> (1 comment)

Unless I misunderstood your example, we can't sort on partition keys

> create table foo (s string) partitioned by (a int, b int) sort by (a, b);
Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b)
ERROR: AnalysisException: SORT BY column list must not contain partition 
column: 'a'

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
Unless I misunderstood your example, we can't sort on partition keys

> default> create table foo (s string) partitioned by (a int, b int) sort by 
> (a, b);
Query: create table foo (s string) partitioned by (a int, b int) sort by (a, b)
ERROR: AnalysisException: SORT BY column list must not contain partition 
column: 'a'

If you go through and examine every AggregationNode we create in 
DistributedPlanner, they either setIntermediateTuple or setEndsMultiPhase, so 
this should have no functional difference with DistributedPlanner. The only 
case this should have any impact is when SingleNodePlanner creates a single 
AggregationNode, and there's nothing to push down to.

When I look at examples for a similar case

> create table foo (c int, d int) partitioned by (a int, b int) sort by (c, d);

where I've created 3 partitions and run

> select distinct a, b from foo limit 2

the plans are unchanged with and without this patch

    Operator              #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. 
#Rows  Peak Mem  Est. Peak Mem  Detail
    
---------------------------------------------------------------------------------------------------------------------
    F02:ROOT                   1      1  323.181us  323.181us                   
   4.01 MB        4.00 MB         
    04:EXCHANGE                1      1   12.331us   12.331us      2           
2  24.00 KB       16.00 KB  UNPARTITIONED
    F01:EXCHANGE SENDER        3      3   34.995us   43.647us                   
  14.25 KB       48.00 KB
    03:AGGREGATE               3      3  407.997us  513.288us      3           
2   2.08 MB       10.00 MB  FINALIZE
    02:EXCHANGE                3      3   10.555us   14.924us      3           
2  16.00 KB       16.00 KB  HASH(a,b) 
    F00:EXCHANGE SENDER        3      3   51.210us   60.198us                   
  46.69 KB      144.00 KB
    01:AGGREGATE               3      3  157.104us  183.497us      3           
2   2.03 MB       10.00 MB  STREAMING
    00:SCAN HDFS               3      3    1.612ms    1.962ms      3           
2  32.00 KB       32.00 MB  default.foo

    Operator              #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. 
#Rows  Peak Mem  Est. Peak Mem  Detail
    
---------------------------------------------------------------------------------------------------------------------
    F02:ROOT                   1      1   64.470us   64.470us                   
   4.01 MB        4.00 MB
    04:EXCHANGE                1      1    9.816us    9.816us      2           
2  24.00 KB       16.00 KB  UNPARTITIONED
    F01:EXCHANGE SENDER        3      3   22.339us   23.764us                   
  14.28 KB       48.00 KB
    03:AGGREGATE               3      3  290.786us  323.453us      3           
2   2.08 MB       10.00 MB  FINALIZE
    02:EXCHANGE                3      3   16.727us   31.736us      3           
2  24.00 KB       16.00 KB  HASH(a,b)
    F00:EXCHANGE SENDER        3      3   62.719us   69.838us                   
  46.69 KB      144.00 KB               
    01:AGGREGATE               3      3  179.794us  196.761us      3           
2   2.03 MB       10.00 MB  STREAMING
    00:SCAN HDFS               3      3   40.368ms   42.799ms      3           
2  32.00 KB       32.00 MB  default.foo

Same for SingleNodePlanner. Same results with

> select distinct c, d from foo limit 2



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:10:16 +0000
Gerrit-HasComments: Yes

Reply via email to