David Rorke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@72
PS7, Line 72:   private static final double AGG_EXPR_COST_COEFFICIENT = 0.07;
            :   // Cost per input row with single grouping expr with single agg 
expr
            :   private static final double 
AGG_INPUT_SINGLE_GROUP_COST_COEFFICIENT = 0.2925;
            :   // Cost per output row produced with single grouping expr and 
single agg expr
            :   private static final double 
AGG_OUTPUT_SINGLE_GROUP_COST_COEFFICIENT = 2.6072;
            :   // Cost per input row with multiple grouping exprs and single 
agg expr
            :   private static final double 
AGG_INPUT_MULTI_GROUP_COST_COEFFICIENT = 1.3741;
            :   // Cost per output row produced with multiple grouping exprs 
and single agg expr
            :   private static final double 
AGG_OUTPUT_MULTI_GROUP_COST_COEFFICIENT = 4.5285;
> nit: Personally, I prefer related constant names to have common prefix, so
Done


http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@582
PS7, Line 582: fragment_.getNumInstances()
> I'm not sure about using getNumInstances() here. traverseEffectiveParalleli
As discussed offline, using the Analyzer min and max parallelism might not be 
better than what we currently have. Agree we could potentially come up with a 
better initial estimate here but needs some careful thought because we don't 
want to lose the useful info that the current code already captures (e.g in 
HdfsScanNode.computeNumNodes()).

For now I just changed this to call a wrapper getNumInstancesForCosting() that 
currently just calls getNumInstances() but we can try to improve this in the 
future.


http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
File fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java@116
PS7, Line 116: fragment_.getNumInstances()
> Another chicken-and-egg problem of using getNumInstances() before traverseE
See reply in AggregationNode.java


http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@633
PS7, Line 633: getNumInstances
> Another chicken-and-egg problem of using getNumInstances() before traverseE
See reply in AggregationNode.java


http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test:

http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@132
PS7, Line 132:    HDFS partitions=1824/1824 files=1824 size=199.18MB
             :    stored statistics:
             :      table: rows=2.88M size=199.18MB
> nit: unnecessary change. Small bytes variability that is not asserted in Pl
Agree, but may keep the (harmless) change just to minimize any noise in diffs.



--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 7
Gerrit-Owner: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Apr 2024 03:39:21 +0000
Gerrit-HasComments: Yes

Reply via email to