David Rorke has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 )
Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator ...................................................................... Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@72 PS7, Line 72: private static final double AGG_EXPR_COST_COEFFICIENT = 0.07; : // Cost per input row with single grouping expr with single agg expr : private static final double AGG_INPUT_SINGLE_GROUP_COST_COEFFICIENT = 0.2925; : // Cost per output row produced with single grouping expr and single agg expr : private static final double AGG_OUTPUT_SINGLE_GROUP_COST_COEFFICIENT = 2.6072; : // Cost per input row with multiple grouping exprs and single agg expr : private static final double AGG_INPUT_MULTI_GROUP_COST_COEFFICIENT = 1.3741; : // Cost per output row produced with multiple grouping exprs and single agg expr : private static final double AGG_OUTPUT_MULTI_GROUP_COST_COEFFICIENT = 4.5285; > nit: Personally, I prefer related constant names to have common prefix, so Done http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@582 PS7, Line 582: fragment_.getNumInstances() > I'm not sure about using getNumInstances() here. traverseEffectiveParalleli As discussed offline, using the Analyzer min and max parallelism might not be better than what we currently have. Agree we could potentially come up with a better initial estimate here but needs some careful thought because we don't want to lose the useful info that the current code already captures (e.g in HdfsScanNode.computeNumNodes()). For now I just changed this to call a wrapper getNumInstancesForCosting() that currently just calls getNumInstances() but we can try to improve this in the future. http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java File fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java@116 PS7, Line 116: fragment_.getNumInstances() > Another chicken-and-egg problem of using getNumInstances() before traverseE See reply in AggregationNode.java http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@633 PS7, Line 633: getNumInstances > Another chicken-and-egg problem of using getNumInstances() before traverseE See reply in AggregationNode.java http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@132 PS7, Line 132: HDFS partitions=1824/1824 files=1824 size=199.18MB : stored statistics: : table: rows=2.88M size=199.18MB > nit: unnecessary change. Small bytes variability that is not asserted in Pl Agree, but may keep the (harmless) change just to minimize any noise in diffs. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 7 Gerrit-Owner: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Wed, 17 Apr 2024 03:39:21 +0000 Gerrit-HasComments: Yes