Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 )
Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator ...................................................................... Patch Set 13: (15 comments) http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@582 PS7, Line 582: .getNumInstancesForCosting( > As discussed offline, using the Analyzer min and max parallelism might not Ack http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30 PS13, Line 30: public BaseProcessingCost( : long cardinality, float exprsCost, float materializationCost) { : // TODO: materializationCost accommodate ProcessingCost where row width should be : // factor in. Currently, ProcessingCost of ScanNode, ExchangeNode, and DataStreamSink : // has row width factored in through materialization parameter here. Investigate if : // other operator need to have its row width factored in as well and whether we should : // have specific 'rowWidth' parameter here. Is this comment still valid, or do they need update? Looks like this constructor can be removed simplified by removing materializationCost parameter. http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@77 PS13, Line 77: if (cardinality_ != 0L) { output.append(" cardinality=").append(cardinality_); } : output.append(" total-cost=").append(totalCost_); nit: Print total-cost first before cardinality since the latter is now optional. http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@117 PS13, Line 117: LOG.trace("Total CPU cost estimate: " + totalCost : + ", Output Card: " + outputCardinality + ", Output Size: " + outputSize); nit: Please log which branch from above that totalCost is calculated from. http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@230 PS13, Line 230: public static double getAvgDeserializedRowSize Can this and getAvgSerializedRowSize() turned into non-static class method? I notice getAvgSerializedRowSize() is used in DistributedPlanner.java, but not getAvgDeserializedRowSize(). http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@277 PS13, Line 277: LOG.trace("Total CPU cost estimate: " + totalCost : + ", Input Card: " + inputCardinality + ", Input Size: " + inputSize); nit: Please log which branch from above that totalCost is calculated from. http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@342 PS13, Line 342: double totalProbeCost = : (getProbeCardinalityForCosting() * COST_COEFFICIENT_PROBE_INPUT) : + (getCardinality() * COST_COEFFICIENT_HASH_JOIN_OUTPUT); Is conjunct evaluation not counted anymore towards probing cost? http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2226 PS13, Line 2226: getAvgRowSize() - getRowPadSize() nit: create and use getAvgRowSizeWithoutPad(). http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@162 PS13, Line 162: inputNode.getAvgRowSize() - inputNode.getRowPadSize() nit: Create and use getAvgRowSizeWithoutPad() in PlanNode.java (see my comment there). http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@179 PS13, Line 179: LOG.trace("HdfsTableSink insert CPU cost estimate: " + totalCost + ", Cardinality: " : + cardinality + ", Estimated Bytes Inserted: " + estBytesInserted); nit: Please log which branch from above that totalCost is calculated from. http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java File fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java@117 PS13, Line 117: buildCardinality / fragment_.getNumInstancesForCosting() Does this require Math.ceil()? http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@251 PS13, Line 251: public float getAvgRowSize() { return avgRowSize_; } : public float getRowPadSize() { return rowPadSize_; } Consider add the following and use it when relevant: public float getAvgRowSizeWithoutPad() { return Math.max(0.0, avgRowSize_ - rowPadSize_); } http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@88 PS13, Line 88: public static ProcessingCost basicCost( : String label, long cardinality, float exprsCost, float materializationCost) { Looks like only ScanNode.java call this method. Consider removing this and replace the one in ScanNode.java to call computeValidBaseCost(double totalCost) instead. http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@98 PS13, Line 98: computeValidBaseCost(cardinality, exprsCost, 0) Consider replacing this with computeValidBaseCost(double totalCost). http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/UnionNode.java@175 PS13, Line 175: LOG.trace("Union CPU cost estimate: " + totalCost : + ", estBytesMaterialized: " + estBytesMaterialized : + ", totalMaterializedCardinality: " + totalMaterializedCardinality); nit: log resultExprLists_.size(). -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 13 Gerrit-Owner: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Wed, 17 Apr 2024 14:43:20 +0000 Gerrit-HasComments: Yes