Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
......................................................................


Patch Set 13:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@582
PS7, Line 582: .getNumInstancesForCosting(
> As discussed offline, using the Analyzer min and max parallelism might not
Ack


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30
PS13, Line 30: public BaseProcessingCost(
             :       long cardinality, float exprsCost, float 
materializationCost) {
             :     // TODO: materializationCost accommodate ProcessingCost 
where row width should be
             :     // factor in. Currently, ProcessingCost of ScanNode, 
ExchangeNode, and DataStreamSink
             :     // has row width factored in through materialization 
parameter here. Investigate if
             :     // other operator need to have its row width factored in as 
well and whether we should
             :     // have specific 'rowWidth' parameter here.
Is this comment still valid, or do they need update?
Looks like this constructor can be removed simplified by removing 
materializationCost parameter.


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@77
PS13, Line 77: if (cardinality_ != 0L) { output.append(" 
cardinality=").append(cardinality_); }
             :     output.append(" total-cost=").append(totalCost_);
nit: Print total-cost first before cardinality since the latter is now optional.


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@117
PS13, Line 117: LOG.trace("Total CPU cost estimate: " + totalCost
              :           + ", Output Card: " + outputCardinality + ", Output 
Size: " + outputSize);
nit: Please log which branch from above that totalCost is calculated from.


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@230
PS13, Line 230: public static double getAvgDeserializedRowSize
Can this and getAvgSerializedRowSize() turned into non-static class method?
I notice getAvgSerializedRowSize() is used in DistributedPlanner.java, but not 
getAvgDeserializedRowSize().


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@277
PS13, Line 277: LOG.trace("Total CPU cost estimate: " + totalCost
              :           + ", Input Card: " + inputCardinality + ", Input 
Size: " + inputSize);
nit: Please log which branch from above that totalCost is calculated from.


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@342
PS13, Line 342: double totalProbeCost =
              :         (getProbeCardinalityForCosting() * 
COST_COEFFICIENT_PROBE_INPUT)
              :         + (getCardinality() * 
COST_COEFFICIENT_HASH_JOIN_OUTPUT);
Is conjunct evaluation not counted anymore towards probing cost?


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2226
PS13, Line 2226: getAvgRowSize() - getRowPadSize()
nit: create and use getAvgRowSizeWithoutPad().


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@162
PS13, Line 162: inputNode.getAvgRowSize() - inputNode.getRowPadSize()
nit: Create and use getAvgRowSizeWithoutPad() in PlanNode.java (see my comment 
there).


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@179
PS13, Line 179: LOG.trace("HdfsTableSink insert CPU cost estimate: " + 
totalCost + ", Cardinality: "
              :           + cardinality + ", Estimated Bytes Inserted: " + 
estBytesInserted);
nit: Please log which branch from above that totalCost is calculated from.


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
File fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java@117
PS13, Line 117: buildCardinality / fragment_.getNumInstancesForCosting()
Does this require Math.ceil()?


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@251
PS13, Line 251: public float getAvgRowSize() { return avgRowSize_; }
              :   public float getRowPadSize() { return rowPadSize_; }
Consider add the following and use it when relevant:

public float getAvgRowSizeWithoutPad() { return Math.max(0.0, avgRowSize_ - 
rowPadSize_); }


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@88
PS13, Line 88: public static ProcessingCost basicCost(
             :       String label, long cardinality, float exprsCost, float 
materializationCost) {
Looks like only ScanNode.java call this method. Consider removing this and 
replace the one in ScanNode.java to call computeValidBaseCost(double totalCost) 
instead.


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@98
PS13, Line 98: computeValidBaseCost(cardinality, exprsCost, 0)
Consider replacing this with computeValidBaseCost(double totalCost).


http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/13/fe/src/main/java/org/apache/impala/planner/UnionNode.java@175
PS13, Line 175: LOG.trace("Union CPU cost estimate: " + totalCost
              :           + ", estBytesMaterialized: " + estBytesMaterialized
              :           + ", totalMaterializedCardinality: " + 
totalMaterializedCardinality);
nit: log resultExprLists_.size().



-- 
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 13
Gerrit-Owner: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Apr 2024 14:43:20 +0000
Gerrit-HasComments: Yes

Reply via email to