Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................


Patch Set 48:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138
PS47, Line 138: The costing algorithm splits a query fragment into several 
segments
              : divided by blocking PlanNode/DataSink boundary. Each fragment 
segment is
              : a subtree of PlanNodes/DataSink in the fragment with a DataSink 
or
> A list implies the linear structure among blocking segments. Not sure it ca
That is actually a good point, thank you.
Within a plan fragment, there are 2 possible branching point: Join and Union. 
For Join, the build is on separate fragment making the structure linear within 
the fragment. But for Union, it is more complicated. I'll think more about this.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154
PS47, Line 154: 100]
> Yes, AGGREGATE of 12 is blocking.
There is a significance in treating DataSink equally as PlanNode. When 
comparing produce-consume rate with the fragment above it, we take the Output 
Processing cost (segment 4 here) of Producer fragment to compare against 
Consumer fragment cost.

In this example, the whole F03 seems to be slow because of its 3 blocking 
operator. However, above TOP-N, the row transmission is fast since nothing but 
DataStreamSink is active serializing and transmitting RowBatch. Simply merging 
the cost of DataStreamSink will cause the Consumer fragment (fragment above 
F03) to think that F03 is slow and transmitting little-by-little in steady 
rate. This can lead to Consumer fragment mistakenly lower its parallelism, 
thinking that it can consume faster than the Producer below it can send rows. 
But the truth is that it may spent long time up to completion of TOP-N and then 
quickly transmitting all the N rows above.

The importance of this equal treatment is more apparent in Pre Aggregation and 
Final Aggregation fragment relationship, say the F00 below this F03

12:AGGREGATE [FINALIZE]
|
11:EXCHANGE [HASH(i_class)]
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
05:AGGREGATE [STREAMING]

05:AGGREGATE may be slow to pre-aggregate. But once it complete, the row 
transmission by DataStreamSink of F00 is fast. Merging DataStreamSink cost of 
F00 into 05:AGGREGATE can mistakenly lower parallelism of F03.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263:     return deferredBatchQueueSize;
> Still doesn't seem right to divide cost by row batch size. The compute cost
I intended this to be a serialization/deserialization cost per row.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Feb 2023 21:23:35 +0000
Gerrit-HasComments: Yes

Reply via email to