Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 )
Change subject: IMPALA-11604 Planner changes for CPU usage ...................................................................... Patch Set 48: (3 comments) http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138 PS47, Line 138: The costing algorithm splits a query fragment into several segments : divided by blocking PlanNode/DataSink boundary. Each fragment segment is : a subtree of PlanNodes/DataSink in the fragment with a DataSink or > A list implies the linear structure among blocking segments. Not sure it ca That is actually a good point, thank you. Within a plan fragment, there are 2 possible branching point: Join and Union. For Join, the build is on separate fragment making the structure linear within the fragment. But for Union, it is more complicated. I'll think more about this. http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154 PS47, Line 154: 100] > Yes, AGGREGATE of 12 is blocking. There is a significance in treating DataSink equally as PlanNode. When comparing produce-consume rate with the fragment above it, we take the Output Processing cost (segment 4 here) of Producer fragment to compare against Consumer fragment cost. In this example, the whole F03 seems to be slow because of its 3 blocking operator. However, above TOP-N, the row transmission is fast since nothing but DataStreamSink is active serializing and transmitting RowBatch. Simply merging the cost of DataStreamSink will cause the Consumer fragment (fragment above F03) to think that F03 is slow and transmitting little-by-little in steady rate. This can lead to Consumer fragment mistakenly lower its parallelism, thinking that it can consume faster than the Producer below it can send rows. But the truth is that it may spent long time up to completion of TOP-N and then quickly transmitting all the N rows above. The importance of this equal treatment is more apparent in Pre Aggregation and Final Aggregation fragment relationship, say the F00 below this F03 12:AGGREGATE [FINALIZE] | 11:EXCHANGE [HASH(i_class)] | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=12 05:AGGREGATE [STREAMING] 05:AGGREGATE may be slow to pre-aggregate. But once it complete, the row transmission by DataStreamSink of F00 is fast. Merging DataStreamSink cost of F00 into 05:AGGREGATE can mistakenly lower parallelism of F03. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263 PS43, Line 263: return deferredBatchQueueSize; > Still doesn't seem right to divide cost by row batch size. The compute cost I intended this to be a serialization/deserialization cost per row. -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 48 Gerrit-Owner: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Tue, 14 Feb 2023 21:23:35 +0000 Gerrit-HasComments: Yes