Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 )
Change subject: IMPALA-11604 Planner changes for CPU usage ...................................................................... Patch Set 47: (5 comments) http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@258 PS46, Line 258: 03, F04]. : : A CoreRequirement > The problem is with the sum((F05+F02), (F06+F01)) used for HJ04, as it assu On the contrary, I'm thinking the max() in the formula is intended to signify that the probe (subtree rooted at F00) can not run in parallel with the builds (2 subtrees rooted at F05 and F6). The two build fragments, however, can run in parallel (in MT_DOP, they are in separate fragment from the HJ node), hence the sum(). The two build subtrees need 8 cores in total. The probe subtree need 12 cores (dictated by the 00:SCAN that still follow MT_DOP rules). Since probe and builds can not run at the same time, the max(12, 8) is taken as the CoreRequirement for subtree rooted at F00. http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138 PS47, Line 138: A query fragment has a processingCosts_ field, a list of ProcessingCost : associated with the fragment. Each element of processingCosts_ may : solely come from a single PlanNode or sum of adjacent PlanNode costs. > This para probably should be reworded as follows and moved after the para e I will rearrange this paragraph and clarify a bit more about "segment" mentioned here. Now I realize that in the example I added below, the last segment that contains only DataStreamSink that is not a blocking DataSink, therefore it does not really fit the definition of "blocking segment" written here. I should probably just call it "segment" instead of "blocking segment". I don't see the need to add new field totalFragmentProcessingCost_ and blockingSegmentProcessingCosts_ map in the code, since processingCosts_ list is sufficient for iteration in PlanFragment.tryLowerParallelism(). But I understand the confusion of what the list trying to do. Maybe renaming it to segmentProcessingCosts_ will be clearer? Please let me know if it is coherent enough after my rewrite. http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154 PS47, Line 154: 34550429, 2159270, 23752870, 1 > There exist two blocking segments in F03: To me, it should be: There exist four segments in F03: 1. 12(11) 2. 06 3. 08(07) 4. F03sink Therefore we have PC(segment 1) = 2109+34548320 PC(segment 2) = 2159270 PC(segment 3) = 23751970+900 PC(segment 4) = 1 http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@258 PS47, Line 258: (F05, F02), (F06, F01) > These two do not follow the definition of a blocking subtree in that both r F06 and F05 has JoinBuildSink (a special kind of DataSink that is blocking), hence these fragements are blocking. I will update line 252 to mention blocking DataSink as well. http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@269 PS47, Line 269: successor > not clear. You mean previous? Right, "previous" sounds better. Will update this. -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 47 Gerrit-Owner: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Tue, 14 Feb 2023 17:43:37 +0000 Gerrit-HasComments: Yes