Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................


Patch Set 47:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@258
PS46, Line 258: 03, F04].
              :
              : A CoreRequirement
> The problem is with the sum((F05+F02), (F06+F01)) used for HJ04, as it assu
On the contrary, I'm thinking the max() in the formula is intended to signify 
that the probe (subtree rooted at F00) can not run in parallel with the builds 
(2 subtrees rooted at F05 and F6). The two build fragments, however, can run in 
parallel (in MT_DOP, they are in separate fragment from the HJ node), hence the 
sum().

The two build subtrees need 8 cores in total.
The probe subtree need 12 cores (dictated by the 00:SCAN that still follow 
MT_DOP rules).
Since probe and builds can not run at the same time, the max(12, 8) is taken as 
the CoreRequirement for subtree rooted at F00.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138
PS47, Line 138: A query fragment has a processingCosts_ field, a list of 
ProcessingCost
              : associated with the fragment. Each element of processingCosts_ 
may
              : solely come from a single PlanNode or sum of adjacent PlanNode 
costs.
> This para probably should be reworded as follows and moved after the para e
I will rearrange this paragraph and clarify a bit more about "segment" 
mentioned here.
Now I realize that in the example I added below, the last segment that contains 
only DataStreamSink that is not a blocking DataSink, therefore it does not 
really fit the definition of "blocking segment" written here. I should probably 
just call it "segment" instead of "blocking segment".

I don't see the need to add new field totalFragmentProcessingCost_ and 
blockingSegmentProcessingCosts_ map in the code, since processingCosts_ list is 
sufficient for iteration in PlanFragment.tryLowerParallelism(). But I 
understand the confusion of what the list trying to do. Maybe renaming it to 
segmentProcessingCosts_ will be clearer? Please let me know if it is coherent 
enough after my rewrite.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154
PS47, Line 154: 34550429, 2159270, 23752870, 1
> There exist two blocking segments in F03:
To me, it should be:

There exist four segments in F03:

1. 12(11)
2. 06
3. 08(07)
4. F03sink

Therefore we have
PC(segment 1) = 2109+34548320
PC(segment 2) = 2159270
PC(segment 3) = 23751970+900
PC(segment 4) = 1


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@258
PS47, Line 258: (F05, F02), (F06, F01)
> These two do not follow the definition of a blocking subtree in that both r
F06 and F05 has JoinBuildSink (a special kind of DataSink that is blocking), 
hence these fragements are blocking.
I will update line 252 to mention blocking DataSink as well.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@269
PS47, Line 269: successor
> not clear. You mean previous?
Right, "previous" sounds better. Will update this.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 47
Gerrit-Owner: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Feb 2023 17:43:37 +0000
Gerrit-HasComments: Yes

Reply via email to