Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................


Patch Set 48:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213
PS43, Line 213: Effective parallelism of a query is the maximum upper bound of 
CPU core
> We can rework this to be used as a starting count. However, I do think that
The upper bound for each fragment should be the number of threads or something 
close to that. We shouldn't cap it otherwise unless we are seeing specific 
operators that can't scale linearly and in that case the operator costing can 
bound further.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@552
PS43, Line 552:       
*state->GetFragmentScheduleState(fragment_state->exchange_input_fragments[0]);
> Not always. This is the correct assignment if IsExceedMaxFsWriter return fa
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210
PS43, Line 210:
> Renamed this to min_input_rows_per_thread. It is now relied on number of in
Would rather keep this as cost as cost of a row is a highly variable metric.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263:     return deferredBatchQueueSize;
> Changed the estimate cost per row to 1 / row batch size.
Still doesn't seem right to divide cost by row batch size. The compute cost per 
row should be fairly constant. Are you trying to express the network bandwidth 
and latency? Latency can probably be assumed to be amortized by the row batch 
and ignored while bandwidth cost will be constant per row. We would need some 
factor to connect cost units to wall time for bandwidth calculations.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Feb 2023 19:56:35 +0000
Gerrit-HasComments: Yes

Reply via email to