Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 )
Change subject: IMPALA-11604 Planner changes for CPU usage ...................................................................... Patch Set 48: (4 comments) http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213 PS43, Line 213: Effective parallelism of a query is the maximum upper bound of CPU core > We can rework this to be used as a starting count. However, I do think that The upper bound for each fragment should be the number of threads or something close to that. We shouldn't cap it otherwise unless we are seeing specific operators that can't scale linearly and in that case the operator costing can bound further. http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@552 PS43, Line 552: *state->GetFragmentScheduleState(fragment_state->exchange_input_fragments[0]); > Not always. This is the correct assignment if IsExceedMaxFsWriter return fa Done http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210 PS43, Line 210: > Renamed this to min_input_rows_per_thread. It is now relied on number of in Would rather keep this as cost as cost of a row is a highly variable metric. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263 PS43, Line 263: return deferredBatchQueueSize; > Changed the estimate cost per row to 1 / row batch size. Still doesn't seem right to divide cost by row batch size. The compute cost per row should be fairly constant. Are you trying to express the network bandwidth and latency? Latency can probably be assumed to be amortized by the row batch and ignored while bandwidth cost will be constant per row. We would need some factor to connect cost units to wall time for bandwidth calculations. -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 48 Gerrit-Owner: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Tue, 14 Feb 2023 19:56:35 +0000 Gerrit-HasComments: Yes