Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 )
Change subject: IMPALA-11604 Planner changes for CPU usage ...................................................................... Patch Set 44: (19 comments) http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@209 PS43, Line 209: > Should use THREADS instead of INSTANCES in naming if this is sizing local t Done http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213 PS43, Line 213: backend-gflag-util.cc. > We can rework this to be used as a starting count. However, I do think that Done http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@244 PS43, Line 244: > Add Done http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@270 PS43, Line 270: if (ContainsUnionNode(fragment_state->fragment.plan) > Add comment/TODO to explain why these cases are skipped. Done http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc@1071 PS43, Line 1071: if (result != StringParser::PARSE_SUCCESS || max_num < 1 || max_num > 128) { > Change INSTANCES to THREADS Done http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc@1077 PS43, Line 1077: break; > 64 may be tool low of a limit. 96 and 128 core CPUs are becoming more commo Done http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@203 PS43, Line 203: "Valid value is > 0.0 and <= 10.0. Default value is 1."); > The '(' sign before 0.0 meant to be an exclusion, while ']' sign after 10.0 Done http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@205 PS43, Line 205: DEFINE_bool_hidden(processing_cost_equal_expr_weight, false, > This default to true because I have not really explore if evaluation costin I don't see major changes from setting this to false. Changed the default to false. http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210 PS43, Line 210: // TODO: Benchmark and tune this config with an optimal value. > The end goal of this flag it to cap maximum number of instances by requirin Renamed this to min_input_rows_per_thread. It is now relied on number of input rows (number of rows coming in from exchange + scan + join build) rather than ProcessingCost unit. http://gerrit.cloudera.org:8080/#/c/19033/38/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/19033/38/common/thrift/Frontend.thrift@769 PS38, Line 769: // The optional max_mem_limit to determine which executor group set to run for a query. > This comment seems out of place? Restored the old comment in newer patch set. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85 PS43, Line 85: > Make this a local variable instead. Done http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java File fe/src/main/java/org/apache/impala/planner/CoreRequirement.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@59 PS43, Line 59: total_ = counts_.stream().mapToInt(v -> v).sum(); > Will replace it with regular stream. Done http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@89 PS43, Line 89: protected static CoreRequirement sum(List<CoreRequirement> cores) { > Will change to protected. Done http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263 PS43, Line 263: return deferredBatchQueueSize; > Any formula suggestion to replace this? If it is a fixed per-row cost, then Changed the estimate cost per row to 1 / row batch size. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@273 PS43, Line 273: st.setNumRowToProdu > This seems to be flawed for case of preaggregation streaming node. The impl Done http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@692 PS43, Line 692: && explainLevel.ordinal() >= TExplainLevel.VERBOSE.ordinal()) { : // Print processing cost. > This is now redundant with fragment's input cardinality and the sizing is b Done http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java@415 PS43, Line 415: * within it. The maximum cost between subtree is then selected as maximal processing > This is printed to LOG to help with my initial research. We can delete this Removed. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java@350 PS43, Line 350: ExprUtil.computeExprsTotalCost(conjuncts_), rowMaterializationCost(queryOptions)); > Min work computation and max threads/instance should override here. Removed. This whole method is unused. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java@369 PS43, Line 369: * tuple. > This should be more constant per row. Changed the estimate cost per row to 1 / row batch size. -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 44 Gerrit-Owner: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Tue, 07 Feb 2023 23:44:25 +0000 Gerrit-HasComments: Yes