Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................


Patch Set 44:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@209
PS43, Line 209:
> Should use THREADS instead of INSTANCES in naming if this is sizing local t
Done


http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213
PS43, Line 213: backend-gflag-util.cc.
> We can rework this to be used as a starting count. However, I do think that
Done


http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@244
PS43, Line 244:
> Add
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@270
PS43, Line 270:     if (ContainsUnionNode(fragment_state->fragment.plan)
> Add comment/TODO to explain why these cases are skipped.
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc@1071
PS43, Line 1071:         if (result != StringParser::PARSE_SUCCESS || max_num < 
1 || max_num > 128) {
> Change INSTANCES to THREADS
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc@1077
PS43, Line 1077:         break;
> 64 may be tool low of a limit. 96 and 128 core CPUs are becoming more commo
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@203
PS43, Line 203:     "Valid value is > 0.0 and <= 10.0. Default value is 1.");
> The '(' sign before 0.0 meant to be an exclusion, while ']' sign after 10.0
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@205
PS43, Line 205: DEFINE_bool_hidden(processing_cost_equal_expr_weight, false,
> This default to true because I have not really explore if evaluation costin
I don't see major changes from setting this to false. Changed the default to 
false.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210
PS43, Line 210: // TODO: Benchmark and tune this config with an optimal value.
> The end goal of this flag it to cap maximum number of instances by requirin
Renamed this to min_input_rows_per_thread. It is now relied on number of input 
rows (number of rows coming in from exchange + scan + join build) rather than 
ProcessingCost unit.


http://gerrit.cloudera.org:8080/#/c/19033/38/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/38/common/thrift/Frontend.thrift@769
PS38, Line 769:   // The optional max_mem_limit to determine which executor 
group set to run for a query.
> This comment seems out of place?
Restored the old comment in newer patch set.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS43, Line 85:
> Make this a local variable instead.
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java
File fe/src/main/java/org/apache/impala/planner/CoreRequirement.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@59
PS43, Line 59:       total_ = counts_.stream().mapToInt(v -> v).sum();
> Will replace it with regular stream.
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@89
PS43, Line 89:   protected static CoreRequirement sum(List<CoreRequirement> 
cores) {
> Will change to protected.
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263:     return deferredBatchQueueSize;
> Any formula suggestion to replace this? If it is a fixed per-row cost, then
Changed the estimate cost per row to 1 / row batch size.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@273
PS43, Line 273: st.setNumRowToProdu
> This seems to be flawed for case of preaggregation streaming node. The impl
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@692
PS43, Line 692:   && explainLevel.ordinal() >= TExplainLevel.VERBOSE.ordinal()) 
{
              :       // Print processing cost.
> This is now redundant with fragment's input cardinality and the sizing is b
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java@415
PS43, Line 415:    * within it. The maximum cost between subtree is then 
selected as maximal processing
> This is printed to LOG to help with my initial research. We can delete this
Removed.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java@350
PS43, Line 350:         ExprUtil.computeExprsTotalCost(conjuncts_), 
rowMaterializationCost(queryOptions));
> Min work computation and max threads/instance should override here.
Removed. This whole method is unused.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java@369
PS43, Line 369:    * tuple.
> This should be more constant per row.
Changed the estimate cost per row to 1 / row batch size.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 44
Gerrit-Owner: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qfc...@hotmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Feb 2023 23:44:25 +0000
Gerrit-HasComments: Yes

Reply via email to