Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3748: minimum buffer requirements in planner
......................................................................


Patch Set 13:

(24 comments)

Addressed most of the comments. I need to investigate why we're getting 
incorrect values for some of the parallel plans.

http://gerrit.cloudera.org:8080/#/c/5847/13/be/src/scheduling/query-schedule.cc
File be/src/scheduling/query-schedule.cc:

Line 212
> do we still need this?
It wasn't reachable as a result of other changes since we now compute estimates 
for all queries, rather than omitting estimates in a fairly arbitrary way.


http://gerrit.cloudera.org:8080/#/c/5847/13/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

Line 396:   // Add info strings consumed by CM: Estimated mem and tables 
missing stats.
> update
It wasn't clear that it made sense to include the reservation in this comment, 
since we we don't have anything consuming it.


Line 402:   if (query_exec_request.__isset.per_host_min_reservation_bytes) {
> can we ditch the _bytes everywhere and declare it's in bytes?
Done. It's already documented in the thrift file that it's bytes.


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

Line 252:     long perInstanceMinBufferBytes = 2 * SPILLABLE_BUFFER_BYTES;
> is this covered by a test case?
We have a planner test in resource-requirements.test.


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 103:   private final static int MAX_THREAD_TOKENS_PER_CORE = 3;
> no idea where that's from, but it seems wrong.
It's the default value of the --num_threads_per_core command-line option, which 
is multiplied by the number of cores to get the number of thread tokens. Agree 
there are some problems with this approach so thought it was best to leave 
as-is and document it.

Added to the comment to document the relationship between this and the flag.


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

Line 80:     long numPartitions =
> rename to perinstance-
Done


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 268:     // TODO: add an estimate
> remove todo (we already have a todo to remove the estimates)
Done


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
File fe/src/main/java/org/apache/impala/planner/KuduTableSink.java:

Line 60:     if (explainLevel.ordinal() >= TExplainLevel.EXTENDED.ordinal()) {
> this if block is in every data sink's explainstring. move it into a functio
Refactored to use the same pattern as PlanNode, where 
DataSink.getExplainString() calls a protected method to get the 
subclass-specific explain string.

We also don't need the number of hosts/instances, since it's already reported 
in the fragment header, so I removed that.


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java:

Line 85:     long minBufferBytes = 0L;
> minReservationBytes
Done


Line 93:       node.computeResourceProfile(queryOptions);
> rather than calling this here, call it after creating plan fragments.
Moved to PlanFragment.finalize() - that seemed like a logical place.


Line 126:       sink.computeResourceProfile(queryOptions);
> same here
Done


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

Line 202:   public int getNumNodes() {
> do we still need this?
It's still used in a couple of places - e.g. costing shuffle vs broadcast.


Line 228:     * Estimates the per-fragment-instance number of distinct values 
of exprs based on the
> not super-important, but: should it be "per-fragment instance number"?
Reworded the sentence to avoid weird hyphenation.


Line 297:       str.append(String.format("%s:PLAN FRAGMENT [%s]", 
fragmentId_.toString(),
> also call getfragmentheaderstring() here
Done. I guess the format below wasn't consistent with the pre-existing format, 
so I fixed that too.


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 276:       TQueryOptions queryOptions, TExplainLevel detailLevel) {
> tqueryoptions already contains the explain level. either pass in mt_dop exp
Passing in detailLevel seems to be a deliberate thing so that the diagnostics 
in some error messages is independent of the explain_level provided by the 
client.


Line 621:    * Compute resources consumed when executing this PlanNode.
> also mention that you're populating resourceProfile_
Done


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 344
> this disappeared (but the comment for it didn't)
Oops


Line 361:     long perHostMinReservationBytes = -1;
> remove -bytes
Done


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

Line 33:   private final long memBytesEstimate_;
> memEstimateBytes_
Done


PS13, Line 38: minBufferBytes
Fixed these names.


http://gerrit.cloudera.org:8080/#/c/5847/13/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

Line 241:     long perInstanceMinBufferBytes = 3 * SPILLABLE_BUFFER_BYTES;
> -reservation instead of bufferbytes
Done


http://gerrit.cloudera.org:8080/#/c/5847/13/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test:

Line 21: ---- PARALLELPLANS
> you should set mt_dop explicitly if you want to test this.
The resource values are different - it's testing to make sure that the values 
are extrapolated by the number of instances.

Using the PARALLELPLANS infrastructure seemed more convenient than reinventing 
a different way to set mt_dop.

I could add single-node plans easily enough - do you think those are 
interesting? I suppose it would be good to have the coverage.


Line 504: Per-Host Resource Reservation: Memory=0B
> what happened here?
Looks like a bug. I'll have to investigate what's going on.


Line 871: Per-Host Resource Reservation: Memory=0B
> ?
Same here - looks like a bug.


-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to