Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13307 )

Change subject: IMPALA-8536: Add Scalable Pool Configuration to Admission 
Controller.
......................................................................


Patch Set 13:

(32 comments)

Mostly "dotting the i's", but had some questions around the admission 
controller logic

http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/runtime/exec-env.cc@214
PS13, Line 214:           request_pool_service_.get(), metrics_.get(), 
configured_backend_address_));
nit: continue indent at 4 spaces


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc
File be/src/scheduling/admission-controller-test.cc:

http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc@50
PS13, Line 50: long
nit: int64_t


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc@98
PS13, Line 98:       // say that every host has 200MB, enough that this isn't a 
problem.
This comment might not be true if the caller passed a different value


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc@236
PS13, Line 236:   auto fair_flag = ScopedFlagSetter<string>::Make(
If you use a FlagSaver in the test class, this can be simplified to just 
writing to the flags. See 
https://github.com/apache/impala/blob/master/be/src/runtime/io/data-cache-test.cc#L120
 for an example


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@30
PS13, Line 30: #include "cluster-membership-mgr.h"
Add directory, sort into list below


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@395
PS13, Line 395: Update
nit: Updates (here and below)


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@660
PS13, Line 660: with the given schedule
Stale comment?


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@660
PS13, Line 660: Return
Returns (see surrounding code and style guide)


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@667
PS13, Line 667: with the given
              :   /// schedule.
Stale comment?


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@677
PS13, Line 677: at once
"at once" seems a bit murky, but I understand that there's no "per time 
interval" limitation here. I actually suspect that the whole max_to_dequeue 
limitation has no effect and we might want to remove it altogether. Thoughts?


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@694
PS13, Line 694: can run be queued
typo


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@694
PS13, Line 694: with the
              :   /// given schedule.
stale comment?


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@422
PS13, Line 422: (
other places have a space before the opening (


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@492
PS13, Line 492:                  schedule, pool_cfg, cluster_size, 
not_admitted_reason)) {
nit: The indent looks strange, wrap the whole block in {} if the "if" condition 
doesn't fit into a single line?


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@587
PS13, Line 587:               PrintBytes(max_mem), 
GetMaxMemForPoolDescription(pool_cfg, cluster_size));
nit: continue at 4 spaces (see above)


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1037
PS13, Line 1037:                    *schedule, pool_config, cluster_size, true, 
&not_admitted_reason)) {
nit: continue indent at 4 spaces


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1072
PS13, Line 1072:                 GetMaxRequestsForPoolDescription(pool_config, 
cluster_size),
nit: fix indent


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1083
PS13, Line 1083:     DCHECK(queue_size_ratio <= 1.0);
I suspect that this DCHECK could fail if there's a local inrush of queries 
before the stats get updated. Should we change the max() in the line before to 
max(local_queued, agg_queued)?


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1084
PS13, Line 1084:     // TODO: Floating point arithmetic may result in dequeuing 
one less request than
I think this comment might be wrong/stale: if local_num_queued == 
agg_num_queued, then queue_size_ratio == 1.0 and there shouldn't be any 
floating point weirdness. I stared at this for a while but couldn't figure out 
how the comment can be correct.


http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1443
PS13, Line 1443: configured cluster-wide
"configured statically" (the other case is also a cluster-wide number)?


http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/ImpalaInternalService.thrift@692
PS13, Line 692: number of executors
Isn't this the total number of backends, e.g. dedicated coordinators would also 
be included?


http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/ImpalaInternalService.thrift@695
PS13, Line 695:
nit: no space before the :


http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json@83
PS13, Line 83: number of executors
"number of backends" (see comment in ImpalaInternalService.thrift)


http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json@88
PS13, Line 88: NONE
Is there a benefit/reason to use NONE over UNIT for counts?


http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json@120
PS13, Line 120: max_running_queries_derived
All other metrics (in this part of the file) use dashes as a separator. Should 
we follow that pattern here?


http://gerrit.cloudera.org:8080/#/c/13307/13/fe/src/main/java/org/apache/impala/util/RequestPoolService.java
File fe/src/main/java/org/apache/impala/util/RequestPoolService.java:

http://gerrit.cloudera.org:8080/#/c/13307/13/fe/src/main/java/org/apache/impala/util/RequestPoolService.java@411
PS13, Line 411:               + "max_queued={},  queue_timeout_ms={}, 
default_query_options={},"
nit: whitespace consistent at the start or end of the strings?


http://gerrit.cloudera.org:8080/#/c/13307/13/fe/src/main/java/org/apache/impala/util/RequestPoolService.java@459
PS13, Line 459:   private double getPoolConfigValue(
Overloading these by the type of defaultValue looks like it could introduce 
very subtle bugs, e.g. by forgetting the ".0". Should we name it 
getPoolConfigDoubleValue instead?


http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@965
PS13, Line 965: 85M
Can you add a comment to the test that explains why the memory sizes change 
throughout the test?


http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@1024
PS13, Line 1024:   def check_admission_by_counts(self, expected_num_impalads):
Prepend with __ (or remove from previous method)?


http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@1045
PS13, Line 1045: 100
does this assume that all queries get to admission control within 100ms? Should 
we bump this a bit higher (e.g. 1s) to make it less racy/flaky on less 
predictable test environments (cloud, asan)?


http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@1122
PS13, Line 1122:                  if "Query Status:" in line]
nit: indent


http://gerrit.cloudera.org:8080/#/c/13307/13/tests/webserver/test_web_pages.py
File tests/webserver/test_web_pages.py:

http://gerrit.cloudera.org:8080/#/c/13307/13/tests/webserver/test_web_pages.py@508
PS13, Line 508:     assert pool_config['max_running_queries_derived']
This tripped me up, maybe add "> 0" for these three?



--
To view, visit http://gerrit.cloudera.org:8080/13307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If47508728124076f3b9200c27cffc989f7a4f188
Gerrit-Change-Number: 13307
Gerrit-PatchSet: 13
Gerrit-Owner: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Jun 2019 23:34:14 +0000
Gerrit-HasComments: Yes

Reply via email to