Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/13307 )
Change subject: IMPALA-8536: Add Scalable Pool Configuration to Admission Controller. ...................................................................... Patch Set 13: (32 comments) Mostly "dotting the i's", but had some questions around the admission controller logic http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/runtime/exec-env.cc@214 PS13, Line 214: request_pool_service_.get(), metrics_.get(), configured_backend_address_)); nit: continue indent at 4 spaces http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc File be/src/scheduling/admission-controller-test.cc: http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc@50 PS13, Line 50: long nit: int64_t http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc@98 PS13, Line 98: // say that every host has 200MB, enough that this isn't a problem. This comment might not be true if the caller passed a different value http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller-test.cc@236 PS13, Line 236: auto fair_flag = ScopedFlagSetter<string>::Make( If you use a FlagSaver in the test class, this can be simplified to just writing to the flags. See https://github.com/apache/impala/blob/master/be/src/runtime/io/data-cache-test.cc#L120 for an example http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@30 PS13, Line 30: #include "cluster-membership-mgr.h" Add directory, sort into list below http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@395 PS13, Line 395: Update nit: Updates (here and below) http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@660 PS13, Line 660: with the given schedule Stale comment? http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@660 PS13, Line 660: Return Returns (see surrounding code and style guide) http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@667 PS13, Line 667: with the given : /// schedule. Stale comment? http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@677 PS13, Line 677: at once "at once" seems a bit murky, but I understand that there's no "per time interval" limitation here. I actually suspect that the whole max_to_dequeue limitation has no effect and we might want to remove it altogether. Thoughts? http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@694 PS13, Line 694: can run be queued typo http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.h@694 PS13, Line 694: with the : /// given schedule. stale comment? http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@422 PS13, Line 422: ( other places have a space before the opening ( http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@492 PS13, Line 492: schedule, pool_cfg, cluster_size, not_admitted_reason)) { nit: The indent looks strange, wrap the whole block in {} if the "if" condition doesn't fit into a single line? http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@587 PS13, Line 587: PrintBytes(max_mem), GetMaxMemForPoolDescription(pool_cfg, cluster_size)); nit: continue at 4 spaces (see above) http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1037 PS13, Line 1037: *schedule, pool_config, cluster_size, true, ¬_admitted_reason)) { nit: continue indent at 4 spaces http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1072 PS13, Line 1072: GetMaxRequestsForPoolDescription(pool_config, cluster_size), nit: fix indent http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1083 PS13, Line 1083: DCHECK(queue_size_ratio <= 1.0); I suspect that this DCHECK could fail if there's a local inrush of queries before the stats get updated. Should we change the max() in the line before to max(local_queued, agg_queued)? http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1084 PS13, Line 1084: // TODO: Floating point arithmetic may result in dequeuing one less request than I think this comment might be wrong/stale: if local_num_queued == agg_num_queued, then queue_size_ratio == 1.0 and there shouldn't be any floating point weirdness. I stared at this for a while but couldn't figure out how the comment can be correct. http://gerrit.cloudera.org:8080/#/c/13307/13/be/src/scheduling/admission-controller.cc@1443 PS13, Line 1443: configured cluster-wide "configured statically" (the other case is also a cluster-wide number)? http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/ImpalaInternalService.thrift@692 PS13, Line 692: number of executors Isn't this the total number of backends, e.g. dedicated coordinators would also be included? http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/ImpalaInternalService.thrift@695 PS13, Line 695: nit: no space before the : http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json@83 PS13, Line 83: number of executors "number of backends" (see comment in ImpalaInternalService.thrift) http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json@88 PS13, Line 88: NONE Is there a benefit/reason to use NONE over UNIT for counts? http://gerrit.cloudera.org:8080/#/c/13307/13/common/thrift/metrics.json@120 PS13, Line 120: max_running_queries_derived All other metrics (in this part of the file) use dashes as a separator. Should we follow that pattern here? http://gerrit.cloudera.org:8080/#/c/13307/13/fe/src/main/java/org/apache/impala/util/RequestPoolService.java File fe/src/main/java/org/apache/impala/util/RequestPoolService.java: http://gerrit.cloudera.org:8080/#/c/13307/13/fe/src/main/java/org/apache/impala/util/RequestPoolService.java@411 PS13, Line 411: + "max_queued={}, queue_timeout_ms={}, default_query_options={}," nit: whitespace consistent at the start or end of the strings? http://gerrit.cloudera.org:8080/#/c/13307/13/fe/src/main/java/org/apache/impala/util/RequestPoolService.java@459 PS13, Line 459: private double getPoolConfigValue( Overloading these by the type of defaultValue looks like it could introduce very subtle bugs, e.g. by forgetting the ".0". Should we name it getPoolConfigDoubleValue instead? http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@965 PS13, Line 965: 85M Can you add a comment to the test that explains why the memory sizes change throughout the test? http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@1024 PS13, Line 1024: def check_admission_by_counts(self, expected_num_impalads): Prepend with __ (or remove from previous method)? http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@1045 PS13, Line 1045: 100 does this assume that all queries get to admission control within 100ms? Should we bump this a bit higher (e.g. 1s) to make it less racy/flaky on less predictable test environments (cloud, asan)? http://gerrit.cloudera.org:8080/#/c/13307/13/tests/custom_cluster/test_admission_controller.py@1122 PS13, Line 1122: if "Query Status:" in line] nit: indent http://gerrit.cloudera.org:8080/#/c/13307/13/tests/webserver/test_web_pages.py File tests/webserver/test_web_pages.py: http://gerrit.cloudera.org:8080/#/c/13307/13/tests/webserver/test_web_pages.py@508 PS13, Line 508: assert pool_config['max_running_queries_derived'] This tripped me up, maybe add "> 0" for these three? -- To view, visit http://gerrit.cloudera.org:8080/13307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If47508728124076f3b9200c27cffc989f7a4f188 Gerrit-Change-Number: 13307 Gerrit-PatchSet: 13 Gerrit-Owner: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 14 Jun 2019 23:34:14 +0000 Gerrit-HasComments: Yes