Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11157 )

Change subject: IMPALA-7349: Add Admission control support for automatically 
setting mem_limit With this patch the mem_limit of a query is automatically set 
based on the mem_limit set in the query options and the mem_estimate calculated 
by the planner. This mem_limit wil
......................................................................


Patch Set 6:

(42 comments)

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/mem-tracker.h
File be/src/runtime/mem-tracker.h:

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/mem-tracker.h@126
PS6, Line 126: const
Unnecessary const qualifier when passing by value


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-exec-mgr.h@57
PS6, Line 57: const
Unnecessary const qualifier


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-exec-mgr.h@75
PS6, Line 75: const
Unnecessary const qualifier


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.h@251
PS6, Line 251:   /// Create QueryState w/ refcnt of 0.
Maybe worth documenting that 'mem_limit' applies to the query MemTracker? I 
think it's overkill to document the mem_limit arg on all functions functions 
that just forward it to this constructor though.


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.h@254
PS6, Line 254: const
Unnecessary const qualifier


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.cc@80
PS6, Line 80:   query_mem_tracker_ = MemTracker::CreateQueryMemTracker(
Much cleaner than before


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@89
PS6, Line 89: /// The memory required for admission for a request is specified 
as the query option
This comment needs update.


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@188
PS6, Line 188: ///
Is it worth summarizing the configuration storing, i.e. what the config files 
are, how they're validated, how updates are propagated?


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@191
PS6, Line 191: /// TODO: Remove less important debug logging after more cluster 
testing. Should have a
While we're here, maybe remove this TODO?


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@252
PS6, Line 252: limit_
Not sure what variable this is referring to?


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@256
PS6, Line 256:   /// The per host mem admitted only for the queries admitted 
locally.
Add a blank line between members with comments


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@260
PS6, Line 260:   class PoolStats {
Is this a good chance to move this class to a separate file? Maybe as a 
follow-on... I feel like admission-controller.cc is getting a bit overwhelimng.


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@93
PS6, Line 93: y-mem-limit
I'm not 100% on this mem_limit naming - it might be describing the current 
mechanism more than the concepts. I wonder if {max,min}-query-per-host-mem 
would capture the concept.

The problem with mem_limit is that it doesn't really suggest that the query is 
guaranteed that amount of memory.


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@115
PS6, Line 115: const string PROFILE_INFO_KEY_ADMITTED_MEM = "Memory Admitted";
It's a little ambiguous based on the name whether this is per-host memory or 
cluster memory - can we make the name less ambiguous?


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@351
PS6, Line 351: per_node_mem_needed
per_host_mem_limit for consistency?

Also maybe DCHECK that it's >= 0 since ComputePerHostMemLimit() can return -1 
in some cases (but shouldn't here).


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@391
PS6, Line 391:   const TQueryOptions& query_opts = schedule.query_options();
Should this be a separate function? It seems like it is a self-contained 
calculation?


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@417
PS6, Line 417:  over the mem limit
This is a little misleading since it's not necessarily a mem_limit. Maybe 
"There are not enough memory resources available for the query"


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@543
PS6, Line 543:     int64_t cluster_mem_estimate = per_host_mem_estimate * 
schedule.GetNumBackends();
Removing the indirection here was a good idea - more readable


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@1042
PS6, Line 1042:
              :   if(
missing whitespace


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@1104
PS6, Line 1104:   if (!has_query_option || 
!pool_cfg.strict_min_max_query_mem_limit) {
Is it worth DCHECKING in this branch that min_query_mem_limit <= 
max_query_mem_limit? It's enforced elsewhere but nice to check invariants.


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@1105
PS6, Line 1105:     if (pool_cfg.min_query_mem_limit > 0)
Need braces around conditional body.


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@141
PS6, Line 141: /// or is computed by Scheduler and the admission controller.
Probably worth elaborating a bit - the actual schedule is computed by the 
Scheduler but the admission control updates the amount of memory admitted


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@219
PS6, Line 219: const
unnecessary const - we're returning by value


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@221
PS6, Line 221: const
unnecessary const - we're returning by value


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@223
PS6, Line 223: const
unnecessary const - we're returning by value


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@291
PS6, Line 291:   /// Set by the admission controller on successful admission.
I'm still not sold on adding these to the schedule. I liked the the schedule 
was previously an immutable output of the scheduler. I guess the idea is that 
these values are the outcome of the decisions made by the whole scheduling/AC 
process?


http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@293
PS6, Line 293:   int64_t per_backend_mem_limit_ = 0;
What's the concurrency story for these variables? It's not thread-safe but we 
expect that all reads to it happen in a later phase than the write, right?


http://gerrit.cloudera.org:8080/#/c/11157/6/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/11157/6/common/thrift/ImpalaInternalService.thrift@545
PS6, Line 545:   // query mem tracker to enforce the memory limit. required in 
V1
It's a bit weird that we're playing lip service to protocol versioning in some 
of these comments.

I think staying consistent with other members here is fine though.


http://gerrit.cloudera.org:8080/#/c/11157/6/fe/src/test/resources/fair-scheduler-mem-limit-test.xml
File fe/src/test/resources/fair-scheduler-mem-limit-test.xml:

PS6:
Maybe rename files to mem-limit-test-fair-scheduler.xml so that the related 
files are grouped together when ordering by name?


http://gerrit.cloudera.org:8080/#/c/11157/6/fe/src/test/resources/fair-scheduler-mem-limit-test.xml@6
PS6, Line 6:       <maxResources>2500 mb, 2 vcores</maxResources>
Maybe remove the vcores since it's not needed for the test?


http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test
File 
testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test:

PS6:
Do we have full code coverage for the branches added to admission-controller.cc 
with these tests? Might be worth running test_admission_controller.py with a 
code coverage build to understand. Joe McDonnell has C++ code coverage working.


http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test@1
PS6, Line 1: ====
Can you add a test that runs the same query with and without num_nodes=1 and 
checks that the "Memory Admitted" is correctly the cluster-wide value.


http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test@5
PS6, Line 5: # check if mem_admitted is same as mem_estimate
Maybe worth mentioning that the query runs on a single node?


http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test@11
PS6, Line 11: row_regex: .*.*
Not sure what this is checking for?


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@177
PS6, Line 177:   """Creates a copy of the config files and uses those to start 
impala."""
Doesn't it just return the argument list? Should the function be renamed 
accordingly, e.g. with an _args() suffix like above?


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@179
PS6, Line 179:   resources_dir = os.path.join(impalad_home, "fe", "src", 
"test", "resources")
Seems like we should factor out these common things like impalad_home and 
resources_dir into top-level constants


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@350
PS6, Line 350:       
self.__check_query_options(client.get_runtime_profile(handle),
Thanks for fixing these style issues


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@697
PS6, Line 697:     # Set num_nodes to 1 since its easier to see one-to-one 
mapping of per_host and
Oh i see.. this makes sense for targeted tests but we should also have some 
testing with num_nodes != 1.


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@720
PS6, Line 720:     sleep(1)
Can we poll the query state or similar instead of sleeping?


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@725
PS6, Line 725:     # Change config to be invalid.
Maybe the config file modification should be a helper function? Actually maybe 
it's even worth factoring out the config file manipulation into a separate 
helper module? I'm thinking that any chance to reduce the amount of code in 
this file will make it more readable


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@760
PS6, Line 760:     """Helper method that constantly sends a query to the pool 
that will be rejected but
Maybe explicitly mention how pool_name, metric_str and target_val fit into this.


http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@781
PS6, Line 781:   def __write_xml_to_file(self, xml_root, file_name):
Writing to the file is non-atomic, so it seems like the Impalad could see a 
partially-written file, right? We might be better off writing to a temporary 
then doing a move so that it's atomic.



--
To view, visit http://gerrit.cloudera.org:8080/11157
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifec00141651982f5975803c2165b7d7a10ebeaa6
Gerrit-Change-Number: 11157
Gerrit-PatchSet: 6
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Aug 2018 16:06:32 +0000
Gerrit-HasComments: Yes

Reply via email to