Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11157 )
Change subject: IMPALA-7349: Add Admission control support for automatically setting mem_limit With this patch the mem_limit of a query is automatically set based on the mem_limit set in the query options and the mem_estimate calculated by the planner. This mem_limit wil ...................................................................... Patch Set 6: (42 comments) http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/mem-tracker.h@126 PS6, Line 126: const Unnecessary const qualifier when passing by value http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-exec-mgr.h@57 PS6, Line 57: const Unnecessary const qualifier http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-exec-mgr.h@75 PS6, Line 75: const Unnecessary const qualifier http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.h@251 PS6, Line 251: /// Create QueryState w/ refcnt of 0. Maybe worth documenting that 'mem_limit' applies to the query MemTracker? I think it's overkill to document the mem_limit arg on all functions functions that just forward it to this constructor though. http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.h@254 PS6, Line 254: const Unnecessary const qualifier http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/runtime/query-state.cc@80 PS6, Line 80: query_mem_tracker_ = MemTracker::CreateQueryMemTracker( Much cleaner than before http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@89 PS6, Line 89: /// The memory required for admission for a request is specified as the query option This comment needs update. http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@188 PS6, Line 188: /// Is it worth summarizing the configuration storing, i.e. what the config files are, how they're validated, how updates are propagated? http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@191 PS6, Line 191: /// TODO: Remove less important debug logging after more cluster testing. Should have a While we're here, maybe remove this TODO? http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@252 PS6, Line 252: limit_ Not sure what variable this is referring to? http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@256 PS6, Line 256: /// The per host mem admitted only for the queries admitted locally. Add a blank line between members with comments http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.h@260 PS6, Line 260: class PoolStats { Is this a good chance to move this class to a separate file? Maybe as a follow-on... I feel like admission-controller.cc is getting a bit overwhelimng. http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@93 PS6, Line 93: y-mem-limit I'm not 100% on this mem_limit naming - it might be describing the current mechanism more than the concepts. I wonder if {max,min}-query-per-host-mem would capture the concept. The problem with mem_limit is that it doesn't really suggest that the query is guaranteed that amount of memory. http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@115 PS6, Line 115: const string PROFILE_INFO_KEY_ADMITTED_MEM = "Memory Admitted"; It's a little ambiguous based on the name whether this is per-host memory or cluster memory - can we make the name less ambiguous? http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@351 PS6, Line 351: per_node_mem_needed per_host_mem_limit for consistency? Also maybe DCHECK that it's >= 0 since ComputePerHostMemLimit() can return -1 in some cases (but shouldn't here). http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@391 PS6, Line 391: const TQueryOptions& query_opts = schedule.query_options(); Should this be a separate function? It seems like it is a self-contained calculation? http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@417 PS6, Line 417: over the mem limit This is a little misleading since it's not necessarily a mem_limit. Maybe "There are not enough memory resources available for the query" http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@543 PS6, Line 543: int64_t cluster_mem_estimate = per_host_mem_estimate * schedule.GetNumBackends(); Removing the indirection here was a good idea - more readable http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@1042 PS6, Line 1042: : if( missing whitespace http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@1104 PS6, Line 1104: if (!has_query_option || !pool_cfg.strict_min_max_query_mem_limit) { Is it worth DCHECKING in this branch that min_query_mem_limit <= max_query_mem_limit? It's enforced elsewhere but nice to check invariants. http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/admission-controller.cc@1105 PS6, Line 1105: if (pool_cfg.min_query_mem_limit > 0) Need braces around conditional body. http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@141 PS6, Line 141: /// or is computed by Scheduler and the admission controller. Probably worth elaborating a bit - the actual schedule is computed by the Scheduler but the admission control updates the amount of memory admitted http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@219 PS6, Line 219: const unnecessary const - we're returning by value http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@221 PS6, Line 221: const unnecessary const - we're returning by value http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@223 PS6, Line 223: const unnecessary const - we're returning by value http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@291 PS6, Line 291: /// Set by the admission controller on successful admission. I'm still not sold on adding these to the schedule. I liked the the schedule was previously an immutable output of the scheduler. I guess the idea is that these values are the outcome of the decisions made by the whole scheduling/AC process? http://gerrit.cloudera.org:8080/#/c/11157/6/be/src/scheduling/query-schedule.h@293 PS6, Line 293: int64_t per_backend_mem_limit_ = 0; What's the concurrency story for these variables? It's not thread-safe but we expect that all reads to it happen in a later phase than the write, right? http://gerrit.cloudera.org:8080/#/c/11157/6/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/11157/6/common/thrift/ImpalaInternalService.thrift@545 PS6, Line 545: // query mem tracker to enforce the memory limit. required in V1 It's a bit weird that we're playing lip service to protocol versioning in some of these comments. I think staying consistent with other members here is fine though. http://gerrit.cloudera.org:8080/#/c/11157/6/fe/src/test/resources/fair-scheduler-mem-limit-test.xml File fe/src/test/resources/fair-scheduler-mem-limit-test.xml: PS6: Maybe rename files to mem-limit-test-fair-scheduler.xml so that the related files are grouped together when ordering by name? http://gerrit.cloudera.org:8080/#/c/11157/6/fe/src/test/resources/fair-scheduler-mem-limit-test.xml@6 PS6, Line 6: <maxResources>2500 mb, 2 vcores</maxResources> Maybe remove the vcores since it's not needed for the test? http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test File testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test: PS6: Do we have full code coverage for the branches added to admission-controller.cc with these tests? Might be worth running test_admission_controller.py with a code coverage build to understand. Joe McDonnell has C++ code coverage working. http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test@1 PS6, Line 1: ==== Can you add a test that runs the same query with and without num_nodes=1 and checks that the "Memory Admitted" is correctly the cluster-wide value. http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test@5 PS6, Line 5: # check if mem_admitted is same as mem_estimate Maybe worth mentioning that the query runs on a single node? http://gerrit.cloudera.org:8080/#/c/11157/6/testdata/workloads/functional-query/queries/QueryTest/admission-max-min-mem-limits.test@11 PS6, Line 11: row_regex: .*.* Not sure what this is checking for? http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@177 PS6, Line 177: """Creates a copy of the config files and uses those to start impala.""" Doesn't it just return the argument list? Should the function be renamed accordingly, e.g. with an _args() suffix like above? http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@179 PS6, Line 179: resources_dir = os.path.join(impalad_home, "fe", "src", "test", "resources") Seems like we should factor out these common things like impalad_home and resources_dir into top-level constants http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@350 PS6, Line 350: self.__check_query_options(client.get_runtime_profile(handle), Thanks for fixing these style issues http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@697 PS6, Line 697: # Set num_nodes to 1 since its easier to see one-to-one mapping of per_host and Oh i see.. this makes sense for targeted tests but we should also have some testing with num_nodes != 1. http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@720 PS6, Line 720: sleep(1) Can we poll the query state or similar instead of sleeping? http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@725 PS6, Line 725: # Change config to be invalid. Maybe the config file modification should be a helper function? Actually maybe it's even worth factoring out the config file manipulation into a separate helper module? I'm thinking that any chance to reduce the amount of code in this file will make it more readable http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@760 PS6, Line 760: """Helper method that constantly sends a query to the pool that will be rejected but Maybe explicitly mention how pool_name, metric_str and target_val fit into this. http://gerrit.cloudera.org:8080/#/c/11157/6/tests/custom_cluster/test_admission_controller.py@781 PS6, Line 781: def __write_xml_to_file(self, xml_root, file_name): Writing to the file is non-atomic, so it seems like the Impalad could see a partially-written file, right? We might be better off writing to a temporary then doing a move so that it's atomic. -- To view, visit http://gerrit.cloudera.org:8080/11157 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifec00141651982f5975803c2165b7d7a10ebeaa6 Gerrit-Change-Number: 11157 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 15 Aug 2018 16:06:32 +0000 Gerrit-HasComments: Yes