Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 )
Change subject: IMPALA-8858: Add observability of idle executor groups ...................................................................... Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/14103/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14103/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-8858: Add observability of idle executor groups I think it would be good to also expose the total number of healthy executor groups per pool while you're at it. This would help to determine overall service readiness (e.g. some users may want to wait until at least one healthy group is available). http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.h@408 PS3, Line 408: seperated nit: typo http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.cc@83 PS3, Line 83: "admission-controller.executor-groups.total-healthy-idle"; Do we actually need these per resource pool? In the future we might have multiple pools and we already have the association between groups and pools through the name-prefix. http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.cc@1852 PS3, Line 1852: vector<std::string> GetIdleExecutorGroupNames( This should probably call GetExecutorGroupsForPool() (see previous comment) http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.h@136 PS3, Line 136: Metric Remove the word "Metric", it's an implementation detail what the receiver of the callback does with it. http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.h@221 PS3, Line 221: Updating nit lowercase http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@342 PS3, Line 342: UpdateFrontendExecutorMembership(*new_backend_map, *new_executor_groups); I think it would be cleaner to call all callbacks from a single spot. That will also allow us to move them into a list and loop over them, if we add even more. http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@346 PS3, Line 346: UpdateMetrics(new_state); Should we call UpdateMetrics() from SetState()? http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@551 PS3, Line 551: SnapshotPtr new_state){ : const BackendIdMap& current_backends = new_state->current_backends; : const ExecutorGroups& executor_groups = new_state->executor_groups; What was the reason for this change? http://gerrit.cloudera.org:8080/#/c/14103/3/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/14103/3/common/thrift/metrics.json@2485 PS3, Line 2485: seperated nit: typo http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py@341 PS3, Line 341: self._add_executor_group(group_name_suffixes[0], 1, max_concurrent_queries=1) Add num_executors parameter to be explicit, maybe also spell out min_size= here and below? I think it would be good to express the intent in a comment, too, in particular why you're creating them with different sizes. http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py@346 PS3, Line 346: self.coordinator.service.get_metric_value( : "admission-controller.executor-groups.total-healthy-idle").split(",") A helper like self._get_idle_groups() might make this more readable. http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py@362 PS3, Line 362: # Splitting on an empty string with a delimiter returns [''] This quirk could then be hidden inside the helper, too -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Comment-Date: Fri, 23 Aug 2019 16:36:09 +0000 Gerrit-HasComments: Yes