[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/14103/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14103/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-8858: Add observability of idle executor groups I think it would be good to also expose the total number of healthy executor groups per pool while you're at it. This would help to determine overall service readiness (e.g. some users may want to wait until at least one healthy group is available). http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.h@408 PS3, Line 408: seperated nit: typo http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.cc@83 PS3, Line 83: "admission-controller.executor-groups.total-healthy-idle"; Do we actually need these per resource pool? In the future we might have multiple pools and we already have the association between groups and pools through the name-prefix. http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/admission-controller.cc@1852 PS3, Line 1852: vector GetIdleExecutorGroupNames( This should probably call GetExecutorGroupsForPool() (see previous comment) http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.h@136 PS3, Line 136: Metric Remove the word "Metric", it's an implementation detail what the receiver of the callback does with it. http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.h@221 PS3, Line 221: Updating nit lowercase http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@342 PS3, Line 342: UpdateFrontendExecutorMembership(*new_backend_map, *new_executor_groups); I think it would be cleaner to call all callbacks from a single spot. That will also allow us to move them into a list and loop over them, if we add even more. http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@346 PS3, Line 346: UpdateMetrics(new_state); Should we call UpdateMetrics() from SetState()? http://gerrit.cloudera.org:8080/#/c/14103/3/be/src/scheduling/cluster-membership-mgr.cc@551 PS3, Line 551: SnapshotPtr new_state){ : const BackendIdMap& current_backends = new_state->current_backends; : const ExecutorGroups& executor_groups = new_state->executor_groups; What was the reason for this change? http://gerrit.cloudera.org:8080/#/c/14103/3/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/14103/3/common/thrift/metrics.json@2485 PS3, Line 2485: seperated nit: typo http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py@341 PS3, Line 341: self._add_executor_group(group_name_suffixes[0], 1, max_concurrent_queries=1) Add num_executors parameter to be explicit, maybe also spell out min_size= here and below? I think it would be good to express the intent in a comment, too, in particular why you're creating them with different sizes. http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py@346 PS3, Line 346: self.coordinator.service.get_metric_value( : "admission-controller.executor-groups.total-healthy-idle").split(",") A helper like self._get_idle_groups() might make this more readable. http://gerrit.cloudera.org:8080/#/c/14103/3/tests/custom_cluster/test_executor_groups.py@362 PS3, Line 362: # Splitting on an empty string with a delimiter returns [''] This quirk could then be hidden inside the helper, too -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 23 Aug
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4325/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 21 Aug 2019 21:05:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h@43 PS1, Line 43: /// The ClusterMembershipMgr manages the local backend's membership in the cluster. It has > I think there's a case for making it admission-controller.executor-groups.t Done -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 21 Aug 2019 20:48:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Hello Andrew Sherman, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14103 to look at the new patch set (#3). Change subject: IMPALA-8858: Add observability of idle executor groups .. IMPALA-8858: Add observability of idle executor groups This patch adds a metric that displays a comma seperated list of executor group names that are in a healthy state and are idle according to the coordinator (no queries admitted locally by the coordinator are running on them). It gets updated when either the cluster membership changes or a query gets admitted or released by the admission controller. Testing: Added a custom cluster test. Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M common/thrift/metrics.json M tests/custom_cluster/test_executor_groups.py 6 files changed, 155 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/14103/3 -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 2: (1 comment) A thing I thought of last night http://gerrit.cloudera.org:8080/#/c/14103/2/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/14103/2/be/src/scheduling/cluster-membership-mgr.cc@606 PS2, Line 606: idle_groups_->SetValue(idle_groups_str); I think this means that when there are no idle groups then this will be "" empty string? I'm worried that in all the stages we have to go through, impala->metrics_prometheus->prometheus->autoscaler that somewhere we will lose a metric with value "". Would it be horrible to have the value be "none" in this case? -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 21 Aug 2019 15:50:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4316/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 20 Aug 2019 23:28:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h@43 PS1, Line 43: /// This struct stores per-host statistics which are used during admission, by the cluster > I agree (sply about HostStats feeling out of place) but currently the metri I think there's a case for making it admission-controller.executor-groups.total-healthy-idle since "idle" is also an AC concept. -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 20 Aug 2019 23:00:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 2: (1 comment) > This looks like the right approach. > Would there be any value by having a test in cluster-membership-mgr-test.cc > ? I thought about that, but since the metric update method relies on both admissionController and the clusterMembershipManager, I felt an e2e test would suit it better. The e2e test covers all cases so it should provide enough coverage. http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h@43 PS1, Line 43: /// This struct stores per-host statistics which are used during admission, by the cluster > I understand why this is here but I think semantically it belongs in the ad I agree (sply about HostStats feeling out of place) but currently the metrics in admission controller are all connected to resource pools (admission-controller.*.). This metric (cluster-membership.executor-groups.total-healthy-idle) which is tied to backends and executors groups fitted better inside the cluster membership manager. -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 20 Aug 2019 22:51:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Hello Andrew Sherman, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14103 to look at the new patch set (#2). Change subject: IMPALA-8858: Add observability of idle executor groups .. IMPALA-8858: Add observability of idle executor groups This patch adds a metric that displays a comma seperated list of executor group names that are in a healthy state and are idle according to the coordinator (no queries admitted locally by the coordinator are running on them). It gets updated when either the cluster membership changes or a query gets admitted or released by the admission controller. Testing: Added a custom cluster test. Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M common/thrift/metrics.json M tests/custom_cluster/test_executor_groups.py 6 files changed, 151 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/14103/2 -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/14103/1/be/src/scheduling/cluster-membership-mgr.h@43 PS1, Line 43: /// This struct stores per-host statistics which are used during admission, by the cluster I understand why this is here but I think semantically it belongs in the admission controller file. Similarly, the concept of an executor group being "idle" should move there. Have you tried adding it there and using a callback to trigger an update when the cluster membership changes? -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 20 Aug 2019 16:57:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 1: This looks like the right approach. Would there be any value by having a test in cluster-membership-mgr-test.cc ? -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 19 Aug 2019 23:49:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14103 ) Change subject: IMPALA-8858: Add observability of idle executor groups .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/4304/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 19 Aug 2019 20:54:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8858: Add observability of idle executor groups
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14103 Change subject: IMPALA-8858: Add observability of idle executor groups .. IMPALA-8858: Add observability of idle executor groups This patch adds a metric that displays a comma seperated list of executor group names that are in a healthy state and are idle according to the coordinator (no queries admitted locally by the coordinator are running on them). It gets updated when either the cluster membership changes or a query gets admitted or released by the admission controller. Testing: Added a custom cluster test. Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/cluster-membership-mgr.cc M be/src/scheduling/cluster-membership-mgr.h M common/thrift/metrics.json M tests/custom_cluster/test_executor_groups.py 6 files changed, 151 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/14103/1 -- To view, visit http://gerrit.cloudera.org:8080/14103 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I58cde8699c33af8b87273437e9d8bf6371a34539 Gerrit-Change-Number: 14103 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig