Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/13207 )
Change subject: IMPALA-8460: Simplify cluster membership management ...................................................................... Patch Set 5: (20 comments) Thanks for the review. Please see my inline comments and PS6. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc@454 PS5, Line 454: // Register the ImpalaServer with the cluster membership manager : cluster_membership_mgr_->SetLocalBeDescFn([server]() { : return server->GetLocalBackendDescriptor(); : }); : cluster_membership_mgr_->SetUpdateLocalServerFn( : [server](const ClusterMembershipMgr::BackendAddressSet& current_backends) { : server->CancelQueriesOnFailedBackends(current_backends); : }); > just thinking out aloud: should we reset the callback functions during tear These rely only on the server still being alive, not the ExecEnv, so the right thing to do would be to delete them when we unregister the ImpalaServer from the ExecEnv (which we currently don't). I added a DCHECK to make sure that we don't use this method to reset the ImpalaServer in the future. Since the ExecEnv d'tor will destroy the ClusterMembershipManager I think we should be good without resetting its state explicitly. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@54 PS5, Line 54: /// Clients can also register callbacks to receive notifications of changes to the cluster : /// membership. > this makes it sound like there is a generic way of registering callbacks li Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@68 PS5, Line 68: pool > nit: executor pool. Is there a description of "executor pools" anywhere? Replaced it with executor groups and added a brief description which we can expand in the future. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@102 PS5, Line 102: then > nit: when Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@112 PS5, Line 112: subscription > nit: subscription. Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@117 PS5, Line 117: /// Registers a callback to provide the local backend descriptor. : void SetLocalBeDescFn(BackendDescriptorPtrFn fn); : : /// Registers a callback to notify the local ImpalaServer of changes in the cluster : /// membership. This callback will only be called when backends are deleted from the : /// membership. : void SetUpdateLocalServerFn(UpdateLocalServerFn fn); : : /// Registers a callback to notify the local Frontend of changes in the cluster : /// membership. : void SetUpdateFrontendFn(UpdateFrontendFn fn); > I know we discussed this offline, but it might be worth documenting in this Both clients have different needs: The ImpalaServer only needs to learn about deleted backends, whereas the Frontend needs the full list. In the future, the Frontend will also need the executor group sizes (but not the memberships because groups will likely only be useful for remote read scenarios). Additionally, the callbacks have different signatures. Having a generic callback interface would require some sort of filtering of events (added, updated, deleted) and more complexity on the client side without a clear benefit. If the list of client classes expands we can revisit this decision. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@165 PS5, Line 165: in > nit: extra word Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@167 PS5, Line 167: / occur in 'current_backends' > nit: outdated comment? Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@184 PS5, Line 184: May be NULL if the set of : /// backends is fixed. > maybe mention that this is only true in tests Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@88 PS5, Line 88: update.is_delta && update.topic_entries.empty() > just curious, when can we receive an empty delta It's the way that the statestore pulls for updates from the clients, e.g. every time interval it will send an empty delta and the clients will reply with their updates. It cannot send an empty snapshot because that would delete the whole state, and sending the full membership each time is too much overhead. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@127 PS5, Line 127: const vector<string>& groups = DEFAULT_EXECUTOR_GROUPS; : for (const string& group : groups) { : (*new_executor_groups)[group].RemoveExecutor(be_desc); : } > This kinda implies that an executor can be added to multiple groups. I thin I added a comment and TODO to the class header, let me know if you'd like me to elaborate more. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@255 PS5, Line 255: auto > nit: const auto& Done. The const is redundant but I added it for readability. (See https://gerrit.cloudera.org/#/c/12065/8/be/src/exec/parquet/parquet-common.cc@95) http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@263 PS5, Line 263: const BackendIdMap::value_type& > nit: const auto& Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@297 PS5, Line 297: auto > nit: const auto& Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h File be/src/scheduling/executor-group.h: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h@41 PS5, Line 41: The caller must make sure that the : /// executor group does not get updated while they hold the reference. > would it make sense to document what is the expected usage pattern for this I removed this comment and added an explanation to the class comment. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h@49 PS5, Line 49: Note that during tests this can include multiple : /// executors per ip address. > any reason we are specifying this here? the class description already impli There's an ambiguity between this method, GetAllExecutorIps(), and NumExecutors() that is only relevant during tests. I also added a comment to NumExecutors() and to the class itself to point it out. http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h@a334 PS5, Line 334: > we can add something like this back when we introduce exec groups. Yeah, we'd probably want to add metrics for total backends and executors not in shutdown. Then the python tests could use those instead of the /backends debug page. Should we file a follow-up Jira (might be a good rampup task) or fix it here? http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h@145 PS5, Line 145: the global executor > nit: maybe call this an "executor group"? Done http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc@439 PS5, Line 439: // TODO(lars) > marking this in case you missed it Thanks, I think this is actually a good place to keep it (easier to test). http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc@654 PS5, Line 654: if (membership_snapshot->local_be_desc.get() == nullptr) { : return Status("Local backend has not yet started"); : } > can this happen? as you can only submit queries once the impala server is r There's a race between the ImpalaServer starting up and the statestore topic update triggering the registration in the local backend descriptor. The upside of it is a looser coupling between the classes. The alternative would be to explicitly call from the ImpalaServer to the ClusterMembershipManager upon startup and add the local backend descriptor, but then we'd have two paths concurrently updating the local snapshot which would require more careful synchronization. I clarified the error message and added a comment. -- To view, visit http://gerrit.cloudera.org:8080/13207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3 Gerrit-Change-Number: 13207 Gerrit-PatchSet: 5 Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 06 May 2019 20:44:08 +0000 Gerrit-HasComments: Yes