Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13207 )
Change subject: IMPALA-8460: Simplify cluster membership management ...................................................................... Patch Set 7: (3 comments) Had a few comments, still need to complete review http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc@454 PS5, Line 454: impala_server_ = server; : // Register the ImpalaServer with the cluster membership manager : cluster_membership_mgr_->SetLocalBeDescFn([server]() { : return server->GetLocalBackendDescriptor(); : }); : cluster_membership_mgr_->SetUpdateLocalServerFn( : [server](const ClusterMembershipMgr::BackendAddressSet& current_backends) { : > Thanks the DCHECK would definitely help. I was also just thinking of a situ We shouldn't be running ~ExecEnv outside of backend tests AFAIK. http://gerrit.cloudera.org:8080/#/c/13207/6/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/13207/6/be/src/scheduling/cluster-membership-mgr.h@125 PS6, Line 125: /// membership. This callback will only be called when backends are deleted from the Unfortunately I don't think these Set*Fn() functions are really thread-safe, and it looks like Init() registers ClusterMembershipMgr with the statestore subscriber, so it's possible that a statestore update could race with these functions getting updated. It's probably unlikely to lead to a bug so long as these functions are always set from non-NULL to NULL, but it hard to reason about given that std::function isn't just a simple pointer, so the the reader could see a "torn" state. I think it would also trip TSAN, etc. I think we could solve it two ways: * Require that all the callbacks are registered before Init() is called. * Add some synchronisations around the function pointers. I don't think any of this is on a critical path, so I wouldn't be too concerned about a simple SpinLock. I think the first would be cleaner, although I understand there are some initialisation order issues between ExecEnv and ImpalaServer. Maybe we can defer initialisation of ClusterMembershipMgr until after SetImpalaServer() is called or something like that? http://gerrit.cloudera.org:8080/#/c/13207/6/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13207/6/be/src/scheduling/cluster-membership-mgr.cc@54 PS6, Line 54: void ClusterMembershipMgr::SetLocalBeDescFn(BackendDescriptorPtrFn fn) { Maybe DCHECK that it's non-NULL here and in the other functions below. -- To view, visit http://gerrit.cloudera.org:8080/13207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3 Gerrit-Change-Number: 13207 Gerrit-PatchSet: 7 Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 08 May 2019 20:51:40 +0000 Gerrit-HasComments: Yes