[Impala-ASF-CR] IMPALA-8460: Simplify cluster membership management

Lars Volker (Code Review) Mon, 06 May 2019 13:44:53 -0700

Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13207 )


Change subject: IMPALA-8460: Simplify cluster membership management
......................................................................


Patch Set 5:

(20 comments)

Thanks for the review. Please see my inline comments and PS6.

http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/runtime/exec-env.cc@454
PS5, Line 454:   // Register the ImpalaServer with the cluster membership 
manager
             :   cluster_membership_mgr_->SetLocalBeDescFn([server]() {
             :     return server->GetLocalBackendDescriptor();
             :   });
             :   cluster_membership_mgr_->SetUpdateLocalServerFn(
             :       [server](const ClusterMembershipMgr::BackendAddressSet& 
current_backends) {
             :         server->CancelQueriesOnFailedBackends(current_backends);
             :   });
> just thinking out aloud: should we reset the callback functions during tear
These rely only on the server still being alive, not the ExecEnv, so the right 
thing to do would be to delete them when we unregister the ImpalaServer from 
the ExecEnv (which we currently don't). I added a DCHECK to make sure that we 
don't use this method to reset the ImpalaServer in the future. Since the 
ExecEnv d'tor will destroy the ClusterMembershipManager I think we should be 
good without resetting its state explicitly.


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@54
PS5, Line 54: /// Clients can also register callbacks to receive notifications 
of changes to the cluster
            : /// membership.
> this makes it sound like there is a generic way of registering callbacks li
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@68
PS5, Line 68: pool
> nit: executor pool. Is there a description of "executor pools" anywhere?
Replaced it with executor groups and added a brief description which we can 
expand in the future.


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@102
PS5, Line 102: then
> nit: when
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@112
PS5, Line 112: subscription
> nit: subscription.
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@117
PS5, Line 117:   /// Registers a callback to provide the local backend 
descriptor.
             :   void SetLocalBeDescFn(BackendDescriptorPtrFn fn);
             :
             :   /// Registers a callback to notify the local ImpalaServer of 
changes in the cluster
             :   /// membership. This callback will only be called when 
backends are deleted from the
             :   /// membership.
             :   void SetUpdateLocalServerFn(UpdateLocalServerFn fn);
             :
             :   /// Registers a callback to notify the local Frontend of 
changes in the cluster
             :   /// membership.
             :   void SetUpdateFrontendFn(UpdateFrontendFn fn);
> I know we discussed this offline, but it might be worth documenting in this
Both clients have different needs: The ImpalaServer only needs to learn about 
deleted backends, whereas the Frontend needs the full list. In the future, the 
Frontend will also need the executor group sizes (but not the memberships 
because groups will likely only be useful for remote read scenarios). 
Additionally, the callbacks have different signatures. Having a generic 
callback interface would require some sort of filtering of events (added, 
updated, deleted) and more complexity on the client side without a clear 
benefit. If the list of client classes expands we can revisit this decision.


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@165
PS5, Line 165: in
> nit: extra word
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@167
PS5, Line 167: / occur in 'current_backends'
> nit: outdated comment?
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.h@184
PS5, Line 184: May be NULL if the set of
             :   /// backends is fixed.
> maybe mention that this is only true in tests
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@88
PS5, Line 88: update.is_delta && update.topic_entries.empty()
> just curious, when can we receive an empty delta
It's the way that the statestore pulls for updates from the clients, e.g. every 
time interval it will send an empty delta and the clients will reply with their 
updates. It cannot send an empty snapshot because that would delete the whole 
state, and sending the full membership each time is too much overhead.


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@127
PS5, Line 127:           const vector<string>& groups = DEFAULT_EXECUTOR_GROUPS;
             :           for (const string& group : groups) {
             :             
(*new_executor_groups)[group].RemoveExecutor(be_desc);
             :           }
> This kinda implies that an executor can be added to multiple groups. I thin
I added a comment and TODO to the class header, let me know if you'd like me to 
elaborate more.


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@255
PS5, Line 255: auto
> nit: const auto&
Done. The const is redundant but I added it for readability. (See 
https://gerrit.cloudera.org/#/c/12065/8/be/src/exec/parquet/parquet-common.cc@95)


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@263
PS5, Line 263: const BackendIdMap::value_type&
> nit: const auto&
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/cluster-membership-mgr.cc@297
PS5, Line 297: auto
> nit: const auto&
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h
File be/src/scheduling/executor-group.h:

http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h@41
PS5, Line 41: The caller must make sure that the
            :   /// executor group does not get updated while they hold the 
reference.
> would it make sense to document what is the expected usage pattern for this
I removed this comment and added an explanation to the class comment.


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/executor-group.h@49
PS5, Line 49: Note that during tests this can include multiple
            :   /// executors per ip address.
> any reason we are specifying this here? the class description already impli
There's an ambiguity between this method, GetAllExecutorIps(), and 
NumExecutors() that is only relevant during tests. I also added a comment to 
NumExecutors() and to the class itself to point it out.


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h@a334
PS5, Line 334:
> we can add something like this back when we introduce exec groups.
Yeah, we'd probably want to add metrics for total backends and executors not in 
shutdown. Then the python tests could use those instead of the /backends debug 
page. Should we file a follow-up Jira (might be a good rampup task) or fix it 
here?


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.h@145
PS5, Line 145:  the global executor
> nit: maybe call this an "executor group"?
Done


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc@439
PS5, Line 439: // TODO(lars)
> marking this in case you missed it
Thanks, I think this is actually a good place to keep it (easier to test).


http://gerrit.cloudera.org:8080/#/c/13207/5/be/src/scheduling/scheduler.cc@654
PS5, Line 654:   if (membership_snapshot->local_be_desc.get() == nullptr) {
             :     return Status("Local backend has not yet started");
             :   }
> can this happen? as you can only submit queries once the impala server is r
There's a race between the ImpalaServer starting up and the statestore topic 
update triggering the registration in the local backend descriptor. The upside 
of it is a looser coupling between the classes. The alternative would be to 
explicitly call from the ImpalaServer to the ClusterMembershipManager upon 
startup and add the local backend descriptor, but then we'd have two paths 
concurrently updating the local snapshot which would require more careful 
synchronization.

I clarified the error message and added a comment.



--
To view, visit http://gerrit.cloudera.org:8080/13207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3
Gerrit-Change-Number: 13207
Gerrit-PatchSet: 5
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 May 2019 20:44:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8460: Simplify cluster membership management

Reply via email to