Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13207 )

Change subject: IMPALA-8460: Simplify cluster membership management
......................................................................


Patch Set 3:

(25 comments)

I did an initial quick pass, focused mainly on the interfaces and overall code 
structure. I haven't made a concerted effort to think about correctness and/or 
look for bugs yet but I thought it was worth flushing out the feedback at this 
point.

http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.h@96
PS2, Line 96:   /// A callback to notify the local ImpalaServer of changes to 
the cluster membership.
Should we document that no locks are held during by the calling thread of the 
callback? That's the trickiest problem I've generally seen with callbacks and 
it's not an issue here.


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@94
PS2, Line 94: not
nit: no


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@98
PS2, Line 98:   VLOG(3) << "Processing statestore update";
I feel like a lot of these messages could actually be VLOG(2) or even VLOG(1), 
since they're relatively rare and mean *something* happened, like a cluster 
membership change or this impalad registering itself. Don't need to fix it 
right now, but worth thinking about if it would help debugging in future.


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@108
PS2, Line 108:     // Make a copy. Currently no other code would modify the 
state but we take the lock as
We're assuming that only one thread executes the below code at a time, right, 
otherwise they'd race to update the sate. Can we use a utility to verify this - 
I believe DFAKE_SCOPED_LOCK_THREAD_LOCKED from gutil does this, but I'm just 
going on Todd's advice.

Ok to defer but would be nice to clarify invariants as part of refactor.


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@113
PS2, Line 113:   if (local_be_desc.get() != nullptr) {
Conditional fits on one line?


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@145
PS2, Line 145:       VLOG(2) << "Error deserializing membership topic item with 
key: " << item.key;
This seems like something we should definitely log at INFO or probably even 
WARNING level, since something went wrong!


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@152
PS2, Line 152:       VLOG(1) << "Ignoring subscription request with empty IP 
address from subscriber: "
Same here


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@201
PS2, Line 201: desccriptor
typo


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@242
PS2, Line 242:     LOG(WARNING) << "Failed to serialize Impala backend 
descriptor for statestore topic:"
It seems like this really should be a LOG(FATAL) or a CHECK since we never 
would expect this to happen and an Impalad that can't register itself with the 
SS is useless.

OK to ignore to avoid scope creep of refactor.


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@252
PS2, Line 252:   BackendAddressSet current_backend_set;
This is O(n^2) where n in the cluster size, in some sense because we 
potentially do O(n) work for each of O(n) registrations. I think this is 
probably OK though right? Because generally the membership changes won't be 
continual and this is fairly cheap.

I do feel like this is less error prone than trying to only send deltas, so I 
think I'm in favour.


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@282
PS2, Line 282:     const BackendDescriptorPtr& local_be_desc){
nit: whitespace


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@283
PS2, Line 283:   if (local_be_desc.get() == nullptr) return false;
I don't personally like the redundant .get() here but it is a matter of taste :)


http://gerrit.cloudera.org:8080/#/c/13207/2/be/src/scheduling/cluster-membership-mgr.cc@312
PS2, Line 312:             << " is shutting down";
return false here and on l308?


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h
File be/src/scheduling/executor-group.h:

http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@38
PS3, Line 38:   ExecutorGroup(const std::vector<TNetworkAddress>& executors);
Explicit constructors?


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@41
PS3, Line 41:   typedef std::list<TBackendDescriptor> ExecutorList;
Any reason this shouldn't be a vector? That's generally a better default. It 
doesn't look like we need stable iterators into a list, I don't think and I 
don't see anywhere where the asymptotic complexity is better.


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@43
PS3, Line 43:   /// Return the list of executors on a particular host. The 
caller must make sure that
What's the thread safety story for the returned list?


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@47
PS3, Line 47:   void GetAllExecutorIps(std::vector<IpAddr>* ip_addresses) const;
Can you document whether these append to the vectors or replace the contents. 
And if it replaces the contents, why not just return a new vector?


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@49
PS3, Line 49:   void AddExecutor(const TBackendDescriptor& be_desc);
I think these could do with a basic description. There may be some details 
worth documenting, e.g. what happens if the executor already exists in the 
group.


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@62
PS3, Line 62:   /// lifetime of this ExecutorGroup.
What happens if the executor is removed with RemoveExecutor()?


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@65
PS3, Line 65:   const HashRing* GetHashRing() const { return 
&executor_ip_hash_ring_; }
Maybe worth documenting lifetime/thread safety of this?


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.h@71
PS3, Line 71:   typedef boost::unordered_map<IpAddr, ExecutorList> ExecutorMap;
Maybe a good time to switch these to std::unordered_map?


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.cc
File be/src/scheduling/executor-group.cc:

http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.cc@54
PS3, Line 54:   ip_addresses->reserve(NumExecutors());
This isn't right if ip_addresses already has entries. Not sure if this happens 
in practice - if it doesn't though we should just return the vector I think.


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/executor-group.cc@109
PS3, Line 109: LIKELY
LIKELY annotation probably isn't necessary. Ok to leave.


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

PS3:
It's nice that the scheduler has become more focused on scheduling and less 
stateful.


http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13207/3/be/src/service/impala-server.cc@a1744
PS3, Line 1744:
:) Nice to see this file get slightly simpler



--
To view, visit http://gerrit.cloudera.org:8080/13207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3
Gerrit-Change-Number: 13207
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 17:26:49 +0000
Gerrit-HasComments: Yes

Reply via email to