Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13207 )

Change subject: IMPALA-8460: Simplify cluster membership management
......................................................................


Patch Set 11:

(18 comments)

Thanks for the comments. Please see my inline replies and PS11. Please also 
have a brief look at this comment: 
https://gerrit.cloudera.org/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@67

http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc
File be/src/scheduling/cluster-membership-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@35
PS10, Line 35:
> Maybe it's worth mentioning that it's effectively simulating the statestore
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@48
PS10, Line 48: ///    cluster.
> I don't know about the pros/cons of Kudu's RNG but we have some test utilit
Thx. I used kudu::Random because I had used it in the trace_ratio change and I 
wasn't aware of our own utils. I agree that we should keep it consistent and 
updated the code.


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@64
PS10, Line 64:   /// The list of backends_ that owns all backends.
> It might be helpful to document the states that a backend can be in (and ho
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@86
PS10, Line 86:
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@86
PS10, Line 86:
> poll?
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@168
PS10, Line 168:     // Poll to obtain topic update
> I was trying to think if there were any interesting cases where different C
I agree with your thinking. Independent of the order, processing of updates in 
each CMM happens sequentially and they don't share state that depends on any 
racy behavior.

The only detail that could depend on the order in which a backend receives is 
multiple backends running on a single host which end up in a different order 
inside a coordinator's executor group. However, that's only relevant to the 
scheduling and having multiple backends on single host is not supported in 
production deployments.


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@174
PS10, Line 174: topi
> sent
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@178
PS10, Line 178:
> ?
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@180
PS10, Line 180:   /// ClusterMembershipMgr to make the change take effect. The 
resulting update from the
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@202
PS10, Line 202:     ASSERT_TRUE(is_running || is_quiescing);
> I'm not sure what "driving the interface with the statestore" really means.
Rephrased it, I wanted to say "this is how you use UpdateMembership() as if 
you're the statestore"


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@358
PS10, Line 358:
> const or constexpr
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@359
PS10, Line 359: T_EQ(
> These are constants, right, so "const double P_ADD"
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@363
PS10, Line 363:   }
> empty comment. Actually would be good to document that these are cumulative
Done


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@384
PS10, Line 384: /// removed from the cluster after having been quiesced before, 
or removed from the
> I feel like we should also be deleting non-quiescing backends, to simulate
That's what I meant by "killing" a backend below (L391). I added a comment to 
the counter in L367.


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@388
PS10, Line 388:   // TODO: Parameterize this test and run with several 
parameter sets
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@67
PS8, Line 67:   typedef std::shared_ptr<const TBackendDescriptor> 
BackendDescriptorPtr;
> Would BackendDescriptorSPtr work for you? I'd like to keep it from getting
Any preference here?


http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@79
PS8, Line 79:   // implicitly-defined default and copy constructors.
> I think both - i think they're pretty tightly coupled anyway. Implicit copy
That makes sense, thanks for the explanation. What do you think about the 
documentation in PS10?


http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc@299
PS8, Line 299:     LOG(WARNING) << "Error updating frontend membership 
snapshot: " << status.GetDetail();
> What change was this referring to?
The error here was really in L301, we need to compare 'local_be_desc' to its 
version inside state.current_backends, not against state.local_be_desc. It is 
fixed in PS10.



--
To view, visit http://gerrit.cloudera.org:8080/13207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3
Gerrit-Change-Number: 13207
Gerrit-PatchSet: 11
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 May 2019 22:26:48 +0000
Gerrit-HasComments: Yes

Reply via email to