Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13207 )

Change subject: IMPALA-8460: Simplify cluster membership management
......................................................................


Patch Set 10:

(15 comments)

Thanks for your improvements - I think this is good. I really like the test, 
it'll be valuable in future.

http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc
File be/src/scheduling/cluster-membership-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@35
PS10, Line 35: /// handling code in 4 subsequently more sophisticated ways:
Maybe it's worth mentioning that it's effectively simulating the statestore 
except updates are done serially


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@48
PS10, Line 48:     rng_.Reset(seed);
I don't know about the pros/cons of Kudu's RNG but we have some test utilities 
that handle seeding and allow overriding with an env variable - 
RandTestUtil::SeedRng(). I'd be inclined to keep things consistent within our 
test code.


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@64
PS10, Line 64:   /// Backends that are created but offline, i.e. they have no 
ClusterMembershipMgr and
It might be helpful to document the states that a backend can be in (and how 
that's represented in terms of membership in these lists).


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@86
PS10, Line 86: pull
poll?


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@168
PS10, Line 168:     // Broadcast to all other backends
I was trying to think if there were any interesting cases where different CMMs 
saw deltas with different timing, e.g. two executors quiesce simultaneously so 
don't see the other's quiescence before they mark themselves as quiesced. The 
test framework I think is simulating serial updates but the statestore polls 
for updates in parallel.

I think this is unimportant since state of other backends doesn't affect the 
deltas pushed out in any way and each key is single writer, but just wanted to 
double check that you agreed with my thinking.


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@174
PS10, Line 174: send
sent


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@178
PS10, Line 178:     // TODO Make these a function
?


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@202
PS10, Line 202: /// statestore directly.
I'm not sure what "driving the interface with the statestore" really means.


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@358
PS10, Line 358:
const or constexpr


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@359
PS10, Line 359: p_add
These are constants, right, so "const double P_ADD"


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@363
PS10, Line 363:   //
empty comment. Actually would be good to document that these are cumulative 
counts.


http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@384
PS10, Line 384:     if (p < p_delete && !quiescing_.empty()) {
I feel like we should also be deleting non-quiescing backends, to simulate 
non-graceful failures. I don't think it adds much complexity (famous last 
words).


http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@79
PS8, Line 79:   // implicitly-defined default and copy constructors.
> I'm not sure I'm following your thoughts behind documenting it. Should we d
I think both - i think they're pretty tightly coupled anyway. Implicit copy 
constructors for complex structs are a bit of a code smell - I'd be suspicious 
if I came across this code and think that maybe it was accidentally copying 
things. I can also see devs accidentally breaking copy construction if they 
didn't realise it was meant to be used that way.


http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc@188
PS8, Line 188: ckend explicit
> I went with 1. Do you think we should expose the fact that we rely on a hos
I guess I don't see a reason why we need to change the interface, so this is 
fine.


http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc@299
PS8, Line 299:     LOG(WARNING) << "Error updating frontend membership 
snapshot: " << status.GetDetail();
> Done
What change was this referring to?



--
To view, visit http://gerrit.cloudera.org:8080/13207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3
Gerrit-Change-Number: 13207
Gerrit-PatchSet: 10
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 May 2019 01:06:30 +0000
Gerrit-HasComments: Yes

Reply via email to