Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13207 )
Change subject: IMPALA-8460: Simplify cluster membership management ...................................................................... Patch Set 10: (15 comments) Thanks for your improvements - I think this is good. I really like the test, it'll be valuable in future. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc File be/src/scheduling/cluster-membership-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@35 PS10, Line 35: /// handling code in 4 subsequently more sophisticated ways: Maybe it's worth mentioning that it's effectively simulating the statestore except updates are done serially http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@48 PS10, Line 48: rng_.Reset(seed); I don't know about the pros/cons of Kudu's RNG but we have some test utilities that handle seeding and allow overriding with an env variable - RandTestUtil::SeedRng(). I'd be inclined to keep things consistent within our test code. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@64 PS10, Line 64: /// Backends that are created but offline, i.e. they have no ClusterMembershipMgr and It might be helpful to document the states that a backend can be in (and how that's represented in terms of membership in these lists). http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@86 PS10, Line 86: pull poll? http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@168 PS10, Line 168: // Broadcast to all other backends I was trying to think if there were any interesting cases where different CMMs saw deltas with different timing, e.g. two executors quiesce simultaneously so don't see the other's quiescence before they mark themselves as quiesced. The test framework I think is simulating serial updates but the statestore polls for updates in parallel. I think this is unimportant since state of other backends doesn't affect the deltas pushed out in any way and each key is single writer, but just wanted to double check that you agreed with my thinking. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@174 PS10, Line 174: send sent http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@178 PS10, Line 178: // TODO Make these a function ? http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@202 PS10, Line 202: /// statestore directly. I'm not sure what "driving the interface with the statestore" really means. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@358 PS10, Line 358: const or constexpr http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@359 PS10, Line 359: p_add These are constants, right, so "const double P_ADD" http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@363 PS10, Line 363: // empty comment. Actually would be good to document that these are cumulative counts. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@384 PS10, Line 384: if (p < p_delete && !quiescing_.empty()) { I feel like we should also be deleting non-quiescing backends, to simulate non-graceful failures. I don't think it adds much complexity (famous last words). http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@79 PS8, Line 79: // implicitly-defined default and copy constructors. > I'm not sure I'm following your thoughts behind documenting it. Should we d I think both - i think they're pretty tightly coupled anyway. Implicit copy constructors for complex structs are a bit of a code smell - I'd be suspicious if I came across this code and think that maybe it was accidentally copying things. I can also see devs accidentally breaking copy construction if they didn't realise it was meant to be used that way. http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc@188 PS8, Line 188: ckend explicit > I went with 1. Do you think we should expose the fact that we rely on a hos I guess I don't see a reason why we need to change the interface, so this is fine. http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc@299 PS8, Line 299: LOG(WARNING) << "Error updating frontend membership snapshot: " << status.GetDetail(); > Done What change was this referring to? -- To view, visit http://gerrit.cloudera.org:8080/13207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3 Gerrit-Change-Number: 13207 Gerrit-PatchSet: 10 Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 17 May 2019 01:06:30 +0000 Gerrit-HasComments: Yes