Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/13207 )
Change subject: IMPALA-8460: Simplify cluster membership management ...................................................................... Patch Set 11: (18 comments) Thanks for the comments. Please see my inline replies and PS11. Please also have a brief look at this comment: https://gerrit.cloudera.org/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@67 http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc File be/src/scheduling/cluster-membership-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@35 PS10, Line 35: > Maybe it's worth mentioning that it's effectively simulating the statestore Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@48 PS10, Line 48: /// cluster. > I don't know about the pros/cons of Kudu's RNG but we have some test utilit Thx. I used kudu::Random because I had used it in the trace_ratio change and I wasn't aware of our own utils. I agree that we should keep it consistent and updated the code. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@64 PS10, Line 64: /// The list of backends_ that owns all backends. > It might be helpful to document the states that a backend can be in (and ho Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@86 PS10, Line 86: > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@86 PS10, Line 86: > poll? Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@168 PS10, Line 168: // Poll to obtain topic update > I was trying to think if there were any interesting cases where different C I agree with your thinking. Independent of the order, processing of updates in each CMM happens sequentially and they don't share state that depends on any racy behavior. The only detail that could depend on the order in which a backend receives is multiple backends running on a single host which end up in a different order inside a coordinator's executor group. However, that's only relevant to the scheduling and having multiple backends on single host is not supported in production deployments. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@174 PS10, Line 174: topi > sent Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@178 PS10, Line 178: > ? Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@180 PS10, Line 180: /// ClusterMembershipMgr to make the change take effect. The resulting update from the > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@202 PS10, Line 202: ASSERT_TRUE(is_running || is_quiescing); > I'm not sure what "driving the interface with the statestore" really means. Rephrased it, I wanted to say "this is how you use UpdateMembership() as if you're the statestore" http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@358 PS10, Line 358: > const or constexpr Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@359 PS10, Line 359: T_EQ( > These are constants, right, so "const double P_ADD" Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@363 PS10, Line 363: } > empty comment. Actually would be good to document that these are cumulative Done http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@384 PS10, Line 384: /// removed from the cluster after having been quiesced before, or removed from the > I feel like we should also be deleting non-quiescing backends, to simulate That's what I meant by "killing" a backend below (L391). I added a comment to the counter in L367. http://gerrit.cloudera.org:8080/#/c/13207/10/be/src/scheduling/cluster-membership-mgr-test.cc@388 PS10, Line 388: // TODO: Parameterize this test and run with several parameter sets > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h File be/src/scheduling/cluster-membership-mgr.h: http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@67 PS8, Line 67: typedef std::shared_ptr<const TBackendDescriptor> BackendDescriptorPtr; > Would BackendDescriptorSPtr work for you? I'd like to keep it from getting Any preference here? http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.h@79 PS8, Line 79: // implicitly-defined default and copy constructors. > I think both - i think they're pretty tightly coupled anyway. Implicit copy That makes sense, thanks for the explanation. What do you think about the documentation in PS10? http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/13207/8/be/src/scheduling/cluster-membership-mgr.cc@299 PS8, Line 299: LOG(WARNING) << "Error updating frontend membership snapshot: " << status.GetDetail(); > What change was this referring to? The error here was really in L301, we need to compare 'local_be_desc' to its version inside state.current_backends, not against state.local_be_desc. It is fixed in PS10. -- To view, visit http://gerrit.cloudera.org:8080/13207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3 Gerrit-Change-Number: 13207 Gerrit-PatchSet: 11 Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 17 May 2019 22:26:48 +0000 Gerrit-HasComments: Yes