Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8138 )
Change subject: WIP [consensus] adding/removing NON_VOTER members ...................................................................... Patch Set 7: (17 comments) Looks pretty good. I agree we don't need to include CLI tools into this same patch - it's already 500+ lines, no need to make it bigger :) http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/leader_election.cc File src/kudu/consensus/leader_election.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/leader_election.cc@166 PS7, Line 166: // Only voters should start leader elections. should this be a DCHECK instead of just a warning? wouldn't it indicate a bug if we tried to start a leader-election as a non-voter? seems like we already have similar checks in RaftConsensus::StartElection which is where it might be more appropriate to double-check that we are an appropriate role. (we already check for NON_PARTICIPANT) http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/leader_election.cc@189 PS7, Line 189: Voters nit: "Voter UUIDs" reads better for me http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/quorum_util-test.cc File src/kudu/consensus/quorum_util-test.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/quorum_util-test.cc@33 PS7, Line 33: The membership type : // is optional: if not specified, the membership type is set to VOTER it doesn't look optional to me http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/quorum_util-test.cc@173 PS7, Line 173: nmi_peer_uuid what does 'nmi' stand for? http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc@405 PS7, Line 405: if (PREDICT_FALSE(active_role != RaftPeerPB::LEADER && isn't this condition guaranteed to be true because of the if statement just above? http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc@405 PS7, Line 405: active_role != RaftPeerPB::LEADER && : active_role != RaftPeerPB::FOLLOWER what about adding a utility function like DoesRoleVote(active_role) in quorum_util so that this is centalized? http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc@410 PS7, Line 410: DisableFailureDetector(); I'm not sure this line is necessary. Shouldn't we have never enabled it in the first place, in which case the only reason we'd get to StartElection is from a remote request or from some kind of delayed callback across a term switch? I'm curious: If you remove this line do any tests fail? http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3286 PS7, Line 3286: tablet cluster I think just 'tablet' is clearer http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3293 PS7, Line 3293: it nit: its http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3296 PS7, Line 3296: // * Tablet leader is established and it reports about the newly added replica isn't this already done as soon as the config change is committed? the completion of the tablet copy should have no bearing on the leader election or reporting of the config change to the master. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3299 PS7, Line 3299: and 'a new one' http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3300 PS7, Line 3300: tablet cluster is available not clear what this means. You mean that the 'updated leader is available at the master'? http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3301 PS7, Line 3301: is hosted by contains http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3378 PS7, Line 3378: GetTabletCopyTgtSessionsCount let's not abbreviate 'Target' here http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3410 PS7, Line 3410: ASSERT_OK(LeaderStepDown(leader, tablet_id, kTimeout)); add a comment for this block here explaining why you're asking it to step down? (it's just verifying that it can re-elect?) http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3423 PS7, Line 3423: // Ensure that the replicas converge. curious: does this test infrastructure also poll the NON_VOTER replica to see that it is up to date? http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3507 PS7, Line 3507: // Remove the newly added replica. can we also assert here that the removed replica gets TOMBSTONED? -- To view, visit http://gerrit.cloudera.org:8080/8138 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2662d45ad9bb6a4bf325d4202c2ee619ffad02b7 Gerrit-Change-Number: 8138 Gerrit-PatchSet: 7 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Fri, 06 Oct 2017 01:02:48 +0000 Gerrit-HasComments: Yes