Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8138 )
Change subject: WIP [consensus] adding/removing NON_VOTER members ...................................................................... Patch Set 7: (19 comments) http://gerrit.cloudera.org:8080/#/c/8138/6/src/kudu/consensus/leader_election.cc File src/kudu/consensus/leader_election.cc: http://gerrit.cloudera.org:8080/#/c/8138/6/src/kudu/consensus/leader_election.cc@167 PS6, Line 167: LOG_WITH_PREFIX(WARNING) << Substitute( > I thought about that, but decided that would be too much: if due do some bu Oh, I take it back. It seemed to me the RequestVotePB parameter arrives from some other peer, but that's not the case: RequestVotePB is generated by current peer and passed as a parameter into this constructor. So, after some consideration I think DCHECK() would be the best option here. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/leader_election.cc File src/kudu/consensus/leader_election.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/leader_election.cc@166 PS7, Line 166: // Only voters should start leader elections. > should this be a DCHECK instead of just a warning? wouldn't it indicate a b Right, Mike pointed at that already, but it seemed to me that this LeaderElection() constructor was called not only to start an election, but also to process VoteRequestPB when such a request arrives from a peer. I found it was wrong, so I'm adding that DCHECK() here. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/leader_election.cc@189 PS7, Line 189: Voters > nit: "Voter UUIDs" reads better for me Done http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/quorum_util-test.cc File src/kudu/consensus/quorum_util-test.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/quorum_util-test.cc@33 PS7, Line 33: The membership type : // is optional: if not specified, the membership type is set to VOTER > it doesn't look optional to me Yep, the comment is stale. Fixed. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/quorum_util-test.cc@173 PS7, Line 173: nmi_peer_uuid > what does 'nmi' stand for? nmi -- 'no member_type info'. agreed, that's not expressive one, will update. http://gerrit.cloudera.org:8080/#/c/8138/6/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/8138/6/src/kudu/consensus/raft_consensus.cc@410 PS6, Line 410: DisableFailureDetector(); > I'm not sure about that. Ah, regarding DisableFailureDetector() I think you are right -- that might be better. The second part of my comment was about returning IllegalState() status, actually. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc@405 PS7, Line 405: if (PREDICT_FALSE(active_role != RaftPeerPB::LEADER && > isn't this condition guaranteed to be true because of the if statement just Right, but it's about voter-related role in general. I replaced that with IsVoterRole() to be less confusing. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc@405 PS7, Line 405: active_role != RaftPeerPB::LEADER && : active_role != RaftPeerPB::FOLLOWER > what about adding a utility function like DoesRoleVote(active_role) in quor That's a good idea. Done. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/consensus/raft_consensus.cc@410 PS7, Line 410: DisableFailureDetector(); > I'm not sure this line is necessary. Shouldn't we have never enabled it in Since we start the failure detection upon start of every replica, it's necessary to do that somewhere. I didn't find the place where things are done in a centralized way upon configuration change. OK, I need to introduce that for generic configuration change. No tests failed when removing this, but that's expected since in that case StartElection() return an error and in the log there are multiple messages like 'failed to trigger leader election: Illegal state: only voting members can start elections'. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3286 PS7, Line 3286: tablet cluster > I think just 'tablet' is clearer Done http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3293 PS7, Line 3293: it > nit: its Done http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3296 PS7, Line 3296: // * Tablet leader is established and it reports about the newly added replica > isn't this already done as soon as the config change is committed? the comp Right, committing Raft configuration update means the leader has been already established. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3299 PS7, Line 3299: and > 'a new one' Done http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3300 PS7, Line 3300: tablet cluster is available > not clear what this means. You mean that the 'updated leader is available a as the sentence explains, that means it's possible to insert data into corresponding table. Updated. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3301 PS7, Line 3301: is hosted by > contains Done http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3378 PS7, Line 3378: GetTabletCopyTgtSessionsCount > let's not abbreviate 'Target' here Done http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3410 PS7, Line 3410: ASSERT_OK(LeaderStepDown(leader, tablet_id, kTimeout)); > add a comment for this block here explaining why you're asking it to step d Done. Yep, that's about checking that the update cluster is able to elect a leader. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3423 PS7, Line 3423: // Ensure that the replicas converge. > curious: does this test infrastructure also poll the NON_VOTER replica to s Along with other verification steps, ClusterVerifier employs VerifyCommittedOpIdsMatch() which works with local files under tablet servers to verify that OpIds for all existent tablets match. In that sense, NON_VOTER replicas are covered as well, yes. I added corresponding comment. http://gerrit.cloudera.org:8080/#/c/8138/7/src/kudu/integration-tests/raft_consensus-itest.cc@3507 PS7, Line 3507: // Remove the newly added replica. > can we also assert here that the removed replica gets TOMBSTONED? That's a good idea, thanks! -- To view, visit http://gerrit.cloudera.org:8080/8138 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2662d45ad9bb6a4bf325d4202c2ee619ffad02b7 Gerrit-Change-Number: 8138 Gerrit-PatchSet: 7 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Fri, 06 Oct 2017 23:51:10 +0000 Gerrit-HasComments: Yes