Hello Fengling Wang, Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11251 to look at the new patch set (#8). Change subject: KUDU-2245 Graceful leadership transfer ...................................................................... KUDU-2245 Graceful leadership transfer This patch implements graceful leadership transfer, as described in the original Raft thesis. It has the following steps: 1. An admin client sends a request to the tablet leader for it to transfer leadership. The client can indicate a specific voter that it wants to become the leader, or it can allow the current leader to choose its successor. 2. The leader receives the request and beings a leader transfer period. During a leader transfer period, the leader does not accept writes or config change requests. This allows followers to catch up to the leader. A background timer expires the transfer period after one election timeout, since clients should be able to ride over interruptions in service lasting at least that long. 3. During the transfer period, the leader continues to update peers. When it receives a response from a peer, it checks if that peer is a voter and fully caught up to the leader's log. If it is, and if it is the designated successor if one was provided, the leader signals the peer to start an election, which it should win. If no eligible successor appears, the transfer period expires and the leader resumes normal operation. This is an improvement over the current leader step down method, which causes the leader to simply relinquish leadership and snooze its election timer for an extra long period, so another voter will likely become leader. Leadership transfer should usually be much faster and it allows the client to select the new leader among current voters. However, note that it does not provide strictly better guarantees- it is still possible that leadership will not be transferred. I ran TestRepeatLeaderStepDown and TestGracefulLeaderStepDown 1000 times and 200 times each, in debug and TSAN modes, with 4 stress threads, and saw no failures. Still WIP because I want to * Run some dist-test loops of the rebalancer tests, which now use graceful leadership transfer. * Add a test or two for bad cases where the leadership transfer period should expire. * Quantify how much faster leadership transfer is than abrupt stepdown, at least in a lab environment. Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_tablet.cc M src/kudu/tools/tool_replica_util.cc M src/kudu/tools/tool_replica_util.h M src/kudu/tserver/tablet_service.cc 17 files changed, 815 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/11251/8 -- To view, visit http://gerrit.cloudera.org:8080/11251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083 Gerrit-Change-Number: 11251 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>