Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1778. Fix LMP mismatch behavior after a replica restarts ......................................................................
KUDU-1778. Fix LMP mismatch behavior after a replica restarts This fixes an issue seen in a stress test after a cluster restart. Both replicas had an LMP mismatch with the leader, and the tablet was unable to make progress. The issue turned out to be that the followers were returning 0 as their committed index, and the leader then tried to fall back to index 1. That index had already been GCed, and thus the leader was unable to send any operations to the followers. I tested this patch in the same stress test environment and the issue didn't reproduce. This also includes a test which failed without the fix. I looped the new test 500 times and it passed. Change-Id: I8f1332d605f7f846a01923b3ab92f12d73462bba Reviewed-on: http://gerrit.cloudera.org:8080/5309 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy <[email protected]> --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus.cc M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/log_verifier.h M src/kudu/integration-tests/raft_consensus-itest.cc 8 files changed, 155 insertions(+), 34 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8f1332d605f7f846a01923b3ab92f12d73462bba Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]>
