Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6299 to look at the new patch set (#7). Change subject: [consensus] add lock coverage for peer queue ...................................................................... [consensus] add lock coverage for peer queue A Peer or PeerQueue would lock without taking into account the presence of a peer calling back after TabletCopy and updating its members, which doesn't abide by external locks (like the queue lock). This would result in races. A PeerMessageQueue operation would lock with the assumption that its access to a peer was uninterrupted by other threads. This assumption is broken in cases where a PeerProxy object has a callback to access a peer (e.g. proxy calls StartTabletCopy with a callback that processes the response and updates the peer) from a separate thread (as is the case with the RpcHeartBeater). A Peer operation like SendNextRequest would lock under this assumption, only to potentially be interrupted by the same callback. This callback was not completely locked and would race with the send. This patch adds lock coverage to these cases, both seen in RaftConsensusITest.TestConfigChangeUnderLoad. TSAN errors and descriptions can be found here: https://github.com/andrwng/kudu/blob/89119693414e50280e33b170ee26de3047e0911e/docs/race_diagnostics/race_descriptions.txt Full TSAN logs can be found here: https://github.com/andrwng/kudu/blob/89119693414e50280e33b170ee26de3047e0911e/docs/race_diagnostics/race_type_1.txt https://github.com/andrwng/kudu/blob/89119693414e50280e33b170ee26de3047e0911e/docs/race_diagnostics/race_type_2.txt Dist-test results before and after: Before: Note that for these builds, status 1 is likely a data race. http://dist-test.cloudera.org/job?job_id=awong.1489029805.26963 http://dist-test.cloudera.org/job?job_id=awong.1489086157.18460 After: http://dist-test.cloudera.org/job?job_id=awong.1489096815.24680 Change-Id: I0e3f9603c471441d9b926cac3377fc6a5a2a1514 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h 3 files changed, 65 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/6299/7 -- To view, visit http://gerrit.cloudera.org:8080/6299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e3f9603c471441d9b926cac3377fc6a5a2a1514 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org>