Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6299

to look at the new patch set (#7).

Change subject: [consensus] add lock coverage for peer queue
......................................................................

[consensus] add lock coverage for peer queue

A Peer or PeerQueue would lock without taking into account the
presence of a peer calling back after TabletCopy and updating its
members, which doesn't abide by external locks (like the queue lock).
This would result in races.

A PeerMessageQueue operation would lock with the assumption that its
access to a peer was uninterrupted by other threads. This assumption
is broken in cases where a PeerProxy object has a callback to access a
peer (e.g. proxy calls StartTabletCopy with a callback that processes
the response and updates the peer) from a separate thread (as is the
case with the RpcHeartBeater).

A Peer operation like SendNextRequest would lock under this assumption,
only to potentially be interrupted by the same callback. This callback
was not completely locked and would race with the send.

This patch adds lock coverage to these cases, both seen in
RaftConsensusITest.TestConfigChangeUnderLoad.

TSAN errors and descriptions can be found here:
https://github.com/andrwng/kudu/blob/89119693414e50280e33b170ee26de3047e0911e/docs/race_diagnostics/race_descriptions.txt

Full TSAN logs can be found here:
https://github.com/andrwng/kudu/blob/89119693414e50280e33b170ee26de3047e0911e/docs/race_diagnostics/race_type_1.txt
https://github.com/andrwng/kudu/blob/89119693414e50280e33b170ee26de3047e0911e/docs/race_diagnostics/race_type_2.txt

Dist-test results before and after:
Before: Note that for these builds, status 1 is likely a data race.
http://dist-test.cloudera.org/job?job_id=awong.1489029805.26963
http://dist-test.cloudera.org/job?job_id=awong.1489086157.18460

After:
http://dist-test.cloudera.org/job?job_id=awong.1489096815.24680

Change-Id: I0e3f9603c471441d9b926cac3377fc6a5a2a1514
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
3 files changed, 65 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/6299/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e3f9603c471441d9b926cac3377fc6a5a2a1514
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to