Hello Mike Percy,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2479

to look at the new patch set (#2).

Change subject: client_failover-itest: fix flakiness with opid mismatches
......................................................................

client_failover-itest: fix flakiness with opid mismatches

This fixes a common source of flakiness, particular in TSAN builds. The issue
was that we were assuming that, if the TestWorkload wrote N batches, that would
correspond exactly to N log operations on the server side. That actually isn't
the case -- there are some interleavings in which the client 'Batcher' can
split a single Flush call into multiple RPCs, and we don't make any strong
guarantees that a Flush is atomic, even though it is almost all the time.

The fix is simple: switch to single-row batches, which can't be split up
into the client.

On an earlier revision of this patch, I was able to run the
DeleteLeaderWhileScanning tests 5000 times in TSAN with only a few failures[1].
I ran 1000 on the latest revision[2]. The remaining failures seem to be an
unrelated data race on RaftConsensus shutdown.

[1] http://dist-test.cloudera.org/job?job_id=todd.1457413245.29963
[2] http://dist-test.cloudera.org/job?job_id=todd.1457464975.21171

Change-Id: Ib3df1b3f5b0903f069a5e7ae3ba2a64c1c52a427
---
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/test_workload.h
2 files changed, 10 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/2479/2
-- 
To view, visit http://gerrit.cloudera.org:8080/2479
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3df1b3f5b0903f069a5e7ae3ba2a64c1c52a427
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to