Todd Lipcon has submitted this change and it was merged. Change subject: client_failover-itest: fix flakiness with opid mismatches ......................................................................
client_failover-itest: fix flakiness with opid mismatches This fixes a common source of flakiness, particular in TSAN builds. The issue was that we were assuming that, if the TestWorkload wrote N batches, that would correspond exactly to N log operations on the server side. That actually isn't the case -- there are some interleavings in which the client 'Batcher' can split a single Flush call into multiple RPCs, and we don't make any strong guarantees that a Flush is atomic, even though it is almost all the time. The fix is simple: switch to single-row batches, which can't be split up into the client. On an earlier revision of this patch, I was able to run the DeleteLeaderWhileScanning tests 5000 times in TSAN with only a few failures[1]. I ran 1000 on the latest revision[2]. The remaining failures seem to be an unrelated data race on RaftConsensus shutdown. [1] http://dist-test.cloudera.org/job?job_id=todd.1457413245.29963 [2] http://dist-test.cloudera.org/job?job_id=todd.1457464975.21171 Change-Id: Ib3df1b3f5b0903f069a5e7ae3ba2a64c1c52a427 Reviewed-on: http://gerrit.cloudera.org:8080/2479 Reviewed-by: Mike Percy <[email protected]> Tested-by: Todd Lipcon <[email protected]> --- M src/kudu/integration-tests/client_failover-itest.cc M src/kudu/integration-tests/test_workload.h 2 files changed, 10 insertions(+), 0 deletions(-) Approvals: Mike Percy: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/2479 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib3df1b3f5b0903f069a5e7ae3ba2a64c1c52a427 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]>
