[
https://issues.apache.org/jira/browse/KUDU-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938266#comment-16938266
]
Adar Dembo commented on KUDU-2958:
--
Good eye; I agree that the second SleepFor was probably keeping the test
passing.
That said, I'm not so sure why. With one tserver down, the second round of row
insertion must by necessity update both remaining replicas, so by the time we
call CountRowsFromClient, all live replicas should have all 200 rows. Maybe
it's sufficient for the FOLLOWER replica to make the rows durable in its WAL
and not necessarily apply them to the MRS, and then we choose to scan the
FOLLOWER? That'd explain the failure.
Anyway, the purpose of the test appears to be to verify that we can write after
killing the leader, so a LEADER_ONLY scan following the second batch of inserts
should be fine.
> ClientTest.TestReplicatedTabletWritesWithLeaderElection is flaky
>
>
> Key: KUDU-2958
> URL: https://issues.apache.org/jira/browse/KUDU-2958
> Project: Kudu
> Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Alexey Serbin
>Priority: Major
> Attachments: client-test.5.txt.xz
>
>
> The {{TestReplicatedTabletWritesWithLeaderElection}} of the {{client-test}}
> is flaky. Time to time in ASAN build configuration it fails with the
> following error:
> {noformat}
> I0924 20:26:19.869351 14037 client-test.cc:4304] Counting rows...
>
> src/kudu/client/client-test.cc:4308: Failure
> Expected: 2 * kNumRowsToWrite
>
> Which is: 200
>
> To be equal to: CountRowsFromClient(table.get(), KuduClient::FIRST_REPLICA,
> KuduScanner::READ_LATEST, kNoBound, kNoBound)
> Which is: 100
> {noformat}
> It seems there is implicit assumption in the test about fast propagation of
> Raft transactions to follower replicas.
> I attached the full log of the failed tests scenario.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)