[ https://issues.apache.org/jira/browse/KAFKA-12851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387433#comment-17387433 ]
Jose Armando Garcia Sancio commented on KAFKA-12851: ---------------------------------------------------- The issue is with the test and not with the Raft implementation. At a high level the test runs with 5 voters until it reaches at least a high watermark of 10. At this point it partition the network so that 3 nodes cannot send request to 2 nodes and vice versa. With the seed {{137014923570865933}} right before the partition the nodes have the following state: {code:java} Node(id=0, hw=14, logEndOffset=25) Node(id=1, hw=10, logEndOffset=14) Node(id=2, hw=10, logEndOffset=14) Node(id=3, hw=10, logEndOffset=22) Node(id=4, hw=10, logEndOffset=14) Node(id=5, hw=10, logEndOffset=14) Node(id=6, hw=6, logEndOffset=18){code} Nodes 5 and 6 are observers and do not participate in quorum or hw values. Notices that two nodes have a HW greater than 20 (0 and 3). The tests now partitions the network so that nodes 0, 1 can send request to each other and nodes 2, 3, 4 can send request to each other. That means that only node 1 needs to reach offset 20 before the election timeout so that the leader 0 can advance the high watermark. {code:java} Node(id=0, hw=22, logEndOffset=34) Node(id=1, hw=18, logEndOffset=22) Node(id=2, hw=14, logEndOffset=18) Node(id=3, hw=10, logEndOffset=25) Node(id=4, hw=14, logEndOffset=18) Node(id=5, hw=18, logEndOffset=25) Node(id=6, hw=18, logEndOffset=29){code} I think that the best way to fix the test, after the partition, is to wait for the high-watermark to reach a value must larger than the LEO before the partition. In the trace above it would be a value much greater than 25. > Flaky Test RaftEventSimulationTest.canMakeProgressIfMajorityIsReachable > ----------------------------------------------------------------------- > > Key: KAFKA-12851 > URL: https://issues.apache.org/jira/browse/KAFKA-12851 > Project: Kafka > Issue Type: Bug > Components: core, kraft > Reporter: A. Sophie Blee-Goldman > Assignee: Jose Armando Garcia Sancio > Priority: Blocker > Labels: kip-500 > Fix For: 3.0.0 > > Attachments: Capture.PNG > > > Failed twice on a [PR > build|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10755/6/testReport/] > h3. Stacktrace > org.opentest4j.AssertionFailedError: expected: <true> but was: <false> at > org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at > org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40) at > org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:35) at > org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:162) at > org.apache.kafka.raft.RaftEventSimulationTest.canMakeProgressIfMajorityIsReachable(RaftEventSimulationTest.java:263) -- This message was sent by Atlassian Jira (v8.3.4#803005)