[ 
https://issues.apache.org/jira/browse/KAFKA-12851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387433#comment-17387433
 ] 

Jose Armando Garcia Sancio commented on KAFKA-12851:
----------------------------------------------------

The issue is with the test and not with the Raft implementation. At a high 
level the test runs with 5 voters until it reaches at least a high watermark of 
10. At this point it partition the network so that 3 nodes cannot send request 
to 2 nodes and vice versa. With the seed {{137014923570865933}} right before 
the partition the nodes have the following state:
{code:java}
Node(id=0, hw=14, logEndOffset=25)
Node(id=1, hw=10, logEndOffset=14)
Node(id=2, hw=10, logEndOffset=14)
Node(id=3, hw=10, logEndOffset=22)
Node(id=4, hw=10, logEndOffset=14)
Node(id=5, hw=10, logEndOffset=14)
Node(id=6, hw=6, logEndOffset=18){code}
Nodes 5 and 6 are observers and do not participate in quorum or hw values. 
Notices that two nodes have a HW greater than 20 (0 and 3).

The tests now partitions the network so that nodes 0, 1 can send request to 
each other and nodes 2, 3, 4 can send request to each other. That means that 
only node 1 needs to reach offset 20 before the election timeout so that the 
leader 0 can advance the high watermark.
{code:java}
Node(id=0, hw=22, logEndOffset=34)
Node(id=1, hw=18, logEndOffset=22)
Node(id=2, hw=14, logEndOffset=18)
Node(id=3, hw=10, logEndOffset=25)
Node(id=4, hw=14, logEndOffset=18)
Node(id=5, hw=18, logEndOffset=25)
Node(id=6, hw=18, logEndOffset=29){code}
 

I think that the best way to fix the test, after the partition, is to wait for 
the high-watermark to reach a value must larger than the LEO before the 
partition. In the trace above it would be a value much greater than 25.

> Flaky Test RaftEventSimulationTest.canMakeProgressIfMajorityIsReachable
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-12851
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12851
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, kraft
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: Jose Armando Garcia Sancio
>            Priority: Blocker
>              Labels: kip-500
>             Fix For: 3.0.0
>
>         Attachments: Capture.PNG
>
>
> Failed twice on a [PR 
> build|https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-10755/6/testReport/]
> h3. Stacktrace
> org.opentest4j.AssertionFailedError: expected: <true> but was: <false> at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at 
> org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40) at 
> org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:35) at 
> org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:162) at 
> org.apache.kafka.raft.RaftEventSimulationTest.canMakeProgressIfMajorityIsReachable(RaftEventSimulationTest.java:263)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to