[ 
https://issues.apache.org/jira/browse/IGNITE-24929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946162#comment-17946162
 ] 

Mirza Aliev edited comment on IGNITE-24929 at 4/21/25 3:47 PM:
---------------------------------------------------------------

I've spend some time investigating this issue and what I've got so far: 

* It is quite hard to reproduce the problem locally (~ 1 fail of out 500 runs )
* I've attached logs from the locally failed run.  
[^localRunWithFailuretestPrimaryReplicaDirectUpdateForExplicitTxn.log] 
* The problem is in the part where we assert the number of blocked 
{{AppendEntriesRequest}} with data. We expect only two messages being blocked 
(one for each follower), but in failed run we've got more than 50 blocked 
messages.
* In the normal run we see that blocked `AppendEntriesRequest` has 
{{committedIndex=5}} attribute, but in failed run blocked 
{{AppendEntriesRequest}} messaged have {{committedIndex=4}}. Also we see that 
those messages have some non-null class HeapByteBuffer, so they are with some 
data.
* There is no spontaneous leader change when test fails.


We need to investigated more and try to realise which data we try to block on 
{{committedIndex=4}}


was (Author: maliev):
I've spend some time investigating this issue and what I've got so far: 

* It is quite hard to reproduce the problem locally (~ 1 fail of out 500 runs )
* I've attached logs from the locally failed run.  
[^localRunWithFailuretestPrimaryReplicaDirectUpdateForExplicitTxn.log] 
* The problem is in the part where we assert the number of blocked 
{{AppendEntriesRequest}} with data. We expect only two messages being blocked 
(one for each follower), but in failed run we've got more than 50 blocked 
messages.
* In the normal run we see that blocked `AppendEntriesRequest` has 
{{committedIndex=5}} attribute, but in failed run blocked 
{{AppendEntriesRequest}} messaged have {{committedIndex=4}}. Also we see that 
those messages have some non-null class HeapByteBuffer, so they are with some 
data.

We need to investigated more and try to realise which data we try to block on 
{{committedIndex=4}}

> ItTxDistributedTestThreeNodesThreeReplicasCollocated.testPrimaryReplicaDirectUpdateForExplicitTxn
>  is flaky
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-24929
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24929
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>         Attachments: 
> localRunWithFailuretestPrimaryReplicaDirectUpdateForExplicitTxn.log
>
>
> {code:java}
> org.opentest4j.AssertionFailedError: Failed to wait for blocked messages ==> 
> expected: <true> but was: <false>  at 
> app//org.apache.ignite.distributed.ItTxDistributedTestThreeNodesThreeReplicas.testPrimaryReplicaDirectUpdateForExplicitTxn(ItTxDistributedTestThreeNodesThreeReplicas.java:97)
>   at java.base@17.0.6/java.lang.reflect.Method.invoke(Method.java:568)  at 
> java.base@17.0.6/java.util.ArrayList.forEach(ArrayList.java:1511)  at 
> java.base@17.0.6/java.util.ArrayList.forEach(ArrayList.java:1511) {code}
> [TC 
> link|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/8983553]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to