[
https://issues.apache.org/jira/browse/IGNITE-24929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946162#comment-17946162
]
Mirza Aliev edited comment on IGNITE-24929 at 4/21/25 3:47 PM:
---------------------------------------------------------------
I've spend some time investigating this issue and what I've got so far:
* It is quite hard to reproduce the problem locally (~ 1 fail of out 500 runs )
* I've attached logs from the locally failed run.
[^localRunWithFailuretestPrimaryReplicaDirectUpdateForExplicitTxn.log]
* The problem is in the part where we assert the number of blocked
{{AppendEntriesRequest}} with data. We expect only two messages being blocked
(one for each follower), but in failed run we've got more than 50 blocked
messages.
* In the normal run we see that blocked `AppendEntriesRequest` has
{{committedIndex=5}} attribute, but in failed run blocked
{{AppendEntriesRequest}} messaged have {{committedIndex=4}}. Also we see that
those messages have some non-null class HeapByteBuffer, so they are with some
data.
* There is no spontaneous leader change when test fails.
We need to investigated more and try to realise which data we try to block on
{{committedIndex=4}}
was (Author: maliev):
I've spend some time investigating this issue and what I've got so far:
* It is quite hard to reproduce the problem locally (~ 1 fail of out 500 runs )
* I've attached logs from the locally failed run.
[^localRunWithFailuretestPrimaryReplicaDirectUpdateForExplicitTxn.log]
* The problem is in the part where we assert the number of blocked
{{AppendEntriesRequest}} with data. We expect only two messages being blocked
(one for each follower), but in failed run we've got more than 50 blocked
messages.
* In the normal run we see that blocked `AppendEntriesRequest` has
{{committedIndex=5}} attribute, but in failed run blocked
{{AppendEntriesRequest}} messaged have {{committedIndex=4}}. Also we see that
those messages have some non-null class HeapByteBuffer, so they are with some
data.
We need to investigated more and try to realise which data we try to block on
{{committedIndex=4}}
> ItTxDistributedTestThreeNodesThreeReplicasCollocated.testPrimaryReplicaDirectUpdateForExplicitTxn
> is flaky
> ----------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-24929
> URL: https://issues.apache.org/jira/browse/IGNITE-24929
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexander Lapin
> Priority: Major
> Labels: ignite-3
> Attachments:
> localRunWithFailuretestPrimaryReplicaDirectUpdateForExplicitTxn.log
>
>
> {code:java}
> org.opentest4j.AssertionFailedError: Failed to wait for blocked messages ==>
> expected: <true> but was: <false> at
> app//org.apache.ignite.distributed.ItTxDistributedTestThreeNodesThreeReplicas.testPrimaryReplicaDirectUpdateForExplicitTxn(ItTxDistributedTestThreeNodesThreeReplicas.java:97)
> at [email protected]/java.lang.reflect.Method.invoke(Method.java:568) at
> [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511) at
> [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511) {code}
> [TC
> link|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/8983553]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)