[
https://issues.apache.org/jira/browse/IGNITE-25539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955826#comment-17955826
]
Sergey Chugunov edited comment on IGNITE-25539 at 6/3/25 7:46 AM:
------------------------------------------------------------------
The same fix is applicable to two other tests:
*{{testPendingMessagesOverflow}}* and *{{testCustomMessageInSingletonCluster}}*
Test failures were caused by a race in test code:
{code:java}
startGrid("listener"); // line 1
sentEnsuredMsgs.clear(); // line 2
receivedEnsuredMsgs.clear(); // line 3{code}
When new node started at line 1 joins an existing cluster, coordinator
generates and sends across the ring *{{CacheAffinityChangeMessage}}* message.
This message is added to {{*sentEnsuredMsgs* }}collection on coordinator
(happens almost immediately) and to *{{receivedEnsuredMsgs}}* collection on
listener node (with some delay as it has to move accross the whole ring).
BUT - both collections are cleared in runner thread without any pause (lines 2
and 3), which creates a race condition: if *{{CacheAffinityChangeMessage}}*
message is delayed a bit more, it will be added to *{{receivedEnsuredMsgs}}*
collection AFTER the collection is cleared and subsequently fail test assertion.
was (Author: sergeychugunov):
The same fix is applicable to two other tests:
*{{testPendingMessagesOverflow}}* and *{{testCustomMessageInSingletonCluster}}*
> TcpDiscoveryPendingMessageDeliveryTest is flaky on TC
> -----------------------------------------------------
>
> Key: IGNITE-25539
> URL: https://issues.apache.org/jira/browse/IGNITE-25539
> Project: Ignite
> Issue Type: Bug
> Reporter: Sergey Chugunov
> Assignee: Sergey Chugunov
> Priority: Major
> Fix For: 2.18
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Test testDeliveryAllFailedMessagesInCorrectOrder is
> [flaky|https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-4230688419866011807&tab=testDetails]
> on TC with high failure rate.
> Failures are reproducible locally with much lower fail rate.
> It seems from logs that discovery ring collapses not in the way the test
> expects it, some investigation is needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)