[ 
https://issues.apache.org/jira/browse/IGNITE-24876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948195#comment-17948195
 ] 

Roman Puchkovskiy commented on IGNITE-24876:
--------------------------------------------

An example of a scenario that caused the reordering:
 # There is an established channel with the corresponding NettySender
 # Thread T1 tries to send message M1; the thread obtains the sender object and 
then it is put on hold by the OS
 # The underlying TCP connection is closed for some reason
 # Activity of other threads initiate another channel creation in the same 
logical connection. Handshake exchanges happen, HandshakeFinishMessage is 
already in the event loop (which is common for both the closed and new 
channels) queue
 # Thread T1 is unfrozen and does sender.send() on the old sender (wrapping the 
closed channel), M1 is put to the event loop's queue
 # T1 is going to send M2; it obtains a future of NettySender which is not 
completed yet (as the handshake has not finished yet), so M2 is put to the 
queue on the sender's future (this queue is different from event loop's queue) 
where messages wait for the sender to become available
 # Event loop's thread N1 handles HandshakeFinishMessage from its queue; this 
finishes the handshake, new NettySender is created and completes the sender 
future
 # As a result, M2's send gets the sender and calls sender.send(); as it's 
still in N1, it bypasses the event loop's queue and writes the message directly 
to the channel
 # Only then, N1 processes M1's send and writes it to the channel

On steps 8 and 9 we get a reordering.

The fix is that on step 8 we always do writes to the channel via event loop's 
queue, even if we are already in the event loop thread.

> ItScaleCubeNetworkMessagingTest.messagesQueuedOnFullyClosedOldChannelGetDeliveredAfterReconnection([2]
>  true) is flaky
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-24876
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24876
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>         Attachments: _Integration_Tests_Module_Network_33162.log.zip
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> java.lang.AssertionError: Expected: <[trailblazer, first, second]> but: was 
> <[trailblazer, second, first]>
> java.lang.AssertionError:
> Expected: <[trailblazer, first, second]>
> but: was <[trailblazer, second, first]>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.ignite.internal.network.scalecube.ItScaleCubeNetworkMessagingTest.messagesQueuedOnFullyClosedOldChannelGetDeliveredAfterReconnection(ItScaleCubeNetworkMessagingTest.java:779)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to