[
https://issues.apache.org/jira/browse/IGNITE-24876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948195#comment-17948195
]
Roman Puchkovskiy commented on IGNITE-24876:
--------------------------------------------
An example of a scenario that caused the reordering:
# There is an established channel with the corresponding NettySender
# Thread T1 tries to send message M1; the thread obtains the sender object and
then it is put on hold by the OS
# The underlying TCP connection is closed for some reason
# Activity of other threads initiate another channel creation in the same
logical connection. Handshake exchanges happen, HandshakeFinishMessage is
already in the event loop (which is common for both the closed and new
channels) queue
# Thread T1 is unfrozen and does sender.send() on the old sender (wrapping the
closed channel), M1 is put to the event loop's queue
# T1 is going to send M2; it obtains a future of NettySender which is not
completed yet (as the handshake has not finished yet), so M2 is put to the
queue on the sender's future (this queue is different from event loop's queue)
where messages wait for the sender to become available
# Event loop's thread N1 handles HandshakeFinishMessage from its queue; this
finishes the handshake, new NettySender is created and completes the sender
future
# As a result, M2's send gets the sender and calls sender.send(); as it's
still in N1, it bypasses the event loop's queue and writes the message directly
to the channel
# Only then, N1 processes M1's send and writes it to the channel
On steps 8 and 9 we get a reordering.
The fix is that on step 8 we always do writes to the channel via event loop's
queue, even if we are already in the event loop thread.
> ItScaleCubeNetworkMessagingTest.messagesQueuedOnFullyClosedOldChannelGetDeliveredAfterReconnection([2]
> true) is flaky
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-24876
> URL: https://issues.apache.org/jira/browse/IGNITE-24876
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
> Attachments: _Integration_Tests_Module_Network_33162.log.zip
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> java.lang.AssertionError: Expected: <[trailblazer, first, second]> but: was
> <[trailblazer, second, first]>
> java.lang.AssertionError:
> Expected: <[trailblazer, first, second]>
> but: was <[trailblazer, second, first]>
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
> at
> org.apache.ignite.internal.network.scalecube.ItScaleCubeNetworkMessagingTest.messagesQueuedOnFullyClosedOldChannelGetDeliveredAfterReconnection(ItScaleCubeNetworkMessagingTest.java:779)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)