[jira] [Updated] (IGNITE-28337) TcpDiscoveryNodeAddedMessage may be serialized from mutated state in client message worker

Aleksandr Chesnokov (Jira) Thu, 16 Apr 2026 00:33:46 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aleksandr Chesnokov updated IGNITE-28337:
-----------------------------------------
    Description: 
There is a race condition in TCP discovery when a server sends discovery 
messages to Ignite client nodes.

In ServerImpl#sendMessageToClients, most discovery messages are serialized 
before being enqueued to the ClientMessageWorker. However, 
TcpDiscoveryNodeAddedMessage is handled differently: the message object itself 
is placed into the queue, while msgBytes remains null. Later, in 
ClientMessageWorker#writeToSocket, the worker detects msgBytes == null and 
performs serialization in the client worker thread.

This approach is unsafe because TcpDiscoveryNodeAddedMessage is mutable and can 
be modified concurrently by the ring message worker: 
ServerImpl#prepareNodeAddedMessage edits fields such as topology, topology 
history, and pending messages.

As a result, TestMetricUpdateFailure#test is flaky and contains errors such as:
 * Invalid message type
 * ClassCastException (e.g., TcpDiscoveryCheckFailedMessage cannot be cast to 
DiscoveryDataPacket)
 * Client join timeout

See 
[https://ci2.ignite.apache.org/test/3305509330615033947?currentProjectId=IgniteTests24Java8&branch=&expandedTest=build%3A%28id%3A8949981%29%2Cid%3A2000000291]

The test reproduces it because it starts one server node and 20 client nodes 
concurrently, what is a good stress situation for this part of code

  was:
See ServerImpl.RingMessageWorker#sendMessageToClients

As a result, TestMetricUpdateFailure#test is flaky

See 
[https://ci2.ignite.apache.org/test/3305509330615033947?currentProjectId=IgniteTests24Java8&branch=&expandedTest=build%3A%28id%3A8949981%29%2Cid%3A2000000291]


> TcpDiscoveryNodeAddedMessage may be serialized from mutated state in client 
> message worker
> ------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-28337
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28337
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Aleksandr Chesnokov
>            Assignee: Aleksandr Chesnokov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> There is a race condition in TCP discovery when a server sends discovery 
> messages to Ignite client nodes.
> In ServerImpl#sendMessageToClients, most discovery messages are serialized 
> before being enqueued to the ClientMessageWorker. However, 
> TcpDiscoveryNodeAddedMessage is handled differently: the message object 
> itself is placed into the queue, while msgBytes remains null. Later, in 
> ClientMessageWorker#writeToSocket, the worker detects msgBytes == null and 
> performs serialization in the client worker thread.
> This approach is unsafe because TcpDiscoveryNodeAddedMessage is mutable and 
> can be modified concurrently by the ring message worker: 
> ServerImpl#prepareNodeAddedMessage edits fields such as topology, topology 
> history, and pending messages.
> As a result, TestMetricUpdateFailure#test is flaky and contains errors such 
> as:
>  * Invalid message type
>  * ClassCastException (e.g., TcpDiscoveryCheckFailedMessage cannot be cast to 
> DiscoveryDataPacket)
>  * Client join timeout
> See 
> [https://ci2.ignite.apache.org/test/3305509330615033947?currentProjectId=IgniteTests24Java8&branch=&expandedTest=build%3A%28id%3A8949981%29%2Cid%3A2000000291]
> The test reproduces it because it starts one server node and 20 client nodes 
> concurrently, what is a good stress situation for this part of code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-28337) TcpDiscoveryNodeAddedMessage may be serialized from mutated state in client message worker

Reply via email to