[jira] [Commented] (IGNITE-28369) Ignite Service may not be redeployed if several nodes leave the cluster

Mikhail Petrov (Jira) Fri, 08 May 2026 23:43:16 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079635#comment-18079635
 ]


Mikhail Petrov commented on IGNITE-28369:
-----------------------------------------

[~NSAmelchev] Thank you for the review.

> Ignite Service may not be redeployed if several nodes leave the cluster
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-28369
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28369
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Petrov
>            Assignee: Mikhail Petrov
>            Priority: Major
>              Labels: ise
>             Fix For: 2.19
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We need to fix flaky 
> org.apache.ignite.client.ReliabilityTest#testServiceProxyFailover test. 
> See 
> https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4795807857625973920&tab=testDetails
>  for more details.
> The org.apache.ignite.client.ReliabilityTest#testServiceProxyFailover test 
> can be considered as a reproducer to the mentioned problem.
> To increase test failure rate - place U.sleep(10) in the 
> GridNioServer.AbstractNioClientWorker#bodyInternal worker loop.
> Steps that result is described problem:
> 1. Consider a 3-node cluster with a singleton SERVICE deployed on node 1.
> 2. Node 1 leaves the cluster, triggering a distributed service redeployment 
> process.
> 3. The service is reassigned to node 2.
> 3. While the coordinator waits for all nodes to reply with single messages, 
> node 2 leaves the cluster.
> 4. The coordinator receives the event that node 2 has left the cluster and 
> stops waiting for its single message.
> 5. The coordinator combines the received singleton messages into a full 
> message that contains no information about the SERVICE or its topology. And 
> sends it across the cluster.
> 6. Service topology is set as empty on all cluster nodes.
> 7. A second service redeployment process is triggered by the leaving of node 
> 2. However, at this point, we do not attempt to redeploy the SERVICE because 
> the node 2 is not part of the current service topology. Therefore, nothing 
> happens. And the service becomes unavailable.
> Even if we fix step 7 and the service is eventually redeployed, there is a 
> period of time during which the service topology is unknown. Currently, all 
> calls during this period will result in an error. This is unexpected for a 
> user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-28369) Ignite Service may not be redeployed if several nodes leave the cluster

Reply via email to