[ 
https://issues.apache.org/jira/browse/IGNITE-26396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-26396:
---------------------------------------
    Description: 
h3. Motivation
After a ticket about acknowledgment batching (IGNITE-25945) was closed, we do 
not guarantee a message will be delivered when the service future is completed. 
So the restarting process can start, but the other side does not receive a 
successful response yet.
{code}
 private void handleResetClusterMessage(ResetClusterMessage message, 
ClusterNode sender, long correlationId) {
     restartExecutor.execute(() -> {
         storage.saveResetClusterMessage(message);

         messagingService.respond(sender, successResponseMessage(), 
correlationId)
                 .thenRunAsync(() -> {
                     if (!thisNodeName.equals(sender.name())) {
                         restarter.initiateRestart();
                     }
                 }, restartExecutor)
                 .whenComplete((res, ex) -> {
                     if (ex != null) {
                         LOG.error("Error when handling a ResetClusterMessage", 
ex);
                     }
                 });
     });
 }
{code}

h3. Definition of done.
The initiator side has to answer that the successful message is received 
explicitly.
And the node should wait for the message before the restart process starts.

  was:
h3. Motivation
After a ticket about acknowledgment batching  was closed, we do not guarantee a 
message will be delivered when the service future is completed. So the 
restarting process can start, but the other side does not receive a successful 
response yet.
{code}
 private void handleResetClusterMessage(ResetClusterMessage message, 
ClusterNode sender, long correlationId) {
     restartExecutor.execute(() -> {
         storage.saveResetClusterMessage(message);

         messagingService.respond(sender, successResponseMessage(), 
correlationId)
                 .thenRunAsync(() -> {
                     if (!thisNodeName.equals(sender.name())) {
                         restarter.initiateRestart();
                     }
                 }, restartExecutor)
                 .whenComplete((res, ex) -> {
                     if (ex != null) {
                         LOG.error("Error when handling a ResetClusterMessage", 
ex);
                     }
                 });
     });
 }
{code}

h3. Definition of done.
The initiator side has to answer that the successful message is received 
explicitly.
And the node should wait for the message before the restart process starts.


> A node restart might not send a message to the initiator.
> ---------------------------------------------------------
>
>                 Key: IGNITE-26396
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26396
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> After a ticket about acknowledgment batching (IGNITE-25945) was closed, we do 
> not guarantee a message will be delivered when the service future is 
> completed. So the restarting process can start, but the other side does not 
> receive a successful response yet.
> {code}
>  private void handleResetClusterMessage(ResetClusterMessage message, 
> ClusterNode sender, long correlationId) {
>      restartExecutor.execute(() -> {
>          storage.saveResetClusterMessage(message);
>          messagingService.respond(sender, successResponseMessage(), 
> correlationId)
>                  .thenRunAsync(() -> {
>                      if (!thisNodeName.equals(sender.name())) {
>                          restarter.initiateRestart();
>                      }
>                  }, restartExecutor)
>                  .whenComplete((res, ex) -> {
>                      if (ex != null) {
>                          LOG.error("Error when handling a 
> ResetClusterMessage", ex);
>                      }
>                  });
>      });
>  }
> {code}
> h3. Definition of done.
> The initiator side has to answer that the successful message is received 
> explicitly.
> And the node should wait for the message before the restart process starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to