[ https://issues.apache.org/jira/browse/IGNITE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Goncharuk updated IGNITE-11394: -------------------------------------- Fix Version/s: 2.8 > Infinite No next node in topology messages during node restart scenario > ----------------------------------------------------------------------- > > Key: IGNITE-11394 > URL: https://issues.apache.org/jira/browse/IGNITE-11394 > Project: Ignite > Issue Type: Improvement > Reporter: Alexey Goncharuk > Assignee: Alexey Goncharuk > Priority: Major > Fix For: 2.8 > > > I observe a situation with the following symptoms during a cycled nodes > restart: > - A node being joining to the cluster sends join request, receives > NodeAddedMessage and awaits NodeAddFinishedMessage > - The node receives a metrics update message, the message is in the queue > - The whole cluster is being restarted, a new ring is formed > - The node re-sends the join request, it is successfully process by the ring > - The node added message is received by the joining node > - The node detects that it cannot send messages (failed nodes contains all > ring remote nodes) > - Sine there was already a metrics update message in the queue, the node > attempts to re-add the message to the queue. Since the metrics update message > is a high priority message, it is added to the head of the queue and the node > gets stuck in an infinite loop > I suggest to drop metrics update message in {{sendMessageAcrossRing}} if we > see the {{No next node in topology}} situation. > Another question is why don't we pass the collection of failed nodes to the > {{ring.hasRemoteNodes()}} method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)