[jira] [Updated] (IGNITE-11394) Infinite No next node in topology messages during node restart scenario

Alexey Goncharuk (JIRA) Mon, 25 Feb 2019 03:54:11 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Goncharuk updated IGNITE-11394:
--------------------------------------
    Fix Version/s: 2.8

> Infinite No next node in topology messages during node restart scenario
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-11394
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11394
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Goncharuk
>            Assignee: Alexey Goncharuk
>            Priority: Major
>             Fix For: 2.8
>
>
> I observe a situation with the following symptoms during a cycled nodes 
> restart:
>  - A node being joining to the cluster sends join request, receives 
> NodeAddedMessage and awaits NodeAddFinishedMessage
>  - The node receives a metrics update message, the message is in the queue
>  - The whole cluster is being restarted, a new ring is formed
>  - The node re-sends the join request, it is successfully process by the ring
>  - The node added message is received by the joining node
>  - The node detects that it cannot send messages (failed nodes contains all 
> ring remote nodes)
>  - Sine there was already a metrics update message in the queue, the node 
> attempts to re-add the message to the queue. Since the metrics update message 
> is a high priority message, it is added to the head of the queue and the node 
> gets stuck in an infinite loop
> I suggest to drop metrics update message in {{sendMessageAcrossRing}} if we 
> see the {{No next node in topology}} situation.
> Another question is why don't we pass the collection of failed nodes to the 
> {{ring.hasRemoteNodes()}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (IGNITE-11394) Infinite No next node in topology messages during node restart scenario

Reply via email to