[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster
[ https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772892#comment-16772892 ] Dmitriy Govorukhin commented on IGNITE-11348: - [~sergey-chugunov] LGTM, merged to master. > Ping node procedure may fail when another node leaves the cluster > - > > Key: IGNITE-11348 > URL: https://issues.apache.org/jira/browse/IGNITE-11348 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Sergey Chugunov >Priority: Critical > Fix For: 2.8 > > > Additional pinging of node on join implemented in IGNITE-5569 may incorrectly > fail leading to shutting down joining node. > The reason for this is that if another node from the same host bound to the > same discovery port as joining node has left the cluster right before joining > node, socket used for pinging gets closed. > This leads to the situation when pinging node considers joining node as > "unreachable" and fails it with JOIN_IMPOSSIBLE error code. > Workaround: simply start again node failed on join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster
[ https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772853#comment-16772853 ] Ignite TC Bot commented on IGNITE-11348: {panel:title=-- Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3116051buildTypeId=IgniteTests24Java8_RunAll] > Ping node procedure may fail when another node leaves the cluster > - > > Key: IGNITE-11348 > URL: https://issues.apache.org/jira/browse/IGNITE-11348 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Sergey Chugunov >Priority: Critical > Fix For: 2.8 > > > Additional pinging of node on join implemented in IGNITE-5569 may incorrectly > fail leading to shutting down joining node. > The reason for this is that if another node from the same host bound to the > same discovery port as joining node has left the cluster right before joining > node, socket used for pinging gets closed. > This leads to the situation when pinging node considers joining node as > "unreachable" and fails it with JOIN_IMPOSSIBLE error code. > Workaround: simply start again node failed on join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster
[ https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772734#comment-16772734 ] Sergey Chugunov commented on IGNITE-11348: -- [~dpavlov], the whole sequence of events leading to the issue looks like as following: # _leaving node_ sitting on a *host0:port0* disco address leaves the cluster (address becomes free); # _new node_ binds to the same *host0:port0* address and sends join request; # _old node_ receives join request and starts pinging _new node_; # NODE_LEFT event for _leaving node_ arrives to _old node_; as part of handling of NODE_LEFT socket for ongoing ping is closed (incorrectly as this ping has nothing to do with _leaving node_) To avoid this situation I add nodeID to ping future and check it before closing socket on NODE_LEFT. The ID enables to distinguish ping request to _new node_ despite of _new node_ and _leaving node_ have the same disco address. > Ping node procedure may fail when another node leaves the cluster > - > > Key: IGNITE-11348 > URL: https://issues.apache.org/jira/browse/IGNITE-11348 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Sergey Chugunov >Priority: Critical > Fix For: 2.8 > > > Additional pinging of node on join implemented in IGNITE-5569 may incorrectly > fail leading to shutting down joining node. > The reason for this is that if another node from the same host bound to the > same discovery port as joining node has left the cluster right before joining > node, socket used for pinging gets closed. > This leads to the situation when pinging node considers joining node as > "unreachable" and fails it with JOIN_IMPOSSIBLE error code. > Workaround: simply start again node failed on join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster
[ https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772723#comment-16772723 ] Dmitriy Pavlov commented on IGNITE-11348: - I will merge if peer review passes. I've tried to take a brief look but didn't understand why these changes fix the problem. > Ping node procedure may fail when another node leaves the cluster > - > > Key: IGNITE-11348 > URL: https://issues.apache.org/jira/browse/IGNITE-11348 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Sergey Chugunov >Priority: Critical > Fix For: 2.8 > > > Additional pinging of node on join implemented in IGNITE-5569 may incorrectly > fail leading to shutting down joining node. > The reason for this is that if another node from the same host bound to the > same discovery port as joining node has left the cluster right before joining > node, socket used for pinging gets closed. > This leads to the situation when pinging node considers joining node as > "unreachable" and fails it with JOIN_IMPOSSIBLE error code. > Workaround: simply start again node failed on join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster
[ https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771916#comment-16771916 ] Ignite TC Bot commented on IGNITE-11348: {panel:title=-- Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3116051buildTypeId=IgniteTests24Java8_RunAll] > Ping node procedure may fail when another node leaves the cluster > - > > Key: IGNITE-11348 > URL: https://issues.apache.org/jira/browse/IGNITE-11348 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Chugunov >Assignee: Sergey Chugunov >Priority: Critical > Fix For: 2.8 > > > Additional pinging of node on join implemented in IGNITE-5569 may incorrectly > fail leading to shutting down joining node. > The reason for this is that if another node from the same host bound to the > same discovery port as joining node has left the cluster right before joining > node, socket used for pinging gets closed. > This leads to the situation when pinging node considers joining node as > "unreachable" and fails it with JOIN_IMPOSSIBLE error code. > Workaround: simply start again node failed on join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)