[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster

2019-02-20 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772892#comment-16772892
 ] 

Dmitriy Govorukhin commented on IGNITE-11348:
-

[~sergey-chugunov] LGTM, merged to master.

> Ping node procedure may fail when another node leaves the cluster
> -
>
> Key: IGNITE-11348
> URL: https://issues.apache.org/jira/browse/IGNITE-11348
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Critical
> Fix For: 2.8
>
>
> Additional pinging of node on join implemented in IGNITE-5569 may incorrectly 
> fail leading to shutting down joining node.
> The reason for this is that if another node from the same host bound to the 
> same discovery port as joining node has left the cluster right before joining 
> node, socket used for pinging gets closed.
> This leads to the situation when pinging node considers joining node as 
> "unreachable" and fails it with JOIN_IMPOSSIBLE error code.
> Workaround: simply start again node failed on join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster

2019-02-20 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772853#comment-16772853
 ] 

Ignite TC Bot commented on IGNITE-11348:


{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3116051buildTypeId=IgniteTests24Java8_RunAll]

> Ping node procedure may fail when another node leaves the cluster
> -
>
> Key: IGNITE-11348
> URL: https://issues.apache.org/jira/browse/IGNITE-11348
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Critical
> Fix For: 2.8
>
>
> Additional pinging of node on join implemented in IGNITE-5569 may incorrectly 
> fail leading to shutting down joining node.
> The reason for this is that if another node from the same host bound to the 
> same discovery port as joining node has left the cluster right before joining 
> node, socket used for pinging gets closed.
> This leads to the situation when pinging node considers joining node as 
> "unreachable" and fails it with JOIN_IMPOSSIBLE error code.
> Workaround: simply start again node failed on join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster

2019-02-19 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772734#comment-16772734
 ] 

Sergey Chugunov commented on IGNITE-11348:
--

[~dpavlov], the whole sequence of events leading to the issue looks like as 
following:
# _leaving node_ sitting on a *host0:port0* disco address leaves the cluster 
(address becomes free);
# _new node_ binds to the same *host0:port0* address and sends join request;
# _old node_ receives join request and starts pinging _new node_;
# NODE_LEFT event for _leaving node_ arrives to _old node_; as part of handling 
of NODE_LEFT socket for ongoing ping is closed (incorrectly as this ping has 
nothing to do with _leaving node_)

To avoid this situation I add nodeID to ping future and check it before closing 
socket on NODE_LEFT. The ID enables to distinguish ping request to _new node_ 
despite of _new node_ and _leaving node_ have the same disco address.

> Ping node procedure may fail when another node leaves the cluster
> -
>
> Key: IGNITE-11348
> URL: https://issues.apache.org/jira/browse/IGNITE-11348
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Critical
> Fix For: 2.8
>
>
> Additional pinging of node on join implemented in IGNITE-5569 may incorrectly 
> fail leading to shutting down joining node.
> The reason for this is that if another node from the same host bound to the 
> same discovery port as joining node has left the cluster right before joining 
> node, socket used for pinging gets closed.
> This leads to the situation when pinging node considers joining node as 
> "unreachable" and fails it with JOIN_IMPOSSIBLE error code.
> Workaround: simply start again node failed on join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster

2019-02-19 Thread Dmitriy Pavlov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772723#comment-16772723
 ] 

Dmitriy Pavlov commented on IGNITE-11348:
-

I will merge if peer review passes.

I've tried to take a brief look but didn't understand why these changes fix the 
problem.

> Ping node procedure may fail when another node leaves the cluster
> -
>
> Key: IGNITE-11348
> URL: https://issues.apache.org/jira/browse/IGNITE-11348
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Critical
> Fix For: 2.8
>
>
> Additional pinging of node on join implemented in IGNITE-5569 may incorrectly 
> fail leading to shutting down joining node.
> The reason for this is that if another node from the same host bound to the 
> same discovery port as joining node has left the cluster right before joining 
> node, socket used for pinging gets closed.
> This leads to the situation when pinging node considers joining node as 
> "unreachable" and fails it with JOIN_IMPOSSIBLE error code.
> Workaround: simply start again node failed on join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11348) Ping node procedure may fail when another node leaves the cluster

2019-02-19 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771916#comment-16771916
 ] 

Ignite TC Bot commented on IGNITE-11348:


{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3116051buildTypeId=IgniteTests24Java8_RunAll]

> Ping node procedure may fail when another node leaves the cluster
> -
>
> Key: IGNITE-11348
> URL: https://issues.apache.org/jira/browse/IGNITE-11348
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Critical
> Fix For: 2.8
>
>
> Additional pinging of node on join implemented in IGNITE-5569 may incorrectly 
> fail leading to shutting down joining node.
> The reason for this is that if another node from the same host bound to the 
> same discovery port as joining node has left the cluster right before joining 
> node, socket used for pinging gets closed.
> This leads to the situation when pinging node considers joining node as 
> "unreachable" and fails it with JOIN_IMPOSSIBLE error code.
> Workaround: simply start again node failed on join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)