[ 
https://issues.apache.org/jira/browse/IGNITE-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Steshin updated IGNITE-13980:
--------------------------------------
    Description: 
Suggestion: remove duplicated ‘ping’, make the code simpler.

To ensure some node isn't failed TcpDiscoverySpi has robust ping 
(TcpDiscoveryConnectionCheckMessage) and the backward connection check. But 
there is also status check message (TcpDiscoveryStatusCheckMessage) which looks 
outdated. This message was introduced with first versions of the discovery when 
the cluster stability and message delivery were under developing.

Currently, TcpDiscoveryStatusCheckMessage is actually launched only at cluster 
start sometimes. And doesn't happen later due to the ping. The ping updates 
time of the message received which is the reason not to raise the status check.

It is possible that node loses all incoming connection but keeps connection to 
next node. In this case the node gets removed from the ring by its follower. 
But cannot recognize the failure because it still successfully send message to 
next node. Instead of complex processing of TcpDiscoveryStatusCheckMessage, it 
iseems enough to answer on message 'OK, but you are not in the ring'. Every 
other node sees failure of malfunction node and can notify about it in the 
message response.

The ticket has been additionally verified with the integration discovery test: 
https://github.com/apache/ignite/pull/8716

We can keep TcpDiscoveryStatusCheckMessage for backward compatibility with 
older versions of Ignite. The subtask (IGNITE-14053) suggest to completely 
remove TcpDiscoveryStatusCheckMessage.


  was:
Suggestion: remove duplicated ‘ping’, make the code simpler.

To ensure some node isn't failed TcpDiscoverySpi has robust ping 
(TcpDiscoveryConnectionCheckMessage) and the backward connection check. But 
there is also status check message (TcpDiscoveryStatusCheckMessage) which looks 
outdated. This message was introduced with first versions of the discovery when 
the cluster stability and message delivery were under developing.

Currently, TcpDiscoveryStatusCheckMessage is actually launched only at cluster 
start sometimes. And doesn't happen later due to the ping. The ping updates 
time of the message received which is the reason not to raise the status check.

It is possible that node loses all incoming connection but keeps connection to 
next node. In this case the node gets removed from the ring by its follower. 
But cannot recognize the failure because it still successfully send message to 
next node. Instead of complex processing of TcpDiscoveryStatusCheckMessage, it 
iseems enough to answer on message 'OK, but you are not in the ring'. Every 
other node sees failure of malfunction node and can notify about it in the 
message response.

We can keep TcpDiscoveryStatusCheckMessage for backward compatibility with 
older versions of Ignite. The subtask (IGNITE-14053) suggest to completely 
remove TcpDiscoveryStatusCheckMessage.



> Remove duplicated ping: processing and raising StatusCheckMessage.
> ------------------------------------------------------------------
>
>                 Key: IGNITE-13980
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13980
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Minor
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Suggestion: remove duplicated ‘ping’, make the code simpler.
> To ensure some node isn't failed TcpDiscoverySpi has robust ping 
> (TcpDiscoveryConnectionCheckMessage) and the backward connection check. But 
> there is also status check message (TcpDiscoveryStatusCheckMessage) which 
> looks outdated. This message was introduced with first versions of the 
> discovery when the cluster stability and message delivery were under 
> developing.
> Currently, TcpDiscoveryStatusCheckMessage is actually launched only at 
> cluster start sometimes. And doesn't happen later due to the ping. The ping 
> updates time of the message received which is the reason not to raise the 
> status check.
> It is possible that node loses all incoming connection but keeps connection 
> to next node. In this case the node gets removed from the ring by its 
> follower. But cannot recognize the failure because it still successfully send 
> message to next node. Instead of complex processing of 
> TcpDiscoveryStatusCheckMessage, it iseems enough to answer on message 'OK, 
> but you are not in the ring'. Every other node sees failure of malfunction 
> node and can notify about it in the message response.
> The ticket has been additionally verified with the integration discovery 
> test: https://github.com/apache/ignite/pull/8716
> We can keep TcpDiscoveryStatusCheckMessage for backward compatibility with 
> older versions of Ignite. The subtask (IGNITE-14053) suggest to completely 
> remove TcpDiscoveryStatusCheckMessage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to