[jira] [Commented] (IGNITE-4499) TcpDiscoverySpi is not reliable in some network split scenarios.
[ https://issues.apache.org/jira/browse/IGNITE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828296#comment-15828296 ] Andrey Gura commented on IGNITE-4499: - Fixed. Current solution: Node should be kicked out from topology (forcibly failed). At this moment this valid only for TCP connection, not shmem. {{TcpCommunicationSpi}} fails node in case if it can connect to remote node (server or client) and all retries are failed. Serve node can fail both server or client node. Client nodes can fail only other clients nodes. It is implemented in {{ctreateTcpClient()}} method. {{TcpDiscoveryNodeFailedMessage}} will be handled by {{TcpDiscoverySpi}} in a special manner in case of forcible node fail. All nodes will not handle this message if it isn't verified by coordinator. It allows to avoid of topology crashes in cases, for example, when two nodes try to kick out each other (changes in {{ServerImpl}} class). Client node now can receive {{TcpDiscoveryNodeFailedMessage}} in case of forcible fail. In this case client reconnection will be performed with delay. > TcpDiscoverySpi is not reliable in some network split scenarios. > > > Key: IGNITE-4499 > URL: https://issues.apache.org/jira/browse/IGNITE-4499 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6 >Reporter: Alexei Scherbakov >Assignee: Andrey Gura > Fix For: 2.0 > > > Where is a possible caveat in current discovery implementation using ring of > nodes. > Imagine grid consisting of nodes A B C D > Let them form the ring: > A-B-C-D-A > If network connectivity issues will arise between nodes A-C and B-D > discovery spi will never know it and will continue to assume the topology is > valid. > On other side, TcpCommunicationSpi will try to run transaction on this > topology and never will succeed. > We must drop nodes from topology on communication spi errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (IGNITE-4499) TcpDiscoverySpi is not reliable in some network split scenarios.
[ https://issues.apache.org/jira/browse/IGNITE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780538#comment-15780538 ] Vyacheslav Daradur commented on IGNITE-4499: I think it will decide in IGNITE-4501 > TcpDiscoverySpi is not reliable in some network split scenarios. > > > Key: IGNITE-4499 > URL: https://issues.apache.org/jira/browse/IGNITE-4499 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6 >Reporter: Alexei Scherbakov > Fix For: 2.0 > > > Where is a possible caveat in current discovery implementation using ring of > nodes. > Imagine grid consisting of nodes A B C D > Let them form the ring: > A-B-C-D-A > If network connectivity issues will arise between nodes A-C and B-D > discovery spi will never know it and will continue to assume the topology is > valid. > On other side, TcpCommunicationSpi will try to run transaction on this > topology and never will succeed. > We must drop nodes from topology on communication spi errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (IGNITE-4499) TcpDiscoverySpi is not reliable in some network split scenarios.
[ https://issues.apache.org/jira/browse/IGNITE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780169#comment-15780169 ] Alexei Scherbakov commented on IGNITE-4499: --- Sadly I have no reproducer at the moment. > TcpDiscoverySpi is not reliable in some network split scenarios. > > > Key: IGNITE-4499 > URL: https://issues.apache.org/jira/browse/IGNITE-4499 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 1.6 >Reporter: Alexei Scherbakov > Fix For: 2.0 > > > Where is a possible caveat in current discovery implementation using ring of > nodes. > Imagine grid consisting of nodes A B C D > Let them form the ring: > A-B-C-D-A > If network connectivity issues will arise between nodes A-C and B-D > discovery spi will never know it and will continue to assume the topology is > valid. > On other side, TcpCommunicationSpi will try to run transaction on this > topology and never will succeed. > We must drop nodes from topology on communication spi errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)