[
https://issues.apache.org/jira/browse/IGNITE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Steshin updated IGNITE-27746:
--------------------------------------
Description:
Consider:
* The Multy-DC feature is on.
* A corner node from DC1 can't send a message to it's next node in DC2.
* DC2 is unavailable.
* No node of DC1 can connect to any node in DC2.
To prevent sequential nodes failure in DC1 we need to extend the connection
recovery mechanics. We need to know whether DC2 is completely unavailable. If
so, we switch to DC/brain split but keep nodes of DC1 online. To achive this we
might ping DC2's nodes from the edge node while it does the normal connection
recovery under the same connection recovery timeout. If the recovery fails and
no ping to DC2 is success, we consider DC1 to work separatelly from DC2.
was:
Consider:
* The Multy-DC feature is on.
* A corner node from DC1 can't send a message to it's next node in DC2.
* DC2 is unavailable.
* No node of DC1 can connect.
To prevent sequential nodes failure in DC1 we need to extend the connection
recovery mechanics. We need to know whether DC2 is completely unavailable. If
so, we switch to DC/brain split but keep nodes of DC1 online. To achive this we
might ping DC2's nodes from the edge node while it does the normal connection
recovery under the same connection recovery timeout. If the recovery fails and
no ping to DC2 is success, we consider DC1 to work separatelly from DC2.
> MDC. Implement parallel ping of DC2's nodes with the connection recovery.
> -------------------------------------------------------------------------
>
> Key: IGNITE-27746
> URL: https://issues.apache.org/jira/browse/IGNITE-27746
> Project: Ignite
> Issue Type: Sub-task
> Reporter: Vladimir Steshin
> Priority: Major
> Labels: ise
>
> Consider:
> * The Multy-DC feature is on.
> * A corner node from DC1 can't send a message to it's next node in DC2.
> * DC2 is unavailable.
> * No node of DC1 can connect to any node in DC2.
> To prevent sequential nodes failure in DC1 we need to extend the connection
> recovery mechanics. We need to know whether DC2 is completely unavailable. If
> so, we switch to DC/brain split but keep nodes of DC1 online. To achive this
> we might ping DC2's nodes from the edge node while it does the normal
> connection recovery under the same connection recovery timeout. If the
> recovery fails and no ping to DC2 is success, we consider DC1 to work
> separatelly from DC2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)