[ 
https://issues.apache.org/jira/browse/IGNITE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikita Amelchev reassigned IGNITE-26986:
----------------------------------------

    Assignee: Nikita Amelchev

> Multi-datacenter awarness for connection recovery mechanism
> -----------------------------------------------------------
>
>                 Key: IGNITE-26986
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26986
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Sergey Chugunov
>            Assignee: Nikita Amelchev
>            Priority: Major
>              Labels: IEP-140, ise
>             Fix For: 2.18
>
>
> Connection recovery mechanism developed in IGNITE-7163 improves topology 
> resilience against brief network instability. However it could cause the 
> whole cluster to go down if a cross-DC network partitioning happens in a 
> multi-datacenter environment.
> This is because connection recovery forces nodes to segment from topology 
> when they cannot restore connection to the next node in a specified timeout. 
> And if a node sits at the edge of its datacenter, and several of its next 
> nodes are in the remote DC, then all attempts of the edge node to find an 
> alive next will fail because of the partitioning. And if connection recovery 
> timeout isn't big enough, the edge node will consider itself as segmented and 
> stop.
> Then the previous node of a newly failed one becomes an edge node, and the 
> process repeats.
> In this case connection recovery mechanism will force the whole cluster to 
> shutdown instead of improving stability.
> Thereby it should be aware on multi-datacenter envorinments and tweak its 
> behavior accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to