[ 
https://issues.apache.org/jira/browse/GEODE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9680:
--------------------------------
    Summary: Newly Started/Restarted Locators are Susceptible to Split-Brains  
(was: Newly Started Locators are Susceptible to Split-Brains)

> Newly Started/Restarted Locators are Susceptible to Split-Brains
> ----------------------------------------------------------------
>
>                 Key: GEODE-9680
>                 URL: https://issues.apache.org/jira/browse/GEODE-9680
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.15.0
>            Reporter: Bill Burcham
>            Priority: Major
>              Labels: needsTriage
>
> Geode is built on the assumption that views progress linearly in a sequence. 
> If that sequence ever forks into two or more parallel lines then we have a 
> "split brain".
> In a split brain condition, each of the parallel views are independent. It's 
> as if you have more than one system running concurrently. It's possible e.g. 
> for some clients to connect to members of one view and other clients to 
> connect to members of another view. Updates to members in one view are not 
> seen by members of a parallel view.
> Geode views are produced by a coordinator. As long as only a single 
> coordinator is running, there is no possibility of a split brain. Split brain 
> arises when more than one coordinator is producing views at the same time.
> Each Geode member (peer) is started with the {{locators}} configuration 
> parameter. That parameter specifies locator(s) to use to find the (already 
> running!) coordinator (member) to join with.
> When a locator (member) starts, it goes through this sequence to find the 
> coordinator:
> # it first tries to find the coordinator through one of the (other) 
> configured locators
> # if it can't contact any of those, it tries contacting non-locator (cache 
> server) members it has retrieved from the "view presistence" ({{.dat}}) file
> If it hasn't found a coordinator to join with, then the locator may _become_ 
> a coordinator.
> Sometimes this is ok. If no other coordinator is currently running then this 
> behavior is fine. An example is when an [administrator is starting up a brand 
> new 
> cluster|https://geode.apache.org/docs/guide/114/configuring/running/running_the_locator.html].
>  In that case we want the very first locator we start to become the 
> coordinator.
> But there are a number of situations where there may already be another 
> coordinator running but it cannot be reached:
> * if the administrator/operator is starting up a brand new cluster with 
> multiple locators and…
> ** maybe Geode is running in a managed environment like Kubernetes and the 
> locators hostnames are not (yet) resolvable in DNS
> ** maybe there is a network partition between the starting locators so they 
> can't communicate
> ** maybe the existing locators or coordinator are running very slowly or the 
> network is degraded. This is effectively the same as the network partition 
> just mentioned
> * if a cluster is already running and the administrator/operator wants to 
> scale it up by starting/adding a new locator Geode is susceptible to that 
> same network partition issue
> * if a cluster is already running and the administrator/operator needs to 
> restart a locator, e.g. for a rolling upgrade, if none of the locators in the 
> {{locators}} configuration parameter are reachable (maybe they are not 
> running, or maybe there is a network partition) and…
> ** if the "view persistence" {{.dat}} file is missing or deleted
> ** or if the current set of running Geode members has evolved so far that the 
> coordinates (host+port) in the {{.dat}} file are completely out of date
> In each of those cases, the newly starting locator will become a coordinator 
> and will start producing views. Now we'll have the old coordinator producing 
> views at the same time as the new one.
> *When this ticket is complete*, Geode will offer a locator startup mode (via 
> TBD configuration parameter, {{LocatorLauncher}} startup parameter) that 
> prevents that locator from becoming a coordinator. With that mode, it will be 
> possible for an administrator to avoid many of the problematic scenarios 
> mentioned above, while retaining the ability to start a first locator which 
> is allowed to become a coordinator.
> For purposes of discussion we'll call the startup mode that allows the 
> locator to become a coordinator "seed" mode, and we'll call the new startup 
> mode that prevents the locator from becoming a coordinator before first 
> joining, "join-only" mode.
> To start a brand new cluster, the first locator is started in "seed" mode. 
> After that all subsequent locators are started in "join only" mode. If 
> network partitions occur, the newly started nodes will exit with a failure 
> status, but will not become coordinators.
> To add a locator to a running cluster, it will be started in "join only" 
> mode. It will similarly either join with an existing coordinator or exit with 
> a failure status, thereby avoiding split brains.
> When restarting a locator, e.g. during a rolling upgrade, it will be 
> restarted in "join only" mode. If a network partition is encountered, or the 
> {{.dat}} file is missing or stale, the locator will exit with a failure 
> status and split brain will be avoided.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to