[ https://issues.apache.org/jira/browse/GEODE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bill Burcham updated GEODE-9680: -------------------------------- Summary: Newly Started/Restarted Locators are Susceptible to Split-Brains (was: Newly Started Locators are Susceptible to Split-Brains) > Newly Started/Restarted Locators are Susceptible to Split-Brains > ---------------------------------------------------------------- > > Key: GEODE-9680 > URL: https://issues.apache.org/jira/browse/GEODE-9680 > Project: Geode > Issue Type: Bug > Components: membership > Affects Versions: 1.15.0 > Reporter: Bill Burcham > Priority: Major > Labels: needsTriage > > Geode is built on the assumption that views progress linearly in a sequence. > If that sequence ever forks into two or more parallel lines then we have a > "split brain". > In a split brain condition, each of the parallel views are independent. It's > as if you have more than one system running concurrently. It's possible e.g. > for some clients to connect to members of one view and other clients to > connect to members of another view. Updates to members in one view are not > seen by members of a parallel view. > Geode views are produced by a coordinator. As long as only a single > coordinator is running, there is no possibility of a split brain. Split brain > arises when more than one coordinator is producing views at the same time. > Each Geode member (peer) is started with the {{locators}} configuration > parameter. That parameter specifies locator(s) to use to find the (already > running!) coordinator (member) to join with. > When a locator (member) starts, it goes through this sequence to find the > coordinator: > # it first tries to find the coordinator through one of the (other) > configured locators > # if it can't contact any of those, it tries contacting non-locator (cache > server) members it has retrieved from the "view presistence" ({{.dat}}) file > If it hasn't found a coordinator to join with, then the locator may _become_ > a coordinator. > Sometimes this is ok. If no other coordinator is currently running then this > behavior is fine. An example is when an [administrator is starting up a brand > new > cluster|https://geode.apache.org/docs/guide/114/configuring/running/running_the_locator.html]. > In that case we want the very first locator we start to become the > coordinator. > But there are a number of situations where there may already be another > coordinator running but it cannot be reached: > * if the administrator/operator is starting up a brand new cluster with > multiple locators and… > ** maybe Geode is running in a managed environment like Kubernetes and the > locators hostnames are not (yet) resolvable in DNS > ** maybe there is a network partition between the starting locators so they > can't communicate > ** maybe the existing locators or coordinator are running very slowly or the > network is degraded. This is effectively the same as the network partition > just mentioned > * if a cluster is already running and the administrator/operator wants to > scale it up by starting/adding a new locator Geode is susceptible to that > same network partition issue > * if a cluster is already running and the administrator/operator needs to > restart a locator, e.g. for a rolling upgrade, if none of the locators in the > {{locators}} configuration parameter are reachable (maybe they are not > running, or maybe there is a network partition) and… > ** if the "view persistence" {{.dat}} file is missing or deleted > ** or if the current set of running Geode members has evolved so far that the > coordinates (host+port) in the {{.dat}} file are completely out of date > In each of those cases, the newly starting locator will become a coordinator > and will start producing views. Now we'll have the old coordinator producing > views at the same time as the new one. > *When this ticket is complete*, Geode will offer a locator startup mode (via > TBD configuration parameter, {{LocatorLauncher}} startup parameter) that > prevents that locator from becoming a coordinator. With that mode, it will be > possible for an administrator to avoid many of the problematic scenarios > mentioned above, while retaining the ability to start a first locator which > is allowed to become a coordinator. > For purposes of discussion we'll call the startup mode that allows the > locator to become a coordinator "seed" mode, and we'll call the new startup > mode that prevents the locator from becoming a coordinator before first > joining, "join-only" mode. > To start a brand new cluster, the first locator is started in "seed" mode. > After that all subsequent locators are started in "join only" mode. If > network partitions occur, the newly started nodes will exit with a failure > status, but will not become coordinators. > To add a locator to a running cluster, it will be started in "join only" > mode. It will similarly either join with an existing coordinator or exit with > a failure status, thereby avoiding split brains. > When restarting a locator, e.g. during a rolling upgrade, it will be > restarted in "join only" mode. If a network partition is encountered, or the > {{.dat}} file is missing or stale, the locator will exit with a failure > status and split brain will be avoided. -- This message was sent by Atlassian Jira (v8.3.4#803005)