[jira] [Commented] (GEODE-8739) Split brain when locators exhaust join attempts on non existant servers

Dan Smith (Jira) Fri, 20 Nov 2020 14:41:05 -0800


    [ 
https://issues.apache.org/jira/browse/GEODE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236486#comment-17236486
 ]


Dan Smith commented on GEODE-8739:
----------------------------------

I think I see the problem. If we fall into *findCoordinatorFromView*, the 
locator chooses *itself* as the best coordinator.

{code}
if (localAddress.preferredForCoordinator()) {
      // it's possible that all other potential coordinators are gone
      // and this new member must become the coordinator
      bestGuessCoordinator = localAddress;
    }
{code}

Even though it got a response from the other locator, because it already tried 
it once and it was not the coordinator at the time, it ignores that response
{noformat}
if (!localAddress.equals(suggestedCoordinator)
          && !state.alreadyTried.contains(suggestedCoordinator)) {
{noformat}

The regular findCoordinator logic doesn't seem to do this, it's just in 
findCoordinatorFromView. It looks like we only get into findCoordinatorFromView 
if we recovered a view from a .dat file.

> Split brain when locators exhaust join attempts on non existant servers
> -----------------------------------------------------------------------
>
>                 Key: GEODE-8739
>                 URL: https://issues.apache.org/jira/browse/GEODE-8739
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Jason Huynh
>            Priority: Major
>         Attachments: exportedLogs_locator-0.zip, exportedLogs_locator-1.zip
>
>
> The hypothesis: "if there is a locator view .dat file with several 
> non-existent servers then then locators will waste all of their join attempts 
> on the servers instead of finding each other"
> Scenario is a test/user attempts to recreate a cluster with existing .dat and 
> persistent files.  The locators are spun in parallel and from the analysis, 
> it looks like they are able to communicate with each other, but then end up 
> forming their own ds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEODE-8739) Split brain when locators exhaust join attempts on non existant servers

Reply via email to