[ https://issues.apache.org/jira/browse/GEODE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236486#comment-17236486 ]
Dan Smith commented on GEODE-8739: ---------------------------------- I think I see the problem. If we fall into *findCoordinatorFromView*, the locator chooses *itself* as the best coordinator. {code} if (localAddress.preferredForCoordinator()) { // it's possible that all other potential coordinators are gone // and this new member must become the coordinator bestGuessCoordinator = localAddress; } {code} Even though it got a response from the other locator, because it already tried it once and it was not the coordinator at the time, it ignores that response {noformat} if (!localAddress.equals(suggestedCoordinator) && !state.alreadyTried.contains(suggestedCoordinator)) { {noformat} The regular findCoordinator logic doesn't seem to do this, it's just in findCoordinatorFromView. It looks like we only get into findCoordinatorFromView if we recovered a view from a .dat file. > Split brain when locators exhaust join attempts on non existant servers > ----------------------------------------------------------------------- > > Key: GEODE-8739 > URL: https://issues.apache.org/jira/browse/GEODE-8739 > Project: Geode > Issue Type: Bug > Components: membership > Reporter: Jason Huynh > Priority: Major > Attachments: exportedLogs_locator-0.zip, exportedLogs_locator-1.zip > > > The hypothesis: "if there is a locator view .dat file with several > non-existent servers then then locators will waste all of their join attempts > on the servers instead of finding each other" > Scenario is a test/user attempts to recreate a cluster with existing .dat and > persistent files. The locators are spun in parallel and from the analysis, > it looks like they are able to communicate with each other, but then end up > forming their own ds. -- This message was sent by Atlassian Jira (v8.3.4#803005)