[ 
https://issues.apache.org/jira/browse/GEODE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236482#comment-17236482
 ] 

Dan Smith commented on GEODE-8739:
----------------------------------

These are what are the most interesting lines from the logs files I think, 
where the locators each decide that they should be the coordinator.

What's weird is that in the "Discovery state" message, each one has the same 
list of registrants, and the same view. But they have different possible 
coordinators.

{noformat}
gemfirecluster-sample-locator-0.log: [info 2020/11/17 12:22:12.973 GMT <main> 
tid=0x1] using findCoordinatorFromView

gemfirecluster-sample-locator-0.log: [info 2020/11/17 12:22:12.974 GMT <main> 
tid=0x1] searching for coordinator in findCoordinatorFromView

gemfirecluster-sample-locator-0.log: [info 2020/11/17 12:22:12.974 GMT <main> 
tid=0x1] sending FindCoordinatorRequests to 
[192.168.68.28(gemfirecluster-sample-server-0:1)<v2>:41000, 
192.168.149.18(gemfirecluster-sample-server-1:1)<v2>:41000, 
192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000]

gemfirecluster-sample-locator-0.log: [info 2020/11/17 12:22:15.975 GMT <main> 
tid=0x1] findCoordinatorFromView processing 
FindCoordinatorResponse(coordinator=192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000;
 senderId=192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000)

gemfirecluster-sample-locator-0.log: [info 2020/11/17 12:22:15.976 GMT <main> 
tid=0x1] Discovery state after looking for membership coordinator is 
locatorsContacted=2; findInViewResponses=0; 
alreadyTried=[192.168.149.18(gemfirecluster-sample-server-1:1)<v2>:41000, 
192.168.68.28(gemfirecluster-sample-server-0:1)<v2>:41000, 
192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000]; 
registrants=[192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000,
 192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000]; 
possibleCoordinator=192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000;
 viewId=-1; hasContactedAJoinedLocator=false; 
view=View[192.168.149.10(gemfirecluster-sample-locator-0:1:locator)<ec><v0>:41000|-1]
 members: [192.168.68.28(gemfirecluster-sample-server-0:1)<v2>:41000{lead}, 
192.168.149.18(gemfirecluster-sample-server-1:1)<v2>:41000]; responses=[]

gemfirecluster-sample-locator-0.log: [info 2020/11/17 12:22:15.976 GMT <main> 
tid=0x1] found possible coordinator 
192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000

gemfirecluster-sample-locator-0.log: [info 2020/11/17 12:22:15.976 GMT <main> 
tid=0x1] This member is becoming the membership coordinator with address 
192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000



gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:16.000 GMT <main> 
tid=0x1] using findCoordinatorFromView

gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:16.001 GMT <main> 
tid=0x1] searching for coordinator in findCoordinatorFromView

gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:16.002 GMT <main> 
tid=0x1] sending FindCoordinatorRequests to 
[192.168.68.28(gemfirecluster-sample-server-0:1)<v2>:41000, 
192.168.149.18(gemfirecluster-sample-server-1:1)<v2>:41000, 
192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000]

gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:19.003 GMT <main> 
tid=0x1] findCoordinatorFromView processing 
FindCoordinatorResponse(coordinator=192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000;
 senderId=192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000)

gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:19.004 GMT <main> 
tid=0x1] findCoordinatorFromView's best guess is now 
192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000

gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:19.005 GMT <main> 
tid=0x1] Discovery state after looking for membership coordinator is 
locatorsContacted=2; findInViewResponses=0; 
alreadyTried=[192.168.149.18(gemfirecluster-sample-server-1:1)<v2>:41000, 
192.168.68.28(gemfirecluster-sample-server-0:1)<v2>:41000]; 
registrants=[192.168.64.210(gemfirecluster-sample-locator-0:1:locator)<ec>:41000,
 192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000]; 
possibleCoordinator=192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000;
 viewId=-1; hasContactedAJoinedLocator=false; 
view=View[192.168.149.10(gemfirecluster-sample-locator-0:1:locator)<ec><v0>:41000|-1]
 members: [192.168.68.28(gemfirecluster-sample-server-0:1)<v2>:41000{lead}, 
192.168.149.18(gemfirecluster-sample-server-1:1)<v2>:41000]; responses=[]

gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:19.005 GMT <main> 
tid=0x1] found possible coordinator 
192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000

gemfirecluster-sample-locator-1.log: [info 2020/11/17 12:22:19.005 GMT <main> 
tid=0x1] This member is becoming the membership coordinator with address 
192.168.149.63(gemfirecluster-sample-locator-1:1:locator)<ec>:41000

{noformat}

> Split brain when locators exhaust join attempts on non existant servers
> -----------------------------------------------------------------------
>
>                 Key: GEODE-8739
>                 URL: https://issues.apache.org/jira/browse/GEODE-8739
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Jason Huynh
>            Priority: Major
>         Attachments: exportedLogs_locator-0.zip, exportedLogs_locator-1.zip
>
>
> The hypothesis: "if there is a locator view .dat file with several 
> non-existent servers then then locators will waste all of their join attempts 
> on the servers instead of finding each other"
> Scenario is a test/user attempts to recreate a cluster with existing .dat and 
> persistent files.  The locators are spun in parallel and from the analysis, 
> it looks like they are able to communicate with each other, but then end up 
> forming their own ds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to