[
https://issues.apache.org/jira/browse/GEODE-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494855#comment-17494855
]
Barrett Oglesby edited comment on GEODE-9910 at 2/19/22, 12:42 AM:
-------------------------------------------------------------------
Here is some analysis of this issue.
h3. Server Addresses
node 1:
membership: 10.196.55.141(15661)<ec><v0>:42000
locator: 10.196.55.141:10335
node 2:
membership: 10.196.55.142(19002)<ec><v1>:42000
locator: 10.196.55.142:10335
h3. Node2 Initial Disconnect
node2 lost connectivity with node1 and removed it:
{noformat}
2021-11-28 04:03:45,084 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure
Detection thread 9] [] Availability check failed for member
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:03:45,084 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure
Detection thread 9] [] Requesting removal of suspect member
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:03:45,085 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure
Detection thread 9] [] This member is becoming the membership coordinator with
address 10.196.55.142(19002)<ec><v1>:42000
{noformat}
It then realized that quorum had been lost (node1 was coordinator with
weight=15; node2 was not coordinator with weight=10):
{noformat}
2021-11-28 04:03:45,091 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] View Creator thread is starting
2021-11-28 04:03:45,091 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] 10.196.55.141(15661)<ec><v0>:42000 had a weight
of 15
2021-11-28 04:03:45,092 WARN
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] total weight lost in this view change is 15 of 25.
Quorum has been lost!
2021-11-28 04:03:45,092 FATAL
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] Possible loss of quorum due to the loss of 1 cache
processes: [10.196.55.141(15661)<ec><v0>:42000]
{noformat}
And disconnected itself from the distributed system:
{noformat}
2021-11-28 04:03:46,093 FATAL
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] Membership service failure: Exiting due to possible
network partition event due to loss of 1 cache processes:
[10.196.55.141(15661)<ec><v0>:42000]
org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
Exiting due to possible network partition event due to loss of 1 cache
processes: [10.196.55.141(15661)<ec><v0>:42000]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1787)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.access$1300(GMSJoinLeave.java:80)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.prepareAndSendView(GMSJoinLeave.java:2588)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.sendInitialView(GMSJoinLeave.java:2204)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2286)
[geode-membership-1.14.0.jar:?]
{noformat}
It stopped its locator:
{noformat}
2021-11-28 04:03:46,794 INFO
[org.apache.geode.distributed.internal.InternalLocator]-[ReconnectThread] []
Distribution Locator on
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142 is
stopped{noformat}
h3. Node2 Reconnect Attempt 1
{noformat}
2021-11-28 04:04:46,800 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Attempting to reconnect to the distributed system. This is attempt #1.
{noformat}
The first reconnect attempt failed to get quorum (it needed a weight of 13 but
is 10):
{noformat}
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] performing a quorum check to see if location services can be started early
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] beginning quorum check with GMSQuorumChecker on view
View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}, 10.196.55.142(19002)<ec><v1>:42000]
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: sending request to 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: sending request to 10.196.55.142(19002)<ec><v1>:42000
2021-11-28 04:04:46,812 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] received ping-pong response from /10.196.55.142<v1>:42000
2021-11-28 04:04:46,812 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: mapped address to member ID 10.196.55.142(19002)<ec><v1>:42000
2021-11-28 04:04:46,812 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: waiting up to 15000ms to receive a quorum of responses
...
2021-11-28 04:05:01,821 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: timeout waiting for responses. 1 responses received
2021-11-28 04:05:01,822 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: contacted 1 processes with 10 member weight units. Threshold
for a quorum is 13
{noformat}
Since quorum failed, the locator did not restart:
{noformat}
2021-11-28 04:05:01,822 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] quorum check failed - not allowing location services to start early
{noformat}
It then was able to send a *FindCoordinatorRequest* and receive a
*FindCoordinatorResponse* from 10.196.55.141:10335 (node1's locator). It then
tries to join the distributed system which fails. It does this 4 times, but
isn't able to join the distributed system:
{noformat}
2021-11-28 04:05:02,078 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:06,946 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=1, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:06,948 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:10,015 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:10,015 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0; alreadyTried=[]; registrants=[];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=1;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}]; responses=[]
2021-11-28 04:05:10,016 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
*2021-11-28 04:05:10,016 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Attempting to join the distributed system through coordinator
10.196.55.141(15661)<ec><v0>:42000 using address 10.196.55.142(19002)<ec>:42000
2021-11-28 04:05:22,017 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:23,050 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:23,052 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:26,111 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:26,111 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0;
alreadyTried=[10.196.55.141(15661)<ec><v0>:42000]; registrants=[];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:05:26,113 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:26,113 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:05:38,113 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:39,145 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:39,147 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:42,207 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:42,207 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0;
alreadyTried=[10.196.55.141(15661)<ec><v0>:42000];
registrants=[10.196.55.142(19002)<ec>:42000];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:05:42,209 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:42,209 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:05:54,210 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:55,242 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:55,244 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:58,303 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:58,303 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0;
alreadyTried=[10.196.55.141(15661)<ec><v0>:42000];
registrants=[10.196.55.142(19002)<ec>:42000];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:05:58,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:58,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
{noformat}
* This message is coming from {*}GMSJoinLeave.attemptToJoin{*}. This fails
since the ReconnectThread continues on.
node1-vm.log contains these messages from
*GMSLocator.processFindCoordinatorRequest* corresponding to the
*FindCoordinatorRequests* received from node2:
{noformat}
2021-11-28 04:05:06,898 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from view is
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:23,045 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from view is
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:39,141 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from registrations is
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:55,238 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from registrations is
10.196.55.141(15661)<ec><v0>:42000
{noformat}
Since, node2 was unable to join the distributed system, it gave up:
{noformat}
2021-11-28 04:06:10,309 WARN
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Caught SystemConnectException in reconnect
org.apache.geode.SystemConnectException: Problem starting up membership services
at
org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:186)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:464)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:497)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1811)
~[geode-membership-1.14.0.jar:?]
Caused by:
org.apache.geode.distributed.internal.membership.api.MemberStartupException:
Unable to join the distributed system in 68228ms
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.join(GMSJoinLeave.java:411)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.join(GMSMembership.java:533)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.access$1200(GMSMembership.java:72)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.joinDistributedSystem(GMSMembership.java:1752)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:242)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1642)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
~[geode-core-1.14.0.jar:?]
{noformat}
Node2 Reconnect Attempt 2
{noformat}
2021-11-28 04:07:10,310 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Attempting to reconnect to the distributed system. This is attempt
#2.{noformat}
The second reconnect attempt succeeded in getting quorum (this means node2 was
able to send a ping message to node1 and receive a pong message back:
{noformat}
2021-11-28 04:07:10,319 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] performing a quorum check to see if location services can be started early
2021-11-28 04:07:10,319 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] beginning quorum check with GMSQuorumChecker on view
View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}, 10.196.55.142(19002)<ec><v1>:42000]
2021-11-28 04:07:10,319 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: sending request to 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:07:10,320 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: waiting up to 15000ms to receive a quorum of responses
2021-11-28 04:07:10,320 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[unicast
receiver,vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2-26663] []
received ping-pong response from /10.196.55.141<v0>:42000
2021-11-28 04:07:10,321 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[unicast
receiver,vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2-26663] []
quorum check: mapped address to member ID 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:07:10,820 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: received responses from all members that were in the old
distributed system
{noformat}
It then started the locator:
{noformat}
2021-11-28 04:07:10,820 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Quorum check passed - allowing location services to start early
2021-11-28 04:07:10,879 INFO
[org.apache.geode.distributed.internal.InternalLocator]-[ReconnectThread] []
Starting peer location for Distribution Locator on
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142
2021-11-28 04:07:10,879 INFO
[org.apache.geode.distributed.internal.tcpserver.TcpServer]-[ReconnectThread]
[] Locator was created at Sun Nov 28 04:07:10 UTC 2021
2021-11-28 04:07:10,879 INFO
[org.apache.geode.distributed.internal.tcpserver.TcpServer]-[ReconnectThread]
[] Listening on port 10335 bound on address
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142
{noformat}
And recovered state from node1's locator (this means node2 was able to send a
*GetViewRequest* to node1 and receive a *GetViewResponse* back):
{noformat}
2021-11-28 04:07:10,880 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] GemFire peer location service starting. Other locators:
10.196.55.142[10335],10.196.55.141[10335],10.196.55.133[10335] Locators
preferred as coordinators: true Network partition detection enabled: true
View persistence file: /locator10335view.dat
2021-11-28 04:07:10,898 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] Peer locator attempting to recover from HostAndPort[/10.196.55.141:10335]
2021-11-28 04:07:10,965 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] Peer locator recovered initial membership of
View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]
2021-11-28 04:07:10,966 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] Peer locator recovered state from HostAndPort[/10.196.55.141:10335]
{noformat}
It then had the same behavior with joining the distributed system as reconnect
attempt 1:
{noformat}
2021-11-28 04:07:11,153 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=-100, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.142(19002)<ec>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|-100] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.142:10335]
2021-11-28 04:07:11,224 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000,
10.196.55.142(19002)<ec>:42000], senderId=10.196.55.141(15661)<ec><v0>:42000,
network partition detection enabled=true, locators preferred as
coordinators=true, view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:07:11,226 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:07:14,303 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:07:14,304 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=2; findInViewResponses=0; alreadyTried=[]; registrants=[];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:07:14,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:07:14,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Attempting to join the distributed system through coordinator
10.196.55.141(15661)<ec><v0>:42000 using address 10.196.55.142(19002)<ec>:42000
2021-11-28 04:07:29,409 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:07:44,516 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:07:59,617 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
{noformat}
Since, it was unable to join the distributed system, it gave up:
{noformat}
2021-11-28 04:08:11,622 WARN
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Caught SystemConnectException in reconnect
{noformat}
*However, the locator was not stopped.*
h3. Node2 Reconnect Attempt 3
{noformat}
2021-11-28 04:09:11,623 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Attempting to reconnect to the distributed system. This is attempt #3.
{noformat}
The third reconnect attempt also succeeded in getting quorum:
{noformat}
2021-11-28 04:09:11,631 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] performing a quorum check to see if location services can be started early
2021-11-28 04:09:11,631 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Quorum check passed - allowing location services to start early
{noformat}
Then, the exception occurred trying to start the locator since it was still
running from reconnect attempt 2:
{noformat}
2021-11-28 04:09:11,632 WARN
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Exception occurred while trying to connect the system during reconnect
java.lang.IllegalStateException: A locator can not be created because one
already exists in this JVM.
at
org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
~[geode-core-1.14.0.jar:?]
{noformat}
I'm not sure why the servers were able to exchange some messages (e.g.
{*}FindCoordinatorRequest/Response{*}, {*}GetViewRequest/Response{*}) but not
others (e.g. {*}JoinRequestMessage/JoinResponseMessage{*}), but that could be
just the state of the servers at the time.
In any event, if a reconnect attempt fails, any started locator should be
stopped.
was (Author: barry.oglesby):
Here is some analysis of this issue.
h3. Server Addresses
node 1:
membership: 10.196.55.141(15661)<ec><v0>:42000
locator: 10.196.55.141:10335
node 2:
membership: 10.196.55.142(19002)<ec><v1>:42000
locator: 10.196.55.142:10335
h3. Node2 Initial Disconnect
node2 lost connectivity with node1 and removed it:
{noformat}
2021-11-28 04:03:45,084 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure
Detection thread 9] [] Availability check failed for member
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:03:45,084 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure
Detection thread 9] [] Requesting removal of suspect member
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:03:45,085 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode Failure
Detection thread 9] [] This member is becoming the membership coordinator with
address 10.196.55.142(19002)<ec><v1>:42000
{noformat}
It then realized that quorum had been lost (node1 was coordinator with
weight=15; node2 was not coordinator with weight=10):
{noformat}
2021-11-28 04:03:45,091 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] View Creator thread is starting
2021-11-28 04:03:45,091 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] 10.196.55.141(15661)<ec><v0>:42000 had a weight
of 15
2021-11-28 04:03:45,092 WARN
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] total weight lost in this view change is 15 of 25.
Quorum has been lost!
2021-11-28 04:03:45,092 FATAL
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] Possible loss of quorum due to the loss of 1 cache
processes: [10.196.55.141(15661)<ec><v0>:42000]
{noformat}
And disconnected itself from the distributed system:
{noformat}
2021-11-28 04:03:46,093 FATAL
[org.apache.geode.distributed.internal.membership.gms.Services]-[Geode
Membership View Creator] [] Membership service failure: Exiting due to possible
network partition event due to loss of 1 cache processes:
[10.196.55.141(15661)<ec><v0>:42000]
org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
Exiting due to possible network partition event due to loss of 1 cache
processes: [10.196.55.141(15661)<ec><v0>:42000]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1787)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.access$1300(GMSJoinLeave.java:80)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.prepareAndSendView(GMSJoinLeave.java:2588)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.sendInitialView(GMSJoinLeave.java:2204)
[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2286)
[geode-membership-1.14.0.jar:?]
{noformat}
It stopped its locator:
{noformat}
2021-11-28 04:03:46,794 INFO
[org.apache.geode.distributed.internal.InternalLocator]-[ReconnectThread] []
Distribution Locator on
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142 is
stopped{noformat}
h3. Node2 Reconnect Attempt 1
{noformat}
2021-11-28 04:04:46,800 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Attempting to reconnect to the distributed system. This is attempt #1.
{noformat}
The first reconnect attempt failed to get quorum (it needed a weight of 13 but
is 10):
{noformat}
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] performing a quorum check to see if location services can be started early
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] beginning quorum check with GMSQuorumChecker on view
View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}, 10.196.55.142(19002)<ec><v1>:42000]
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: sending request to 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:04:46,810 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: sending request to 10.196.55.142(19002)<ec><v1>:42000
2021-11-28 04:04:46,812 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] received ping-pong response from /10.196.55.142<v1>:42000
2021-11-28 04:04:46,812 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: mapped address to member ID 10.196.55.142(19002)<ec><v1>:42000
2021-11-28 04:04:46,812 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: waiting up to 15000ms to receive a quorum of responses
...
2021-11-28 04:05:01,821 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: timeout waiting for responses. 1 responses received
2021-11-28 04:05:01,822 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: contacted 1 processes with 10 member weight units. Threshold
for a quorum is 13
{noformat}
Since quorum failed, the locator did not restart:
{noformat}
2021-11-28 04:05:01,822 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] quorum check failed - not allowing location services to start early
{noformat}
It then was able to send a *FindCoordinatorRequest* and receive a
*FindCoordinatorResponse* from 10.196.55.141:10335 (node1's locator). It then
tries to join the distributed system which fails. It does this 4 times, but
isn't able to join the distributed system:
{noformat}
2021-11-28 04:05:02,078 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:06,946 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=1, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:06,948 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:10,015 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:10,015 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0; alreadyTried=[]; registrants=[];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=1;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}]; responses=[]
2021-11-28 04:05:10,016 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
*2021-11-28 04:05:10,016 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Attempting to join the distributed system through coordinator
10.196.55.141(15661)<ec><v0>:42000 using address 10.196.55.142(19002)<ec>:42000
2021-11-28 04:05:22,017 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:23,050 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:23,052 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:26,111 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:26,111 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0;
alreadyTried=[10.196.55.141(15661)<ec><v0>:42000]; registrants=[];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:05:26,113 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:26,113 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:05:38,113 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:39,145 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:39,147 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:42,207 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:42,207 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0;
alreadyTried=[10.196.55.141(15661)<ec><v0>:42000];
registrants=[10.196.55.142(19002)<ec>:42000];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:05:42,209 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:42,209 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:05:54,210 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Unable to contact locator HostAndPort[/10.196.55.142:10335]:
java.net.ConnectException: Connection refused (Connection refused)
2021-11-28 04:05:55,242 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.141(15661)<ec><v0>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:05:55,244 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:05:58,303 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:05:58,303 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=1; findInViewResponses=0;
alreadyTried=[10.196.55.141(15661)<ec><v0>:42000];
registrants=[10.196.55.142(19002)<ec>:42000];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:05:58,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:58,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
{noformat}
* This message is coming from {*}GMSJoinLeave.attemptToJoin{*}. This fails
since the ReconnectThread continues on.
node1-vm.log contains these messages from
*GMSLocator.processFindCoordinatorRequest* corresponding to the
*FindCoordinatorRequests* received from node2:
{noformat}
2021-11-28 04:05:06,898 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from view is
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:23,045 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from view is
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:39,141 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from registrations is
10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:05:55,238 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[locator
request thread 1] [] Peer locator: coordinator from registrations is
10.196.55.141(15661)<ec><v0>:42000
{noformat}
Since, node2 was unable to join the distributed system, it gave up:
{noformat}
2021-11-28 04:06:10,309 WARN
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Caught SystemConnectException in reconnect
org.apache.geode.SystemConnectException: Problem starting up membership services
at
org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:186)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:464)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:497)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1811)
~[geode-membership-1.14.0.jar:?]
Caused by:
org.apache.geode.distributed.internal.membership.api.MemberStartupException:
Unable to join the distributed system in 68228ms
at
org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.join(GMSJoinLeave.java:411)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.join(GMSMembership.java:533)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.access$1200(GMSMembership.java:72)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.joinDistributedSystem(GMSMembership.java:1752)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:242)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.start(GMSMembership.java:1642)
~[geode-membership-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:171)
~[geode-core-1.14.0.jar:?]
{noformat}
Node2 Reconnect Attempt 2
{noformat}
2021-11-28 04:07:10,310 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Attempting to reconnect to the distributed system. This is attempt
#2.{noformat}
The second reconnect attempt succeeded in getting quorum (this means node2 was
able to send a ping message to node1 and receive a pong message back:
{noformat}
2021-11-28 04:07:10,319 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] performing a quorum check to see if location services can be started early
2021-11-28 04:07:10,319 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] beginning quorum check with GMSQuorumChecker on view
View[10.196.55.141(15661)<ec><v0>:42000|1] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}, 10.196.55.142(19002)<ec><v1>:42000]
2021-11-28 04:07:10,319 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: sending request to 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:07:10,320 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: waiting up to 15000ms to receive a quorum of responses
2021-11-28 04:07:10,320 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[unicast
receiver,vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2-26663] []
received ping-pong response from /10.196.55.141<v0>:42000
2021-11-28 04:07:10,321 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[unicast
receiver,vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2-26663] []
quorum check: mapped address to member ID 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:07:10,820 INFO
[org.apache.geode.distributed.internal.membership.gms.messenger.GMSQuorumChecker]-[ReconnectThread]
[] quorum check: received responses from all members that were in the old
distributed system
{noformat}
It then started the locator:
{noformat}
2021-11-28 04:07:10,820 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Quorum check passed - allowing location services to start early
2021-11-28 04:07:10,879 INFO
[org.apache.geode.distributed.internal.InternalLocator]-[ReconnectThread] []
Starting peer location for Distribution Locator on
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142
2021-11-28 04:07:10,879 INFO
[org.apache.geode.distributed.internal.tcpserver.TcpServer]-[ReconnectThread]
[] Locator was created at Sun Nov 28 04:07:10 UTC 2021
2021-11-28 04:07:10,879 INFO
[org.apache.geode.distributed.internal.tcpserver.TcpServer]-[ReconnectThread]
[] Listening on port 10335 bound on address
vmw-hcs-248e71fd-dd76-4111-ba82-379151aabbb7-3000-1-node-2/10.196.55.142
{noformat}
And recovered state from node1's locator (this means node2 was able to send a
*GetViewRequest* to node1 and receive a *GetViewResponse* back):
{noformat}
2021-11-28 04:07:10,880 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] GemFire peer location service starting. Other locators:
10.196.55.142[10335],10.196.55.141[10335],10.196.55.133[10335] Locators
preferred as coordinators: true Network partition detection enabled: true
View persistence file: /locator10335view.dat
2021-11-28 04:07:10,898 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] Peer locator attempting to recover from HostAndPort[/10.196.55.141:10335]
2021-11-28 04:07:10,965 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] Peer locator recovered initial membership of
View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]
2021-11-28 04:07:10,966 INFO
[org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator]-[ReconnectThread]
[] Peer locator recovered state from HostAndPort[/10.196.55.141:10335]
{noformat}
It then had the same behavior with joining the distributed system as reconnect
attempt 1:
{noformat}
2021-11-28 04:07:11,153 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=-100, registrants=[10.196.55.142(19002)<ec>:42000],
senderId=10.196.55.142(19002)<ec>:42000, network partition detection
enabled=true, locators preferred as coordinators=true,
view=View[10.196.55.141(15661)<ec><v0>:42000|-100] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.142:10335]
2021-11-28 04:07:11,224 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] received
FindCoordinatorResponse(coordinator=10.196.55.141(15661)<ec><v0>:42000,
fromView=true, viewId=2, registrants=[10.196.55.142(19002)<ec>:42000,
10.196.55.142(19002)<ec>:42000], senderId=10.196.55.141(15661)<ec><v0>:42000,
network partition detection enabled=true, locators preferred as
coordinators=true, view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]) from locator
HostAndPort[/10.196.55.141:10335]
2021-11-28 04:07:11,226 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Locator's address indicates it is part of a distributed system so I will
not become membership coordinator on this attempt to join
2021-11-28 04:07:14,303 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] findCoordinator chose 10.196.55.141(15661)<ec><v0>:42000 out of these
possible coordinators: [10.196.55.141(15661)<ec><v0>:42000]
2021-11-28 04:07:14,304 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Discovery state after looking for membership coordinator is
locatorsContacted=2; findInViewResponses=0; alreadyTried=[]; registrants=[];
possibleCoordinator=10.196.55.141(15661)<ec><v0>:42000; viewId=2;
hasContactedAJoinedLocator=true;
view=View[10.196.55.141(15661)<ec><v0>:42000|2] members:
[10.196.55.141(15661)<ec><v0>:42000{lead}] crashed:
[10.196.55.142(19002)<ec><v1>:42000]; responses=[]
2021-11-28 04:07:14,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] found possible coordinator 10.196.55.141(15661)<ec><v0>:42000
2021-11-28 04:07:14,305 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Attempting to join the distributed system through coordinator
10.196.55.141(15661)<ec><v0>:42000 using address 10.196.55.142(19002)<ec>:42000
2021-11-28 04:07:29,409 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:07:44,516 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
2021-11-28 04:07:59,617 INFO
[org.apache.geode.distributed.internal.membership.gms.Services]-[ReconnectThread]
[] Probable coordinator is still 10.196.55.141(15661)<ec><v0>:42000 - waiting
for a join-response
{noformat}
Since, it was unable to join the distributed system, it gave up:
{noformat}
2021-11-28 04:08:11,622 WARN
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Caught SystemConnectException in reconnect
{noformat}
*However, the locator was not stopped.*
h3. Node2 Reconnect Attempt 3
{noformat}
2021-11-28 04:09:11,623 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Attempting to reconnect to the distributed system. This is attempt #3.
{noformat}
The third reconnect attempt also succeeded in getting quorum:
{noformat}
2021-11-28 04:09:11,631 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] performing a quorum check to see if location services can be started early
2021-11-28 04:09:11,631 INFO
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Quorum check passed - allowing location services to start early
{noformat}
Then, the exception occurred trying to start the locator since it was still
running from reconnect attempt 2:
{noformat}
2021-11-28 04:09:11,632 WARN
[org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
[] Exception occurred while trying to connect the system during reconnect
java.lang.IllegalStateException: A locator can not be created because one
already exists in this JVM.
at
org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
~[geode-core-1.14.0.jar:?]
at
org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
~[geode-core-1.14.0.jar:?]
{noformat}
I'm not sure why the servers were able to exchange some messages (e.g.
{*}FindCoordinatorRequest/Response{*}, {*}GetViewRequest/Response{*}) but not
others (e.g. {*}JoinRequestMessage/JoinResponseMessage{*}), but that could be
just the state of the servers at the time.
In any event, if a reconnect attempt fails, any started locator should be
stopped.
> Failure to auto-reconnect upon network partition
> ------------------------------------------------
>
> Key: GEODE-9910
> URL: https://issues.apache.org/jira/browse/GEODE-9910
> Project: Geode
> Issue Type: Bug
> Affects Versions: 1.14.0
> Reporter: Surya Mudundi
> Assignee: Barrett Oglesby
> Priority: Major
> Labels: GeodeOperationAPI, blocks-1.15.0, needsTriage
> Attachments: geode-logs.zip
>
>
> Two node cluster with embedded locators failed to auto-reconnect when node-1
> experienced network outage for couple of minutes and when node-1 recovered
> from the outage, node-2 failed to auto-reconnect.
> node-2 tried to re-connect to node-1 as:
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
> [] Attempting to reconnect to the distributed system. This is attempt #1.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
> [] Attempting to reconnect to the distributed system. This is attempt #2.
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
> [] Attempting to reconnect to the distributed system. This is attempt #3.
> Finally reported below error after 3 attempts as:
> INFO
> [org.apache.geode.logging.internal.LoggingProviderLoader]-[ReconnectThread]
> [] Using org.apache.geode.logging.internal.SimpleLoggingProvider for service
> org.apache.geode.logging.internal.spi.LoggingProvider
> INFO [org.apache.geode.internal.InternalDataSerializer]-[ReconnectThread] []
> initializing InternalDataSerializer with 0 services
> INFO
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
> [] performing a quorum check to see if location services can be started early
> INFO
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
> [] Quorum check passed - allowing location services to start early
> WARN
> [org.apache.geode.distributed.internal.InternalDistributedSystem]-[ReconnectThread]
> [] Exception occurred while trying to connect the system during reconnect
> java.lang.IllegalStateException: A locator can not be created because one
> already exists in this JVM.
> at
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:298)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalLocator.createLocator(InternalLocator.java:273)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.startInitLocator(InternalDistributedSystem.java:916)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:768)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3034)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2605)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2326)
> ~[geode-core-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1187)
> ~[geode-membership-1.14.0.jar:?]
> at
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:1811)
> ~[geode-membership-1.14.0.jar:?]
> at java.lang.Thread.run(Thread.java:829) [?:?]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)