[
https://issues.apache.org/jira/browse/GEODE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751433#comment-16751433
]
ASF subversion and git services commented on GEODE-6309:
--------------------------------------------------------
Commit 607026fc99b4b044df0352e8e7e5f1d373e57b92 in geode's branch
refs/heads/feature/GEODE-6309 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=607026f ]
GEODE-6309 ClusterConfigLocatorRestartDUnitTest fails to spin up a new server
This modifies auto-reconnect to lengthen the time a Locator will attempt
to join from 24 seconds to 60 seconds and prevents the Locator from
creating its own cluster (which would form a split-brain). In an
auto-reconnect attempt the location service will not start up until a
quorum of the old cluster can be contacted, meaning that some process
that's still in the cluster exists and should have taken over the role
of membership coordinator. The locator needs to join using that
coordinator and not create its own cluster.
This also corrects the handling of the old membership view in
GMSLocator. The restarted location service was incorrectly using this
old view as an authority on who had the role of coordinator but it
should only be used as a hint. This is done by putting the view into
the recoveredView variable and assigning it an invalid viewID.
In real applications this bug isn't likely to be encountered because the
first auto-reconnect attempt doesn't take place for a minute. The
ClusterStartupRule modifies this default to start reconnecting in 5
seconds, which wasn't giving the cluster enough time to react to the
loss of the old Locator and assign a new membership coordinator.
With these changes the test passes even if the default is reduced to 1
second.
Finally, the test was incorrectly using internal APIs to detect whether
the Locator had successfully reconnected. I fixed some of that but
opened GEODE-6312 to track the problem that stopping the old Locator did
not actually stop its cluster configuration service.
> ClusterConfigLocatorRestartDUnitTest fails to spin up a new server
> ------------------------------------------------------------------
>
> Key: GEODE-6309
> URL: https://issues.apache.org/jira/browse/GEODE-6309
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Bruce Schuchardt
> Assignee: Bruce Schuchardt
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> One of this class's tests starts a locator and two servers, then it
> force-disconnects the locator and one of the servers & waits for the locator
> to reconnect. After that it starts a third server and expects it to join the
> cluster but this failed to happen in CI run 316:
> {noformat}
> > Task :geode-core:distributedTest
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest
> > serverRestartsAfterLocatorReconnects FAILED
> org.apache.geode.test.dunit.RMIException: While invoking
> org.apache.geode.test.dunit.rules.ClusterStartupRule$$Lambda$46/1297938526.call
> in VM 3 running on Host 74139c18c4e4 with 5 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:533)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:390)
> at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:239)
> at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:232)
> at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:218)
> at
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartsAfterLocatorReconnects(ClusterConfigLocatorRestartDUnitTest.java:71)
> Caused by:
> org.apache.geode.SystemConnectException: Unable to join the
> distributed system in 60032ms
> {noformat}
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/316]
> SHA: 654dc3bac3e50e66f33385bdbc38c88750061aa9
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)