[ https://issues.apache.org/jira/browse/GEODE-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219145#comment-17219145 ]
Jinmei Liao commented on GEODE-8589: ------------------------------------ Same problem also happens on the secondMembershipPausesForLocatorWaitTime test: {quote} org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest > secondMembershipPausesForLocatorWaitTime FAILED org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a lambda expression in org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest that uses org.apache.geode.distributed.internal.membership.api.Membership Expected size:<2> but was:<1> in: <[172.17.0.29(1:locator)<ec><v0>:41003]> within 5 minutes. at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119) at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31) at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679) at org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest.secondMembershipPausesForLocatorWaitTime(MembershipIntegrationTest.java:270) Caused by: java.lang.AssertionError: Expected size:<2> but was:<1> in: <[172.17.0.29(1:locator)<ec><v0>:41003]> at org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest.lambda$secondMembershipPausesForLocatorWaitTime$8(MembershipIntegrationTest.java:271) {quote} > make locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted test > deterministic > ----------------------------------------------------------------------------------- > > Key: GEODE-8589 > URL: https://issues.apache.org/jira/browse/GEODE-8589 > Project: Geode > Issue Type: Improvement > Components: membership > Reporter: Bill Burcham > Priority: Major > Labels: membership, pull-request-available > > A recent commit added a new test: > {{MembershipIntegrationTest.locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}} > That's a good test. But it would be better if it ran faster and was less > susceptible to timing issues. The problem is that the logic we are trying to > test, {{GMSJoinLeave.leave()}} uses {{System.currentTimeMillis()}} to get the > current time and it uses {{Thread.sleep()}} to sleep: > {code:java} > long now = System.currentTimeMillis(); > ... > if (state.joinedMembersContacted <= 0 && ((now >= > locatorGiveUpTime && > tries >= minimumRetriesBeforeBecomingCoordinator) || > state.locatorsContacted >= locators.size())) { > ... > if (System.currentTimeMillis() > giveupTime) { > break; > } > ... > Thread.sleep(retrySleep); > ... > giveupTime = System.currentTimeMillis() + timeout; > ... > if (!this.isJoined) { > logger.debug("giving up attempting to join the distributed system > after " > + (System.currentTimeMillis() - startTime) + "ms"); > } > ... > if (!this.isJoined && state.hasContactedAJoinedLocator) { > throw new MemberStartupException("Unable to join the distributed > system in " > + (System.currentTimeMillis() - startTime) + "ms"); > } > {code} > The opportunity is to _inject_ objects that handle these two functions (time > keeping and sleeping). This will enable us to artifically control these in > our test! Sleeping doesn't have to take any time at all. And we can make time > pass as quickly as we want to. > For an example of how this can be done, see [PR > #5422|https://github.com/apache/geode/pull/5422] for GEODE-6950. > This particular test > ({{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}) is a > little bit more involved than that one in that more objects are involved. In > addition to {{GMSJoinLeave}}, other classes involved in the test also need > current time and need to sleep. > When this ticket is complete: > * functional interfaces {{MillisecondProvider}} and {{Sleeper}}, currently > defined inside {{PrimaryHandler}} will be moved higher in the (internal) > hierarchy for use by other membership classes > * {{GMSJoinLeave}} will take a {{MillisecondProvider}} and {{Sleeper}} in its > constructor and will delegate to those instead of calling > {{System.currentTimeMillis()}} and {{Thread.sleep()}} directly > * TBD other classes may require injection of {{MillisecondProvider}} and > {{Sleeper}} > * TBD other changes may be necessary in cases where collaborating classes > currently spin up threads or synchronize between threads e.g. calling > {{wait()}} > * {{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}} will no > longer employ {{Thread.sleep()}} nor will it call > {{CompletableFuture.get(long timeout, TimeUnit unit)}}—instead it will > operate deterministically (ideally in the same thread but not necessarily) > with respect to the unit under test. -- This message was sent by Atlassian Jira (v8.3.4#803005)