[ 
https://issues.apache.org/jira/browse/GEODE-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219145#comment-17219145
 ] 

Jinmei Liao commented on GEODE-8589:
------------------------------------

Same problem also happens on the secondMembershipPausesForLocatorWaitTime test:
{quote}
org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest 
> secondMembershipPausesForLocatorWaitTime FAILED
    org.awaitility.core.ConditionTimeoutException: Assertion condition defined 
as a lambda expression in 
org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest 
that uses org.apache.geode.distributed.internal.membership.api.Membership 
    Expected size:<2> but was:<1> in:
    <[172.17.0.29(1:locator)<ec><v0>:41003]> within 5 minutes.
        at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
        at 
org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
        at 
org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
        at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
        at 
org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679)
        at 
org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest.secondMembershipPausesForLocatorWaitTime(MembershipIntegrationTest.java:270)

        Caused by:
        java.lang.AssertionError: 
        Expected size:<2> but was:<1> in:
        <[172.17.0.29(1:locator)<ec><v0>:41003]>
            at 
org.apache.geode.distributed.internal.membership.gms.MembershipIntegrationTest.lambda$secondMembershipPausesForLocatorWaitTime$8(MembershipIntegrationTest.java:271)

{quote}


> make locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted test 
> deterministic
> -----------------------------------------------------------------------------------
>
>                 Key: GEODE-8589
>                 URL: https://issues.apache.org/jira/browse/GEODE-8589
>             Project: Geode
>          Issue Type: Improvement
>          Components: membership
>            Reporter: Bill Burcham
>            Priority: Major
>              Labels: membership, pull-request-available
>
> A recent commit added a new test: 
> {{MembershipIntegrationTest.locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}
> That's a good test. But it would be better if it ran faster and was less 
> susceptible to timing issues. The problem is that the logic we are trying to 
> test, {{GMSJoinLeave.leave()}} uses {{System.currentTimeMillis()}} to get the 
> current time and it uses {{Thread.sleep()}} to sleep:
> {code:java}
>         long now = System.currentTimeMillis();
> ...
>             if (state.joinedMembersContacted <= 0 && ((now >= 
> locatorGiveUpTime &&
>                 tries >= minimumRetriesBeforeBecomingCoordinator) ||
>                 state.locatorsContacted >= locators.size())) {
> ...
>             if (System.currentTimeMillis() > giveupTime) {
>               break;
>             }
> ...
>               Thread.sleep(retrySleep);
> ...
>               giveupTime = System.currentTimeMillis() + timeout;
> ...
>       if (!this.isJoined) {
>         logger.debug("giving up attempting to join the distributed system 
> after "
>             + (System.currentTimeMillis() - startTime) + "ms");
>       }
> ...
>       if (!this.isJoined && state.hasContactedAJoinedLocator) {
>         throw new MemberStartupException("Unable to join the distributed 
> system in "
>             + (System.currentTimeMillis() - startTime) + "ms");
>       }
> {code}
> The opportunity is to _inject_ objects that handle these two functions (time 
> keeping and sleeping). This will enable us to artifically control these in 
> our test! Sleeping doesn't have to take any time at all. And we can make time 
> pass as quickly as we want to.
> For an example of how this can be done, see [PR 
> #5422|https://github.com/apache/geode/pull/5422] for GEODE-6950.
> This particular test 
> ({{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}) is a 
> little bit more involved than that one in that more objects are involved. In 
> addition to {{GMSJoinLeave}}, other classes involved in the test also need 
> current time and need to sleep.
> When this ticket is complete:
> * functional interfaces {{MillisecondProvider}} and {{Sleeper}}, currently 
> defined inside {{PrimaryHandler}} will be moved higher in the (internal) 
> hierarchy for use by other membership classes
> * {{GMSJoinLeave}} will take a {{MillisecondProvider}} and {{Sleeper}} in its 
> constructor and will delegate to those instead of calling 
> {{System.currentTimeMillis()}} and {{Thread.sleep()}} directly
> * TBD other classes may require injection of {{MillisecondProvider}} and 
> {{Sleeper}}
> * TBD other changes may be necessary in cases where collaborating classes 
> currently spin up threads or synchronize between threads e.g. calling 
> {{wait()}}
> * {{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}} will no 
> longer employ {{Thread.sleep()}} nor will it call 
> {{CompletableFuture.get(long timeout, TimeUnit unit)}}—instead it will 
> operate deterministically (ideally in the same thread but not necessarily) 
> with respect to the unit under test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to