LdapTimeoutTest.java failed due to timeout on DeadServerNoTimeoutTest is incorrect

Pavel Rappo Tue, 10 Sep 2019 07:25:51 -0700

> On 6 Sep 2019, at 20:02, Martin Buchholz <[email protected]> wrote:
> 
> Martin's rules for test methods:
> - never have more than 100 millisecond total expected runtime
> - never have more than 12 millisecond blocking time for any single operation
> - yet be able to survive a one-time 5 second thread suspension at any time


No doubt these are reputable numbers however magic they seem. I expect these 
make sense in a highly focused code, which j.u.c might be an example of. LDAP, 
on the other hand, does everything but the kitchen sink. So much so that I 
cannot guarantee any of those metrics to hold.

The latter one is of particular interest to me. Could you maybe elaborate on 
what "a rare 5-second thread suspension" is? (Not that it will make me change 
the code, but out of pure curiosity.)

> adjusting wait time looks like an anti-pattern to me.  You can't stop a rare 
> 5-second thread suspension at line 387.
> 
>  385         final long waitTime = hiMillis -
>  386                 NANOSECONDS.toMillis(System.nanoTime() - startNanos); // 
> (2) adjust wait time
>  387 
>  388         try {
>  389             T r = task.get(waitTime, MILLISECONDS); // (3) wait for the 
> task to complete
> 

I agree it might make the code look more complicated, but it doesn't deserve 
the name "anti-pattern". Is it a bad solution? Should I expect it to backfire? 
The reason for using that is to make sure the code accommodates for a given 
timeout once, not many times.

Yes, I agree I can't stop a thread suspension at line 387 or anywhere else for 
that matter, but I shouldn't account for it either. Maybe just once -- in the 
value of the timeout.

Again, many thanks for looking at it!

Thoughts?

> On Fri, Sep 6, 2019 at 9:33 AM Pavel Rappo <[email protected]> wrote:
> > On 6 Sep 2019, at 15:59, Martin Buchholz <[email protected]> wrote:
> > 
> > I took another look at LdapTimeoutTest.java.
> 
> Many thanks!
> 
> > I was surprised to see RIGHT_MARGIN.  In jsr166 we succeed in having one 
> > timeout that is long enough to "never happen".  I'm still advocating the 10 
> > second value.
> > 
> > I was surprised to see LEFT_MARGIN.  We just fixed Thread.sleep, so we have 
> > no known problems with JDK methods returning early - you can trust timed 
> > get!
> > You start measuring, by calling nanoTime, before you start the activity you 
> > are measuring, so there should be no need for LEFT_MARGIN.
> 
> You raised many good points. Let me try to address them.
> 
> 1. RIGHT_MARGIN is not used for checking that the activity has stuck 
> infinitely (assertIncompletion). INFINITY_MILLIS is used for that. 
> RIGHT_MARGIN is used for checking that the activity takes some predefined 
> amount of time (roughly).
> 
> 2. As for the numeric value of INFINITY_MILLIS, the reason I chose 20 seconds 
> is twofold. Firstly, the code under test is subject to different timeouts. 
> Every timeout value should be distinctive. Otherwise, how would I 
> differentiate between them? For example, if I chose INFINITY_MILLIS to be 10 
> seconds how would I know if the code is stuck due to the read timeout of 10 
> seconds or the "infinite timeout"? Secondly, I must take into account slow 
> machines. Really slow virtual machines. Hence, minimal timeouts 
> (read/connect) have a magnitude of seconds and tens of seconds and the 
> "rightmost", infinite timeout, is 20 seconds.
> 
> 3. LEFT_MARGIN might no longer be needed due to the fact that no timed 
> methods return early (actually there is a comment about it inside those two 
> assert methods).
> 
> > We have some fresh thread-awaiting code here:
> > http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/tck/JSR166TestCase.java?view=markup#l1443
> 
> Interesting.
> 
> > Instead of communicating startTime from the test thread back to the main 
> > thread, I would do my loMillis checking in the test thread, and hiMillis 
> > checking in the main thread, like e.g. compare with a fresh test method 
> > testTimedOffer
> > http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/tck/ArrayBlockingQueueTest.java?view=markup#l394
> 
> Understood. However, that might be a matter of taste. I prefer to have all 
> the calculations and error handling in one place. Unless there's a good 
> reason I wouldn't change it.
> 
> > Timeouts should be adjusted via Utils.adjustTimeout
> 
> That makes perfect sense. I never knew this method existed. Thanks!
> 
> > On Fri, Sep 6, 2019 at 4:31 AM Pavel Rappo <[email protected]> wrote:
> > Martin, thanks for having a look at it.
> > 
> > I'd appreciate if you could have a look at the timeout measuring mechanics 
> > in assertCompletion/assertIncompletion specifically, maybe to spot 
> > something that is grossly inadequate.
> > 
> > I tried to accommodate some usual suspects of timeout measurements 
> > failures. I understand that since we're not working with real-time systems, 
> > my attempts to build bullet-proof measurement mechanics are futile.
> > 
> > -Pavel
> > 
> > > On 30 Aug 2019, at 18:19, Martin Buchholz <[email protected]> wrote:
> > > 
> > > Not really a review, but:
> > > 
> > > For many years we've been using 10 seconds (scaled by timeout factor) as 
> > > a duration long enough that a timeout is a real failure.
> > > Which is close to your own 20 seconds.  Of course, no value is surely 
> > > safe.
> > > 
> > > Probably, parallel testing infrastructure for timeouts should be a test 
> > > library method.  I do something similar in JSR166TestCase
> > > 
> > >     /**
> > >      * Runs all the given actions in parallel, failing if any fail.
> > >      * Useful for running multiple variants of tests that are
> > >      * necessarily individually slow because they must block.
> > >      */
> > >     void testInParallel(Action ... actions) {
> > >         ExecutorService pool = Executors.newCachedThreadPool();
> > >         try (PoolCleaner cleaner = cleaner(pool)) {
> > > 
> > > On Fri, Aug 30, 2019 at 6:23 AM Daniel Fuchs <[email protected]> 
> > > wrote:
> > > On 30/08/2019 13:54, Pavel Rappo wrote:
> > > > Updated,
> > > > 
> > > >      http://cr.openjdk.java.net/~prappo/8151678/webrev.01/
> > > > 
> > > 
> > > Changes look good!
> > > 
> > > best regards,
> > > 
> > > -- daniel
> > 
>

Re: RFR [14] 8151678: com/sun/jndi/ldap/LdapTimeoutTest.java failed due to timeout on DeadServerNoTimeoutTest is incorrect

Reply via email to