On Tue, Sep 10, 2019 at 7:25 AM Pavel Rappo <pavel.ra...@oracle.com> wrote:
> > On 6 Sep 2019, at 20:02, Martin Buchholz <marti...@google.com> wrote: > > > > Martin's rules for test methods: > > - never have more than 100 millisecond total expected runtime > > - never have more than 12 millisecond blocking time for any single > operation > > - yet be able to survive a one-time 5 second thread suspension at any > time > > No doubt these are reputable numbers however magic they seem. I expect > these make sense in a highly focused code, which j.u.c might be an example > of. LDAP, on the other hand, does everything but the kitchen sink. So much > so that I cannot guarantee any of those metrics to hold. > > The latter one is of particular interest to me. Could you maybe elaborate > on what "a rare 5-second thread suspension" is? (Not that it will make me > change the code, but out of pure curiosity.) > Suppose you are waiting for some other thread to complete some trivial observable operation, like counting down a latch. At some point you want to time out and report failure. How long before spurious failures disappear entirely? We used to use 250ms and see rare failures - too small! After switching to 10 seconds (scaled by timeout factor) spurious failures due to thread suspension have disappeared in practice. The longest thread suspension I ever actually observed was 4 seconds. Of course, no guarantees - higher priority processes can always hog the cpu. If you are waiting for a less trivial operation, you may need a higher value than our magic 10 seconds. An alternative is to wait "forever" and rely on jtreg's own timeout handler to kick in and fail the test for you. public void await(Semaphore semaphore) { boolean timedOut = false; try { timedOut = !semaphore.tryAcquire(LONG_DELAY_MS, MILLISECONDS); } catch (Throwable fail) { threadUnexpectedException(fail); } if (timedOut) fail("timed out waiting for Semaphore for " + (LONG_DELAY_MS/1000) + " sec"); }