Milan, Thanks for looking into this. I think you should ask a question on the expected timing semantics and guarantees on net-dev (with maybe a cc to nio-dev). As for our test. I agree with you that we should simply work a possibility of early returns into the check.
... /* The acceptable variation of early returns from timed socket operations. */ private static final long PREMATURE_RETURN = adjustTimeout(100); ... long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos); if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) { String msg = String.format( "elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s", timeout, timeout, TOLERANCE, PREMATURE_RETURN); throw new RuntimeException(msg); } ... Thoughts? -Pavel > On 24 Sep 2019, at 09:12, Milan Mimica <milan.mim...@gmail.com> wrote: > > Hi Pavel > > Wow, I find this awesome. I don't have a Windows machine to play with, > but I think I may have found something. > The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix > it uses poll(2), on Windows it uses select(2). Regarding timeouts, > poll() has "wait at least" semantics and overruns by design[1], while > select() on windows has "waits at most" semantics, or how they put > it[2]: "specifies the maximum time that select should wait before > returning.". It returns early by design! Is this a known thing? I > don't think there is much one can do here. It probably makes no sense > to loop it and wait for time remainder. > Java's soTimeout does not specify[3] should it wait at least or at > most the specified timeout, so it's fine I guess. Old, "plain" socket > impl are not much different. > > If the above is correct, should I just add a tolerance for the lower bound? > > [1] http://man7.org/linux/man-pages/man2/poll.2.html > [2] > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select > [3] > https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int- > > On Mon, 23 Sep 2019 at 16:15, Pavel Rappo <pavel.ra...@oracle.com> wrote: >> >> Milan, >> >> I'm observing the latest version (.04) of this test failing quite frequently >> (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test >> passes fine on macOS and Linux. Here's the typical output I see in the logs: >> >> java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000 >> java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000 >> java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000 >> java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000 >> ... >> >> Now, there might be many reasons for that. One of which would be that the >> DnsClient code is buggy. The other reason would be that the accuracy >> guaranteed by Windows implementation of `read` is not what we would expect. >> Would you be able to investigate that? >> >> P.S. The good news is that the CSR has been approved: >> >> https://bugs.openjdk.java.net/browse/JDK-8230965 >> >>> On 23 Sep 2019, at 14:20, Milan Mimica <milan.mim...@gmail.com> wrote: >>> >>> Got it. Thanks Pavel! >>> >>> >>> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo <pavel.ra...@oracle.com> wrote: >>>> >>>> Milan, >>>> >>>> How do you check which tests are run? That's what I see in the >>>> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr >>>> file after I have run the test locally on my machine: >>>> >>>> ----------messages:(5/233)---------- >>>> command: main TcpTimeout >>>> reason: User specified action: run main TcpTimeout >>>> Mode: othervm >>>> Additional options from @modules: --add-modules java.base --add-exports >>>> java.base/sun.security.util=ALL-UNNAMED >>>> elapsed time (seconds): 1.751 >>>> >>>> ... >>>> >>>> ----------messages:(5/313)---------- >>>> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 >>>> reason: User specified action: run main TcpTimeout >>>> -Dcom.sun.jndi.dns.timeout.initial=5000 >>>> Mode: othervm >>>> Additional options from @modules: --add-modules java.base --add-exports >>>> java.base/sun.security.util=ALL-UNNAMED >>>> elapsed time (seconds): 5.498 >>>> >>>> ------------------------------------ >>>> >>>> Which is consistent with what I would expect given the timeout values. >>>> >>>> The following output does not tell the full story, just the name of the >>>> test: >>>> >>>> ============================== >>>> Test summary >>>> ============================== >>>> TEST TOTAL PASS FAIL ERROR >>>> jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java >>>> 1 1 0 0 >>>> ============================== >>>> TEST SUCCESS >>>> >>>> -Pavel >>>> >>>>> On 20 Sep 2019, at 15:42, Milan Mimica <milan.mim...@gmail.com> wrote: >>>>> >>>>> Pavel, >>>>> >>>>> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/ >>>>> I don't see the test is run twice when I execute "make test >>>>> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am >>>>> I missing something? >>>>> >>> >>> >>> -- >>> Milan Mimica >> > > > -- > Milan Mimica