Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, I satisfied myself by running the final version of the test some 8k times and then pushed the change. Thanks for your patience and persistence. I saw your question on the net-dev and nio-dev mailing lists. Thanks. -Pavel > On 24 Sep 2019, at 13:41, Milan Mimica wrote: > > Pavel, > > Deal. Handling early returns too: > http://cr.openjdk.java.net/~mmimica/8228580/webrev.05/ > I will ask there about socket timeout semantics. > > On Tue, 24 Sep 2019 at 12:51, Pavel Rappo wrote: >> >> Milan, >> >> Thanks for looking into this. I think you should ask a question on the >> expected timing semantics and guarantees on net-dev (with maybe a cc to >> nio-dev). >> As for our test. I agree with you that we should simply work a possibility >> of early returns into the check. >> >> ... >> >> /* The acceptable variation of early returns from timed socket operations. */ >> private static final long PREMATURE_RETURN = adjustTimeout(100); >> >> ... >> >> long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos); >> if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) { >>String msg = String.format( >>"elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s", >>timeout, timeout, TOLERANCE, PREMATURE_RETURN); >>throw new RuntimeException(msg); >> } >> >> ... >> >> Thoughts? >> >> -Pavel >> >>> On 24 Sep 2019, at 09:12, Milan Mimica wrote: >>> >>> Hi Pavel >>> >>> Wow, I find this awesome. I don't have a Windows machine to play with, >>> but I think I may have found something. >>> The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix >>> it uses poll(2), on Windows it uses select(2). Regarding timeouts, >>> poll() has "wait at least" semantics and overruns by design[1], while >>> select() on windows has "waits at most" semantics, or how they put >>> it[2]: "specifies the maximum time that select should wait before >>> returning.". It returns early by design! Is this a known thing? I >>> don't think there is much one can do here. It probably makes no sense >>> to loop it and wait for time remainder. >>> Java's soTimeout does not specify[3] should it wait at least or at >>> most the specified timeout, so it's fine I guess. Old, "plain" socket >>> impl are not much different. >>> >>> If the above is correct, should I just add a tolerance for the lower bound? >>> >>> [1] http://man7.org/linux/man-pages/man2/poll.2.html >>> [2] >>> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select >>> [3] >>> https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int- >>> >>> On Mon, 23 Sep 2019 at 16:15, Pavel Rappo wrote: Milan, I'm observing the latest version (.04) of this test failing quite frequently (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test passes fine on macOS and Linux. Here's the typical output I see in the logs: java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000 java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000 java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000 java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000 ... Now, there might be many reasons for that. One of which would be that the DnsClient code is buggy. The other reason would be that the accuracy guaranteed by Windows implementation of `read` is not what we would expect. Would you be able to investigate that? P.S. The good news is that the CSR has been approved: https://bugs.openjdk.java.net/browse/JDK-8230965 > On 23 Sep 2019, at 14:20, Milan Mimica wrote: > > Got it. Thanks Pavel! > > > On Mon, 23 Sep 2019 at 13:37, Pavel Rappo wrote: >> >> Milan, >> >> How do you check which tests are run? That's what I see in the >> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr >> file after I have run the test locally on my machine: >> >> --messages:(5/233)-- >> command: main TcpTimeout >> reason: User specified action: run main TcpTimeout >> Mode: othervm >> Additional options from @modules: --add-modules java.base --add-exports >> java.base/sun.security.util=ALL-UNNAMED >> elapsed time (seconds): 1.751 >> >> ... >> >> --messages:(5/313)-- >> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 >> reason: User specified action: run main TcpTimeout >> -Dcom.sun.jndi.dns.timeout.initial=5000 >> Mode: othervm >> Additional options from @modules: --add-modules java.base --add-exports >> java.base/sun.security.util=ALL-UNNAMED >> elapsed time (seconds): 5.498 >> >> >> >>
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 24 Sep 2019, at 13:41, Milan Mimica wrote: > > Pavel, > > Deal. Handling early returns too: > http://cr.openjdk.java.net/~mmimica/8228580/webrev.05/ LGTM -Chris
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Pavel, Deal. Handling early returns too: http://cr.openjdk.java.net/~mmimica/8228580/webrev.05/ I will ask there about socket timeout semantics. On Tue, 24 Sep 2019 at 12:51, Pavel Rappo wrote: > > Milan, > > Thanks for looking into this. I think you should ask a question on the > expected timing semantics and guarantees on net-dev (with maybe a cc to > nio-dev). > As for our test. I agree with you that we should simply work a possibility of > early returns into the check. > > ... > > /* The acceptable variation of early returns from timed socket operations. */ > private static final long PREMATURE_RETURN = adjustTimeout(100); > > ... > > long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos); > if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) { > String msg = String.format( > "elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s", > timeout, timeout, TOLERANCE, PREMATURE_RETURN); > throw new RuntimeException(msg); > } > > ... > > Thoughts? > > -Pavel > > > On 24 Sep 2019, at 09:12, Milan Mimica wrote: > > > > Hi Pavel > > > > Wow, I find this awesome. I don't have a Windows machine to play with, > > but I think I may have found something. > > The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix > > it uses poll(2), on Windows it uses select(2). Regarding timeouts, > > poll() has "wait at least" semantics and overruns by design[1], while > > select() on windows has "waits at most" semantics, or how they put > > it[2]: "specifies the maximum time that select should wait before > > returning.". It returns early by design! Is this a known thing? I > > don't think there is much one can do here. It probably makes no sense > > to loop it and wait for time remainder. > > Java's soTimeout does not specify[3] should it wait at least or at > > most the specified timeout, so it's fine I guess. Old, "plain" socket > > impl are not much different. > > > > If the above is correct, should I just add a tolerance for the lower bound? > > > > [1] http://man7.org/linux/man-pages/man2/poll.2.html > > [2] > > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select > > [3] > > https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int- > > > > On Mon, 23 Sep 2019 at 16:15, Pavel Rappo wrote: > >> > >> Milan, > >> > >> I'm observing the latest version (.04) of this test failing quite > >> frequently (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) > >> machines. The test passes fine on macOS and Linux. Here's the typical > >> output I see in the logs: > >> > >>java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000 > >>java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000 > >>java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000 > >>java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000 > >>... > >> > >> Now, there might be many reasons for that. One of which would be that the > >> DnsClient code is buggy. The other reason would be that the accuracy > >> guaranteed by Windows implementation of `read` is not what we would > >> expect. Would you be able to investigate that? > >> > >> P.S. The good news is that the CSR has been approved: > >> > >>https://bugs.openjdk.java.net/browse/JDK-8230965 > >> > >>> On 23 Sep 2019, at 14:20, Milan Mimica wrote: > >>> > >>> Got it. Thanks Pavel! > >>> > >>> > >>> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo wrote: > > Milan, > > How do you check which tests are run? That's what I see in the > /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr > file after I have run the test locally on my machine: > > --messages:(5/233)-- > command: main TcpTimeout > reason: User specified action: run main TcpTimeout > Mode: othervm > Additional options from @modules: --add-modules java.base --add-exports > java.base/sun.security.util=ALL-UNNAMED > elapsed time (seconds): 1.751 > > ... > > --messages:(5/313)-- > command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 > reason: User specified action: run main TcpTimeout > -Dcom.sun.jndi.dns.timeout.initial=5000 > Mode: othervm > Additional options from @modules: --add-modules java.base --add-exports > java.base/sun.security.util=ALL-UNNAMED > elapsed time (seconds): 5.498 > > > > Which is consistent with what I would expect given the timeout values. > > The following output does not tell the full story, just the name of the > test: > > == > Test summary > == > TEST TOTAL PASS
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, Thanks for looking into this. I think you should ask a question on the expected timing semantics and guarantees on net-dev (with maybe a cc to nio-dev). As for our test. I agree with you that we should simply work a possibility of early returns into the check. ... /* The acceptable variation of early returns from timed socket operations. */ private static final long PREMATURE_RETURN = adjustTimeout(100); ... long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos); if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) { String msg = String.format( "elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s", timeout, timeout, TOLERANCE, PREMATURE_RETURN); throw new RuntimeException(msg); } ... Thoughts? -Pavel > On 24 Sep 2019, at 09:12, Milan Mimica wrote: > > Hi Pavel > > Wow, I find this awesome. I don't have a Windows machine to play with, > but I think I may have found something. > The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix > it uses poll(2), on Windows it uses select(2). Regarding timeouts, > poll() has "wait at least" semantics and overruns by design[1], while > select() on windows has "waits at most" semantics, or how they put > it[2]: "specifies the maximum time that select should wait before > returning.". It returns early by design! Is this a known thing? I > don't think there is much one can do here. It probably makes no sense > to loop it and wait for time remainder. > Java's soTimeout does not specify[3] should it wait at least or at > most the specified timeout, so it's fine I guess. Old, "plain" socket > impl are not much different. > > If the above is correct, should I just add a tolerance for the lower bound? > > [1] http://man7.org/linux/man-pages/man2/poll.2.html > [2] > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select > [3] > https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int- > > On Mon, 23 Sep 2019 at 16:15, Pavel Rappo wrote: >> >> Milan, >> >> I'm observing the latest version (.04) of this test failing quite frequently >> (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test >> passes fine on macOS and Linux. Here's the typical output I see in the logs: >> >>java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000 >>java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000 >>java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000 >>java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000 >>... >> >> Now, there might be many reasons for that. One of which would be that the >> DnsClient code is buggy. The other reason would be that the accuracy >> guaranteed by Windows implementation of `read` is not what we would expect. >> Would you be able to investigate that? >> >> P.S. The good news is that the CSR has been approved: >> >>https://bugs.openjdk.java.net/browse/JDK-8230965 >> >>> On 23 Sep 2019, at 14:20, Milan Mimica wrote: >>> >>> Got it. Thanks Pavel! >>> >>> >>> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo wrote: Milan, How do you check which tests are run? That's what I see in the /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr file after I have run the test locally on my machine: --messages:(5/233)-- command: main TcpTimeout reason: User specified action: run main TcpTimeout Mode: othervm Additional options from @modules: --add-modules java.base --add-exports java.base/sun.security.util=ALL-UNNAMED elapsed time (seconds): 1.751 ... --messages:(5/313)-- command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 reason: User specified action: run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 Mode: othervm Additional options from @modules: --add-modules java.base --add-exports java.base/sun.security.util=ALL-UNNAMED elapsed time (seconds): 5.498 Which is consistent with what I would expect given the timeout values. The following output does not tell the full story, just the name of the test: == Test summary == TEST TOTAL PASS FAIL ERROR jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java 1 1 0 0 == TEST SUCCESS -Pavel > On 20 Sep 2019, at 15:42, Milan Mimica wrote: > > Pavel, > > Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/ > I don't see the
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Hi Pavel Wow, I find this awesome. I don't have a Windows machine to play with, but I think I may have found something. The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix it uses poll(2), on Windows it uses select(2). Regarding timeouts, poll() has "wait at least" semantics and overruns by design[1], while select() on windows has "waits at most" semantics, or how they put it[2]: "specifies the maximum time that select should wait before returning.". It returns early by design! Is this a known thing? I don't think there is much one can do here. It probably makes no sense to loop it and wait for time remainder. Java's soTimeout does not specify[3] should it wait at least or at most the specified timeout, so it's fine I guess. Old, "plain" socket impl are not much different. If the above is correct, should I just add a tolerance for the lower bound? [1] http://man7.org/linux/man-pages/man2/poll.2.html [2] https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select [3] https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int- On Mon, 23 Sep 2019 at 16:15, Pavel Rappo wrote: > > Milan, > > I'm observing the latest version (.04) of this test failing quite frequently > (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test > passes fine on macOS and Linux. Here's the typical output I see in the logs: > > java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000 > java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000 > java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000 > java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000 > ... > > Now, there might be many reasons for that. One of which would be that the > DnsClient code is buggy. The other reason would be that the accuracy > guaranteed by Windows implementation of `read` is not what we would expect. > Would you be able to investigate that? > > P.S. The good news is that the CSR has been approved: > > https://bugs.openjdk.java.net/browse/JDK-8230965 > > > On 23 Sep 2019, at 14:20, Milan Mimica wrote: > > > > Got it. Thanks Pavel! > > > > > > On Mon, 23 Sep 2019 at 13:37, Pavel Rappo wrote: > >> > >> Milan, > >> > >> How do you check which tests are run? That's what I see in the > >> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr > >> file after I have run the test locally on my machine: > >> > >> --messages:(5/233)-- > >> command: main TcpTimeout > >> reason: User specified action: run main TcpTimeout > >> Mode: othervm > >> Additional options from @modules: --add-modules java.base --add-exports > >> java.base/sun.security.util=ALL-UNNAMED > >> elapsed time (seconds): 1.751 > >> > >> ... > >> > >> --messages:(5/313)-- > >> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 > >> reason: User specified action: run main TcpTimeout > >> -Dcom.sun.jndi.dns.timeout.initial=5000 > >> Mode: othervm > >> Additional options from @modules: --add-modules java.base --add-exports > >> java.base/sun.security.util=ALL-UNNAMED > >> elapsed time (seconds): 5.498 > >> > >> > >> > >> Which is consistent with what I would expect given the timeout values. > >> > >> The following output does not tell the full story, just the name of the > >> test: > >> > >> == > >> Test summary > >> == > >> TEST TOTAL PASS FAIL ERROR > >> jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java > >> 1 1 0 0 > >> == > >> TEST SUCCESS > >> > >> -Pavel > >> > >>> On 20 Sep 2019, at 15:42, Milan Mimica wrote: > >>> > >>> Pavel, > >>> > >>> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/ > >>> I don't see the test is run twice when I execute "make test > >>> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am > >>> I missing something? > >>> > > > > > > -- > > Milan Mimica > -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, I'm observing the latest version (.04) of this test failing quite frequently (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test passes fine on macOS and Linux. Here's the typical output I see in the logs: java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000 java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000 java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000 java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000 ... Now, there might be many reasons for that. One of which would be that the DnsClient code is buggy. The other reason would be that the accuracy guaranteed by Windows implementation of `read` is not what we would expect. Would you be able to investigate that? P.S. The good news is that the CSR has been approved: https://bugs.openjdk.java.net/browse/JDK-8230965 > On 23 Sep 2019, at 14:20, Milan Mimica wrote: > > Got it. Thanks Pavel! > > > On Mon, 23 Sep 2019 at 13:37, Pavel Rappo wrote: >> >> Milan, >> >> How do you check which tests are run? That's what I see in the >> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr >> file after I have run the test locally on my machine: >> >> --messages:(5/233)-- >> command: main TcpTimeout >> reason: User specified action: run main TcpTimeout >> Mode: othervm >> Additional options from @modules: --add-modules java.base --add-exports >> java.base/sun.security.util=ALL-UNNAMED >> elapsed time (seconds): 1.751 >> >> ... >> >> --messages:(5/313)-- >> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 >> reason: User specified action: run main TcpTimeout >> -Dcom.sun.jndi.dns.timeout.initial=5000 >> Mode: othervm >> Additional options from @modules: --add-modules java.base --add-exports >> java.base/sun.security.util=ALL-UNNAMED >> elapsed time (seconds): 5.498 >> >> >> >> Which is consistent with what I would expect given the timeout values. >> >> The following output does not tell the full story, just the name of the test: >> >> == >> Test summary >> == >> TEST TOTAL PASS FAIL ERROR >> jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java >> 1 1 0 0 >> == >> TEST SUCCESS >> >> -Pavel >> >>> On 20 Sep 2019, at 15:42, Milan Mimica wrote: >>> >>> Pavel, >>> >>> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/ >>> I don't see the test is run twice when I execute "make test >>> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am >>> I missing something? >>> > > > -- > Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Got it. Thanks Pavel! On Mon, 23 Sep 2019 at 13:37, Pavel Rappo wrote: > > Milan, > > How do you check which tests are run? That's what I see in the > /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr > file after I have run the test locally on my machine: > > --messages:(5/233)-- > command: main TcpTimeout > reason: User specified action: run main TcpTimeout > Mode: othervm > Additional options from @modules: --add-modules java.base --add-exports > java.base/sun.security.util=ALL-UNNAMED > elapsed time (seconds): 1.751 > > ... > > --messages:(5/313)-- > command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 > reason: User specified action: run main TcpTimeout > -Dcom.sun.jndi.dns.timeout.initial=5000 > Mode: othervm > Additional options from @modules: --add-modules java.base --add-exports > java.base/sun.security.util=ALL-UNNAMED > elapsed time (seconds): 5.498 > > > > Which is consistent with what I would expect given the timeout values. > > The following output does not tell the full story, just the name of the test: > > == > Test summary > == >TEST TOTAL PASS FAIL ERROR >jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java > 1 1 0 0 > == > TEST SUCCESS > > -Pavel > > > On 20 Sep 2019, at 15:42, Milan Mimica wrote: > > > > Pavel, > > > > Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/ > > I don't see the test is run twice when I execute "make test > > TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am > > I missing something? > > -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, How do you check which tests are run? That's what I see in the /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr file after I have run the test locally on my machine: --messages:(5/233)-- command: main TcpTimeout reason: User specified action: run main TcpTimeout Mode: othervm Additional options from @modules: --add-modules java.base --add-exports java.base/sun.security.util=ALL-UNNAMED elapsed time (seconds): 1.751 ... --messages:(5/313)-- command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 reason: User specified action: run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 Mode: othervm Additional options from @modules: --add-modules java.base --add-exports java.base/sun.security.util=ALL-UNNAMED elapsed time (seconds): 5.498 Which is consistent with what I would expect given the timeout values. The following output does not tell the full story, just the name of the test: == Test summary == TEST TOTAL PASS FAIL ERROR jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java 1 1 0 0 == TEST SUCCESS -Pavel > On 20 Sep 2019, at 15:42, Milan Mimica wrote: > > Pavel, > > Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/ > I don't see the test is run twice when I execute "make test > TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am > I missing something? > > On Thu, 19 Sep 2019 at 13:16, Pavel Rappo wrote: >> >> Milan, >> >> This looks like a good start. Please consider >> >> 1. Setting TOLERANCE to 5 seconds >> 2. Getting the second time mark immediately after the query returns (i.e. do >> not waste your time in DNSTestUtils.debug(attrs)) >> 3. Making the test parametric instead of hardcoded for the >> DEFAULT_DNS_CLIENT_TIMEOUT >> 4. Running the test for at least 2 different values of the timeout, e.g.: >> >>* @run main TcpTimeout >>* @run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 >> >> As for you question, I'm not sure how we would be able to communicate the >> fact that the response is truncated to the user. You could try to ask this >> on the net-dev mailing list. >> >> -Pavel >> >>> On 18 Sep 2019, at 14:25, Milan Mimica wrote: >>> >>> Hi Pavel >>> >>> Sure. Here is the incremental change: >>> http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/ >>> >>> What actually bothers me from the beginning is the truncated response. >>> The TXT attribute, a String, prints "A very popular h", but does not >>> equals("A very popular h"), because of some stray bytes. I guess it's >>> because of how DNS response parsing works. I can imagine how this >>> could cause problems to users. I think, at least, we should have a way >>> to tell the user that the response is truncated, and the payload is >>> partial/invalid. >>> >>> >>> On Tue, 17 Sep 2019 at 15:09, Pavel Rappo wrote: Milan, While the CSR is being processed, could we maybe think of some additional testing for that change? Otherwise, that test seems kind of anemic. It makes sure that the query doesn't hang, but that's about it. It doesn't check that the timeout is respected. I was wondering if you could propose some way of testing that. > On 17 Sep 2019, at 09:55, Pavel Rappo wrote: > > I have filed the CSR: > > https://bugs.openjdk.java.net/browse/JDK-8230965 > >> On 13 Sep 2019, at 11:21, Pavel Rappo wrote: >> >> Here's the latest webrev accumulating all the changes we've discussed so >> far: >> >> http://cr.openjdk.java.net/~prappo/8228580/webrev.03/ >> >> If people are okay with that I will proceed to creating a CSR. >>> >>> >>> -- >>> Milan Mimica >> > > > -- > Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Pavel, Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/ I don't see the test is run twice when I execute "make test TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am I missing something? On Thu, 19 Sep 2019 at 13:16, Pavel Rappo wrote: > > Milan, > > This looks like a good start. Please consider > > 1. Setting TOLERANCE to 5 seconds > 2. Getting the second time mark immediately after the query returns (i.e. do > not waste your time in DNSTestUtils.debug(attrs)) > 3. Making the test parametric instead of hardcoded for the > DEFAULT_DNS_CLIENT_TIMEOUT > 4. Running the test for at least 2 different values of the timeout, e.g.: > > * @run main TcpTimeout > * @run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 > > As for you question, I'm not sure how we would be able to communicate the > fact that the response is truncated to the user. You could try to ask this on > the net-dev mailing list. > > -Pavel > > > On 18 Sep 2019, at 14:25, Milan Mimica wrote: > > > > Hi Pavel > > > > Sure. Here is the incremental change: > > http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/ > > > > What actually bothers me from the beginning is the truncated response. > > The TXT attribute, a String, prints "A very popular h", but does not > > equals("A very popular h"), because of some stray bytes. I guess it's > > because of how DNS response parsing works. I can imagine how this > > could cause problems to users. I think, at least, we should have a way > > to tell the user that the response is truncated, and the payload is > > partial/invalid. > > > > > > On Tue, 17 Sep 2019 at 15:09, Pavel Rappo wrote: > >> > >> Milan, > >> > >> While the CSR is being processed, could we maybe think of some additional > >> testing for that change? Otherwise, that test seems kind of anemic. It > >> makes sure that the query doesn't hang, but that's about it. It doesn't > >> check that the timeout is respected. I was wondering if you could propose > >> some way of testing that. > >> > >>> On 17 Sep 2019, at 09:55, Pavel Rappo wrote: > >>> > >>> I have filed the CSR: > >>> > >>> https://bugs.openjdk.java.net/browse/JDK-8230965 > >>> > On 13 Sep 2019, at 11:21, Pavel Rappo wrote: > > Here's the latest webrev accumulating all the changes we've discussed so > far: > > http://cr.openjdk.java.net/~prappo/8228580/webrev.03/ > > If people are okay with that I will proceed to creating a CSR. > >> > > > > > > -- > > Milan Mimica > -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, This looks like a good start. Please consider 1. Setting TOLERANCE to 5 seconds 2. Getting the second time mark immediately after the query returns (i.e. do not waste your time in DNSTestUtils.debug(attrs)) 3. Making the test parametric instead of hardcoded for the DEFAULT_DNS_CLIENT_TIMEOUT 4. Running the test for at least 2 different values of the timeout, e.g.: * @run main TcpTimeout * @run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000 As for you question, I'm not sure how we would be able to communicate the fact that the response is truncated to the user. You could try to ask this on the net-dev mailing list. -Pavel > On 18 Sep 2019, at 14:25, Milan Mimica wrote: > > Hi Pavel > > Sure. Here is the incremental change: > http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/ > > What actually bothers me from the beginning is the truncated response. > The TXT attribute, a String, prints "A very popular h", but does not > equals("A very popular h"), because of some stray bytes. I guess it's > because of how DNS response parsing works. I can imagine how this > could cause problems to users. I think, at least, we should have a way > to tell the user that the response is truncated, and the payload is > partial/invalid. > > > On Tue, 17 Sep 2019 at 15:09, Pavel Rappo wrote: >> >> Milan, >> >> While the CSR is being processed, could we maybe think of some additional >> testing for that change? Otherwise, that test seems kind of anemic. It makes >> sure that the query doesn't hang, but that's about it. It doesn't check that >> the timeout is respected. I was wondering if you could propose some way of >> testing that. >> >>> On 17 Sep 2019, at 09:55, Pavel Rappo wrote: >>> >>> I have filed the CSR: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8230965 >>> On 13 Sep 2019, at 11:21, Pavel Rappo wrote: Here's the latest webrev accumulating all the changes we've discussed so far: http://cr.openjdk.java.net/~prappo/8228580/webrev.03/ If people are okay with that I will proceed to creating a CSR. >> > > > -- > Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Hi Pavel Sure. Here is the incremental change: http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/ What actually bothers me from the beginning is the truncated response. The TXT attribute, a String, prints "A very popular h", but does not equals("A very popular h"), because of some stray bytes. I guess it's because of how DNS response parsing works. I can imagine how this could cause problems to users. I think, at least, we should have a way to tell the user that the response is truncated, and the payload is partial/invalid. On Tue, 17 Sep 2019 at 15:09, Pavel Rappo wrote: > > Milan, > > While the CSR is being processed, could we maybe think of some additional > testing for that change? Otherwise, that test seems kind of anemic. It makes > sure that the query doesn't hang, but that's about it. It doesn't check that > the timeout is respected. I was wondering if you could propose some way of > testing that. > > > On 17 Sep 2019, at 09:55, Pavel Rappo wrote: > > > > I have filed the CSR: > > > >https://bugs.openjdk.java.net/browse/JDK-8230965 > > > >> On 13 Sep 2019, at 11:21, Pavel Rappo wrote: > >> > >> Here's the latest webrev accumulating all the changes we've discussed so > >> far: > >> > >> http://cr.openjdk.java.net/~prappo/8228580/webrev.03/ > >> > >> If people are okay with that I will proceed to creating a CSR. > -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, While the CSR is being processed, could we maybe think of some additional testing for that change? Otherwise, that test seems kind of anemic. It makes sure that the query doesn't hang, but that's about it. It doesn't check that the timeout is respected. I was wondering if you could propose some way of testing that. > On 17 Sep 2019, at 09:55, Pavel Rappo wrote: > > I have filed the CSR: > >https://bugs.openjdk.java.net/browse/JDK-8230965 > >> On 13 Sep 2019, at 11:21, Pavel Rappo wrote: >> >> Here's the latest webrev accumulating all the changes we've discussed so far: >> >> http://cr.openjdk.java.net/~prappo/8228580/webrev.03/ >> >> If people are okay with that I will proceed to creating a CSR.
Re: RFR(s): 8228580: DnsClient TCP socket timeout
I have filed the CSR: https://bugs.openjdk.java.net/browse/JDK-8230965 > On 13 Sep 2019, at 11:21, Pavel Rappo wrote: > > Here's the latest webrev accumulating all the changes we've discussed so far: > >http://cr.openjdk.java.net/~prappo/8228580/webrev.03/ > > If people are okay with that I will proceed to creating a CSR. > >> On 12 Sep 2019, at 20:06, Alan Bateman wrote: >> >> On 12/09/2019 12:41, Chris Hegarty wrote: >>> Here is an initial attempt to document these timeout/retry properties. It’s >>> effectively the wording lifted from the tech notes [1], with “UDP” removed. >> Thanks, this look and dropping the reference to "UDP" make sense. >> >> One minor nit is that "are relevant to the DNS provider" might be better >> worded as "may be used when creating the initial context". This will fit >> nicely if/when someone adds a description to the @provides tag. At some >> point I hope the other properties can be used too but that is beyond the >> scope of JDK-8228580. >> >> -Alan >
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Here's the latest webrev accumulating all the changes we've discussed so far: http://cr.openjdk.java.net/~prappo/8228580/webrev.03/ If people are okay with that I will proceed to creating a CSR. > On 12 Sep 2019, at 20:06, Alan Bateman wrote: > > On 12/09/2019 12:41, Chris Hegarty wrote: >> Here is an initial attempt to document these timeout/retry properties. It’s >> effectively the wording lifted from the tech notes [1], with “UDP” removed. > Thanks, this look and dropping the reference to "UDP" make sense. > > One minor nit is that "are relevant to the DNS provider" might be better > worded as "may be used when creating the initial context". This will fit > nicely if/when someone adds a description to the @provides tag. At some point > I hope the other properties can be used too but that is beyond the scope of > JDK-8228580. > > -Alan
Re: RFR(s): 8228580: DnsClient TCP socket timeout
On 12/09/2019 12:41, Chris Hegarty wrote: Here is an initial attempt to document these timeout/retry properties. It’s effectively the wording lifted from the tech notes [1], with “UDP” removed. Thanks, this look and dropping the reference to "UDP" make sense. One minor nit is that "are relevant to the DNSprovider" might be better worded as "may be used when creating the initial context". This will fit nicely if/when someone adds a description to the @provides tag. At some point I hope the other properties can be used too but that is beyond the scope of JDK-8228580. -Alan
Re: RFR(s): 8228580: DnsClient TCP socket timeout
I like the wording, it's concise. Thanks a lot for helping us with this, Chris! Trivially, there's a typo: <2> instead of . > On 12 Sep 2019, at 12:41, Chris Hegarty wrote: > > Here is an initial attempt to document these timeout/retry properties. It’s > effectively the wording lifted from the tech notes [1], with “UDP” removed. > > I deliberately avoided describing the implementation. It serves little > purpose, and in fact will greatly complicate the description, as well as > constrain the implementation. I personally dislike parts of the existing > implementation ( which I will ignore ), so baking them into the spec would be > a mistake. > > There are clearly a lot more properties that could be specified, that is far > far beyond the scope of this change. > > If we add such documentation, then a CSR ( with JDK-scope ) will be needed. > > --- a/src/jdk.naming.dns/share/classes/module-info.java > +++ b/src/jdk.naming.dns/share/classes/module-info.java > @@ -26,7 +26,38 @@ > /** > * Provides the implementation of the DNS Java Naming provider. > * > + * <2>Environment Properties > + * > + * The following JNDI environment properties are relevant to the DNS > + * provider. > + * > + * > + *com.sun.jndi.dns.timeout.initial > + *com.sun.jndi.dns.timeout.retries > + * > + * > + * These properties are used to alter the timeout-related defaults that > the > + * DNS provider uses when submitting queries. The DNS provider submits > queries > + * using the following exponential backoff algorithm. The provider submits a > + * query to a DNS server and waits for a response to arrive within a timeout > + * period (1 second by default). If it receives no response within the > timeout > + * period, it queries the next server, and so on. If the provider receives no > + * response from any server, it doubles the timeout period and repeats the > + * process of submitting the query to each server, up to a maximum number of > + * retries (4 by default). > + * > + * The {@code com.sun.jndi.dns.timeout.initial} property, if set, > specifies > + * the number of milliseconds to use as the initial timeout period (i.e., > before > + * any doubling). If this property has not been set, the default initial > timeout > + * is 1000 milliseconds. > + * > + * The {@code com.sun.jndi.dns.timeout.retries} property, if set, > specifies > + * the number of times to retry each server using the exponential backoff > + * algorithm described previously. If this property has not been set, the > + * default number of retries is 4. > + * > * @provides javax.naming.spi.InitialContextFactory > + * > * @moduleGraph > * @since 9 > */ > > -Chris. > > [1] https://docs.oracle.com/javase/6/docs/technotes/guides/jndi/jndi-dns.html > >> On 11 Sep 2019, at 16:55, Alan Bateman wrote: >> >> On 11/09/2019 16:16, Pavel Rappo wrote: >>> On 11 Sep 2019, at 16:10, Alan Bateman wrote: I assume extending the timeout to TCP will require at least some minimal updates to the descriptions. >>> Which descriptions do you have in mind? I cannot seem to find that text >>> anywhere in the current repo (jdk/jdk) >> I don't think the jndi-dns docs page is in the jdk repo. Since JDK 9, a good >> place for service providers to document capability and configuration is >> their module description and I think at least some of the documentation from >> that page should move into the jdk.naming.dns module description (that's for >> another issue of course). However, for the changes under discussion here >> then I don't think it's unreasonable to provide an updated description of >> the timeout property. >> >> -Alan >> >
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Here is an initial attempt to document these timeout/retry properties. It’s effectively the wording lifted from the tech notes [1], with “UDP” removed. I deliberately avoided describing the implementation. It serves little purpose, and in fact will greatly complicate the description, as well as constrain the implementation. I personally dislike parts of the existing implementation ( which I will ignore ), so baking them into the spec would be a mistake. There are clearly a lot more properties that could be specified, that is far far beyond the scope of this change. If we add such documentation, then a CSR ( with JDK-scope ) will be needed. --- a/src/jdk.naming.dns/share/classes/module-info.java +++ b/src/jdk.naming.dns/share/classes/module-info.java @@ -26,7 +26,38 @@ /** * Provides the implementation of the DNS Java Naming provider. * + * <2>Environment Properties + * + * The following JNDI environment properties are relevant to the DNS + * provider. + * + * + *com.sun.jndi.dns.timeout.initial + *com.sun.jndi.dns.timeout.retries + * + * + * These properties are used to alter the timeout-related defaults that the + * DNS provider uses when submitting queries. The DNS provider submits queries + * using the following exponential backoff algorithm. The provider submits a + * query to a DNS server and waits for a response to arrive within a timeout + * period (1 second by default). If it receives no response within the timeout + * period, it queries the next server, and so on. If the provider receives no + * response from any server, it doubles the timeout period and repeats the + * process of submitting the query to each server, up to a maximum number of + * retries (4 by default). + * + * The {@code com.sun.jndi.dns.timeout.initial} property, if set, specifies + * the number of milliseconds to use as the initial timeout period (i.e., before + * any doubling). If this property has not been set, the default initial timeout + * is 1000 milliseconds. + * + * The {@code com.sun.jndi.dns.timeout.retries} property, if set, specifies + * the number of times to retry each server using the exponential backoff + * algorithm described previously. If this property has not been set, the + * default number of retries is 4. + * * @provides javax.naming.spi.InitialContextFactory + * * @moduleGraph * @since 9 */ -Chris. [1] https://docs.oracle.com/javase/6/docs/technotes/guides/jndi/jndi-dns.html > On 11 Sep 2019, at 16:55, Alan Bateman wrote: > > On 11/09/2019 16:16, Pavel Rappo wrote: >> >>> On 11 Sep 2019, at 16:10, Alan Bateman wrote: >>> >>> I assume extending the timeout to TCP will require at least some minimal >>> updates to the descriptions. >> Which descriptions do you have in mind? I cannot seem to find that text >> anywhere in the current repo (jdk/jdk) > I don't think the jndi-dns docs page is in the jdk repo. Since JDK 9, a good > place for service providers to document capability and configuration is their > module description and I think at least some of the documentation from that > page should move into the jdk.naming.dns module description (that's for > another issue of course). However, for the changes under discussion here then > I don't think it's unreasonable to provide an updated description of the > timeout property. > > -Alan >
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 11 Sep 2019, at 16:55, Alan Bateman wrote: > > I don't think the jndi-dns docs page is in the jdk repo. Could you kindly point me to the location of the current jndi-dns docs page? I can't seem to be able to even find anything related to that on the web. You might want to try to use your favorite search engine. > > > However, for the changes under discussion here then I don't think it's > unreasonable to provide an updated description of the timeout property. It might be reasonable. What wording would you suggest? To me it already *vaguely* addresses the TCP case by saying that "Each lookup is initially performed using UDP. If the response is too long to be returned in a UDP packet without being truncated, the lookup is repeated using TCP." -Pavel
Re: RFR(s): 8228580: DnsClient TCP socket timeout
On 11/09/2019 16:16, Pavel Rappo wrote: On 11 Sep 2019, at 16:10, Alan Bateman wrote: I assume extending the timeout to TCP will require at least some minimal updates to the descriptions. Which descriptions do you have in mind? I cannot seem to find that text anywhere in the current repo (jdk/jdk) I don't think the jndi-dns docs page is in the jdk repo. Since JDK 9, a good place for service providers to document capability and configuration is their module description and I think at least some of the documentation from that page should move into the jdk.naming.dns module description (that's for another issue of course). However, for the changes under discussion here then I don't think it's unreasonable to provide an updated description of the timeout property. -Alan
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 11 Sep 2019, at 16:10, Alan Bateman wrote: > > I assume extending the timeout to TCP will require at least some minimal > updates to the descriptions. Which descriptions do you have in mind? I cannot seem to find that text anywhere in the current repo (jdk/jdk) $ hg tip changeset: 56237:cddef3bde924 tag: tip user:lkorinth date:Wed Sep 11 14:16:30 2019 +0200 summary: 8230398: Remove NULL checks before FREE_C_HEAP_ARRAY.
Re: RFR(s): 8228580: DnsClient TCP socket timeout
On 11/09/2019 15:56, Pavel Rappo wrote: Sure, from https://docs.oracle.com/javase/8/docs/technotes/guides/jndi/jndi-dns.html: "Each lookup is initially performed using UDP. If the response is too long to be returned in a UDP packet without being truncated, the lookup is repeated using TCP." and "com.example.jndi.dns.timeout.initial com.example.jndi.dns.timeout.retries These properties are used to alter the timeout-related defaults that the DNS provider uses when submitting UDP queries. The DNS provider submits UDP queries using the following exponential backoff algorithm. The provider submits a query to a DNS server and waits for a response to arrive within a timeout period (1 second by default). If it receives no response within the timeout period, it queries the next server, and so on. If the provider receives no response from any server, it doubles the timeout period and repeats the process of submitting the query to each server, up to a maximum number of retries (4 by default). The "com.example.jndi.dns.timeout.initial" property, if set, specifies the number of milliseconds to use as the initial timeout period (i.e., before any doubling). If this property has not been set, the default initial timeout is 1000 milliseconds. The "com.example.jndi.dns.timeout.retries" property, if set, specifies the number of times to retry each server using the exponential backoff algorithm described previously. If this property has not been set, the default number of retries is 4." I cannot seem to find a newer version of that document though. I assume extending the timeout to TCP will require at least some minimal updates to the descriptions. That will help reviewers and help decide if a CSR is needed or not. Ideally the authoritative descriptions of these properties would be in the javadoc, probably in the jdk.naming.dns module description. -Alan
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Sure, from https://docs.oracle.com/javase/8/docs/technotes/guides/jndi/jndi-dns.html: "Each lookup is initially performed using UDP. If the response is too long to be returned in a UDP packet without being truncated, the lookup is repeated using TCP." and "com.example.jndi.dns.timeout.initial com.example.jndi.dns.timeout.retries These properties are used to alter the timeout-related defaults that the DNS provider uses when submitting UDP queries. The DNS provider submits UDP queries using the following exponential backoff algorithm. The provider submits a query to a DNS server and waits for a response to arrive within a timeout period (1 second by default). If it receives no response within the timeout period, it queries the next server, and so on. If the provider receives no response from any server, it doubles the timeout period and repeats the process of submitting the query to each server, up to a maximum number of retries (4 by default). The "com.example.jndi.dns.timeout.initial" property, if set, specifies the number of milliseconds to use as the initial timeout period (i.e., before any doubling). If this property has not been set, the default initial timeout is 1000 milliseconds. The "com.example.jndi.dns.timeout.retries" property, if set, specifies the number of times to retry each server using the exponential backoff algorithm described previously. If this property has not been set, the default number of retries is 4." I cannot seem to find a newer version of that document though. > On 11 Sep 2019, at 14:58, Alan Bateman wrote: > > On 11/09/2019 13:26, Pavel Rappo wrote: >> I'm happy with the overall changeset. I have (once again) made some tiny >> changes, you can see them here: >> >> http://cr.openjdk.java.net/~prappo/8228580/webrev.02/ >> >> If you are okay with them, then we wait for a *R*eviewer. If the Reviewer(s) >> are okay with them, we push. For the record, I'm not really happy with how >> we used the DNSTestBase/TestBase infrastructure, however I'm totally fine >> with the retrying logic. >> > Would it be possible to write a clear description for the > com.sun.jndi.dns.timeout and com.sun.jndi.dns.retries properties? The current > description in the JNDI docs describes them for UDP only. I realize this is > may be out of date but new descriptions would be useful when the docs are > updated. Also I think would be useful for potential Reviewers. > > -Alan
Re: RFR(s): 8228580: DnsClient TCP socket timeout
On 11/09/2019 13:26, Pavel Rappo wrote: I'm happy with the overall changeset. I have (once again) made some tiny changes, you can see them here: http://cr.openjdk.java.net/~prappo/8228580/webrev.02/ If you are okay with them, then we wait for a *R*eviewer. If the Reviewer(s) are okay with them, we push. For the record, I'm not really happy with how we used the DNSTestBase/TestBase infrastructure, however I'm totally fine with the retrying logic. Would it be possible to write a clear description for the com.sun.jndi.dns.timeout and com.sun.jndi.dns.retries properties? The current description in the JNDI docs describes them for UDP only. I realize this is may be out of date but new descriptions would be useful when the docs are updated. Also I think would be useful for potential Reviewers. -Alan
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Hi Pavel Your changes look good to me. On Wed, 11 Sep 2019 at 14:26, Pavel Rappo wrote: > > I'm happy with the overall changeset. I have (once again) made some tiny > changes, you can see them here: > > http://cr.openjdk.java.net/~prappo/8228580/webrev.02/ > > If you are okay with them, then we wait for a *R*eviewer. If the Reviewer(s) > are okay with them, we push. For the record, I'm not really happy with how we > used the DNSTestBase/TestBase infrastructure, however I'm totally fine with > the retrying logic. > > Test results are pending. > > -Pavel > > > On 10 Sep 2019, at 16:33, Milan Mimica wrote: > > > >>> On 5 Sep 2019, at 16:02, Pavel Rappo wrote: > >>> > >>> I think we are almost there. What do you think of the following > >>> incremental (i.e. on top of your latest webrev) change? > >>> > >>> http://cr.openjdk.java.net/~prappo/8228580/webrev.01/ > >>> > >>> I fixed a couple of trivial typos and addressed the socket relinquishing > >>> issue. Initializing a socket is not an atomic "all-or-nothing" operation > >>> now. Someone needs to take care of the socket in case things go not as > >>> planned. > > > > Right. Thanks. Here is the merged version: > > http://cr.openjdk.java.net/~mmimica/8228580/webrev.02/ > > Plus, I have added TCP server init retry code from Chris. Works fine > > without changes to TestBase. > > > > > > -- > > Milan Mimica > -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
I'm happy with the overall changeset. I have (once again) made some tiny changes, you can see them here: http://cr.openjdk.java.net/~prappo/8228580/webrev.02/ If you are okay with them, then we wait for a *R*eviewer. If the Reviewer(s) are okay with them, we push. For the record, I'm not really happy with how we used the DNSTestBase/TestBase infrastructure, however I'm totally fine with the retrying logic. Test results are pending. -Pavel > On 10 Sep 2019, at 16:33, Milan Mimica wrote: > >>> On 5 Sep 2019, at 16:02, Pavel Rappo wrote: >>> >>> I think we are almost there. What do you think of the following incremental >>> (i.e. on top of your latest webrev) change? >>> >>> http://cr.openjdk.java.net/~prappo/8228580/webrev.01/ >>> >>> I fixed a couple of trivial typos and addressed the socket relinquishing >>> issue. Initializing a socket is not an atomic "all-or-nothing" operation >>> now. Someone needs to take care of the socket in case things go not as >>> planned. > > Right. Thanks. Here is the merged version: > http://cr.openjdk.java.net/~mmimica/8228580/webrev.02/ > Plus, I have added TCP server init retry code from Chris. Works fine > without changes to TestBase. > > > -- > Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Hi Florian On Mon, 9 Sep 2019 at 15:04, Florian Weimer wrote: > > Ahh. The other option is to stick with one server and keep resending > with larger and larger timeouts. Switching has the advantage that in > case of a server problem, you get to a working server more quickly. > Staying means that if the answer is delayed and you resend exactly the > same query, you might still pick up the answer to the original query and > process it, after the first timeout. Furthermore, the behaviour is documented [1][2] and thus can't be just changed. > > But we know that the server is up because it responded our UDP, so > waiting more than one second for the TCP handshake to succeed might > worthwhile, yes. Heh, except the ticket we are trying to solve here is exactly about the TCP server not being up. So what do you suggest exactly? Should we set a lower cap? Something like: Math.max(timeout, 2000 /* or some other value?*/) The rationale is, if the client requests a initial timeout larger than 1 second (how much larger?) then we can use it as-is. For lower values (default included) we need to override it to some value that, at the end of the day, is platform-specific. Sorry, I just don't see how I can get this right without introducing a bunch of arbitrary constants to the code. [1] https://docs.oracle.com/javase/7/docs/technotes/guides/jndi/jndi-dns.html#TIMEOUT [2] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4630910 -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> > On 5 Sep 2019, at 16:02, Pavel Rappo wrote: > > > > I think we are almost there. What do you think of the following incremental > > (i.e. on top of your latest webrev) change? > > > >http://cr.openjdk.java.net/~prappo/8228580/webrev.01/ > > > > I fixed a couple of trivial typos and addressed the socket relinquishing > > issue. Initializing a socket is not an atomic "all-or-nothing" operation > > now. Someone needs to take care of the socket in case things go not as > > planned. Right. Thanks. Here is the merged version: http://cr.openjdk.java.net/~mmimica/8228580/webrev.02/ Plus, I have added TCP server init retry code from Chris. Works fine without changes to TestBase. -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, ping? > On 5 Sep 2019, at 16:02, Pavel Rappo wrote: > > I think we are almost there. What do you think of the following incremental > (i.e. on top of your latest webrev) change? > >http://cr.openjdk.java.net/~prappo/8228580/webrev.01/ > > I fixed a couple of trivial typos and addressed the socket relinquishing > issue. Initializing a socket is not an atomic "all-or-nothing" operation now. > Someone needs to take care of the socket in case things go not as planned. > > -Pavel > >> On 5 Sep 2019, at 11:20, Milan Mimica wrote: >> >> On Wed, 4 Sep 2019 at 20:32, Florian Weimer wrote: >>> >>> If you use the initial UDP timeout (one second, I think), the kernel >>> will not complete the TCP handshake if the initial SYN segment is lost >>> because the retransmit delay during the handshake is longer than that. >> >> We could set a higher timeout value, but I would not prefer that. >> After all, 1 second is just default value, and can be changed. If the >> user wants us to give up on DNS query after the specified timeout then >> lets do just that. True, some queries that previously worked might >> start failing, but that is true even for slow socket.read() >> operations. >> >> New webrev: http://cr.openjdk.java.net/~mmimica/8228580/webrev.01/ >> >> * simplified test with no thread (nice catch Pavel) >> * set connect timeout and account for it >> >> -- >> Milan Mimica >
Re: RFR(s): 8228580: DnsClient TCP socket timeout
* Milan Mimica: > On Thu, 5 Sep 2019 at 18:59, Florian Weimer wrote: >> >> But I think in the UDP case, the client will retry. I think the total >> timeout in the TCP case should equal the total timeout in the UDP case. >> That's what I'm going to implement for glibc. The difference is that in >> the TCP case, the TCP stack will take care of the retries, not the >> application code. > > I understand that, and it does make sense, but we have to put it in > context of how current DnsClient.java works: > // > // The UDP retry strategy is to try the 1st server, and then > // each server in order. If no answer, double the timeout > // and try each server again. > // Ahh. The other option is to stick with one server and keep resending with larger and larger timeouts. Switching has the advantage that in case of a server problem, you get to a working server more quickly. Staying means that if the answer is delayed and you resend exactly the same query, you might still pick up the answer to the original query and process it, after the first timeout. > Fallback to TCP happens within this process. Going immediately with > timeout*2^maxRetry could yield significantly larger delays, if there > happens to be some other server on the list that works better. > I would rather look into reusing TCP connections, not to close them > immediately. But we know that the server is up because it responded our UDP, so waiting more than one second for the TCP handshake to succeed might worthwhile, yes. > What about read() and non-handshake TCP retransmissions? Do those > usually happen faster? I think so, yes. Thanks, Florian
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 6 Sep 2019, at 9:20 PM, Pavel Rappo wrote: > > Milan, > > In order to simulate the "Address already in use" scenario I did this: > >TcpDnsServer(int port) { >try { > * new ServerSocket(port, 0, InetAddress.getLoopbackAddress()); >serverSocket = new ServerSocket(port, 0, > InetAddress.getLoopbackAddress()); >} >catch (BindException ex) { > > The test failed, instead of being skipped. That's what I saw in the logs: > > > jtreg.SkippedException: Cannot start TCP server > at TcpTimeout$TcpDnsServer.(TcpTimeout.java:97) > at TcpTimeout.initTest(TcpTimeout.java:71) > at TestBase.run(TestBase.java:48) > at TcpTimeout.main(TcpTimeout.java:47) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:830) > Caused by: java.net.BindException: Address already in use: bind > at java.base/sun.nio.ch.Net.bind0(Native Method) > at java.base/sun.nio.ch.Net.bind(Net.java:469) > at java.base/sun.nio.ch.Net.bind(Net.java:458) > at java.base/sun.nio.ch.NioSocketImpl.bind(NioSocketImpl.java:643) > at java.base/java.net.ServerSocket.bind(ServerSocket.java:355) > at java.base/java.net.ServerSocket.(ServerSocket.java:241) > at TcpTimeout$TcpDnsServer.(TcpTimeout.java:91) > ... 9 more > > JavaTest Message: Test threw exception: jtreg.SkippedException > JavaTest Message: shutting down test > > > JavaTest Message: Problem cleaning up the following threads: > Thread-0 > at > java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.socketReceiveOrPeekData(Native > Method) > at > java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.peekData(DualStackPlainDatagramSocketImpl.java:113) > at > java.base@14-internal/java.net.DatagramSocket.receive(DatagramSocket.java:746) > at DNSServer.receiveQuery(DNSServer.java:178) > at DNSServer.run(DNSServer.java:119) > > > What's going on here? It looks like the test didn't call `cleanupTest()` (and > therefore `server.stopServer()`) because the exception was thrown before it > had reached `runTest()`. I think we must address this, otherwise our > jtreg.SkippedException is only half measure. > > I thought about how to do that. I'm thinking of going the full way up to > TestBase as I believe it makes the most sense. Having said that, we'd better > ask the original author (cc'ed), Chris Yin, what he thinks about it. Here's > the proposal: The change to TestBase looks good to me. Just after try to understand what you want to achieve, some other thoughts in my mind: 1. Whether we give up too earlier when the Tcp port been used 2. Whether we put too many init related check logic in TcpDnsServer Here is just pseudo code for your reference. First, we may have a few tries on different ports by reinit DNSServer, if all failed, then give up and skip test. Second, move socket check and throw SkppedExpection logic to initTest method, that looks more clear and TcpDnsServer could simply focus on server logic. Third, explicit cleanup for each retry fails, no need to hide cleanup in TestBase @Override public void initTest(String[] args) { for (loop a few times as you prefer as max retries) { super.initTest(args); var udpServer = (Server) env().get(DNSTestUtils.TEST_DNS_SERVER_THREAD); try { var serverSocket = new ServerSocket(udpServer.getPort(), 0, InetAddress.getLoopbackAddress()); tcpDnsServer = new TcpDnsServer(serverSocket); break; } catch (BindException be) { DNSTestUtils.debug("Failed to bind server socket on port " + udpServer.getPort() + ", retry ..."); } catch (Exception ex) { throw new RuntimeException("Unexpected exception during initTest", ex); } finally { if (tcpDnsServer == null) { super.cleanupTest(); } } } if (tcpDnsServer == null) { throw new SkippedException("Cannot start TCP server after a few tries, skip test"); } } Thanks, Chris > > > diff -r 53e139e55605 test/jdk/com/sun/jndi/dns/lib/TestBase.java > --- a/test/jdk/com/sun/jndi/dns/lib/TestBase.java Thu Sep 05 14:59:29 > 2019 +0100 > +++ b/test/jdk/com/sun/jndi/dns/lib/TestBase.java Fri Sep 06 14:08:48 > 2019 +0100 > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2018, 2019, Oracle and/or its affiliates. All rights > reserved.
Re: RFR(s): 8228580: DnsClient TCP socket timeout
On Thu, 5 Sep 2019 at 18:59, Florian Weimer wrote: > > But I think in the UDP case, the client will retry. I think the total > timeout in the TCP case should equal the total timeout in the UDP case. > That's what I'm going to implement for glibc. The difference is that in > the TCP case, the TCP stack will take care of the retries, not the > application code. I understand that, and it does make sense, but we have to put it in context of how current DnsClient.java works: // // The UDP retry strategy is to try the 1st server, and then // each server in order. If no answer, double the timeout // and try each server again. // Fallback to TCP happens within this process. Going immediately with timeout*2^maxRetry could yield significantly larger delays, if there happens to be some other server on the list that works better. I would rather look into reusing TCP connections, not to close them immediately. What about read() and non-handshake TCP retransmissions? Do those usually happen faster? -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
Milan, In order to simulate the "Address already in use" scenario I did this: TcpDnsServer(int port) { try { * new ServerSocket(port, 0, InetAddress.getLoopbackAddress()); serverSocket = new ServerSocket(port, 0, InetAddress.getLoopbackAddress()); } catch (BindException ex) { The test failed, instead of being skipped. That's what I saw in the logs: jtreg.SkippedException: Cannot start TCP server at TcpTimeout$TcpDnsServer.(TcpTimeout.java:97) at TcpTimeout.initTest(TcpTimeout.java:71) at TestBase.run(TestBase.java:48) at TcpTimeout.main(TcpTimeout.java:47) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) at java.base/java.lang.Thread.run(Thread.java:830) Caused by: java.net.BindException: Address already in use: bind at java.base/sun.nio.ch.Net.bind0(Native Method) at java.base/sun.nio.ch.Net.bind(Net.java:469) at java.base/sun.nio.ch.Net.bind(Net.java:458) at java.base/sun.nio.ch.NioSocketImpl.bind(NioSocketImpl.java:643) at java.base/java.net.ServerSocket.bind(ServerSocket.java:355) at java.base/java.net.ServerSocket.(ServerSocket.java:241) at TcpTimeout$TcpDnsServer.(TcpTimeout.java:91) ... 9 more JavaTest Message: Test threw exception: jtreg.SkippedException JavaTest Message: shutting down test JavaTest Message: Problem cleaning up the following threads: Thread-0 at java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.socketReceiveOrPeekData(Native Method) at java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.peekData(DualStackPlainDatagramSocketImpl.java:113) at java.base@14-internal/java.net.DatagramSocket.receive(DatagramSocket.java:746) at DNSServer.receiveQuery(DNSServer.java:178) at DNSServer.run(DNSServer.java:119) What's going on here? It looks like the test didn't call `cleanupTest()` (and therefore `server.stopServer()`) because the exception was thrown before it had reached `runTest()`. I think we must address this, otherwise our jtreg.SkippedException is only half measure. I thought about how to do that. I'm thinking of going the full way up to TestBase as I believe it makes the most sense. Having said that, we'd better ask the original author (cc'ed), Chris Yin, what he thinks about it. Here's the proposal: diff -r 53e139e55605 test/jdk/com/sun/jndi/dns/lib/TestBase.java --- a/test/jdk/com/sun/jndi/dns/lib/TestBase.java Thu Sep 05 14:59:29 2019 +0100 +++ b/test/jdk/com/sun/jndi/dns/lib/TestBase.java Fri Sep 06 14:08:48 2019 +0100 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2018, 2019, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -44,10 +44,14 @@ * @throws Exception if any exception */ public void run(String[] args) throws Exception { -initRes(); -initTest(args); -setupTest(); -launch(); +try { +initRes(); +initTest(args); +setupTest(); +launch(); +} finally { +cleanupTest(); +} } /** @@ -84,8 +88,6 @@ if (!handleException(e)) { throw e; } -} finally { -cleanupTest(); } } @@ -108,6 +110,11 @@ /** * Cleanup test. + * + * This method may be called by the test at any time, including before all + * the resources, initialization, and setup have been completed. This might + * require careful attention. For example, before closing a resource this + * method should check that resource for being {@code null}. */ public abstract void cleanupTest(); } -Pavel > On 5 Sep 2019, at 11:20, Milan Mimica wrote: > > On Wed, 4 Sep 2019 at 20:32, Florian Weimer wrote: >> >> If you use the initial UDP timeout (one second, I think), the kernel >> will not complete the TCP handshake if the initial SYN segment is lost >> because the retransmit delay during the handshake is longer than that. > > We could set a higher timeout value, but I would not prefer that. > After all, 1 second is just default value, and can be changed. If the > user wants us to give up on DNS query after the specified timeout then > lets do just that. True, some queries that previously worked might > start failing, b
Re: RFR(s): 8228580: DnsClient TCP socket timeout
* Milan Mimica: > On Wed, 4 Sep 2019 at 20:32, Florian Weimer wrote: >> >> If you use the initial UDP timeout (one second, I think), the kernel >> will not complete the TCP handshake if the initial SYN segment is lost >> because the retransmit delay during the handshake is longer than that. > > We could set a higher timeout value, but I would not prefer that. > After all, 1 second is just default value, and can be changed. If the > user wants us to give up on DNS query after the specified timeout then > lets do just that. But I think in the UDP case, the client will retry. I think the total timeout in the TCP case should equal the total timeout in the UDP case. That's what I'm going to implement for glibc. The difference is that in the TCP case, the TCP stack will take care of the retries, not the application code. Thanks, Florian
Re: RFR(s): 8228580: DnsClient TCP socket timeout
On Wed, 4 Sep 2019 at 20:32, Florian Weimer wrote: > > If you use the initial UDP timeout (one second, I think), the kernel > will not complete the TCP handshake if the initial SYN segment is lost > because the retransmit delay during the handshake is longer than that. We could set a higher timeout value, but I would not prefer that. After all, 1 second is just default value, and can be changed. If the user wants us to give up on DNS query after the specified timeout then lets do just that. True, some queries that previously worked might start failing, but that is true even for slow socket.read() operations. New webrev: http://cr.openjdk.java.net/~mmimica/8228580/webrev.01/ * simplified test with no thread (nice catch Pavel) * set connect timeout and account for it -- Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
I think we are almost there. What do you think of the following incremental (i.e. on top of your latest webrev) change? http://cr.openjdk.java.net/~prappo/8228580/webrev.01/ I fixed a couple of trivial typos and addressed the socket relinquishing issue. Initializing a socket is not an atomic "all-or-nothing" operation now. Someone needs to take care of the socket in case things go not as planned. -Pavel > On 5 Sep 2019, at 11:20, Milan Mimica wrote: > > On Wed, 4 Sep 2019 at 20:32, Florian Weimer wrote: >> >> If you use the initial UDP timeout (one second, I think), the kernel >> will not complete the TCP handshake if the initial SYN segment is lost >> because the retransmit delay during the handshake is longer than that. > > We could set a higher timeout value, but I would not prefer that. > After all, 1 second is just default value, and can be changed. If the > user wants us to give up on DNS query after the specified timeout then > lets do just that. True, some queries that previously worked might > start failing, but that is true even for slow socket.read() > operations. > > New webrev: http://cr.openjdk.java.net/~mmimica/8228580/webrev.01/ > > * simplified test with no thread (nice catch Pavel) > * set connect timeout and account for it > > -- > Milan Mimica
Re: RFR(s): 8228580: DnsClient TCP socket timeout
* Pavel Rappo: >> On 4 Sep 2019, at 18:54, Florian Weimer wrote: >> >> You should use a larger timeout than the initial UDP timeout, >> though. > > Could you elaborate on that? If you use the initial UDP timeout (one second, I think), the kernel will not complete the TCP handshake if the initial SYN segment is lost because the retransmit delay during the handshake is longer than that. Thanks, Florian
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 4 Sep 2019, at 18:54, Florian Weimer wrote: > > You should use a larger timeout than the initial UDP timeout, > though. Could you elaborate on that?
Re: RFR(s): 8228580: DnsClient TCP socket timeout
On 04/09/2019 18:52, Pavel Rappo wrote: You are right, it definitely is a blocking operation. I missed it. I was focused on 712 sock.setSoTimeout(timeoutLeft); So I'd suggest we use explicit connect with timeout java.net.Socket#connect(java.net.SocketAddress, int) Are you okay with that? Decomposing timeouts is hard to get right. In this case I think you want to specify the initial timeout to connect, then set timeoutLeft to be the remaining (will be less than "timeout"). -Alan
Re: RFR(s): 8228580: DnsClient TCP socket timeout
* Pavel Rappo: >> On 4 Sep 2019, at 18:38, Florian Weimer wrote: >> >> >> >> Maybe I'm mistaken, but I think this: >> >> 692 Tcp(InetAddress server, int port, int timeout) throws IOException { >> 693 sock = new Socket(server, port); >> 694 sock.setTcpNoDelay(true); >> 695 out = new java.io.BufferedOutputStream(sock.getOutputStream()); >> 696 in = new java.io.BufferedInputStream(sock.getInputStream()); >> 697 timeoutLeft = timeout; >> 698 } >> >> creates the TCP socket and connects it. This is a potentially blocking >> operation as well. > > You are right, it definitely is a blocking operation. I missed it. I was > focused on > > 712 sock.setSoTimeout(timeoutLeft); > > So I'd suggest we use explicit connect with timeout > > java.net.Socket#connect(java.net.SocketAddress, int) > > Are you okay with that? Sure. You should use a larger timeout than the initial UDP timeout, though. Can you compute the maximum amount of time the UDP code would wait for reply and use that? Or is that the timeoutLeft value? Thanks, Florian
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 4 Sep 2019, at 18:38, Florian Weimer wrote: > > > > Maybe I'm mistaken, but I think this: > > 692 Tcp(InetAddress server, int port, int timeout) throws IOException { > 693 sock = new Socket(server, port); > 694 sock.setTcpNoDelay(true); > 695 out = new java.io.BufferedOutputStream(sock.getOutputStream()); > 696 in = new java.io.BufferedInputStream(sock.getInputStream()); > 697 timeoutLeft = timeout; > 698 } > > creates the TCP socket and connects it. This is a potentially blocking > operation as well. You are right, it definitely is a blocking operation. I missed it. I was focused on 712 sock.setSoTimeout(timeoutLeft); So I'd suggest we use explicit connect with timeout java.net.Socket#connect(java.net.SocketAddress, int) Are you okay with that?
Re: RFR(s): 8228580: DnsClient TCP socket timeout
* Pavel Rappo: >> On 4 Sep 2019, at 17:35, Florian Weimer wrote: >> >> * Milan Mimica: >> >>> Continuing in a new thread with a RFR of my own: >>> http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/ >> >> I would add a comment why there's no explicit TCP connection timeout in >> the code. I assume you rely on the TCP stack having its own timeout, >> right? But I think it can be quite long under some circumstances. > > If I get this right, there's a default timeout of 1,000 ms (1 second) > on L70 > src/jdk.naming.dns/share/classes/com/sun/jndi/dns/DnsContext.java > which applies to the case where there's no explicit timeout. I agree > though that this deserves a comment. Maybe I'm mistaken, but I think this: 692 Tcp(InetAddress server, int port, int timeout) throws IOException { 693 sock = new Socket(server, port); 694 sock.setTcpNoDelay(true); 695 out = new java.io.BufferedOutputStream(sock.getOutputStream()); 696 in = new java.io.BufferedInputStream(sock.getInputStream()); 697 timeoutLeft = timeout; 698 } creates the TCP socket and connects it. This is a potentially blocking operation as well. It would make sense to align the timeout for that with what typical TCP stacks do for SYN segment retransmission. Different TCP stacks have different connection timeouts, ranging from 70 to 130 seconds or even longer. So the defaults are much larger than typical DNS timeouts. (Coincidentally, I'm working on fixing the glibc stub resolver, and I've decided to use a non-blocking connect there because of the long default timeouts.) I agree on the matter of clock warp issues. UDP needs fixing too, but not in this change. Thanks, Florian
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 4 Sep 2019, at 17:35, Florian Weimer wrote: > > * Milan Mimica: > >> Continuing in a new thread with a RFR of my own: >> http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/ > > I would add a comment why there's no explicit TCP connection timeout in > the code. I assume you rely on the TCP stack having its own timeout, > right? But I think it can be quite long under some circumstances. If I get this right, there's a default timeout of 1,000 ms (1 second) on L70 src/jdk.naming.dns/share/classes/com/sun/jndi/dns/DnsContext.java which applies to the case where there's no explicit timeout. I agree though that this deserves a comment. > The timeout will not be enforced properly if the clock jumps backwards. Here's the link to the previous discussion: https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-August/061918.html If you're talking about the use of System.currentTimeMillis, then I've already told Milan that we could address that later in a follow-up bug, together with the UDP case. -Pavel
Re: RFR(s): 8228580: DnsClient TCP socket timeout
> On 4 Sep 2019, at 13:26, Milan Mimica wrote: > > Hello > > Continuing in a new thread with a RFR of my own: > http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/ Here's the link to the previous discussion: https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-August/061918.html > It has the fixes applied from the first review of the patch. > > The test was sporadically failing not because of missing > synchronization, but more likely because test was not waiting for the > TCP server thread to actually start. That Thread.sleep + AtomicReference construct seems racy (the exception reading thread might miss it). I suggest using a synchronization aid (CountDownLatch, Phaser, Semaphore, etc.) for that. Or... since you don't do anything with the accepted socket anyway you might as well not bother with creating that thread. Once the ServerSocket has been created the connection can be established even without calling `accept`. This might simplify the code quite a lot. > Made it throw jtreg.SkippedException if TCP port is in use. We can > just hope it does not happen too often. We should probably pass the exception to the jtreg.SkippedException's constructor. Why is there timeout on L45? * @run main/timeout=60 TcpTimeout If you're optimizing for a case where the test fails, then please don't. -Pavel
Re: RFR(s): 8228580: DnsClient TCP socket timeout
* Milan Mimica: > Continuing in a new thread with a RFR of my own: > http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/ I would add a comment why there's no explicit TCP connection timeout in the code. I assume you rely on the TCP stack having its own timeout, right? But I think it can be quite long under some circumstances. The timeout will not be enforced properly if the clock jumps backwards. Thanks, Florian