Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-24 Thread Pavel Rappo
Milan,

I satisfied myself by running the final version of the test some 8k times and 
then pushed the change. Thanks for your patience and persistence.
I saw your question on the net-dev and nio-dev mailing lists. Thanks.

-Pavel

> On 24 Sep 2019, at 13:41, Milan Mimica  wrote:
> 
> Pavel,
> 
> Deal. Handling early returns too:
> http://cr.openjdk.java.net/~mmimica/8228580/webrev.05/
> I will ask there about socket timeout semantics.
> 
> On Tue, 24 Sep 2019 at 12:51, Pavel Rappo  wrote:
>> 
>> Milan,
>> 
>> Thanks for looking into this. I think you should ask a question on the 
>> expected timing semantics and guarantees on net-dev (with maybe a cc to 
>> nio-dev).
>> As for our test. I agree with you that we should simply work a possibility 
>> of early returns into the check.
>> 
>> ...
>> 
>> /* The acceptable variation of early returns from timed socket operations. */
>> private static final long PREMATURE_RETURN = adjustTimeout(100);
>> 
>> ...
>> 
>> long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos);
>> if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) {
>>String msg = String.format(
>>"elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s",
>>timeout, timeout, TOLERANCE, PREMATURE_RETURN);
>>throw new RuntimeException(msg);
>> }
>> 
>> ...
>> 
>> Thoughts?
>> 
>> -Pavel
>> 
>>> On 24 Sep 2019, at 09:12, Milan Mimica  wrote:
>>> 
>>> Hi Pavel
>>> 
>>> Wow, I find this awesome. I don't have a Windows machine to play with,
>>> but I think I may have found something.
>>> The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix
>>> it uses poll(2), on Windows it uses select(2). Regarding timeouts,
>>> poll() has "wait at least" semantics and overruns by design[1], while
>>> select() on windows has "waits at most" semantics, or how they put
>>> it[2]: "specifies the maximum time that select should wait before
>>> returning.". It returns early by design! Is this a known thing? I
>>> don't think there is much one can do here. It probably makes no sense
>>> to loop it and wait for time remainder.
>>> Java's soTimeout does not specify[3] should it wait at least or at
>>> most the specified timeout, so it's fine I guess. Old, "plain" socket
>>> impl are not much different.
>>> 
>>> If the above is correct, should I just add a tolerance for the lower bound?
>>> 
>>> [1] http://man7.org/linux/man-pages/man2/poll.2.html
>>> [2] 
>>> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
>>> [3] 
>>> https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int-
>>> 
>>> On Mon, 23 Sep 2019 at 16:15, Pavel Rappo  wrote:
 
 Milan,
 
 I'm observing the latest version (.04) of this test failing quite 
 frequently (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) 
 machines. The test passes fine on macOS and Linux. Here's the typical 
 output I see in the logs:
 
   java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
   java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
   java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
   java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
   ...
 
 Now, there might be many reasons for that. One of which would be that the 
 DnsClient code is buggy. The other reason would be that the accuracy 
 guaranteed by Windows implementation of `read` is not what we would 
 expect. Would you be able to investigate that?
 
 P.S. The good news is that the CSR has been approved:
 
   https://bugs.openjdk.java.net/browse/JDK-8230965
 
> On 23 Sep 2019, at 14:20, Milan Mimica  wrote:
> 
> Got it. Thanks Pavel!
> 
> 
> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo  wrote:
>> 
>> Milan,
>> 
>> How do you check which tests are run? That's what I see in the 
>> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
>>  file after I have run the test locally on my machine:
>> 
>> --messages:(5/233)--
>> command: main TcpTimeout
>> reason: User specified action: run main TcpTimeout
>> Mode: othervm
>> Additional options from @modules: --add-modules java.base --add-exports 
>> java.base/sun.security.util=ALL-UNNAMED
>> elapsed time (seconds): 1.751
>> 
>> ...
>> 
>> --messages:(5/313)--
>> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>> reason: User specified action: run main TcpTimeout 
>> -Dcom.sun.jndi.dns.timeout.initial=5000
>> Mode: othervm
>> Additional options from @modules: --add-modules java.base --add-exports 
>> java.base/sun.security.util=ALL-UNNAMED
>> elapsed time (seconds): 5.498
>> 
>> 
>> 

Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-24 Thread Chris Hegarty



> On 24 Sep 2019, at 13:41, Milan Mimica  wrote:
> 
> Pavel,
> 
> Deal. Handling early returns too:
> http://cr.openjdk.java.net/~mmimica/8228580/webrev.05/

LGTM

-Chris


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-24 Thread Milan Mimica
Pavel,

Deal. Handling early returns too:
http://cr.openjdk.java.net/~mmimica/8228580/webrev.05/
I will ask there about socket timeout semantics.

On Tue, 24 Sep 2019 at 12:51, Pavel Rappo  wrote:
>
> Milan,
>
> Thanks for looking into this. I think you should ask a question on the 
> expected timing semantics and guarantees on net-dev (with maybe a cc to 
> nio-dev).
> As for our test. I agree with you that we should simply work a possibility of 
> early returns into the check.
>
> ...
>
> /* The acceptable variation of early returns from timed socket operations. */
> private static final long PREMATURE_RETURN = adjustTimeout(100);
>
> ...
>
> long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos);
> if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) {
> String msg = String.format(
> "elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s",
> timeout, timeout, TOLERANCE, PREMATURE_RETURN);
> throw new RuntimeException(msg);
> }
>
> ...
>
> Thoughts?
>
> -Pavel
>
> > On 24 Sep 2019, at 09:12, Milan Mimica  wrote:
> >
> > Hi Pavel
> >
> > Wow, I find this awesome. I don't have a Windows machine to play with,
> > but I think I may have found something.
> > The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix
> > it uses poll(2), on Windows it uses select(2). Regarding timeouts,
> > poll() has "wait at least" semantics and overruns by design[1], while
> > select() on windows has "waits at most" semantics, or how they put
> > it[2]: "specifies the maximum time that select should wait before
> > returning.". It returns early by design! Is this a known thing? I
> > don't think there is much one can do here. It probably makes no sense
> > to loop it and wait for time remainder.
> > Java's soTimeout does not specify[3] should it wait at least or at
> > most the specified timeout, so it's fine I guess. Old, "plain" socket
> > impl are not much different.
> >
> > If the above is correct, should I just add a tolerance for the lower bound?
> >
> > [1] http://man7.org/linux/man-pages/man2/poll.2.html
> > [2] 
> > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
> > [3] 
> > https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int-
> >
> > On Mon, 23 Sep 2019 at 16:15, Pavel Rappo  wrote:
> >>
> >> Milan,
> >>
> >> I'm observing the latest version (.04) of this test failing quite 
> >> frequently (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) 
> >> machines. The test passes fine on macOS and Linux. Here's the typical 
> >> output I see in the logs:
> >>
> >>java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
> >>java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
> >>java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
> >>java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
> >>...
> >>
> >> Now, there might be many reasons for that. One of which would be that the 
> >> DnsClient code is buggy. The other reason would be that the accuracy 
> >> guaranteed by Windows implementation of `read` is not what we would 
> >> expect. Would you be able to investigate that?
> >>
> >> P.S. The good news is that the CSR has been approved:
> >>
> >>https://bugs.openjdk.java.net/browse/JDK-8230965
> >>
> >>> On 23 Sep 2019, at 14:20, Milan Mimica  wrote:
> >>>
> >>> Got it. Thanks Pavel!
> >>>
> >>>
> >>> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo  wrote:
> 
>  Milan,
> 
>  How do you check which tests are run? That's what I see in the 
>  /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
>   file after I have run the test locally on my machine:
> 
>  --messages:(5/233)--
>  command: main TcpTimeout
>  reason: User specified action: run main TcpTimeout
>  Mode: othervm
>  Additional options from @modules: --add-modules java.base --add-exports 
>  java.base/sun.security.util=ALL-UNNAMED
>  elapsed time (seconds): 1.751
> 
>  ...
> 
>  --messages:(5/313)--
>  command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>  reason: User specified action: run main TcpTimeout 
>  -Dcom.sun.jndi.dns.timeout.initial=5000
>  Mode: othervm
>  Additional options from @modules: --add-modules java.base --add-exports 
>  java.base/sun.security.util=ALL-UNNAMED
>  elapsed time (seconds): 5.498
> 
>  
> 
>  Which is consistent with what I would expect given the timeout values.
> 
>  The following output does not tell the full story, just the name of the 
>  test:
> 
>  ==
>  Test summary
>  ==
>   TEST  TOTAL  PASS  

Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-24 Thread Pavel Rappo
Milan,

Thanks for looking into this. I think you should ask a question on the expected 
timing semantics and guarantees on net-dev (with maybe a cc to nio-dev).
As for our test. I agree with you that we should simply work a possibility of 
early returns into the check.

...

/* The acceptable variation of early returns from timed socket operations. */
private static final long PREMATURE_RETURN = adjustTimeout(100);

...

long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos);
if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) {
String msg = String.format(
"elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s",
timeout, timeout, TOLERANCE, PREMATURE_RETURN);
throw new RuntimeException(msg);
}

...

Thoughts?

-Pavel

> On 24 Sep 2019, at 09:12, Milan Mimica  wrote:
> 
> Hi Pavel
> 
> Wow, I find this awesome. I don't have a Windows machine to play with,
> but I think I may have found something.
> The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix
> it uses poll(2), on Windows it uses select(2). Regarding timeouts,
> poll() has "wait at least" semantics and overruns by design[1], while
> select() on windows has "waits at most" semantics, or how they put
> it[2]: "specifies the maximum time that select should wait before
> returning.". It returns early by design! Is this a known thing? I
> don't think there is much one can do here. It probably makes no sense
> to loop it and wait for time remainder.
> Java's soTimeout does not specify[3] should it wait at least or at
> most the specified timeout, so it's fine I guess. Old, "plain" socket
> impl are not much different.
> 
> If the above is correct, should I just add a tolerance for the lower bound?
> 
> [1] http://man7.org/linux/man-pages/man2/poll.2.html
> [2] 
> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
> [3] 
> https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int-
> 
> On Mon, 23 Sep 2019 at 16:15, Pavel Rappo  wrote:
>> 
>> Milan,
>> 
>> I'm observing the latest version (.04) of this test failing quite frequently 
>> (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test 
>> passes fine on macOS and Linux. Here's the typical output I see in the logs:
>> 
>>java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
>>java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
>>java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
>>java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
>>...
>> 
>> Now, there might be many reasons for that. One of which would be that the 
>> DnsClient code is buggy. The other reason would be that the accuracy 
>> guaranteed by Windows implementation of `read` is not what we would expect. 
>> Would you be able to investigate that?
>> 
>> P.S. The good news is that the CSR has been approved:
>> 
>>https://bugs.openjdk.java.net/browse/JDK-8230965
>> 
>>> On 23 Sep 2019, at 14:20, Milan Mimica  wrote:
>>> 
>>> Got it. Thanks Pavel!
>>> 
>>> 
>>> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo  wrote:
 
 Milan,
 
 How do you check which tests are run? That's what I see in the 
 /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
  file after I have run the test locally on my machine:
 
 --messages:(5/233)--
 command: main TcpTimeout
 reason: User specified action: run main TcpTimeout
 Mode: othervm
 Additional options from @modules: --add-modules java.base --add-exports 
 java.base/sun.security.util=ALL-UNNAMED
 elapsed time (seconds): 1.751
 
 ...
 
 --messages:(5/313)--
 command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
 reason: User specified action: run main TcpTimeout 
 -Dcom.sun.jndi.dns.timeout.initial=5000
 Mode: othervm
 Additional options from @modules: --add-modules java.base --add-exports 
 java.base/sun.security.util=ALL-UNNAMED
 elapsed time (seconds): 5.498
 
 
 
 Which is consistent with what I would expect given the timeout values.
 
 The following output does not tell the full story, just the name of the 
 test:
 
 ==
 Test summary
 ==
  TEST  TOTAL  PASS  FAIL ERROR
  jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
1 1 0 0
 ==
 TEST SUCCESS
 
 -Pavel
 
> On 20 Sep 2019, at 15:42, Milan Mimica  wrote:
> 
> Pavel,
> 
> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
> I don't see the 

Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-24 Thread Milan Mimica
Hi Pavel

Wow, I find this awesome. I don't have a Windows machine to play with,
but I think I may have found something.
The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix
it uses poll(2), on Windows it uses select(2). Regarding timeouts,
poll() has "wait at least" semantics and overruns by design[1], while
select() on windows has "waits at most" semantics, or how they put
it[2]: "specifies the maximum time that select should wait before
returning.". It returns early by design! Is this a known thing? I
don't think there is much one can do here. It probably makes no sense
to loop it and wait for time remainder.
Java's soTimeout does not specify[3] should it wait at least or at
most the specified timeout, so it's fine I guess. Old, "plain" socket
impl are not much different.

If the above is correct, should I just add a tolerance for the lower bound?

[1] http://man7.org/linux/man-pages/man2/poll.2.html
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
[3] 
https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int-

On Mon, 23 Sep 2019 at 16:15, Pavel Rappo  wrote:
>
> Milan,
>
> I'm observing the latest version (.04) of this test failing quite frequently 
> (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test 
> passes fine on macOS and Linux. Here's the typical output I see in the logs:
>
> java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
> java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
> java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
> java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
> ...
>
> Now, there might be many reasons for that. One of which would be that the 
> DnsClient code is buggy. The other reason would be that the accuracy 
> guaranteed by Windows implementation of `read` is not what we would expect. 
> Would you be able to investigate that?
>
> P.S. The good news is that the CSR has been approved:
>
> https://bugs.openjdk.java.net/browse/JDK-8230965
>
> > On 23 Sep 2019, at 14:20, Milan Mimica  wrote:
> >
> > Got it. Thanks Pavel!
> >
> >
> > On Mon, 23 Sep 2019 at 13:37, Pavel Rappo  wrote:
> >>
> >> Milan,
> >>
> >> How do you check which tests are run? That's what I see in the 
> >> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
> >>  file after I have run the test locally on my machine:
> >>
> >> --messages:(5/233)--
> >> command: main TcpTimeout
> >> reason: User specified action: run main TcpTimeout
> >> Mode: othervm
> >> Additional options from @modules: --add-modules java.base --add-exports 
> >> java.base/sun.security.util=ALL-UNNAMED
> >> elapsed time (seconds): 1.751
> >>
> >> ...
> >>
> >> --messages:(5/313)--
> >> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
> >> reason: User specified action: run main TcpTimeout 
> >> -Dcom.sun.jndi.dns.timeout.initial=5000
> >> Mode: othervm
> >> Additional options from @modules: --add-modules java.base --add-exports 
> >> java.base/sun.security.util=ALL-UNNAMED
> >> elapsed time (seconds): 5.498
> >>
> >> 
> >>
> >> Which is consistent with what I would expect given the timeout values.
> >>
> >> The following output does not tell the full story, just the name of the 
> >> test:
> >>
> >> ==
> >> Test summary
> >> ==
> >>   TEST  TOTAL  PASS  FAIL ERROR
> >>   jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
> >> 1 1 0 0
> >> ==
> >> TEST SUCCESS
> >>
> >> -Pavel
> >>
> >>> On 20 Sep 2019, at 15:42, Milan Mimica  wrote:
> >>>
> >>> Pavel,
> >>>
> >>> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
> >>> I don't see the test is run twice when I execute "make test
> >>> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
> >>> I missing something?
> >>>
> >
> >
> > --
> > Milan Mimica
>


-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-23 Thread Pavel Rappo
Milan,

I'm observing the latest version (.04) of this test failing quite frequently 
(4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test 
passes fine on macOS and Linux. Here's the typical output I see in the logs:

java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
...

Now, there might be many reasons for that. One of which would be that the 
DnsClient code is buggy. The other reason would be that the accuracy guaranteed 
by Windows implementation of `read` is not what we would expect. Would you be 
able to investigate that?

P.S. The good news is that the CSR has been approved:

https://bugs.openjdk.java.net/browse/JDK-8230965

> On 23 Sep 2019, at 14:20, Milan Mimica  wrote:
> 
> Got it. Thanks Pavel!
> 
> 
> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo  wrote:
>> 
>> Milan,
>> 
>> How do you check which tests are run? That's what I see in the 
>> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
>>  file after I have run the test locally on my machine:
>> 
>> --messages:(5/233)--
>> command: main TcpTimeout
>> reason: User specified action: run main TcpTimeout
>> Mode: othervm
>> Additional options from @modules: --add-modules java.base --add-exports 
>> java.base/sun.security.util=ALL-UNNAMED
>> elapsed time (seconds): 1.751
>> 
>> ...
>> 
>> --messages:(5/313)--
>> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>> reason: User specified action: run main TcpTimeout 
>> -Dcom.sun.jndi.dns.timeout.initial=5000
>> Mode: othervm
>> Additional options from @modules: --add-modules java.base --add-exports 
>> java.base/sun.security.util=ALL-UNNAMED
>> elapsed time (seconds): 5.498
>> 
>> 
>> 
>> Which is consistent with what I would expect given the timeout values.
>> 
>> The following output does not tell the full story, just the name of the test:
>> 
>> ==
>> Test summary
>> ==
>>   TEST  TOTAL  PASS  FAIL ERROR
>>   jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
>> 1 1 0 0
>> ==
>> TEST SUCCESS
>> 
>> -Pavel
>> 
>>> On 20 Sep 2019, at 15:42, Milan Mimica  wrote:
>>> 
>>> Pavel,
>>> 
>>> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
>>> I don't see the test is run twice when I execute "make test
>>> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
>>> I missing something?
>>> 
> 
> 
> -- 
> Milan Mimica



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-23 Thread Milan Mimica
Got it. Thanks Pavel!


On Mon, 23 Sep 2019 at 13:37, Pavel Rappo  wrote:
>
> Milan,
>
> How do you check which tests are run? That's what I see in the 
> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
>  file after I have run the test locally on my machine:
>
> --messages:(5/233)--
> command: main TcpTimeout
> reason: User specified action: run main TcpTimeout
> Mode: othervm
> Additional options from @modules: --add-modules java.base --add-exports 
> java.base/sun.security.util=ALL-UNNAMED
> elapsed time (seconds): 1.751
>
> ...
>
> --messages:(5/313)--
> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
> reason: User specified action: run main TcpTimeout 
> -Dcom.sun.jndi.dns.timeout.initial=5000
> Mode: othervm
> Additional options from @modules: --add-modules java.base --add-exports 
> java.base/sun.security.util=ALL-UNNAMED
> elapsed time (seconds): 5.498
>
> 
>
> Which is consistent with what I would expect given the timeout values.
>
> The following output does not tell the full story, just the name of the test:
>
> ==
> Test summary
> ==
>TEST  TOTAL  PASS  FAIL ERROR
>jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
>  1 1 0 0
> ==
> TEST SUCCESS
>
> -Pavel
>
> > On 20 Sep 2019, at 15:42, Milan Mimica  wrote:
> >
> > Pavel,
> >
> > Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
> > I don't see the test is run twice when I execute "make test
> > TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
> > I missing something?
> >


-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-23 Thread Pavel Rappo
Milan,

How do you check which tests are run? That's what I see in the 
/test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
 file after I have run the test locally on my machine:

--messages:(5/233)--
command: main TcpTimeout
reason: User specified action: run main TcpTimeout 
Mode: othervm
Additional options from @modules: --add-modules java.base --add-exports 
java.base/sun.security.util=ALL-UNNAMED
elapsed time (seconds): 1.751

...

--messages:(5/313)--
command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
reason: User specified action: run main TcpTimeout 
-Dcom.sun.jndi.dns.timeout.initial=5000 
Mode: othervm
Additional options from @modules: --add-modules java.base --add-exports 
java.base/sun.security.util=ALL-UNNAMED
elapsed time (seconds): 5.498



Which is consistent with what I would expect given the timeout values. 

The following output does not tell the full story, just the name of the test:

==
Test summary
==
   TEST  TOTAL  PASS  FAIL ERROR   
   jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
 1 1 0 0   
==
TEST SUCCESS

-Pavel

> On 20 Sep 2019, at 15:42, Milan Mimica  wrote:
> 
> Pavel,
> 
> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
> I don't see the test is run twice when I execute "make test
> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
> I missing something?
> 
> On Thu, 19 Sep 2019 at 13:16, Pavel Rappo  wrote:
>> 
>> Milan,
>> 
>> This looks like a good start. Please consider
>> 
>> 1. Setting TOLERANCE to 5 seconds
>> 2. Getting the second time mark immediately after the query returns (i.e. do 
>> not waste your time in DNSTestUtils.debug(attrs))
>> 3. Making the test parametric instead of hardcoded for the 
>> DEFAULT_DNS_CLIENT_TIMEOUT
>> 4. Running the test for at least 2 different values of the timeout, e.g.:
>> 
>>* @run main TcpTimeout
>>* @run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>> 
>> As for you question, I'm not sure how we would be able to communicate the 
>> fact that the response is truncated to the user. You could try to ask this 
>> on the net-dev mailing list.
>> 
>> -Pavel
>> 
>>> On 18 Sep 2019, at 14:25, Milan Mimica  wrote:
>>> 
>>> Hi Pavel
>>> 
>>> Sure. Here is the incremental change:
>>> http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/
>>> 
>>> What actually bothers me from the beginning is the truncated response.
>>> The TXT attribute, a String, prints "A very popular h", but does not
>>> equals("A very popular h"), because of some stray bytes. I guess it's
>>> because of how DNS response parsing works. I can imagine how this
>>> could cause problems to users. I think, at least, we should have a way
>>> to tell the user that the response is truncated, and the payload is
>>> partial/invalid.
>>> 
>>> 
>>> On Tue, 17 Sep 2019 at 15:09, Pavel Rappo  wrote:
 
 Milan,
 
 While the CSR is being processed, could we maybe think of some additional 
 testing for that change? Otherwise, that test seems kind of anemic. It 
 makes sure that the query doesn't hang, but that's about it. It doesn't 
 check that the timeout is respected. I was wondering if you could propose 
 some way of testing that.
 
> On 17 Sep 2019, at 09:55, Pavel Rappo  wrote:
> 
> I have filed the CSR:
> 
>  https://bugs.openjdk.java.net/browse/JDK-8230965
> 
>> On 13 Sep 2019, at 11:21, Pavel Rappo  wrote:
>> 
>> Here's the latest webrev accumulating all the changes we've discussed so 
>> far:
>> 
>> http://cr.openjdk.java.net/~prappo/8228580/webrev.03/
>> 
>> If people are okay with that I will proceed to creating a CSR.
 
>>> 
>>> 
>>> --
>>> Milan Mimica
>> 
> 
> 
> -- 
> Milan Mimica



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-20 Thread Milan Mimica
Pavel,

Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
I don't see the test is run twice when I execute "make test
TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
I missing something?

On Thu, 19 Sep 2019 at 13:16, Pavel Rappo  wrote:
>
> Milan,
>
> This looks like a good start. Please consider
>
> 1. Setting TOLERANCE to 5 seconds
> 2. Getting the second time mark immediately after the query returns (i.e. do 
> not waste your time in DNSTestUtils.debug(attrs))
> 3. Making the test parametric instead of hardcoded for the 
> DEFAULT_DNS_CLIENT_TIMEOUT
> 4. Running the test for at least 2 different values of the timeout, e.g.:
>
> * @run main TcpTimeout
> * @run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>
> As for you question, I'm not sure how we would be able to communicate the 
> fact that the response is truncated to the user. You could try to ask this on 
> the net-dev mailing list.
>
> -Pavel
>
> > On 18 Sep 2019, at 14:25, Milan Mimica  wrote:
> >
> > Hi Pavel
> >
> > Sure. Here is the incremental change:
> > http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/
> >
> > What actually bothers me from the beginning is the truncated response.
> > The TXT attribute, a String, prints "A very popular h", but does not
> > equals("A very popular h"), because of some stray bytes. I guess it's
> > because of how DNS response parsing works. I can imagine how this
> > could cause problems to users. I think, at least, we should have a way
> > to tell the user that the response is truncated, and the payload is
> > partial/invalid.
> >
> >
> > On Tue, 17 Sep 2019 at 15:09, Pavel Rappo  wrote:
> >>
> >> Milan,
> >>
> >> While the CSR is being processed, could we maybe think of some additional 
> >> testing for that change? Otherwise, that test seems kind of anemic. It 
> >> makes sure that the query doesn't hang, but that's about it. It doesn't 
> >> check that the timeout is respected. I was wondering if you could propose 
> >> some way of testing that.
> >>
> >>> On 17 Sep 2019, at 09:55, Pavel Rappo  wrote:
> >>>
> >>> I have filed the CSR:
> >>>
> >>>   https://bugs.openjdk.java.net/browse/JDK-8230965
> >>>
>  On 13 Sep 2019, at 11:21, Pavel Rappo  wrote:
> 
>  Here's the latest webrev accumulating all the changes we've discussed so 
>  far:
> 
>   http://cr.openjdk.java.net/~prappo/8228580/webrev.03/
> 
>  If people are okay with that I will proceed to creating a CSR.
> >>
> >
> >
> > --
> > Milan Mimica
>


-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-19 Thread Pavel Rappo
Milan,

This looks like a good start. Please consider

1. Setting TOLERANCE to 5 seconds
2. Getting the second time mark immediately after the query returns (i.e. do 
not waste your time in DNSTestUtils.debug(attrs))
3. Making the test parametric instead of hardcoded for the 
DEFAULT_DNS_CLIENT_TIMEOUT
4. Running the test for at least 2 different values of the timeout, e.g.:

* @run main TcpTimeout
* @run main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000

As for you question, I'm not sure how we would be able to communicate the fact 
that the response is truncated to the user. You could try to ask this on the 
net-dev mailing list.

-Pavel

> On 18 Sep 2019, at 14:25, Milan Mimica  wrote:
> 
> Hi Pavel
> 
> Sure. Here is the incremental change:
> http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/
> 
> What actually bothers me from the beginning is the truncated response.
> The TXT attribute, a String, prints "A very popular h", but does not
> equals("A very popular h"), because of some stray bytes. I guess it's
> because of how DNS response parsing works. I can imagine how this
> could cause problems to users. I think, at least, we should have a way
> to tell the user that the response is truncated, and the payload is
> partial/invalid.
> 
> 
> On Tue, 17 Sep 2019 at 15:09, Pavel Rappo  wrote:
>> 
>> Milan,
>> 
>> While the CSR is being processed, could we maybe think of some additional 
>> testing for that change? Otherwise, that test seems kind of anemic. It makes 
>> sure that the query doesn't hang, but that's about it. It doesn't check that 
>> the timeout is respected. I was wondering if you could propose some way of 
>> testing that.
>> 
>>> On 17 Sep 2019, at 09:55, Pavel Rappo  wrote:
>>> 
>>> I have filed the CSR:
>>> 
>>>   https://bugs.openjdk.java.net/browse/JDK-8230965
>>> 
 On 13 Sep 2019, at 11:21, Pavel Rappo  wrote:
 
 Here's the latest webrev accumulating all the changes we've discussed so 
 far:
 
  http://cr.openjdk.java.net/~prappo/8228580/webrev.03/
 
 If people are okay with that I will proceed to creating a CSR.
>> 
> 
> 
> -- 
> Milan Mimica



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-18 Thread Milan Mimica
Hi Pavel

Sure. Here is the incremental change:
http://cr.openjdk.java.net/~mmimica/8228580/webrev.03/

What actually bothers me from the beginning is the truncated response.
The TXT attribute, a String, prints "A very popular h", but does not
equals("A very popular h"), because of some stray bytes. I guess it's
because of how DNS response parsing works. I can imagine how this
could cause problems to users. I think, at least, we should have a way
to tell the user that the response is truncated, and the payload is
partial/invalid.


On Tue, 17 Sep 2019 at 15:09, Pavel Rappo  wrote:
>
> Milan,
>
> While the CSR is being processed, could we maybe think of some additional 
> testing for that change? Otherwise, that test seems kind of anemic. It makes 
> sure that the query doesn't hang, but that's about it. It doesn't check that 
> the timeout is respected. I was wondering if you could propose some way of 
> testing that.
>
> > On 17 Sep 2019, at 09:55, Pavel Rappo  wrote:
> >
> > I have filed the CSR:
> >
> >https://bugs.openjdk.java.net/browse/JDK-8230965
> >
> >> On 13 Sep 2019, at 11:21, Pavel Rappo  wrote:
> >>
> >> Here's the latest webrev accumulating all the changes we've discussed so 
> >> far:
> >>
> >>   http://cr.openjdk.java.net/~prappo/8228580/webrev.03/
> >>
> >> If people are okay with that I will proceed to creating a CSR.
>


-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-17 Thread Pavel Rappo
Milan,

While the CSR is being processed, could we maybe think of some additional 
testing for that change? Otherwise, that test seems kind of anemic. It makes 
sure that the query doesn't hang, but that's about it. It doesn't check that 
the timeout is respected. I was wondering if you could propose some way of 
testing that.

> On 17 Sep 2019, at 09:55, Pavel Rappo  wrote:
> 
> I have filed the CSR:
> 
>https://bugs.openjdk.java.net/browse/JDK-8230965
> 
>> On 13 Sep 2019, at 11:21, Pavel Rappo  wrote:
>> 
>> Here's the latest webrev accumulating all the changes we've discussed so far:
>> 
>>   http://cr.openjdk.java.net/~prappo/8228580/webrev.03/
>> 
>> If people are okay with that I will proceed to creating a CSR.



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-17 Thread Pavel Rappo
I have filed the CSR:

https://bugs.openjdk.java.net/browse/JDK-8230965

> On 13 Sep 2019, at 11:21, Pavel Rappo  wrote:
> 
> Here's the latest webrev accumulating all the changes we've discussed so far:
> 
>http://cr.openjdk.java.net/~prappo/8228580/webrev.03/
> 
> If people are okay with that I will proceed to creating a CSR.
> 
>> On 12 Sep 2019, at 20:06, Alan Bateman  wrote:
>> 
>> On 12/09/2019 12:41, Chris Hegarty wrote:
>>> Here is an initial attempt to document these timeout/retry properties. It’s 
>>> effectively the wording lifted from the tech notes [1], with “UDP” removed.
>> Thanks, this look and dropping the reference to "UDP" make sense.
>> 
>> One minor nit is that "are relevant to the DNS provider" might be better 
>> worded as "may be used when creating the initial context". This will fit 
>> nicely if/when someone adds a description to the @provides tag. At some 
>> point I hope the other properties can be used too but that is beyond the 
>> scope of JDK-8228580.
>> 
>> -Alan
> 



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-13 Thread Pavel Rappo
Here's the latest webrev accumulating all the changes we've discussed so far:

http://cr.openjdk.java.net/~prappo/8228580/webrev.03/

If people are okay with that I will proceed to creating a CSR.

> On 12 Sep 2019, at 20:06, Alan Bateman  wrote:
> 
> On 12/09/2019 12:41, Chris Hegarty wrote:
>> Here is an initial attempt to document these timeout/retry properties. It’s 
>> effectively the wording lifted from the tech notes [1], with “UDP” removed.
> Thanks, this look and dropping the reference to "UDP" make sense.
> 
> One minor nit is that "are relevant to the DNS provider" might be better 
> worded as "may be used when creating the initial context". This will fit 
> nicely if/when someone adds a description to the @provides tag. At some point 
> I hope the other properties can be used too but that is beyond the scope of 
> JDK-8228580.
> 
> -Alan



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-12 Thread Alan Bateman

On 12/09/2019 12:41, Chris Hegarty wrote:
Here is an initial attempt to document these timeout/retry properties. 
It’s effectively the wording lifted from the tech notes [1], with 
“UDP” removed.

Thanks, this look and dropping the reference to "UDP" make sense.

One minor nit is that "are relevant to the DNSprovider" might be better 
worded as "may be used when creating the initial context". This will fit 
nicely if/when someone adds a description to the @provides tag. At some 
point I hope the other properties can be used too but that is beyond the 
scope of JDK-8228580.


-Alan


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-12 Thread Pavel Rappo
I like the wording, it's concise. Thanks a lot for helping us with this, Chris!

Trivially, there's a typo: <2> instead of .

> On 12 Sep 2019, at 12:41, Chris Hegarty  wrote:
> 
> Here is an initial attempt to document these timeout/retry properties. It’s 
> effectively the wording lifted from the tech notes [1], with “UDP” removed.
> 
> I deliberately avoided describing the implementation. It serves little 
> purpose, and in fact will greatly complicate the description, as well as 
> constrain the implementation. I personally dislike parts of the existing 
> implementation ( which I will ignore ), so baking them into the spec would be 
> a mistake.
> 
> There are clearly a lot more properties that could be specified, that is far 
> far beyond the scope of this change.
> 
> If we add such documentation, then a CSR ( with JDK-scope ) will be needed.
> 
> --- a/src/jdk.naming.dns/share/classes/module-info.java
> +++ b/src/jdk.naming.dns/share/classes/module-info.java
> @@ -26,7 +26,38 @@
>  /**
>   * Provides the implementation of the DNS Java Naming provider.
>   *
> + * <2>Environment Properties
> + *
> + *  The following JNDI environment properties are relevant to the DNS
> + * provider.
> + *
> + * 
> + *com.sun.jndi.dns.timeout.initial
> + *com.sun.jndi.dns.timeout.retries
> + *  
> + *
> + *  These properties are used to alter the timeout-related defaults that 
> the
> + * DNS provider uses when submitting queries. The DNS provider submits 
> queries
> + * using the following exponential backoff algorithm. The provider submits a
> + * query to a DNS server and waits for a response to arrive within a timeout
> + * period (1 second by default). If it receives no response within the 
> timeout
> + * period, it queries the next server, and so on. If the provider receives no
> + * response from any server, it doubles the timeout period and repeats the
> + * process of submitting the query to each server, up to a maximum number of
> + * retries (4 by default).
> + *
> + *  The {@code com.sun.jndi.dns.timeout.initial} property, if set, 
> specifies
> + * the number of milliseconds to use as the initial timeout period (i.e., 
> before
> + * any doubling). If this property has not been set, the default initial 
> timeout
> + * is 1000 milliseconds.
> + *
> + *  The {@code com.sun.jndi.dns.timeout.retries} property, if set, 
> specifies
> + * the number of times to retry each server using the exponential backoff
> + * algorithm described previously. If this property has not been set, the
> + * default number of retries is 4.
> + *
>   * @provides javax.naming.spi.InitialContextFactory
> + *
>   * @moduleGraph
>   * @since 9
>   */
> 
> -Chris.
> 
> [1] https://docs.oracle.com/javase/6/docs/technotes/guides/jndi/jndi-dns.html
> 
>> On 11 Sep 2019, at 16:55, Alan Bateman  wrote:
>> 
>> On 11/09/2019 16:16, Pavel Rappo wrote:
>>> 
 On 11 Sep 2019, at 16:10, Alan Bateman  wrote:
 
 I assume extending the timeout to TCP will require at least some minimal 
 updates to the descriptions.
>>> Which descriptions do you have in mind? I cannot seem to find that text 
>>> anywhere in the current repo (jdk/jdk)
>> I don't think the jndi-dns docs page is in the jdk repo. Since JDK 9, a good 
>> place for service providers to document capability and configuration is 
>> their module description and I think at least some of the documentation from 
>> that page should move into the jdk.naming.dns module description (that's for 
>> another issue of course). However, for the changes under discussion here 
>> then I don't think it's unreasonable to provide an updated description of 
>> the timeout property.
>> 
>> -Alan
>> 
> 



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-12 Thread Chris Hegarty
Here is an initial attempt to document these timeout/retry properties. It’s 
effectively the wording lifted from the tech notes [1], with “UDP” removed.

I deliberately avoided describing the implementation. It serves little purpose, 
and in fact will greatly complicate the description, as well as constrain the 
implementation. I personally dislike parts of the existing implementation ( 
which I will ignore ), so baking them into the spec would be a mistake.

There are clearly a lot more properties that could be specified, that is far 
far beyond the scope of this change.

If we add such documentation, then a CSR ( with JDK-scope ) will be needed.

--- a/src/jdk.naming.dns/share/classes/module-info.java
+++ b/src/jdk.naming.dns/share/classes/module-info.java
@@ -26,7 +26,38 @@
 /**
  * Provides the implementation of the DNS Java Naming provider.
  *
+ * <2>Environment Properties
+ *
+ *  The following JNDI environment properties are relevant to the DNS
+ * provider.
+ *
+ * 
+ *com.sun.jndi.dns.timeout.initial
+ *com.sun.jndi.dns.timeout.retries
+ *  
+ *
+ *  These properties are used to alter the timeout-related defaults that the
+ * DNS provider uses when submitting queries. The DNS provider submits queries
+ * using the following exponential backoff algorithm. The provider submits a
+ * query to a DNS server and waits for a response to arrive within a timeout
+ * period (1 second by default). If it receives no response within the timeout
+ * period, it queries the next server, and so on. If the provider receives no
+ * response from any server, it doubles the timeout period and repeats the
+ * process of submitting the query to each server, up to a maximum number of
+ * retries (4 by default).
+ *
+ *  The {@code com.sun.jndi.dns.timeout.initial} property, if set, specifies
+ * the number of milliseconds to use as the initial timeout period (i.e., 
before
+ * any doubling). If this property has not been set, the default initial 
timeout
+ * is 1000 milliseconds.
+ *
+ *  The {@code com.sun.jndi.dns.timeout.retries} property, if set, specifies
+ * the number of times to retry each server using the exponential backoff
+ * algorithm described previously. If this property has not been set, the
+ * default number of retries is 4.
+ *
  * @provides javax.naming.spi.InitialContextFactory
+ *
  * @moduleGraph
  * @since 9
  */

-Chris.

[1] https://docs.oracle.com/javase/6/docs/technotes/guides/jndi/jndi-dns.html

> On 11 Sep 2019, at 16:55, Alan Bateman  wrote:
> 
> On 11/09/2019 16:16, Pavel Rappo wrote:
>> 
>>> On 11 Sep 2019, at 16:10, Alan Bateman  wrote:
>>> 
>>> I assume extending the timeout to TCP will require at least some minimal 
>>> updates to the descriptions.
>> Which descriptions do you have in mind? I cannot seem to find that text 
>> anywhere in the current repo (jdk/jdk)
> I don't think the jndi-dns docs page is in the jdk repo. Since JDK 9, a good 
> place for service providers to document capability and configuration is their 
> module description and I think at least some of the documentation from that 
> page should move into the jdk.naming.dns module description (that's for 
> another issue of course). However, for the changes under discussion here then 
> I don't think it's unreasonable to provide an updated description of the 
> timeout property.
> 
> -Alan
> 



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-11 Thread Pavel Rappo
> On 11 Sep 2019, at 16:55, Alan Bateman  wrote:
> 
> I don't think the jndi-dns docs page is in the jdk repo.

Could you kindly point me to the location of the current jndi-dns docs page? I 
can't seem to be able to even find anything related to that on the web. You 
might want to try to use your favorite search engine.

> 
> 
> However, for the changes under discussion here then I don't think it's 
> unreasonable to provide an updated description of the timeout property.

It might be reasonable. What wording would you suggest? To me it already 
*vaguely* addresses the TCP case by saying that "Each lookup is initially 
performed using UDP. If the response is too long to be returned in a UDP packet 
without being truncated, the lookup is repeated using TCP."

-Pavel



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-11 Thread Alan Bateman

On 11/09/2019 16:16, Pavel Rappo wrote:



On 11 Sep 2019, at 16:10, Alan Bateman  wrote:

I assume extending the timeout to TCP will require at least some minimal 
updates to the descriptions.

Which descriptions do you have in mind? I cannot seem to find that text 
anywhere in the current repo (jdk/jdk)
I don't think the jndi-dns docs page is in the jdk repo. Since JDK 9, a 
good place for service providers to document capability and 
configuration is their module description and I think at least some of 
the documentation from that page should move into the jdk.naming.dns 
module description (that's for another issue of course). However, for 
the changes under discussion here then I don't think it's unreasonable 
to provide an updated description of the timeout property.


-Alan



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-11 Thread Pavel Rappo



> On 11 Sep 2019, at 16:10, Alan Bateman  wrote:
> 
> I assume extending the timeout to TCP will require at least some minimal 
> updates to the descriptions. 

Which descriptions do you have in mind? I cannot seem to find that text 
anywhere in the current repo (jdk/jdk)

$ hg tip
changeset:   56237:cddef3bde924
tag: tip
user:lkorinth
date:Wed Sep 11 14:16:30 2019 +0200
summary: 8230398: Remove NULL checks before FREE_C_HEAP_ARRAY.



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-11 Thread Alan Bateman

On 11/09/2019 15:56, Pavel Rappo wrote:

Sure, from 
https://docs.oracle.com/javase/8/docs/technotes/guides/jndi/jndi-dns.html:

"Each lookup is initially performed using UDP. If the response is too long to be 
returned in a UDP packet without being truncated, the lookup is repeated using TCP."

and

"com.example.jndi.dns.timeout.initial
com.example.jndi.dns.timeout.retries

These properties are used to alter the timeout-related defaults that the DNS 
provider uses when submitting UDP queries. The DNS provider submits UDP queries 
using the following exponential backoff algorithm. The provider submits a query 
to a DNS server and waits for a response to arrive within a timeout period (1 
second by default). If it receives no response within the timeout period, it 
queries the next server, and so on. If the provider receives no response from 
any server, it doubles the timeout period and repeats the process of submitting 
the query to each server, up to a maximum number of retries (4 by default).

The "com.example.jndi.dns.timeout.initial" property, if set, specifies the 
number of milliseconds to use as the initial timeout period (i.e., before any doubling). 
If this property has not been set, the default initial timeout is 1000 milliseconds.

The "com.example.jndi.dns.timeout.retries" property, if set, specifies the number of 
times to retry each server using the exponential backoff algorithm described previously. If 
this property has not been set, the default number of retries is 4."

I cannot seem to find a newer version of that document though.

I assume extending the timeout to TCP will require at least some minimal 
updates to the descriptions. That will help reviewers and help decide if 
a CSR is needed or not. Ideally the authoritative descriptions of these 
properties would be in the javadoc, probably in the jdk.naming.dns 
module description.


-Alan


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-11 Thread Pavel Rappo
Sure, from 
https://docs.oracle.com/javase/8/docs/technotes/guides/jndi/jndi-dns.html:

"Each lookup is initially performed using UDP. If the response is too long to 
be returned in a UDP packet without being truncated, the lookup is repeated 
using TCP."

and

"com.example.jndi.dns.timeout.initial
com.example.jndi.dns.timeout.retries

These properties are used to alter the timeout-related defaults that the DNS 
provider uses when submitting UDP queries. The DNS provider submits UDP queries 
using the following exponential backoff algorithm. The provider submits a query 
to a DNS server and waits for a response to arrive within a timeout period (1 
second by default). If it receives no response within the timeout period, it 
queries the next server, and so on. If the provider receives no response from 
any server, it doubles the timeout period and repeats the process of submitting 
the query to each server, up to a maximum number of retries (4 by default).

The "com.example.jndi.dns.timeout.initial" property, if set, specifies the 
number of milliseconds to use as the initial timeout period (i.e., before any 
doubling). If this property has not been set, the default initial timeout is 
1000 milliseconds.

The "com.example.jndi.dns.timeout.retries" property, if set, specifies the 
number of times to retry each server using the exponential backoff algorithm 
described previously. If this property has not been set, the default number of 
retries is 4."

I cannot seem to find a newer version of that document though.

> On 11 Sep 2019, at 14:58, Alan Bateman  wrote:
> 
> On 11/09/2019 13:26, Pavel Rappo wrote:
>> I'm happy with the overall changeset. I have (once again) made some tiny 
>> changes, you can see them here:
>> 
>> http://cr.openjdk.java.net/~prappo/8228580/webrev.02/
>> 
>> If you are okay with them, then we wait for a *R*eviewer. If the Reviewer(s) 
>> are okay with them, we push. For the record, I'm not really happy with how 
>> we used the DNSTestBase/TestBase infrastructure, however I'm totally fine 
>> with the retrying logic.
>> 
> Would it be possible to write a clear description for the 
> com.sun.jndi.dns.timeout and com.sun.jndi.dns.retries properties? The current 
> description in the JNDI docs describes them for UDP only. I realize this is 
> may be out of date but new descriptions would be useful when the docs are 
> updated. Also I think would be useful for potential Reviewers.
> 
> -Alan



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-11 Thread Alan Bateman

On 11/09/2019 13:26, Pavel Rappo wrote:

I'm happy with the overall changeset. I have (once again) made some tiny 
changes, you can see them here:

 http://cr.openjdk.java.net/~prappo/8228580/webrev.02/

If you are okay with them, then we wait for a *R*eviewer. If the Reviewer(s) 
are okay with them, we push. For the record, I'm not really happy with how we 
used the DNSTestBase/TestBase infrastructure, however I'm totally fine with the 
retrying logic.

Would it be possible to write a clear description for the 
com.sun.jndi.dns.timeout and com.sun.jndi.dns.retries properties? The 
current description in the JNDI docs describes them for UDP only. I 
realize this is may be out of date but new descriptions would be useful 
when the docs are updated. Also I think would be useful for potential 
Reviewers.


-Alan


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-11 Thread Milan Mimica
Hi Pavel

Your changes look good to me.

On Wed, 11 Sep 2019 at 14:26, Pavel Rappo  wrote:
>
> I'm happy with the overall changeset. I have (once again) made some tiny 
> changes, you can see them here:
>
> http://cr.openjdk.java.net/~prappo/8228580/webrev.02/
>
> If you are okay with them, then we wait for a *R*eviewer. If the Reviewer(s) 
> are okay with them, we push. For the record, I'm not really happy with how we 
> used the DNSTestBase/TestBase infrastructure, however I'm totally fine with 
> the retrying logic.
>
> Test results are pending.
>
> -Pavel
>
> > On 10 Sep 2019, at 16:33, Milan Mimica  wrote:
> >
> >>> On 5 Sep 2019, at 16:02, Pavel Rappo  wrote:
> >>>
> >>> I think we are almost there. What do you think of the following 
> >>> incremental (i.e. on top of your latest webrev) change?
> >>>
> >>>   http://cr.openjdk.java.net/~prappo/8228580/webrev.01/
> >>>
> >>> I fixed a couple of trivial typos and addressed the socket relinquishing 
> >>> issue. Initializing a socket is not an atomic "all-or-nothing" operation 
> >>> now. Someone needs to take care of the socket in case things go not as 
> >>> planned.
> >
> > Right. Thanks. Here is the merged version:
> > http://cr.openjdk.java.net/~mmimica/8228580/webrev.02/
> > Plus, I have added TCP server init retry code from Chris. Works fine
> > without changes to TestBase.
> >
> >
> > --
> > Milan Mimica
>


-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-10 Thread Milan Mimica
Hi Florian

On Mon, 9 Sep 2019 at 15:04, Florian Weimer  wrote:
>
> Ahh.  The other option is to stick with one server and keep resending
> with larger and larger timeouts.  Switching has the advantage that in
> case of a server problem, you get to a working server more quickly.
> Staying means that if the answer is delayed and you resend exactly the
> same query, you might still pick up the answer to the original query and
> process it, after the first timeout.

Furthermore, the behaviour is documented [1][2] and thus can't be just changed.

>
> But we know that the server is up because it responded our UDP, so
> waiting more than one second for the TCP handshake to succeed might
> worthwhile, yes.

Heh, except the ticket we are trying to solve here is exactly about
the TCP server not being up.

So what do you suggest exactly? Should we set a lower cap? Something
like: Math.max(timeout, 2000 /* or some other value?*/)
The rationale is, if the client requests a initial timeout larger than
1 second (how much larger?) then we can use it as-is. For lower values
(default included) we need to override it to some value that, at the
end of the day, is platform-specific. Sorry, I just don't see how I
can get this right without introducing a bunch of arbitrary constants
to the code.


[1] 
https://docs.oracle.com/javase/7/docs/technotes/guides/jndi/jndi-dns.html#TIMEOUT
[2] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4630910



--
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-10 Thread Milan Mimica
> > On 5 Sep 2019, at 16:02, Pavel Rappo  wrote:
> >
> > I think we are almost there. What do you think of the following incremental 
> > (i.e. on top of your latest webrev) change?
> >
> >http://cr.openjdk.java.net/~prappo/8228580/webrev.01/
> >
> > I fixed a couple of trivial typos and addressed the socket relinquishing 
> > issue. Initializing a socket is not an atomic "all-or-nothing" operation 
> > now. Someone needs to take care of the socket in case things go not as 
> > planned.

Right. Thanks. Here is the merged version:
http://cr.openjdk.java.net/~mmimica/8228580/webrev.02/
Plus, I have added TCP server init retry code from Chris. Works fine
without changes to TestBase.


-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-10 Thread Pavel Rappo
Milan,

ping?

> On 5 Sep 2019, at 16:02, Pavel Rappo  wrote:
> 
> I think we are almost there. What do you think of the following incremental 
> (i.e. on top of your latest webrev) change?
> 
>http://cr.openjdk.java.net/~prappo/8228580/webrev.01/
> 
> I fixed a couple of trivial typos and addressed the socket relinquishing 
> issue. Initializing a socket is not an atomic "all-or-nothing" operation now. 
> Someone needs to take care of the socket in case things go not as planned.
> 
> -Pavel
> 
>> On 5 Sep 2019, at 11:20, Milan Mimica  wrote:
>> 
>> On Wed, 4 Sep 2019 at 20:32, Florian Weimer  wrote:
>>> 
>>> If you use the initial UDP timeout (one second, I think), the kernel
>>> will not complete the TCP handshake if the initial SYN segment is lost
>>> because the retransmit delay during the handshake is longer than that.
>> 
>> We could set a higher timeout value, but I would not prefer that.
>> After all, 1 second is just default value, and can be changed. If the
>> user wants us to give up on DNS query after the specified timeout then
>> lets do just that. True, some queries that previously worked might
>> start failing, but that is true even for slow socket.read()
>> operations.
>> 
>> New webrev: http://cr.openjdk.java.net/~mmimica/8228580/webrev.01/
>> 
>> * simplified test with no thread (nice catch Pavel)
>> * set connect timeout and account for it
>> 
>> -- 
>> Milan Mimica
> 



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-09 Thread Florian Weimer
* Milan Mimica:

> On Thu, 5 Sep 2019 at 18:59, Florian Weimer  wrote:
>>
>> But I think in the UDP case, the client will retry.  I think the total
>> timeout in the TCP case should equal the total timeout in the UDP case.
>> That's what I'm going to implement for glibc.  The difference is that in
>> the TCP case, the TCP stack will take care of the retries, not the
>> application code.
>
> I understand that, and it does make sense, but we have to put it in
> context of how current DnsClient.java works:
> //
> // The UDP retry strategy is to try the 1st server, and then
> // each server in order. If no answer, double the timeout
> // and try each server again.
> //

Ahh.  The other option is to stick with one server and keep resending
with larger and larger timeouts.  Switching has the advantage that in
case of a server problem, you get to a working server more quickly.
Staying means that if the answer is delayed and you resend exactly the
same query, you might still pick up the answer to the original query and
process it, after the first timeout.

> Fallback to TCP happens within this process. Going immediately with
> timeout*2^maxRetry could yield significantly larger delays, if there
> happens to be some other server on the list that works better.
> I would rather look into reusing TCP connections, not to close them 
> immediately.

But we know that the server is up because it responded our UDP, so
waiting more than one second for the TCP handshake to succeed might
worthwhile, yes.

> What about read() and non-handshake TCP retransmissions? Do those
> usually happen faster?

I think so, yes.

Thanks,
Florian


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-09 Thread Chris Yin


> On 6 Sep 2019, at 9:20 PM, Pavel Rappo  wrote:
> 
> Milan,
> 
> In order to simulate the "Address already in use" scenario I did this:
> 
>TcpDnsServer(int port) {
>try {
> *   new ServerSocket(port, 0, InetAddress.getLoopbackAddress());
>serverSocket = new ServerSocket(port, 0, 
> InetAddress.getLoopbackAddress());
>}
>catch (BindException ex) {
> 
> The test failed, instead of being skipped. That's what I saw in the logs:
> 
> 
> jtreg.SkippedException: Cannot start TCP server
>   at TcpTimeout$TcpDnsServer.(TcpTimeout.java:97)
>   at TcpTimeout.initTest(TcpTimeout.java:71)
>   at TestBase.run(TestBase.java:48)
>   at TcpTimeout.main(TcpTimeout.java:47)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
>   at java.base/java.lang.Thread.run(Thread.java:830)
> Caused by: java.net.BindException: Address already in use: bind
>   at java.base/sun.nio.ch.Net.bind0(Native Method)
>   at java.base/sun.nio.ch.Net.bind(Net.java:469)
>   at java.base/sun.nio.ch.Net.bind(Net.java:458)
>   at java.base/sun.nio.ch.NioSocketImpl.bind(NioSocketImpl.java:643)
>   at java.base/java.net.ServerSocket.bind(ServerSocket.java:355)
>   at java.base/java.net.ServerSocket.(ServerSocket.java:241)
>   at TcpTimeout$TcpDnsServer.(TcpTimeout.java:91)
>   ... 9 more
> 
> JavaTest Message: Test threw exception: jtreg.SkippedException
> JavaTest Message: shutting down test
> 
> 
> JavaTest Message: Problem cleaning up the following threads:
> Thread-0
>  at 
> java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.socketReceiveOrPeekData(Native
>  Method)
>  at 
> java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.peekData(DualStackPlainDatagramSocketImpl.java:113)
>  at 
> java.base@14-internal/java.net.DatagramSocket.receive(DatagramSocket.java:746)
>  at DNSServer.receiveQuery(DNSServer.java:178)
>  at DNSServer.run(DNSServer.java:119)
> 
> 
> What's going on here? It looks like the test didn't call `cleanupTest()` (and 
> therefore `server.stopServer()`) because the exception was thrown before it 
> had reached `runTest()`. I think we must address this, otherwise our 
> jtreg.SkippedException is only half measure.
> 
> I thought about how to do that. I'm thinking of going the full way up to 
> TestBase as I believe it makes the most sense. Having said that, we'd better 
> ask the original author (cc'ed), Chris Yin,  what he thinks about it. Here's 
> the proposal:

The change to TestBase looks good to me. Just after try to understand what you 
want to achieve, some other thoughts in my mind:
1. Whether we give up too earlier when the Tcp port been used
2. Whether we put too many init related check logic in TcpDnsServer

Here is just pseudo code for your reference. First, we may have a few tries on 
different ports by reinit DNSServer, if all failed, then give up and skip test. 
Second, move socket check and throw SkppedExpection logic to initTest method, 
that looks more clear and TcpDnsServer could simply focus on server logic. 
Third, explicit cleanup for each retry fails, no need to hide cleanup in 
TestBase

@Override
public void initTest(String[] args) {
for (loop a few times as you prefer as max retries) {
super.initTest(args);
var udpServer = (Server) env().get(DNSTestUtils.TEST_DNS_SERVER_THREAD);
try {
var serverSocket = new ServerSocket(udpServer.getPort(), 0, 
InetAddress.getLoopbackAddress());
tcpDnsServer = new TcpDnsServer(serverSocket);
break;
} catch (BindException be) {
DNSTestUtils.debug("Failed to bind server socket on port " + 
udpServer.getPort() + ", retry ...");
} catch (Exception ex) {
throw new RuntimeException("Unexpected exception during initTest", 
ex);
} finally {
if (tcpDnsServer == null) {
super.cleanupTest();
}
}
}

if (tcpDnsServer == null) {
throw new SkippedException("Cannot start TCP server after a few tries, 
skip test");
}
}

Thanks,
Chris

> 
> 
> diff -r 53e139e55605 test/jdk/com/sun/jndi/dns/lib/TestBase.java
> --- a/test/jdk/com/sun/jndi/dns/lib/TestBase.java Thu Sep 05 14:59:29 
> 2019 +0100
> +++ b/test/jdk/com/sun/jndi/dns/lib/TestBase.java Fri Sep 06 14:08:48 
> 2019 +0100
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2018, 2019, Oracle and/or its affiliates. All rights 
> reserved.

Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-06 Thread Milan Mimica
On Thu, 5 Sep 2019 at 18:59, Florian Weimer  wrote:
>
> But I think in the UDP case, the client will retry.  I think the total
> timeout in the TCP case should equal the total timeout in the UDP case.
> That's what I'm going to implement for glibc.  The difference is that in
> the TCP case, the TCP stack will take care of the retries, not the
> application code.

I understand that, and it does make sense, but we have to put it in
context of how current DnsClient.java works:
//
// The UDP retry strategy is to try the 1st server, and then
// each server in order. If no answer, double the timeout
// and try each server again.
//

Fallback to TCP happens within this process. Going immediately with
timeout*2^maxRetry could yield significantly larger delays, if there
happens to be some other server on the list that works better.
I would rather look into reusing TCP connections, not to close them immediately.

What about read() and non-handshake TCP retransmissions? Do those
usually happen faster?


-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-06 Thread Pavel Rappo
Milan,

In order to simulate the "Address already in use" scenario I did this:

TcpDnsServer(int port) {
try {
*   new ServerSocket(port, 0, InetAddress.getLoopbackAddress());
serverSocket = new ServerSocket(port, 0, 
InetAddress.getLoopbackAddress());
}
catch (BindException ex) {

The test failed, instead of being skipped. That's what I saw in the logs:


jtreg.SkippedException: Cannot start TCP server
at TcpTimeout$TcpDnsServer.(TcpTimeout.java:97)
at TcpTimeout.initTest(TcpTimeout.java:71)
at TestBase.run(TestBase.java:48)
at TcpTimeout.main(TcpTimeout.java:47)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: java.net.BindException: Address already in use: bind
at java.base/sun.nio.ch.Net.bind0(Native Method)
at java.base/sun.nio.ch.Net.bind(Net.java:469)
at java.base/sun.nio.ch.Net.bind(Net.java:458)
at java.base/sun.nio.ch.NioSocketImpl.bind(NioSocketImpl.java:643)
at java.base/java.net.ServerSocket.bind(ServerSocket.java:355)
at java.base/java.net.ServerSocket.(ServerSocket.java:241)
at TcpTimeout$TcpDnsServer.(TcpTimeout.java:91)
... 9 more

JavaTest Message: Test threw exception: jtreg.SkippedException
JavaTest Message: shutting down test


JavaTest Message: Problem cleaning up the following threads:
Thread-0
  at 
java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.socketReceiveOrPeekData(Native
 Method)
  at 
java.base@14-internal/java.net.DualStackPlainDatagramSocketImpl.peekData(DualStackPlainDatagramSocketImpl.java:113)
  at 
java.base@14-internal/java.net.DatagramSocket.receive(DatagramSocket.java:746)
  at DNSServer.receiveQuery(DNSServer.java:178)
  at DNSServer.run(DNSServer.java:119)


What's going on here? It looks like the test didn't call `cleanupTest()` (and 
therefore `server.stopServer()`) because the exception was thrown before it had 
reached `runTest()`. I think we must address this, otherwise our 
jtreg.SkippedException is only half measure.

I thought about how to do that. I'm thinking of going the full way up to 
TestBase as I believe it makes the most sense. Having said that, we'd better 
ask the original author (cc'ed), Chris Yin,  what he thinks about it. Here's 
the proposal:


diff -r 53e139e55605 test/jdk/com/sun/jndi/dns/lib/TestBase.java
--- a/test/jdk/com/sun/jndi/dns/lib/TestBase.java   Thu Sep 05 14:59:29 
2019 +0100
+++ b/test/jdk/com/sun/jndi/dns/lib/TestBase.java   Fri Sep 06 14:08:48 
2019 +0100
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2018, 2019, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -44,10 +44,14 @@
  * @throws Exception if any exception
  */
 public void run(String[] args) throws Exception {
-initRes();
-initTest(args);
-setupTest();
-launch();
+try {
+initRes();
+initTest(args);
+setupTest();
+launch();
+} finally {
+cleanupTest();
+}
 }
 
 /**
@@ -84,8 +88,6 @@
 if (!handleException(e)) {
 throw e;
 }
-} finally {
-cleanupTest();
 }
 }
 
@@ -108,6 +110,11 @@
 
 /**
  * Cleanup test.
+ *
+ * This method may be called by the test at any time, including before all
+ * the resources, initialization, and setup have been completed. This might
+ * require careful attention. For example, before closing a resource this
+ * method should check that resource for being {@code null}.
  */
 public abstract void cleanupTest();
 }


-Pavel

> On 5 Sep 2019, at 11:20, Milan Mimica  wrote:
> 
> On Wed, 4 Sep 2019 at 20:32, Florian Weimer  wrote:
>> 
>> If you use the initial UDP timeout (one second, I think), the kernel
>> will not complete the TCP handshake if the initial SYN segment is lost
>> because the retransmit delay during the handshake is longer than that.
> 
> We could set a higher timeout value, but I would not prefer that.
> After all, 1 second is just default value, and can be changed. If the
> user wants us to give up on DNS query after the specified timeout then
> lets do just that. True, some queries that previously worked might
> start failing, 

Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-05 Thread Florian Weimer
* Milan Mimica:

> On Wed, 4 Sep 2019 at 20:32, Florian Weimer  wrote:
>>
>> If you use the initial UDP timeout (one second, I think), the kernel
>> will not complete the TCP handshake if the initial SYN segment is lost
>> because the retransmit delay during the handshake is longer than that.
>
> We could set a higher timeout value, but I would not prefer that.
> After all, 1 second is just default value, and can be changed. If the
> user wants us to give up on DNS query after the specified timeout then
> lets do just that.

But I think in the UDP case, the client will retry.  I think the total
timeout in the TCP case should equal the total timeout in the UDP case.
That's what I'm going to implement for glibc.  The difference is that in
the TCP case, the TCP stack will take care of the retries, not the
application code.

Thanks,
Florian


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-05 Thread Milan Mimica
On Wed, 4 Sep 2019 at 20:32, Florian Weimer  wrote:
>
> If you use the initial UDP timeout (one second, I think), the kernel
> will not complete the TCP handshake if the initial SYN segment is lost
> because the retransmit delay during the handshake is longer than that.

We could set a higher timeout value, but I would not prefer that.
After all, 1 second is just default value, and can be changed. If the
user wants us to give up on DNS query after the specified timeout then
lets do just that. True, some queries that previously worked might
start failing, but that is true even for slow socket.read()
operations.

New webrev: http://cr.openjdk.java.net/~mmimica/8228580/webrev.01/

* simplified test with no thread (nice catch Pavel)
* set connect timeout and account for it

-- 
Milan Mimica


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-05 Thread Pavel Rappo
I think we are almost there. What do you think of the following incremental 
(i.e. on top of your latest webrev) change?

http://cr.openjdk.java.net/~prappo/8228580/webrev.01/

I fixed a couple of trivial typos and addressed the socket relinquishing issue. 
Initializing a socket is not an atomic "all-or-nothing" operation now. Someone 
needs to take care of the socket in case things go not as planned.

-Pavel

> On 5 Sep 2019, at 11:20, Milan Mimica  wrote:
> 
> On Wed, 4 Sep 2019 at 20:32, Florian Weimer  wrote:
>> 
>> If you use the initial UDP timeout (one second, I think), the kernel
>> will not complete the TCP handshake if the initial SYN segment is lost
>> because the retransmit delay during the handshake is longer than that.
> 
> We could set a higher timeout value, but I would not prefer that.
> After all, 1 second is just default value, and can be changed. If the
> user wants us to give up on DNS query after the specified timeout then
> lets do just that. True, some queries that previously worked might
> start failing, but that is true even for slow socket.read()
> operations.
> 
> New webrev: http://cr.openjdk.java.net/~mmimica/8228580/webrev.01/
> 
> * simplified test with no thread (nice catch Pavel)
> * set connect timeout and account for it
> 
> -- 
> Milan Mimica



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Florian Weimer
* Pavel Rappo:

>> On 4 Sep 2019, at 18:54, Florian Weimer  wrote:
>> 
>> You should use a larger timeout than the initial UDP timeout,
>> though.
>
> Could you elaborate on that?

If you use the initial UDP timeout (one second, I think), the kernel
will not complete the TCP handshake if the initial SYN segment is lost
because the retransmit delay during the handshake is longer than that.

Thanks,
Florian


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Pavel Rappo
> On 4 Sep 2019, at 18:54, Florian Weimer  wrote:
> 
> You should use a larger timeout than the initial UDP timeout,
> though.

Could you elaborate on that?




Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Alan Bateman

On 04/09/2019 18:52, Pavel Rappo wrote:

You are right, it definitely is a blocking operation. I missed it. I was 
focused on

 712 sock.setSoTimeout(timeoutLeft);

So I'd suggest we use explicit connect with timeout

 java.net.Socket#connect(java.net.SocketAddress, int)

Are you okay with that?

Decomposing timeouts is hard to get right. In this case I think you want 
to specify the initial timeout to connect, then set timeoutLeft to be 
the remaining (will be less than "timeout").


-Alan



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Florian Weimer
* Pavel Rappo:

>> On 4 Sep 2019, at 18:38, Florian Weimer  wrote:
>> 
>> 
>> 
>> Maybe I'm mistaken, but I think this:
>> 
>> 692 Tcp(InetAddress server, int port, int timeout) throws IOException {
>> 693 sock = new Socket(server, port);
>> 694 sock.setTcpNoDelay(true);
>> 695 out = new java.io.BufferedOutputStream(sock.getOutputStream());
>> 696 in = new java.io.BufferedInputStream(sock.getInputStream());
>> 697 timeoutLeft = timeout;
>> 698 }
>> 
>> creates the TCP socket and connects it.  This is a potentially blocking
>> operation as well.  
>
> You are right, it definitely is a blocking operation. I missed it. I was 
> focused on
>
> 712 sock.setSoTimeout(timeoutLeft);
>
> So I'd suggest we use explicit connect with timeout
>
> java.net.Socket#connect(java.net.SocketAddress, int)
>
> Are you okay with that?

Sure.  You should use a larger timeout than the initial UDP timeout,
though.  Can you compute the maximum amount of time the UDP code would
wait for reply and use that?  Or is that the timeoutLeft value?

Thanks,
Florian


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Pavel Rappo
> On 4 Sep 2019, at 18:38, Florian Weimer  wrote:
> 
> 
> 
> Maybe I'm mistaken, but I think this:
> 
> 692 Tcp(InetAddress server, int port, int timeout) throws IOException {
> 693 sock = new Socket(server, port);
> 694 sock.setTcpNoDelay(true);
> 695 out = new java.io.BufferedOutputStream(sock.getOutputStream());
> 696 in = new java.io.BufferedInputStream(sock.getInputStream());
> 697 timeoutLeft = timeout;
> 698 }
> 
> creates the TCP socket and connects it.  This is a potentially blocking
> operation as well.  

You are right, it definitely is a blocking operation. I missed it. I was 
focused on

712 sock.setSoTimeout(timeoutLeft);

So I'd suggest we use explicit connect with timeout

java.net.Socket#connect(java.net.SocketAddress, int)

Are you okay with that?



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Florian Weimer
* Pavel Rappo:

>> On 4 Sep 2019, at 17:35, Florian Weimer  wrote:
>> 
>> * Milan Mimica:
>> 
>>> Continuing in a new thread with a RFR of my own:
>>> http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/
>> 
>> I would add a comment why there's no explicit TCP connection timeout in
>> the code.  I assume you rely on the TCP stack having its own timeout,
>> right?  But I think it can be quite long under some circumstances.
>
> If I get this right, there's a default timeout of 1,000 ms (1 second)
> on L70
> src/jdk.naming.dns/share/classes/com/sun/jndi/dns/DnsContext.java
> which applies to the case where there's no explicit timeout. I agree
> though that this deserves a comment.

Maybe I'm mistaken, but I think this:

 692 Tcp(InetAddress server, int port, int timeout) throws IOException {
 693 sock = new Socket(server, port);
 694 sock.setTcpNoDelay(true);
 695 out = new java.io.BufferedOutputStream(sock.getOutputStream());
 696 in = new java.io.BufferedInputStream(sock.getInputStream());
 697 timeoutLeft = timeout;
 698 }

creates the TCP socket and connects it.  This is a potentially blocking
operation as well.  It would make sense to align the timeout for that
with what typical TCP stacks do for SYN segment retransmission.
Different TCP stacks have different connection timeouts, ranging from 70
to 130 seconds or even longer.  So the defaults are much larger than
typical DNS timeouts.  (Coincidentally, I'm working on fixing the glibc
stub resolver, and I've decided to use a non-blocking connect there
because of the long default timeouts.)

I agree on the matter of clock warp issues.  UDP needs fixing too, but
not in this change.

Thanks,
Florian


Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Pavel Rappo
> On 4 Sep 2019, at 17:35, Florian Weimer  wrote:
> 
> * Milan Mimica:
> 
>> Continuing in a new thread with a RFR of my own:
>> http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/
> 
> I would add a comment why there's no explicit TCP connection timeout in
> the code.  I assume you rely on the TCP stack having its own timeout,
> right?  But I think it can be quite long under some circumstances.

If I get this right, there's a default timeout of 1,000 ms (1 second) on L70 
src/jdk.naming.dns/share/classes/com/sun/jndi/dns/DnsContext.java which applies 
to the case where there's no explicit timeout. I agree though that this 
deserves a comment.

> The timeout will not be enforced properly if the clock jumps backwards.

Here's the link to the previous discussion:


https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-August/061918.html

If you're talking about the use of System.currentTimeMillis, then I've already 
told Milan that we could address that later in a follow-up bug, together with 
the UDP case.

-Pavel



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Pavel Rappo
> On 4 Sep 2019, at 13:26, Milan Mimica  wrote:
> 
> Hello
> 
> Continuing in a new thread with a RFR of my own:
> http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/

Here's the link to the previous discussion:


https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-August/061918.html

> It has the fixes applied from the first review of the patch.
> 
> The test was sporadically failing not because of missing
> synchronization, but more likely because test was not waiting for the
> TCP server thread to actually start.

That Thread.sleep + AtomicReference construct seems racy (the exception reading 
thread might miss it). I suggest using a synchronization aid (CountDownLatch, 
Phaser, Semaphore, etc.) for that. Or... since you don't do anything with the 
accepted socket anyway you might as well not bother with creating that thread. 
Once the ServerSocket has been created the connection can be established even 
without calling `accept`. This might simplify the code quite a lot.

> Made it throw jtreg.SkippedException if TCP port is in use. We can
> just hope it does not happen too often.

We should probably pass the exception to the jtreg.SkippedException's 
constructor.

Why is there timeout on L45?

* @run main/timeout=60 TcpTimeout

If you're optimizing for a case where the test fails, then please don't.

-Pavel



Re: RFR(s): 8228580: DnsClient TCP socket timeout

2019-09-04 Thread Florian Weimer
* Milan Mimica:

> Continuing in a new thread with a RFR of my own:
> http://cr.openjdk.java.net/~mmimica/8228580/webrev.00/

I would add a comment why there's no explicit TCP connection timeout in
the code.  I assume you rely on the TCP stack having its own timeout,
right?  But I think it can be quite long under some circumstances.

The timeout will not be enforced properly if the clock jumps backwards.

Thanks,
Florian