Re: RFR: 8339538: Wrong timeout computations in DnsClient [v11]

2024-10-10 Thread Mark Sheppard
On Mon, 7 Oct 2024 09:00:28 GMT, Aleksei Efimov  wrote:

>> This PR proposes the following changes to address wrong timeout computations 
>> in the `com.sun.jndi.dns.DnsClient`:
>> - The `DnsClient` has been updated to use a monotonic high-resolution (nano) 
>> clock. The existing `Timeout` test has also been updated to use the nano 
>> clock to measure observed timeout value.
>> 
>> - The left timeout computation has been fixed to decrease the timeout value 
>> during each retry attempt. A new test, `TimeoutWithEmptyDatagrams`, has been 
>> added to test it.
>> 
>> - The `DnsClient.blockingReceive` has been updated:
>> - to detect if any data is received
>> - to avoid contention with `Selector.close()` that could be called by a 
>> cleaner thread
>> 
>> - The expected timeout calculation in the `Timeout` test has been updated to 
>> take into account the minimum retry timeout (50ms). Additionally, the max 
>> allowed difference between the observed timeout and the expected one has 
>> been increased from 50% to 67%. Taking into account 50 ms retry timeout 
>> decrease the maximum allowed difference is effectively set to 61%. This 
>> change is expected to improve the stability of the `Timeout` test which has 
>> been seen to fail 
>> [intermittentlly](https://bugs.openjdk.org/browse/JDK-8220213). If no 
>> objections, I'm planning to close 
>> [JDK-8220213](https://bugs.openjdk.org/browse/JDK-8220213) as duplicate of 
>> this one.
>> 
>> JNDI/DNS jtreg tests has been executed multiple times (500+) to check if the 
>> new and the modified tests are stable. No failures been observed (so far?).
>
> Aleksei Efimov has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 16 additional 
> commits since the last revision:
> 
>  - Merge remote-tracking branch 'origin' into 
> JDK-8339538_DnsClient_nanoTime_And_InfiniteLoop
>  - improve reporting for unrecoverable exceptions
>  - Add comment suggested by Daniel
>  - Track unfulfilled timeouts during UDP queries.
>Update exceptions handling.
>Update TCP client to use nano time source.
>  - set min timeout to 0; set max allowed timeout to 2x expected timeout in 
> tests
>  -  set max allowed value for retries to 30
>  - update tests to calculate allowed timeout range (max is updated to 75%), 
> print it and use it for elapsed time check
>  - remove redundant clamp from timeoutLeft calculation
>  - clamp timeout and retries property values
>  - Measure time the caller spent waiting. Simplify timeoutLeft computation
>  - ... and 6 more: https://git.openjdk.org/jdk/compare/860af487...c58e34cc

good job ... thanks for considering my feedback

-

Marked as reviewed by msheppar (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20892#pullrequestreview-2360760459


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v11]

2024-10-10 Thread Mark Sheppard
On Mon, 7 Oct 2024 09:00:28 GMT, Aleksei Efimov  wrote:

>> This PR proposes the following changes to address wrong timeout computations 
>> in the `com.sun.jndi.dns.DnsClient`:
>> - The `DnsClient` has been updated to use a monotonic high-resolution (nano) 
>> clock. The existing `Timeout` test has also been updated to use the nano 
>> clock to measure observed timeout value.
>> 
>> - The left timeout computation has been fixed to decrease the timeout value 
>> during each retry attempt. A new test, `TimeoutWithEmptyDatagrams`, has been 
>> added to test it.
>> 
>> - The `DnsClient.blockingReceive` has been updated:
>> - to detect if any data is received
>> - to avoid contention with `Selector.close()` that could be called by a 
>> cleaner thread
>> 
>> - The expected timeout calculation in the `Timeout` test has been updated to 
>> take into account the minimum retry timeout (50ms). Additionally, the max 
>> allowed difference between the observed timeout and the expected one has 
>> been increased from 50% to 67%. Taking into account 50 ms retry timeout 
>> decrease the maximum allowed difference is effectively set to 61%. This 
>> change is expected to improve the stability of the `Timeout` test which has 
>> been seen to fail 
>> [intermittentlly](https://bugs.openjdk.org/browse/JDK-8220213). If no 
>> objections, I'm planning to close 
>> [JDK-8220213](https://bugs.openjdk.org/browse/JDK-8220213) as duplicate of 
>> this one.
>> 
>> JNDI/DNS jtreg tests has been executed multiple times (500+) to check if the 
>> new and the modified tests are stable. No failures been observed (so far?).
>
> Aleksei Efimov has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 16 additional 
> commits since the last revision:
> 
>  - Merge remote-tracking branch 'origin' into 
> JDK-8339538_DnsClient_nanoTime_And_InfiniteLoop
>  - improve reporting for unrecoverable exceptions
>  - Add comment suggested by Daniel
>  - Track unfulfilled timeouts during UDP queries.
>Update exceptions handling.
>Update TCP client to use nano time source.
>  - set min timeout to 0; set max allowed timeout to 2x expected timeout in 
> tests
>  -  set max allowed value for retries to 30
>  - update tests to calculate allowed timeout range (max is updated to 75%), 
> print it and use it for elapsed time check
>  - remove redundant clamp from timeoutLeft calculation
>  - clamp timeout and retries property values
>  - Measure time the caller spent waiting. Simplify timeoutLeft computation
>  - ... and 6 more: https://git.openjdk.org/jdk/compare/772b32cc...c58e34cc

test/jdk/com/sun/jndi/dns/ConfigTests/TimeoutWithEmptyDatagrams.java line 114:

> 112: });
> 113: 
> 114: // Run a virtual thread that will send an empty packets via 
> server socket

similarly here the lambda functionality encapsulated in a class abstraction 
EmptyDatagramSender implements Runnable
Thread.startVirtualThread(new EmptyDatagramSender(. . .))

I make this and the above comment as an observation

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1795595815


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v11]

2024-10-10 Thread Mark Sheppard
On Mon, 7 Oct 2024 09:00:28 GMT, Aleksei Efimov  wrote:

>> This PR proposes the following changes to address wrong timeout computations 
>> in the `com.sun.jndi.dns.DnsClient`:
>> - The `DnsClient` has been updated to use a monotonic high-resolution (nano) 
>> clock. The existing `Timeout` test has also been updated to use the nano 
>> clock to measure observed timeout value.
>> 
>> - The left timeout computation has been fixed to decrease the timeout value 
>> during each retry attempt. A new test, `TimeoutWithEmptyDatagrams`, has been 
>> added to test it.
>> 
>> - The `DnsClient.blockingReceive` has been updated:
>> - to detect if any data is received
>> - to avoid contention with `Selector.close()` that could be called by a 
>> cleaner thread
>> 
>> - The expected timeout calculation in the `Timeout` test has been updated to 
>> take into account the minimum retry timeout (50ms). Additionally, the max 
>> allowed difference between the observed timeout and the expected one has 
>> been increased from 50% to 67%. Taking into account 50 ms retry timeout 
>> decrease the maximum allowed difference is effectively set to 61%. This 
>> change is expected to improve the stability of the `Timeout` test which has 
>> been seen to fail 
>> [intermittentlly](https://bugs.openjdk.org/browse/JDK-8220213). If no 
>> objections, I'm planning to close 
>> [JDK-8220213](https://bugs.openjdk.org/browse/JDK-8220213) as duplicate of 
>> this one.
>> 
>> JNDI/DNS jtreg tests has been executed multiple times (500+) to check if the 
>> new and the modified tests are stable. No failures been observed (so far?).
>
> Aleksei Efimov has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 16 additional 
> commits since the last revision:
> 
>  - Merge remote-tracking branch 'origin' into 
> JDK-8339538_DnsClient_nanoTime_And_InfiniteLoop
>  - improve reporting for unrecoverable exceptions
>  - Add comment suggested by Daniel
>  - Track unfulfilled timeouts during UDP queries.
>Update exceptions handling.
>Update TCP client to use nano time source.
>  - set min timeout to 0; set max allowed timeout to 2x expected timeout in 
> tests
>  -  set max allowed value for retries to 30
>  - update tests to calculate allowed timeout range (max is updated to 75%), 
> print it and use it for elapsed time check
>  - remove redundant clamp from timeoutLeft calculation
>  - clamp timeout and retries property values
>  - Measure time the caller spent waiting. Simplify timeoutLeft computation
>  - ... and 6 more: https://git.openjdk.org/jdk/compare/270d00f1...c58e34cc

test/jdk/com/sun/jndi/dns/ConfigTests/TimeoutWithEmptyDatagrams.java line 92:

> 90: 
> 91: // Run a virtual thread that receives client request packets 
> and extracts
> 92: // sender address from them.

I think the lambda could have been encapsulated in a class abstraction to 
convey clearly its semantics e.g. DnsRequestSink implementing Runnable and 
started with Thread.startVirtualThread(new DnsRequestSink(. . .));

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1795589879


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v8]

2024-10-02 Thread Mark Sheppard
On Wed, 2 Oct 2024 13:25:54 GMT, Aleksei Efimov  wrote:

> > I think that if there is a PortUnreachable thrown, during DnsClient.query 
> > processing from the doUdpQuery, then the timeout may expire early ... if 
> > I've interpreted the outer loop processing correctly
> 
> The `DnsClient.query` marks server as misconfigured (not suitable for further 
> retries) in scenarios when `PortUnreachable` is caught (see `doNotRetry`). 
> And since the timeout/retries are specified per server it is expected for the 
> timeout to expire early. For scenarios with one server and `PortUnreachable` 
> exception - the `DnsClient.query` will throw the `CommunicationException` 
> with `PortUnreachable` set as its cause.

Here's two set of trace (pre your current change) for the main Timeout test. 
The second trace is a simulated receipt of an ICMP Host unreachable

normal successful execution of Timeout test

--System.out:(19/715)--
Skip local DNS Server creation 
DnsClient.query retry == 0
DnsClient.doUdpQuery pktTimeout == 250
DnsClient.blockingReceive: select with timeout == 250
DnsClient.query retry == 1
DnsClient.doUdpQuery pktTimeout == 500
DnsClient.blockingReceive: select with timeout == 500
DnsClient.query retry == 2
DnsClient.doUdpQuery pktTimeout == 1000
DnsClient.blockingReceive: select with timeout == 1000
DnsClient.query retry == 3
DnsClient.doUdpQuery pktTimeout == 2000
DnsClient.blockingReceive: select with timeout == 2000
DnsClient.query retry == 4
DnsClient.doUdpQuery pktTimeout == 4000
DnsClient.blockingReceive: select with timeout == 4000
Elapsed (ms):  7793
Expected (ms): 7750
elapsed time is as long as expected.
--System.err:(18/768)--
DNS: SEND ID (1): 29440
DNS: Trying RECEIVE(1) retry(1) for:29440sock-timeout:250 ms.
DNS: Caught IOException:java.net.SocketTimeoutException
DNS: SEND ID (2): 29440
DNS: Trying RECEIVE(1) retry(2) for:29440sock-timeout:500 ms.
DNS: Caught IOException:java.net.SocketTimeoutException
DNS: SEND ID (3): 29440
DNS: Trying RECEIVE(1) retry(3) for:29440sock-timeout:1000 ms.
DNS: Caught IOException:java.net.SocketTimeoutException
DNS: SEND ID (4): 29440
DNS: Trying RECEIVE(1) retry(4) for:29440sock-timeout:2000 ms.
DNS: Caught IOException:java.net.SocketTimeoutException
DNS: SEND ID (5): 29440
DNS: Trying RECEIVE(1) retry(5) for:29440sock-timeout:4000 ms.
DNS: Caught IOException:java.net.SocketTimeoutException

--oOo
timeout failure due to PortUnreachableException

--System.out:(12/392)--
Skip local DNS Server creation 
DnsClient.query retry == 0
DnsClient.doUdpQuery pktTimeout == 250
DnsClient.blockingReceive: select with timeout == 250
DnsClient.query retry == 1
DnsClient.doUdpQuery pktTimeout == 500
DnsClient.blockingReceive: select with timeout == 500
DnsClient.query retry == 2
DnsClient.query retry == 3
DnsClient.query retry == 4
Elapsed (ms):  777
Expected (ms): 7750
--System.err:(21/1055)--
DNS: SEND ID (1): 27504
DNS: Trying RECEIVE(1) retry(1) for:27504sock-timeout:250 ms.
DNS: Caught IOException:java.net.SocketTimeoutException
DNS: SEND ID (2): 27504
DNS: Trying RECEIVE(1) retry(2) for:27504sock-timeout:500 ms.
DNS: Caught IOException:java.net.SocketTimeoutException
DNS: SEND ID (3): 27504
DNS: Caught Exception:java.net.PortUnreachableException: simulated ICMP Host 
unreachable
java.lang.RuntimeException: Failed: timeout in 777 ms, expected7750ms
at Timeout.handleException(Timeout.java:116)
at TestBase.launch(TestBase.java:84)
at TestBase.run(TestBase.java:50)
at Timeout.main(Timeout.java:59)
at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:588)
at 
com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
at java.base/java.lang.Thread.run(Thread.java:1575)

JavaTest Message: Test threw exception: java.lang.RuntimeException
JavaTest Message: shutting down test

result: Failed. Execution failed: `main' threw exception: 
java.lang.RuntimeException: Failed: timeout in 777 ms, expected7750ms


test result: Failed. Execution failed: `main' threw exception: 
java.lang.RuntimeException: Failed: timeout in 777 ms, expected7750ms


So if a PortUnreachableException is thrown, which can happen for either a send 
or receive anytime after the first iteration through the retry loop, then there 
will be an early timeout. As such, your assertion does not appear to be true.

-

PR Comment: https://git.openjdk.org/jdk/pull/20892#issuecomment-2389979568


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v8]

2024-09-30 Thread Mark Sheppard
On Thu, 19 Sep 2024 17:55:13 GMT, Aleksei Efimov  wrote:

>> This PR proposes the following changes to address wrong timeout computations 
>> in the `com.sun.jndi.dns.DnsClient`:
>> - The `DnsClient` has been updated to use a monotonic high-resolution (nano) 
>> clock. The existing `Timeout` test has also been updated to use the nano 
>> clock to measure observed timeout value.
>> 
>> - The left timeout computation has been fixed to decrease the timeout value 
>> during each retry attempt. A new test, `TimeoutWithEmptyDatagrams`, has been 
>> added to test it.
>> 
>> - The `DnsClient.blockingReceive` has been updated:
>> - to detect if any data is received
>> - to avoid contention with `Selector.close()` that could be called by a 
>> cleaner thread
>> 
>> - The expected timeout calculation in the `Timeout` test has been updated to 
>> take into account the minimum retry timeout (50ms). Additionally, the max 
>> allowed difference between the observed timeout and the expected one has 
>> been increased from 50% to 67%. Taking into account 50 ms retry timeout 
>> decrease the maximum allowed difference is effectively set to 61%. This 
>> change is expected to improve the stability of the `Timeout` test which has 
>> been seen to fail 
>> [intermittentlly](https://bugs.openjdk.org/browse/JDK-8220213). If no 
>> objections, I'm planning to close 
>> [JDK-8220213](https://bugs.openjdk.org/browse/JDK-8220213) as duplicate of 
>> this one.
>> 
>> JNDI/DNS jtreg tests has been executed multiple times (500+) to check if the 
>> new and the modified tests are stable. No failures been observed (so far?).
>
> Aleksei Efimov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Track unfulfilled timeouts during UDP queries.
>   Update exceptions handling.
>   Update TCP client to use nano time source.

I think that if there is a PortUnreachable thrown, during DnsClient.query 
processing from the doUdpQuery, then  the  timeout may expire early ... if I've 
interpreted the outer loop processing correctly

I'm surprised that we haven't encountered this more frequently, BUT it may have 
a occurred as there were a couple of failures where timeout was quite minimal

-

PR Comment: https://git.openjdk.org/jdk/pull/20892#issuecomment-2384091460


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v5]

2024-09-10 Thread Mark Sheppard
On Tue, 10 Sep 2024 18:38:15 GMT, Aleksei Efimov  wrote:

>> I think 2 times is good, remove all potential noise  ;-)
>> 
>> the following failures is nearly twice the expected
>> 
>> --System.out:(3/73)--
>> Skip local DNS Server creation
>> Elapsed (ms): 14229
>> Expected (ms): 7750
>> --System.err:(13/652)--
>> java.lang.RuntimeException: Failed: timeout in 14229 ms, expected 7750ms
>> at Timeout.handleException(Timeout.java:116)
>> at TestBase.launch(TestBase.java:84)
>> at TestBase.run(TestBase.java:50)
>> at Timeout.main(Timeout.java:59)
>> 
>> and  I think it fails  the new upper bound check 
>> 
>> most of the elapsed times that have been less than the expected have been 
>> within the 50 * 5  tolerance, but there have been a few outside the -250 
>> mess lower bound
>
> I agree that we don't want to document too much here. Updated the factor to 
> 1.75 (2 seems a bit high and might hide real issues), and to make the timeout 
> value calculation and check less arcane - I have updated test output to print 
> the range of acceptable timeout values: 
> 05ed9e053865293a1938ed7bc6fe208759513813

2 time  is not too high,
I have presented, in the comment, a failures with the elapsed time is almost 
twice the expected time
where the elapsed time is 14229 !! which is approx 1.84 * expected timeout

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1752563030


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v5]

2024-09-10 Thread Mark Sheppard
On Tue, 10 Sep 2024 17:50:59 GMT, Mark Sheppard  wrote:

>> I don't think we want to go down the rabbit hole of documenting too much. 
>> Agreed that using a simple factor 2 would make the code simpler, but do we 
>> want to go that high?
>
> I don't think it is a rabbit hole, to provide some additional clarity on the 
> timeout mechanism and to avoid any perception that it has absolute realtime 
> timeout semantics, such that developers have precise view of how the 
> mechanism works
> 
> https://docs.oracle.com/en/java/javase/22/docs/api/jdk.naming.dns/module-summary.html

I think 2 times is good, remove all potential noise  ;-)

the following failures is nearly twice the expected

--System.out:(3/73)--
Skip local DNS Server creation
Elapsed (ms): 14229
Expected (ms): 7750
--System.err:(13/652)--
java.lang.RuntimeException: Failed: timeout in 14229 ms, expected 7750ms
at Timeout.handleException(Timeout.java:116)
at TestBase.launch(TestBase.java:84)
at TestBase.run(TestBase.java:50)
at Timeout.main(Timeout.java:59)

and  I think it fails  the new upper bound check 

most of the elapsed times that have been less than the expected have been 
within the 50 * 5  tolerance, but there have been a few outside the -250 mess 
lower bound

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1752480088


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v5]

2024-09-10 Thread Mark Sheppard
On Tue, 10 Sep 2024 14:44:35 GMT, Daniel Fuchs  wrote:

>> src/jdk.naming.dns/share/classes/com/sun/jndi/dns/DnsClient.java line 442:
>> 
>>> 440: // use 1L below to ensure conversion to long and avoid 
>>> potential
>>> 441: // integer overflow (timeout is an int).
>>> 442: // no point in supporting timeout > Integer.MAX_VALUE, 
>>> clamp if needed
>> 
>> if I have read this correctly,  timeout is of type int, thus int 
>> Math.clamp(int, int, int) is being called returning type int and promoting 
>> to long. Are there any side effects to consider here?  And as timeoutLeft 
>> (or remainingTimeout) and pktTimeout were both int and now is type long, 
>> then why have timeout declared as type  int ? 
>> 
>> consistency in various declared "timeout" variables' type ?
>
> If I'm not mistaken here  it's `Math.clamp(long, long, long)` which is called 
> - because `timeout * (1L << retry)` is a long. We could make it more obvious 
> by using `0L` instead of `0` too.

thanks for the clarification ... yes indeed, I didn't see the 1L as the 
original was (timeout * (1 << retry))
BUT I should have read the comment more precisely(small screens and no glasses 
!!)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1752428852


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v5]

2024-09-10 Thread Mark Sheppard
On Tue, 10 Sep 2024 14:59:44 GMT, Daniel Fuchs  wrote:

>> test/jdk/com/sun/jndi/dns/ConfigTests/Timeout.java line 112:
>> 
>>> 110: // Check that elapsed time is as long as expected, and
>>> 111: // not more than 67% greater. Given the min DNS timeout
>>> 112: // correction above the threshold value is equal to 61%.
>> 
>> this is a bit arcane, why not have a simple measure that elapsed time 
>> shouldn't be more than twice the expected timeout ... this is not that 
>> different to the  multipliedBy(2) and multipliedBy(3) --  
>> elaspedTime.compareTo(expectedTime.multipliedBy(2) <= 0
>> 
>> Additionally based on the internal minimum timeout allowance of 50 secs and 
>> this upper bound calculation, it would suggest that an @implNote might be 
>> useful, or required, to alert developers  to potential timeout variability, 
>> and not to rely on timeout to be absolutely (real time) precise
>
> I don't think we want to go down the rabbit hole of documenting too much. 
> Agreed that using a simple factor 2 would make the code simpler, but do we 
> want to go that high?

I don't think it is a rabbit hole, to provide some additional clarity on the 
timeout mechanism and to avoid any perception that it is an absolute realtime 
timeout semantics, such that developers have precise view of how the mechanism 
works

https://docs.oracle.com/en/java/javase/22/docs/api/jdk.naming.dns/module-summary.html

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1752422523


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v5]

2024-09-10 Thread Mark Sheppard
On Mon, 9 Sep 2024 22:29:23 GMT, Aleksei Efimov  wrote:

>> This PR proposes the following changes to address wrong timeout computations 
>> in the `com.sun.jndi.dns.DnsClient`:
>> - The `DnsClient` has been updated to use a monotonic high-resolution (nano) 
>> clock. The existing `Timeout` test has also been updated to use the nano 
>> clock to measure observed timeout value.
>> 
>> - The left timeout computation has been fixed to decrease the timeout value 
>> during each retry attempt. A new test, `TimeoutWithEmptyDatagrams`, has been 
>> added to test it.
>> 
>> - The `DnsClient.blockingReceive` has been updated:
>> - to detect if any data is received
>> - to avoid contention with `Selector.close()` that could be called by a 
>> cleaner thread
>> 
>> - The expected timeout calculation in the `Timeout` test has been updated to 
>> take into account the minimum retry timeout (50ms). Additionally, the max 
>> allowed difference between the observed timeout and the expected one has 
>> been increased from 50% to 67%. Taking into account 50 ms retry timeout 
>> decrease the maximum allowed difference is effectively set to 61%. This 
>> change is expected to improve the stability of the `Timeout` test which has 
>> been seen to fail 
>> [intermittentlly](https://bugs.openjdk.org/browse/JDK-8220213). If no 
>> objections, I'm planning to close 
>> [JDK-8220213](https://bugs.openjdk.org/browse/JDK-8220213) as duplicate of 
>> this one.
>> 
>> JNDI/DNS jtreg tests has been executed multiple times (500+) to check if the 
>> new and the modified tests are stable. No failures been observed (so far?).
>
> Aleksei Efimov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Measure time the caller spent waiting. Simplify timeoutLeft computation

test/jdk/com/sun/jndi/dns/ConfigTests/Timeout.java line 112:

> 110: // Check that elapsed time is as long as expected, and
> 111: // not more than 67% greater. Given the min DNS timeout
> 112: // correction above the threshold value is equal to 61%.

this is a bit arcane, why not have a simple measure that elapsed time shouldn't 
be more than twice the expected timeout ... this is not that different to the  
multipliedBy(2) and multipliedBy(3) --  
elaspedTime.compareTo(expectedTime.multipliedBy(2) <= 0

Additionally based on the internal minimum timeout allowance of 50 secs and 
this upper bound calculation, it would suggest that an @implNote might be 
useful, or required, to alert developers  to potential timeout variability, and 
not to rely on timeout to be absolutely (real time) precise

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1751636105


Re: RFR: 8339538: Wrong timeout computations in DnsClient [v5]

2024-09-10 Thread Mark Sheppard
On Mon, 9 Sep 2024 22:29:23 GMT, Aleksei Efimov  wrote:

>> This PR proposes the following changes to address wrong timeout computations 
>> in the `com.sun.jndi.dns.DnsClient`:
>> - The `DnsClient` has been updated to use a monotonic high-resolution (nano) 
>> clock. The existing `Timeout` test has also been updated to use the nano 
>> clock to measure observed timeout value.
>> 
>> - The left timeout computation has been fixed to decrease the timeout value 
>> during each retry attempt. A new test, `TimeoutWithEmptyDatagrams`, has been 
>> added to test it.
>> 
>> - The `DnsClient.blockingReceive` has been updated:
>> - to detect if any data is received
>> - to avoid contention with `Selector.close()` that could be called by a 
>> cleaner thread
>> 
>> - The expected timeout calculation in the `Timeout` test has been updated to 
>> take into account the minimum retry timeout (50ms). Additionally, the max 
>> allowed difference between the observed timeout and the expected one has 
>> been increased from 50% to 67%. Taking into account 50 ms retry timeout 
>> decrease the maximum allowed difference is effectively set to 61%. This 
>> change is expected to improve the stability of the `Timeout` test which has 
>> been seen to fail 
>> [intermittentlly](https://bugs.openjdk.org/browse/JDK-8220213). If no 
>> objections, I'm planning to close 
>> [JDK-8220213](https://bugs.openjdk.org/browse/JDK-8220213) as duplicate of 
>> this one.
>> 
>> JNDI/DNS jtreg tests has been executed multiple times (500+) to check if the 
>> new and the modified tests are stable. No failures been observed (so far?).
>
> Aleksei Efimov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Measure time the caller spent waiting. Simplify timeoutLeft computation

src/jdk.naming.dns/share/classes/com/sun/jndi/dns/DnsClient.java line 442:

> 440: // use 1L below to ensure conversion to long and avoid 
> potential
> 441: // integer overflow (timeout is an int).
> 442: // no point in supporting timeout > Integer.MAX_VALUE, 
> clamp if needed

if I have read this correctly,  timeout is of type int, thus int 
Math.clamp(int, int, int) is being called returning type int and promoting to 
long. Are the any side effects to consider here?  And as timeoutLeft (or 
remainingTimeout) and pktTimeout were both int and now is type long, then why 
have timeout declared as type  int ?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20892#discussion_r1751619463


Re: RFR: 8328066: WhiteBoxResizeTest failure on linux-x86: Could not reserve enough space for 2097152KB object heap [v3]

2024-03-15 Thread Mark Sheppard
On Fri, 15 Mar 2024 11:30:08 GMT, Jaikiran Pai  wrote:

> The reason why 2GB is needed in this test is explained in 
> https://bugs.openjdk.org/browse/JDK-8285386.

ok thanks ... i'll re-read

-

PR Comment: https://git.openjdk.org/jdk/pull/18290#issuecomment-1999485205


Re: RFR: 8328066: WhiteBoxResizeTest failure on linux-x86: Could not reserve enough space for 2097152KB object heap [v3]

2024-03-15 Thread Mark Sheppard
On Thu, 14 Mar 2024 14:04:51 GMT, Jaikiran Pai  wrote:

>> Can I please get a review of this test-only change which proposes to address 
>> https://bugs.openjdk.org/browse/JDK-8328066?
>> 
>> The test launches a JVM with 2G heap (`-Xmx2G`) and as noted in that issue, 
>> the failure was observed on linux-86 instance on a GitHub jobs run. 
>> 
>> The commit in this PR proposes to add relevant `@requires` so that the test 
>> is only run on 64 bit VM and expects the `os.maxMemory` to be > 2G.
>> 
>> The change has been tested on a linux-x86 run in GitHub actions job, plus 
>> even on local and CI 64 bit VM environments. No failures have been noticed.
>
> Jaikiran Pai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Mark's suggestion - no need for os.maxMemory

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18290#pullrequestreview-1938750496


Re: RFR: 8328066: WhiteBoxResizeTest failure on linux-x86: Could not reserve enough space for 2097152KB object heap [v3]

2024-03-15 Thread Mark Sheppard
On Thu, 14 Mar 2024 14:04:51 GMT, Jaikiran Pai  wrote:

>> Can I please get a review of this test-only change which proposes to address 
>> https://bugs.openjdk.org/browse/JDK-8328066?
>> 
>> The test launches a JVM with 2G heap (`-Xmx2G`) and as noted in that issue, 
>> the failure was observed on linux-86 instance on a GitHub jobs run. 
>> 
>> The commit in this PR proposes to add relevant `@requires` so that the test 
>> is only run on 64 bit VM and expects the `os.maxMemory` to be > 2G.
>> 
>> The change has been tested on a linux-x86 run in GitHub actions job, plus 
>> even on local and CI 64 bit VM environments. No failures have been noticed.
>
> Jaikiran Pai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Mark's suggestion - no need for os.maxMemory

maybe ask the question: why does the test need 2GB ?  Would 1Gb suffice and 
that would be within the allocation limits for 32 bit architectures virtual 
address space.

-

PR Comment: https://git.openjdk.org/jdk/pull/18290#issuecomment-1999413347


Re: RFR: 8328066: WhiteBoxResizeTest failure on linux-x86: Could not reserve enough space for 2097152KB object heap [v2]

2024-03-14 Thread Mark Sheppard
On Thu, 14 Mar 2024 12:15:39 GMT, Jaikiran Pai  wrote:

>> Can I please get a review of this test-only change which proposes to address 
>> https://bugs.openjdk.org/browse/JDK-8328066?
>> 
>> The test launches a JVM with 2G heap (`-Xmx2G`) and as noted in that issue, 
>> the failure was observed on linux-86 instance on a GitHub jobs run. 
>> 
>> The commit in this PR proposes to add relevant `@requires` so that the test 
>> is only run on 64 bit VM and expects the `os.maxMemory` to be > 2G.
>> 
>> The change has been tested on a linux-x86 run in GitHub actions job, plus 
>> even on local and CI 64 bit VM environments. No failures have been noticed.
>
> Jaikiran Pai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   combine requires into one line

I think the os.maxMemory is an artificial constraint.
a requirement to run on 64 bit machine should be sufficient
@requires vm.bits == "64"

as the issue relates to the max user virtual address space available on 32 bit 
architecture

-

PR Comment: https://git.openjdk.org/jdk/pull/18290#issuecomment-1997521612


Re: RFR: 8316649: JMX connection timeout cannot be changed and uses the default of 0 (infinite)

2023-11-24 Thread Mark Sheppard
On Fri, 24 Nov 2023 13:22:07 GMT, Kevin Walls  wrote:

>> OK - sounds good. Meanwhile I had a look at the custom RMI Socket Factories 
>> used by the JMX Agent, and these are actually RMIServerSocketFactories, so 
>> having a timeout for connect there probably makes no sense.
>
> Thanks, yes so JMX SSL clients are without socket connect timeout, good to 
> have that documented.  I logged a parallel RFE for SslRMIClientSocketFactory, 
> which should wait for now as it seems wrong to go changing that right now 
> without knowing if it's ever been an issue for anybody:
> [JDK-8320703](https://bugs.openjdk.org/browse/JDK-8320703): JMX SSL RMI 
> Socket connection timeout cannot be changed

wrt to the property name initialConnectTimeout might infer that it is an 
initial value which can be subsequentually modified, but that is not possible. 
As such, would sun.rmi.transport.tcp.connectTimeout be more appropriate -- 
dropping the initial ?

If the connectTimeout initialization code was placed in a static method, it 
could directly unit testable :-) (if such a test was desirable for coverage 
completeness)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16771#discussion_r1404513377


Re: RFR: 8320665: update jdk_core at test/jdk/TEST.groups

2023-11-24 Thread Mark Sheppard
On Thu, 23 Nov 2023 14:31:40 GMT, Ivan Šipka  wrote:

> @bwhuang-us @mahendrachhipa

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16796#pullrequestreview-1747718538


Re: RFR: 8316649: JMX connection timeout cannot be changed and uses the default of 0 (infinite)

2023-11-23 Thread Mark Sheppard
On Tue, 21 Nov 2023 17:57:32 GMT, Kevin Walls  wrote:

> RMI Connections (in general) should use a timeout on the Socket connect call 
> by default.
> 
> JMX connections use RMI and some connection failures never terminate.  The 
> hang described in 8316649 is hard to reproduce manually: the description says 
> it can be caused by a firewall that never returns a response.
> 
> src/java.rmi/share/classes/sun/rmi/transport/tcp/TCPChannel.java
> has other timeouts but nothing to control the initial Socket connect.
> 
> Defaulting to a 1 minute timeout on connect has no effect on existing tests 
> for RMI and JMX, and should go unnoticed in applications unless there really 
> is a significant connection delay.  Specifying zero for the new System 
> Property sun.rmi.transport.tcp.initialConnectTimeout uses the old code path 
> of a connect with no timeout.
> 
> I have a test, but it is not realistically usable: although specifying a 1 
> millisecond timeout will often fail (as expected/desired for the test), it 
> will often pass as the connection happens immediately.

The introduction of the system property is reasonable. I have questions over 
the veracity of the context described in the bug. I think it's a firewall 
configuration issue. If there is not listening end point for a TCP connect 
request, this will, result in a ConnectException being thrown in the RMI 
client. As such, for no response to happen, It means that a TCP connection has 
been established with the firewall (at the TCP level), from the RMI (JMX) 
client and that connection has not been accepted or passed through by the 
firewall application. Thus, it would appear that the shutdown of the target 
server has not been registered in the firewall.

In any case the change looks fine.
Do you have a test c.f. bug comment

-

PR Comment: https://git.openjdk.org/jdk/pull/16771#issuecomment-1824272681


Re: RFR: 8320586: update manual test/jdk/TEST.groups

2023-11-22 Thread Mark Sheppard
On Wed, 22 Nov 2023 11:32:53 GMT, Ivan Šipka  wrote:

> @mahendrachhipa @bwhuang-us

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16782#pullrequestreview-1745344250


Re: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v4]

2023-11-09 Thread Mark Sheppard
On Thu, 9 Nov 2023 05:01:35 GMT, David Holmes  wrote:

> I remain concerned that this means that a whole swag of tests will never be 
> run with virtual threads, which reduces our virtual thread test coverage. 
> Hard to quantify. Do you have any stats on how many tests this will affect 
> and which ones?

I thought the purpose of the createLimitedTestJavaProcessBuilder is for use 
where vm option are NOT  being propagated to the ProcessBuilder,  as such the 
thread factory property wont come into consideration ? the Java doc notes that 
test using createLimitedTestJavaProcessBuilder would be marked as vm.flagless 
which would suugest thread factory is not being used.
createTestJavaProcessBuilder is for use when vm args are being propagated and 
as such thread factory can be "injected" via this method. So this change is 
streamlining these use cases.
Is this a correct interpretation?

-

PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1803595745


Re: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v3]

2023-11-08 Thread Mark Sheppard
On Wed, 8 Nov 2023 02:33:29 GMT, Leonid Mesnik  wrote:

>> Test thread factory is a mode similar to VM flags and should not be used in 
>> ProcessTools.createLimitedTestJavaProcessBuilder(). Only 
>> createTestJavaProcessBuilder() should use it like jtreg VM options.
>> 
>> Adding the test thread factory requires the injection of arguments in the 
>> middle of the list. I don't think it makes sense to modify arguments in 
>> several places so I replaced it with the flag isLimited and moved all 
>> modifications in createJavaProcessBuilder().
>> 
>> Testing tier1-5.
>
> Leonid Mesnik has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   converted list to array.

test/lib/jdk/test/lib/process/ProcessTools.java line 387:

> 385:  */
> 386: 
> 387: private static String[] addTestThreadFactoryArgs(String 
> testThreadFactoryName, String[] command) {

would it be appropriate, at this juncture, to rename the method parameter 
"command" here, and throughout the associated code, to commandArgs, as the 
actual command i.e. java is added in createJavaProcessBuilder,  and the 
parameter references the java command's args?  or is that too much hassle.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16442#discussion_r1387076880


Re: RFR: JDK-8311961 Update Manual Test Groups for ATR JDK22

2023-11-07 Thread Mark Sheppard
On Mon, 6 Nov 2023 22:25:46 GMT, Bill Huang  wrote:

> Updated jdk_core_manual test groups.

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16531#pullrequestreview-1719065392


Re: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder()

2023-11-03 Thread Mark Sheppard
On Wed, 1 Nov 2023 00:06:35 GMT, Leonid Mesnik  wrote:

> Test thread factory is a mode similar to VM flags and should not be used in 
> ProcessTools.createLimitedTestJavaProcessBuilder(). Only 
> createTestJavaProcessBuilder() should use it like jtreg VM options.
> 
> Adding the test thread factory requires the injection of arguments in the 
> middle of the list. I don't think it makes sense to modify arguments in 
> several places so I replaced it with the flag isLimited and moved all 
> modifications in createJavaProcessBuilder().
> 
> Testing tier1-5.

test/lib/jdk/test/lib/process/ProcessTools.java line 451:

> 449:  * @return The ProcessBuilder instance representing the java command.
> 450:  */
> 451: private static ProcessBuilder createJavaProcessBuilder(boolean 
> isLimited, String... command) {

addThreadFactory might be a more appropriate name
However, Stefan has a more SOLID suggestion

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16442#discussion_r1382236083


Re: RFR: 8319265: TestLoadLibraryDeadlock.java fails on windows-x64 "Unable to load b.jar" [v2]

2023-11-02 Thread Mark Sheppard
On Thu, 2 Nov 2023 15:57:15 GMT, Mandy Chung  wrote:

>> The test fails on windows because unmatched comparison of `Path.toString()` 
>> vs `URL.getPath().toString()` after JDK-8317965.   A simple fix is to 
>> evaluate the JAR file path in the same way as `LoadLibraryDeadlock` does.
>
> Mandy Chung has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   review feedback

TestLoadLibraryDeadlock.java: at line 144 after the creation of the jar files 
via method createJar, it would seem prudent to test that they actually exist at 
the expected location prior to executing the actual load tests, rather than 
purely relying createJar not throwing a RuntimeException ?

-

PR Comment: https://git.openjdk.org/jdk/pull/16466#issuecomment-1791043721


Re: RFR: 8303525: Refactor/cleanup open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java [v5]

2023-10-18 Thread Mark Sheppard
On Wed, 18 Oct 2023 19:05:06 GMT, Matthew Donovan  wrote:

>> This PR refactors the SSLSocketParametersTest by removing 
>> redundant/unnecessary classes and cleans up the logic around expected 
>> exceptions.
>
> Matthew Donovan has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   added reachability fence for the rmi server object

submitted some test runs to mach5 for these changes and they have succeeded

-

Marked as reviewed by msheppar (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14932#pullrequestreview-1686055124


Re: RFR: 8303525: Refactor/cleanup open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java [v2]

2023-10-18 Thread Mark Sheppard
On Wed, 18 Oct 2023 18:20:49 GMT, Stuart Marks  wrote:

>> test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java line 77:
>> 
>>> 75: 
>>> 76: public void testRmiCommunication(RMIServerSocketFactory 
>>> serverFactory) throws Exception {
>>> 77: Hello stub = (Hello)UnicastRemoteObject.exportObject(new 
>>> HelloImpl(),
>> 
>> by not retaining an explicit reference to the test rmi server, you are 
>> exposing it to potentially being GCed during the test execution  and 
>> potentially prior to client invocation... this might sound fanciful but  
>> this has been observed in a few scenarios due to the GC  changes ... 
>> althought it doesn't seem to have been an issue, the structure of the test 
>> appears to be inherently racy, with the potential for the client invocation 
>> to getting ahead of the server with the rmi server launching background 
>> threads.
>> Anyways, caution on not retaining a sever reference and GC interference.
>
> Yes, this can still happen even if the reference to the server object is held 
> in a local variable. It's unlikely to occur, but to prevent any possibility 
> of unexpected GC, you need to use reachabilityFence. Something like:
> 
> HelloImpl server = new HelloImpl();
> try {
> // ... use server ...
> } finally {
> Reference.reachabilityFence(server);
> }
> 
> I would go ahead and add this out of an abundance of caution.

yes, good advice on reachability fence ... added some -Xcomp args to test runs 
and no failures detected

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14932#discussion_r1364479317


Re: RFR: 8303525: Refactor/cleanup open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java [v3]

2023-10-17 Thread Mark Sheppard
On Tue, 17 Oct 2023 00:02:28 GMT, Stuart Marks  wrote:

>> Matthew Donovan has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   retained a reference to the RMI server and improved naming
>
> test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java line 81:
> 
>> 79: 0, new SslRMIClientSocketFactory(), serverSocketFactory);
>> 80: HelloClient client = new HelloClient();
>> 81: client.runClient(stub);
> 
> You could even inline HelloClient here, as I don't think that having a 
> separate class is actually doing anything..

it does emphasize the classic RMI client/server scenario!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14932#discussion_r1361765127


Re: RFR: JDK-8249832: java/util/zip/DataDescriptorSignatureMissing.java uses @ignore w/o bug-id [v3]

2023-10-16 Thread Mark Sheppard
On Fri, 13 Oct 2023 12:18:53 GMT, Agnibho Hom Chowdhury  
wrote:

>> Please review this PR as a fix of 
>> [JDK-8249832](https://bugs.openjdk.org/browse/JDK-8249832). I have added the 
>> bug with after @ignore annotation.
>
> Agnibho Hom Chowdhury has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   reverted copyright year update

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16073#pullrequestreview-1680714528


Re: RFR: 8303525: Refactor/cleanup open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java [v2]

2023-10-16 Thread Mark Sheppard
On Wed, 11 Oct 2023 13:53:58 GMT, Matthew Donovan  wrote:

>> This PR refactors the SSLSocketParametersTest by removing 
>> redundant/unnecessary classes and cleans up the logic around expected 
>> exceptions.
>
> Matthew Donovan has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains four additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into socket-params
>  - Merge branch 'master' into socket-params
>  - added javadocs to new methods
>  - 8303525: Refactor/cleanup 
> open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java

the test method testServerFactory could be viewed as a misnomer, so refactor 
rename testSslServerSocketFactory would seem to be more appropriate, while the 
serverFactory parameter name to testRmiCommunication could likewise be renamed 
to sslServerSocketFactory (or at least serverSocketFactory).

-

PR Comment: https://git.openjdk.org/jdk/pull/14932#issuecomment-1764338318


Re: RFR: 8303525: Refactor/cleanup open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java [v2]

2023-10-16 Thread Mark Sheppard
On Wed, 11 Oct 2023 13:53:58 GMT, Matthew Donovan  wrote:

>> This PR refactors the SSLSocketParametersTest by removing 
>> redundant/unnecessary classes and cleans up the logic around expected 
>> exceptions.
>
> Matthew Donovan has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains four additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into socket-params
>  - Merge branch 'master' into socket-params
>  - added javadocs to new methods
>  - 8303525: Refactor/cleanup 
> open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java

test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java line 77:

> 75: 
> 76: public void testRmiCommunication(RMIServerSocketFactory 
> serverFactory) throws Exception {
> 77: Hello stub = (Hello)UnicastRemoteObject.exportObject(new 
> HelloImpl(),

by not retaining an explicit reference to the test rmi server, you are exposing 
it to potentially being GCed during the test execution  and potemntially prior 
to client invocation... this might sound fanciful but  this has been observed 
in a few scenarios due to the GC  changes ... althought it doesn't seem to have 
been an issue, the structure of the test appears to be inherently racy, with 
the potential for the client invocation to get ahead of the client with the rmi 
server launching bacground threads.
Any caution on not retaining a sever reference and GC interference.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14932#discussion_r1360500322


Re: RFR: 8315097: Rename createJavaProcessBuilder [v3]

2023-09-05 Thread Mark Sheppard
On Wed, 30 Aug 2023 09:23:55 GMT, Leo Korinth  wrote:

>> Rename createJavaProcessBuilder so that it is not used by mistake instead of 
>> createTestJvm.
>> 
>> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed 
>> -i -e 
>> "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"`
>> 
>> Then I have manually modified ProcessTools.java. In that file I have moved 
>> one version of createJavaProcessBuilder so that it is close to the other 
>> version. Then I have added a javadoc comment in bold telling:
>> 
>>/**
>>  * Create ProcessBuilder using the java launcher from the jdk to
>>  * be tested.
>>  *
>>  *  Please observe that you likely should use
>>  * createTestJvm() instead of this method because createTestJvm()
>>  * will add JVM options from "test.vm.opts" and "test.java.opts"
>>  *  and this method will not do that.
>>  *
>>  * @param command Arguments to pass to the java command.
>>  * @return The ProcessBuilder instance representing the java command.
>>  */
>> 
>> 
>> I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of 
>> the name of Utils.prependTestJavaOpts that adds those VM flags. If you have 
>> a better name I could do a rename of the method. I kind of like that it is 
>> long and clumsy, that makes it harder to use...
>> 
>> I have run tier 1 testing, and I have started more exhaustive testing.
>
> Leo Korinth has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix static import

> From talking to other HotSpot devs it is quite clear that the different names 
> lead to mistakes when writing (copy-n-pasting) tests, so I'm happy that we 
> try to figure out some way to make it more obvious when we prepend the extra 
> test options and when we don't.
> 
> I agree with @msheppar that `createTestJvm` isn't a good name and I wouldn't 
> be opposed to changing that name instead of `createJavaProcessBuilder`. 
> However, if we do rename that function I strongly urge us to also rename the 
> corresponding `executeTestJvm` function.
> 
> I also think it is too obscure that functions with _Test_ in the name prepend 
> the extra test options, and those without _Test_ don't, so I'd like to get 
> rid of that convention.
> 
> I wouldn't be opposed to a change that:
> 
> * Keeps the `createJavaProcessBuilder` name
> 
> * Renames `createTestJvm` to `createJavaProcessBuilderPrependTestOpts`
> 
> * Renames `executeTestJvm` to `executeJavaPrependTestOpts`
> 
> * Removes `createTestJava`
> 
> * Removes `executeTestJava`

I think @stefank made a reasonable suggestion which was endorsed by 
@AlanBateman 

which would remove the misconception hurdle

-

PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1707391042


Re: RFR: 8315097: Rename createJavaProcessBuilder [v3]

2023-08-31 Thread Mark Sheppard
On Thu, 31 Aug 2023 05:45:27 GMT, David Holmes  wrote:

> > So you could create a single createJavaProcessBuilder with add an 
> > additional parameter boolean addTestOpts e.g.
> > createJavaProcessBuilder(List command, boolean addTestOpts) { ... }
> 
> @msheppar that is actually where we started, and it was then split into two 
> differently named methods to "make it clear" which one included the test opts 
> without having to remember the name of the parameter that the true/false 
> argument was bound to.

cheers David  thanks for the clarification on background

-

PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1700749844


Re: RFR: 8315097: Rename createJavaProcessBuilder [v3]

2023-08-30 Thread Mark Sheppard
On Wed, 30 Aug 2023 09:23:55 GMT, Leo Korinth  wrote:

>> Rename createJavaProcessBuilder so that it is not used by mistake instead of 
>> createTestJvm.
>> 
>> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed 
>> -i -e 
>> "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"`
>> 
>> Then I have manually modified ProcessTools.java. In that file I have moved 
>> one version of createJavaProcessBuilder so that it is close to the other 
>> version. Then I have added a javadoc comment in bold telling:
>> 
>>/**
>>  * Create ProcessBuilder using the java launcher from the jdk to
>>  * be tested.
>>  *
>>  *  Please observe that you likely should use
>>  * createTestJvm() instead of this method because createTestJvm()
>>  * will add JVM options from "test.vm.opts" and "test.java.opts"
>>  *  and this method will not do that.
>>  *
>>  * @param command Arguments to pass to the java command.
>>  * @return The ProcessBuilder instance representing the java command.
>>  */
>> 
>> 
>> I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of 
>> the name of Utils.prependTestJavaOpts that adds those VM flags. If you have 
>> a better name I could do a rename of the method. I kind of like that it is 
>> long and clumsy, that makes it harder to use...
>> 
>> I have run tier 1 testing, and I have started more exhaustive testing.
>
> Leo Korinth has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix static import

I don't think  a name change is required. The method is 
createJavaProcessBuilder with a "list" of argurments and a builder is returned. 
As such, there is no inference, in the name, that test args will be propagated. 
Adding the additional java doc description should be sufficient to dispell any 
misconceptions. 
The createTestJvm is a misnomer as it returns a ProcessBulder rather than a 
Process, which is what I would expected from createTestJvm, without looking at 
its signature.

So you could create a single createJavaProcessBuilder with add an additional 
parameter boolean addTestOpts e.g.
createJavaProcessBuilder(List command, boolean addTestOpts) { ... }

createProcessBuilderIgnoreJavaTestOpt(cmdArgs)  maps to 
createJavaProcessBuilder(cmdArgs, false)

createTestJvm(cmdArgs) maps to createJavaProcessBuilder(cmdArgs, true)

But this is a lot more work.

alternatively change createTestJvm to createTestJavaProcessBuilder  or 
createJavaProcessBuilderAddTestOpts

-

PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1698985460


Re: RFR: 8314476: TestJstatdPortAndServer.java failed with "java.rmi.NoSuchObjectException: no such object in table" [v3]

2023-08-24 Thread Mark Sheppard
On Thu, 24 Aug 2023 21:38:41 GMT, Kevin Walls  wrote:

>> Several tests from test/jdk/sun/tools/jstatd are intermittent.
>> 
>> Port clashes when run at the same time on the same machine have been a 
>> problem.
>> The RMI error "no such object in table" can mean a reference on the RMI 
>> server has been GC'd.
>> Either way, jstatd fails to startup and and the test fails.
>> 
>> These both have jstatd reporting "Could not bind" which should be recognised 
>> and retried, as we already do for (some) port in use problems. i.e. 
>> JstatdTest.java: the isJstadReady() method (will correct that typo also) 
>> checks for "Port already in use", which can be printed by TCPTransport.java.
>> 
>> Should update its test to check for "Could not bind".
>> Should limit the retries also.
>
> Kevin Walls has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Modernisation

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15414#pullrequestreview-1594519549


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v13]

2023-08-24 Thread Mark Sheppard
On Thu, 24 Aug 2023 20:48:01 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   format the code

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15294#pullrequestreview-1594517760


Re: RFR: 8314476: TestJstatdPortAndServer.java failed with "java.rmi.NoSuchObjectException: no such object in table" [v2]

2023-08-24 Thread Mark Sheppard
On Thu, 24 Aug 2023 14:57:47 GMT, Kevin Walls  wrote:

>> Several tests from test/jdk/sun/tools/jstatd are intermittent.
>> 
>> Port clashes when run at the same time on the same machine have been a 
>> problem.
>> The RMI error "no such object in table" can mean a reference on the RMI 
>> server has been GC'd.
>> Either way, jstatd fails to startup and and the test fails.
>> 
>> These both have jstatd reporting "Could not bind" which should be recognised 
>> and retried, as we already do for (some) port in use problems. i.e. 
>> JstatdTest.java: the isJstadReady() method (will correct that typo also) 
>> checks for "Port already in use", which can be printed by TCPTransport.java.
>> 
>> Should update its test to check for "Could not bind".
>> Should limit the retries also.
>
> Kevin Walls has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Less specific error message, could be ports or other failure..

Observations from other "rmi" test scenarios suggest there might be some issue 
in the Registry, requiring some "deep dive" analysis. In any case, for the 
jstatd test, checking for a "name binding condition" and executing a retry 
seems reasonable

-

PR Comment: https://git.openjdk.org/jdk/pull/15414#issuecomment-1692189768


Re: RFR: 8314476: TestJstatdPortAndServer.java failed with "java.rmi.NoSuchObjectException: no such object in table" [v2]

2023-08-24 Thread Mark Sheppard
On Thu, 24 Aug 2023 14:57:47 GMT, Kevin Walls  wrote:

>> Several tests from test/jdk/sun/tools/jstatd are intermittent.
>> 
>> Port clashes when run at the same time on the same machine have been a 
>> problem.
>> The RMI error "no such object in table" can mean a reference on the RMI 
>> server has been GC'd.
>> Either way, jstatd fails to startup and and the test fails.
>> 
>> These both have jstatd reporting "Could not bind" which should be recognised 
>> and retried, as we already do for (some) port in use problems. i.e. 
>> JstatdTest.java: the isJstadReady() method (will correct that typo also) 
>> checks for "Port already in use", which can be printed by TCPTransport.java.
>> 
>> Should update its test to check for "Could not bind".
>> Should limit the retries also.
>
> Kevin Walls has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Less specific error message, could be ports or other failure..

test/jdk/sun/tools/jstatd/JstatdTest.java line 327:

> 325: int tries = 0;
> 326: try {
> 327: while (jstatdThread == null && ++tries <= 10) {

maybe use a ubiquitous constant MAX_RETRY (private static final in MAX_RETRY = 
10)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15414#discussion_r1304694460


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v10]

2023-08-23 Thread Mark Sheppard
On Tue, 22 Aug 2023 18:21:17 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   format the code

looks like a good approach 👍

-

PR Comment: https://git.openjdk.org/jdk/pull/15294#issuecomment-1690389811


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v10]

2023-08-22 Thread Mark Sheppard
On Tue, 22 Aug 2023 18:21:17 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   format the code

Such a restriction appears to be a flaw in the design of SocketFactory and its 
use in jndi ldap.
Consider the existing updated code  and the inference of your assertion:
 ```
   // create the socket
if (connectTimeout > 0) {
InetSocketAddress endpoint =
createInetSocketAddress(host, port);
// unconnected socket
socket = factory.createSocket();
// connected socket
socket.connect(endpoint, connectTimeout);
if (debug) {
System.err.println("Connection: creating socket with " +
"a timeout using supplied socket factory");
}
} else {
// continue (but ignore connectTimeout)
if (socket == null) {
// connected socket
socket = factory.createSocket(host, port);
if (debug) {
System.err.println("Connection: creating socket using " +
"supplied socket factory");
}
}
}``

Your assertion is, "If the custom factory does not implement the method 
creatSocket (no method parameters), it will fail for creating the socket."
Thus, the first block of the, if (connectTimeout > 0) condition, is 
problematic. This assumes that if the timeout is > 0, then the factory will 
implement the no args factory method createSocket(). BUT that means that an 
implementor  of a customer factory would need to have knowledge of the internal 
implemntation of this class, and ensure no args factory method createSocket is 
implemented. If an implementor has knowledge to implement the no args factory 
method when timeout > 0, then they can also have knowldge to implement the no 
args factory method when the timeout is 0 or < 0, as suggested by the 
restructure.

Additionally, this would not prohibit a restructure of the non factory 
createConnectionSocket.

-

PR Comment: https://git.openjdk.org/jdk/pull/15294#issuecomment-1689079814


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v10]

2023-08-22 Thread Mark Sheppard
On Tue, 22 Aug 2023 18:21:17 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   format the code

Resuggesting restructure to createConnectionSocket methods -- to instantiate 
Socket explicitly in one place and to explicitly use connect in all cases

`
   ```
private Socket createConnectionSocket (String host, int port, int 
connectTimeout) throws Exception {
InetSocketAddress endpoint = createInetSocketAddress(host, port);
Socket socket = new Socket();

if (connectTimeout > 0) {
if (debug) {
System.err.println("Connection: creating socket with " +
"a timeout");
}
socket.connect(endpoint, connectTimeout);
} else {
if (debug) {
System.err.println("Connection: creating socket");
}
// connected socket
socket.connect(endpoint);
}
return socket;
}


// create the socket with the provided factory
private Socket createConnectionSocket(String host, int port, String 
socketFactory,
   int connectTimeout) throws Exception 
{
@SuppressWarnings("unchecked")
Class socketFactoryClass = (Class)
Obj.helper.loadClass(socketFactory);
Method getDefault =
socketFactoryClass.getMethod("getDefault", new Class[]{});
SocketFactory factory = (SocketFactory) getDefault.invoke(null, new 
Object[]{});
InetSocketAddress endpoint =
  createInetSocketAddress(host, port);
// create unconnected socket
Socket socket = factory.createSocket();

// create the socket
if (connectTimeout > 0) {
if (debug) {
System.err.println("Connection: creating socket with " +
"a timeout using supplied socket factory");
}
// connected socket
socket.connect(endpoint, connectTimeout);
} else {
if (debug) {
System.err.println("Connection: creating socket using " +
"supplied socket factory");
}
// connected socket
socket.connect(endpoint);
}
return socket;
}


`

-

PR Comment: https://git.openjdk.org/jdk/pull/15294#issuecomment-1689007121


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v3]

2023-08-17 Thread Mark Sheppard
On Wed, 16 Aug 2023 23:11:11 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   updated the code according to the review

you now have two methods closing a socket. Can these be refactored and a single 
closeSocket method used in both cases ?

method overloading could have been used for the new create socket method

you might consider some restructuring of the call flow in each:

   private Socket createSocketWithoutFactory (String host, int port, int 
connectTimeout) throws Exception {
Socket socket = null;
InetSocketAddress endpoint = createInetSocketAddress(host, port);
socket = new Socket();

if (connectTimeout > 0) {
if (debug) {
System.err.println("Connection: creating socket with " +
"a timeout");
}
socket.connect(endpoint, connectTimeout);
} else {
if (debug) {
System.err.println("Connection: creating socket");
}
// connected socket
socket.connect(endpoint);
}
return socket;
}


// create the socket with the provided factory
private Socket createSocketWithFactory(String host, int port, String 
socketFactory,
   int connectTimeout) throws Exception 
{
Socket socket = null;
@SuppressWarnings("unchecked")
Class socketFactoryClass = (Class)
Obj.helper.loadClass(socketFactory);
Method getDefault =
socketFactoryClass.getMethod("getDefault", new Class[]{});
SocketFactory factory = (SocketFactory) getDefault.invoke(null, new 
Object[]{});
InetSocketAddress endpoint =
  createInetSocketAddress(host, port);
// create unconnected socket
socket = factory.createSocket();

// create the socket
if (connectTimeout > 0) {
if (debug) {
System.err.println("Connection: creating socket with " +
"a timeout using supplied socket factory");
}
// connected socket
socket.connect(endpoint, connectTimeout);
} else {
if (debug) {
System.err.println("Connection: creating socket using " +
"supplied socket factory");
}
// connected socket
socket.connect(endpoint);
}
return socket;
}

-

PR Comment: https://git.openjdk.org/jdk/pull/15294#issuecomment-1681979781


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v3]

2023-08-16 Thread Mark Sheppard
On Wed, 16 Aug 2023 23:11:11 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   updated the code according to the review

src/java.naming/share/classes/com/sun/jndi/ldap/Connection.java line 340:

> 338: 
> 339: // create the socket without the factory
> 340: private Socket createSocketWithoutFactory (String host, int port, 
> int connectTimeout) throws Exception {

i think the logic can be streamlined as per comment in main conversation

src/java.naming/share/classes/com/sun/jndi/ldap/Connection.java line 381:

> 379: createInetSocketAddress(host, port);
> 380: // unconnected socket
> 381: socket = factory.createSocket();

i think the logic can be streamlined as per comment in main conversation

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15294#discussion_r1296543226
PR Review Comment: https://git.openjdk.org/jdk/pull/15294#discussion_r1296543879


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v3]

2023-08-16 Thread Mark Sheppard
On Wed, 16 Aug 2023 23:11:11 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   updated the code according to the review

src/java.naming/share/classes/com/sun/jndi/ldap/Connection.java line 287:

> 285: // create the socket with factory
> 286: if (socketFactory != null) {
> 287: socket = createSocketWithFactory (host, port, 
> socketFactory, connectTimeout) ;

you could use method overloading :-)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15294#discussion_r1296540658


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v3]

2023-08-16 Thread Mark Sheppard
On Wed, 16 Aug 2023 23:11:11 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   updated the code according to the review

together with a refactor extact methods on the createSocket method, the 
existing logic could be simplified something like as follows:


InetSocketAddress endpoint =
   createInetSocketAddress(ldapServerHost, ldapServerPort);

if (socketFactory != null) {
 // further refine with refactor extract method 
createConnectionSocket(InetSocketAddress serverEndpoint, String 
factoryClassName, int timeout)

// create the factory

@SuppressWarnings("unchecked")
Class socketFactoryClass =
(Class) 
Obj.helper.loadClass(socketFactory);
Method getDefault =
socketFactoryClass.getMethod("getDefault", new 
Class[]{});
SocketFactory factory = (SocketFactory) getDefault.invoke(null, 
new Object[]{});
 
// create unconnected socket
socket = factory.createSocket();

   if (connectTimeout > 0) { 
  if (debug) {
System.err.println("Connection.createSocket: creating 
socket with " +
"a timeout using supplied socket factory");
}

// connect socket
socket.connect(endpoint, connectTimeout);
   } else {
   if (debug) {
System.err.println("Connection.createSocket: creating 
socket using " +
"supplied socket factory");
}
// connect socket
socket.connect(endpoint);
   }
 } else { // NO SocketFactory
// further refine with refactor extract method 
createConnectionSocket(InetSocketAddress serverEndpoint,  int timeout)
socket = new Socket();
if (connectTimeout > 0) {
if (debug) {
System.err.println("Connection.createSocket: creating 
socket with " +
"a timeout");
}
socket.connect(endpoint, connectTimeout);
} else {
   if (debug) {
System.err.println("Connection.createSocket: creating 
socket");
   }
   socket.connect(endpoint);
  }

-

PR Comment: https://git.openjdk.org/jdk/pull/15294#issuecomment-1681403755


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v2]

2023-08-16 Thread Mark Sheppard
On Wed, 16 Aug 2023 23:10:02 GMT, Weibing Xiao  wrote:

>> test/jdk/com/sun/jndi/ldap/LdapSSLHandshakeFailureTest.java line 173:
>> 
>>> 171: public void run() {
>>> 172: try (Socket socket = serverSocket.accept()) {
>>> 173: Thread.sleep(1);
>> 
>> What's the purpose of the sleep ? 
>> Regardless, based on the test semantics alluded in the test name, the server 
>> should never enter the read block. So is this code redundant? Or is it there 
>> just in case the accept and the handshake succeeds?
>
> It will slow down the handshake process after the socket is created and 
> connected with the server. The bug is showing the handshake failure and the 
> socket is not closed properly.

OK, So the SSL handshake must take place within the getInputStream call rather 
than the accept, as such you are triggering a timeout on the SSL handshake.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15294#discussion_r1296530516


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v3]

2023-08-16 Thread Mark Sheppard
On Wed, 16 Aug 2023 17:24:57 GMT, Aleksei Efimov  wrote:

>> Weibing Xiao has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   updated the code according to the review
>
> src/java.naming/share/classes/com/sun/jndi/ldap/Connection.java line 372:
> 
>> 370: // 8314063 the socket is not closed after the failure of 
>> handshake
>> 371: if (socket != null && !socket.isClosed()) {
>> 372: socket.close();
> 
> The original exception can be lost if `socket.close()` fails with 
> `IOException` here

a closeConnectionSocket (IIRC) was introduced in previous change which handled 
any IOExceptions on the close. This could be used here.
But some how that seems to have been renamed !!

709 private void closeOpenedSocket() {
 710 try {
 711 sock.close();
 712 } catch (IOException ioEx) {
 713 if (debug) {
 714 System.err.println("Connection.closeConnectionSocket: 
Socket close problem: " + ioEx);
 715 System.err.println("Socket isClosed: " + sock.isClosed());
 716 }
 717 }
 718 }

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15294#discussion_r1296519158


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v2]

2023-08-16 Thread Mark Sheppard
On Wed, 16 Aug 2023 18:10:04 GMT, Aleksei Efimov  wrote:

>> Weibing Xiao has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   update the test code
>
> src/java.naming/share/classes/com/sun/jndi/ldap/Connection.java line 369:
> 
>> 367: }
>> 368: }
>> 369: } catch (Exception e) {
> 
> The code wrapped in this try-catch block can throw unchecked exceptions, for 
> example `SecurityException` thrown by `Socket.connect`. For such cases the 
> newly created socket remain open.

But the catch is on generalized Exception, and SecutityException is a subclass, 
so it is covered, n'est-ce pas?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15294#discussion_r1296515618


Re: RFR: 8314063 : The socket is not closed in Connection::createSocket when the handshake failed for LDAP connection [v2]

2023-08-16 Thread Mark Sheppard
On Wed, 16 Aug 2023 19:12:54 GMT, Weibing Xiao  wrote:

>> Please refer to JDK-8314063.
>> 
>> The failure scenario is due to the setting of connection timeout. It is 
>> either too small or not an optimal value for the system. When the client 
>> tries to connect to the server with LDAPs protocol. It requires the 
>> handshake after the socket is created and connected, but it fails due to 
>> connection timeout and leaves the socket open. It is not closed properly due 
>> to the exception handling in the JDK code.
>> 
>> The change is adding a try/catch block and closing the socket in the catch 
>> block,  and the format of the code got changed consequently.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   update the test code

test/jdk/com/sun/jndi/ldap/LdapSSLHandshakeFailureTest.java line 173:

> 171: public void run() {
> 172: try (Socket socket = serverSocket.accept()) {
> 173: Thread.sleep(1);

What's the purpose of the sleep ? 
Regardless, based on the test semantics alluded in the test name, the server 
should never enter the read block. So is this code redundant? Or is it there 
just in case the accept and the handshake succeeds?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15294#discussion_r1296511511


Re: RFR: 8313657 : com.sun.jndi.ldap.Connection.cleanup does not close connections on SocketTimeoutErrors [v6]

2023-08-14 Thread Mark Sheppard
On Mon, 14 Aug 2023 13:27:05 GMT, Weibing Xiao  wrote:

>> com.sun.jndi.ldap.Connection::leanup does not close the underlying socket if 
>> the is an IOException generation when the output stream was flushing the 
>> buffer.
>> 
>> Please refer to the bug https://bugs.openjdk.org/browse/JDK-8313657.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   removed the unused variable

thanks for incorporating the feedback
LGTM

-

Marked as reviewed by msheppar (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15143#pullrequestreview-1577263299


Re: RFR: 8313657 : com.sun.jndi.ldap.Connection.cleanup does not close connections on SocketTimeoutErrors [v3]

2023-08-07 Thread Mark Sheppard
On Fri, 4 Aug 2023 19:09:58 GMT, Weibing Xiao  wrote:

>> com.sun.jndi.ldap.Connection::leanup does not close the underlying socket if 
>> the is an IOException generation when the output stream was flushing the 
>> buffer.
>> 
>> Please refer to the bug https://bugs.openjdk.org/browse/JDK-8313657.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   update error message

test/jdk/com/sun/jndi/ldap/SocketCloseTest.java line 62:

> 60: props.put(Context.PROVIDER_URL, 
> "ldap://localhost:1389/o=example";);
> 61: props.put("java.naming.ldap.factory.socket", 
> CustomSocketFactory.class.getName());
> 62: try {

this will execute twice: once in the agentvm main and also, as desired, in the 
test JVM main.
The agentvm main sees a ClassNotFoundException being thrown and then the output 
of "The socket was not closed. " in the log c.f. other comments and bug 
comments for further details to avoid execution in agentvm or use of othervm 
test mode execution

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15143#discussion_r1285675453


Re: RFR: 8313657 : com.sun.jndi.ldap.Connection.cleanup does not close connections on SocketTimeoutErrors [v3]

2023-08-07 Thread Mark Sheppard
On Fri, 4 Aug 2023 19:09:58 GMT, Weibing Xiao  wrote:

>> com.sun.jndi.ldap.Connection::leanup does not close the underlying socket if 
>> the is an IOException generation when the output stream was flushing the 
>> buffer.
>> 
>> Please refer to the bug https://bugs.openjdk.org/browse/JDK-8313657.
>
> Weibing Xiao has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   update error message

I ran the test multiple times in mach5 and it executes fine. However, while 
examining the log output I noticed the "The socket was not closed. " output. 
This is a little confusing. While looking at the test and exploring the origins 
of this outout, we see that the test is essentially an othervm execution. As 
such, I restructured the test to othervm and it executes with some informative 
output and eliminates the discombobulating output.

I have left a comment in the JBS bug item with some further details and changes 
for othervm. This gives some imformative output as the test executes.

The origins of the output is the main try block executing in the agentvm and 
the throwing of a ClassNotFoundException.

If you don't wish to do otherrvm and to avoid the  "The socket was not closed. 
" output, and if wish to retain the same structure, then consider adding a run 
command to the test with an arg, and then a conditional on the core test logic 
e.g.

* @run main SocketCloseTest launcher

and in the main method

public static void main(String[] args) throws Exception {
Hashtable props = new Hashtable<>();

if (args.length == 0)  { // only executed in the launched JVM
props.put(Context.INITIAL_CONTEXT_FACTORY, 
"com.sun.jndi.ldap.LdapCtxFactory");
props.put(Context.PROVIDER_URL, "ldap://localhost:1389/o=example";);
props.put("java.naming.ldap.factory.socket", 
CustomSocketFactory.class.getName());
try {
final DirContext ctx = new InitialDirContext(props);
} catch (Exception e) {
if (CustomSocketFactory.customSocket.closeMethodCalledCount() > 0) {
System.out.println(SOCKET_CLOSED_MSG);
} else {
System.out.println(SOCKET_NOT_CLOSED_MSG);
}
}
}

OutputAnalyzer outputAnalyzer = 
ProcessTools.executeTestJvm("SocketCloseTest_II");
outputAnalyzer.stdoutShouldContain(SOCKET_CLOSED_MSG);
outputAnalyzer.stdoutShouldNotContain(SOCKET_NOT_CLOSED_MSG);
outputAnalyzer.stdoutShouldContain(BAD_FLUSH);
}

-

PR Comment: https://git.openjdk.org/jdk/pull/15143#issuecomment-1667550730


Re: RFR: 8313657 : com.sun.jndi.ldap.Connection.cleanup does not close connections on SocketTimeoutErrors [v3]

2023-08-07 Thread Mark Sheppard
On Mon, 7 Aug 2023 06:42:54 GMT, Vyom Tewari  wrote:

>> Weibing Xiao has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   update error message
>
> src/java.naming/share/classes/com/sun/jndi/ldap/Connection.java line 702:
> 
>> 700: 
>> 701: // close socket
>> 702: private void closeConnectionSocket() {
> 
> 'closeSocketConnection' will be more meaningful.

the suggested name was to convey the closing of the  Connection's socket, not a 
socket connection per se.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15143#discussion_r1285577687


Re: RFR: 8313657 : com.sun.jndi.ldap.Connection.cleanup does not close connections on SocketTimeoutErrors

2023-08-04 Thread Mark Sheppard
On Thu, 3 Aug 2023 17:32:43 GMT, Weibing Xiao  wrote:

> com.sun.jndi.ldap.Connection::leanup does not close the underlying socket if 
> the is an IOException generation when the output stream was flushing the 
> buffer.
> 
> Please refer to the bug https://bugs.openjdk.org/browse/JDK-8313657.

why not restructure the finally block a little ... refactor extract methods
flushOutputStream, 
closeConnectionSocket 
... should the outputStream be closed in this finally block, also?

} finally {
flushOutputStream(outStream);
closeConnectionSocket(sock);
try {
unpauseReader();
} catch (IOException ie) {
if (debug)
System.err.println("Connection.cleanup: problem 
with unpuaseReader " + ie);
}
  ...


private void flushOutputStream (OutputStream outputStream) {
try {
outStream.flush();
outStream.close();
} catch (IOException ioEx) {
if (debug)
System.err.println("Connection.flushOutputStream: OutputStream 
flush or close problem " + ioEx);
}
}

private void closeConnectionSocket (Socket sock) {
//bug 8313657, socket not closed util GC running
try {
sock.close();
} catch (IOException ioEx) {
if (debug)
System.err.println("ConnectioncloseConnectioSocket: problem 
closing socket: " + ioEx);
}
}

-

PR Comment: https://git.openjdk.org/jdk/pull/15143#issuecomment-1665777500


Re: RFR: JDK-8295859 Update Manual Test Groups [v3]

2023-04-12 Thread Mark Sheppard
On Tue, 11 Apr 2023 22:06:30 GMT, Bill Huang  wrote:

>> The purpose of this task is to add the difference between -manual jdk_core 
>> and jdk_core_manual to the jdk_core_manual test goal. Furthermore, in order 
>> to streamline the manual test execution process, a new test group called 
>> jdk_core_manual_human has been created for manual tests that demand extra 
>> preparation or test input.
>
> Bill Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Renamed jdk_core_manual to jdk_core_manual_interactive

Marked as reviewed by msheppar (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/12840#pullrequestreview-1381338834


Re: RFR: JDK-8295859 Update Manual Test Groups [v2]

2023-04-11 Thread Mark Sheppard
On Mon, 20 Mar 2023 23:29:25 GMT, Bill Huang  wrote:

>> The purpose of this task is to add the difference between -manual jdk_core 
>> and jdk_core_manual to the jdk_core_manual test goal. Furthermore, in order 
>> to streamline the manual test execution process, a new test group called 
>> jdk_core_manual_human has been created for manual tests that demand extra 
>> preparation or test input.
>
> Bill Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Implemented review comments.

test/jdk/TEST.groups line 600:

> 598: 
> 599: jdk_core_manual = \
> 600: :jdk_core_manual_no_input \

jdk_core_manual_human (as it is now called) should be part of jdk_core_manual, 
as jdk_core_manual will be used in the corelibs-atr job

test/jdk/TEST.groups line 643:

> 641: sun/security/tools/jarsigner/compatibility/Compatibility.java
> 642: 
> 643: jdk_core_manual_human = \

jdk_core_manual_requires_human_input gave a good indication of what is needed 
with this test group ... maybe jdk_core_manual_interactive if we don't want to 
use requires_human_input

-

PR Review Comment: https://git.openjdk.org/jdk/pull/12840#discussion_r1163314338
PR Review Comment: https://git.openjdk.org/jdk/pull/12840#discussion_r1163316682


Re: RFR: 8301737: java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with -Xcomp [v4]

2023-02-10 Thread Mark Sheppard
On Fri, 10 Feb 2023 17:00:47 GMT, Roger Riggs  wrote:

>> yes, this mitigates the issue within the test,  and alls good.
>> BUT it still leave an open question as to why the behaviour of the test is 
>> different for the -Xcomp mode and the interpretative mode?
>> I think it would be reasonable to expect both modes to behave the same. As 
>> such, that the compile mode should generate a fence or whatever to guarantee 
>> that the impl remain a strong reference until it goes out of scope at the 
>> end of the try block ?
>> This is the case in non -Xcomp mode, but in -Xcomp the status of the impl 
>> reference is accelerate to to being unreachable and a candidate for garbage 
>> collection ?
>> Is it not a hotspot compiler issue or the component area responsible for 
>> -Xcomp ?
>
> @msheppar It depends on the timing of GC and the unpredictable interactions 
> between the compiler and gc.
> If gc is particularly aggressive (as may be stimulated by -Xcomp), it may 
> determine that the impl is no longer referenced even though it seems to be in 
> the middle of the method.

Would this behaviour be covered by the JLS and JVM specs ?

-

PR: https://git.openjdk.org/jdk/pull/12399


Re: RFR: 8301737: java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with -Xcomp [v4]

2023-02-10 Thread Mark Sheppard
On Fri, 10 Feb 2023 01:53:18 GMT, SUN Guoyun  wrote:

>> test/jdk/java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java 
>> line 74:
>> 
>>> 72: public void useExportObject(String name, Object obj, int 
>>> expectedFilterCount) throws RemoteException {
>>> 73: try {
>>> 74: impl = RemoteImpl.create();
>> 
>> I'd prefer to use  ReachabilityFence at the end of the try block to keep the 
>> nested scoping.
>> ```Reference.reachabilityFence(impl);```
>
> I just tested it and it really be OK, I'll submit a new commit.

yes, this mitigates the issue within the test,  and alls good.
BUT it still leave an open question as to why the behaviour of the test is 
different for the -Xcomp mode and the interpretative mode?
I think it would be reasonable to expect both modes to behave the same. As 
such, that the compile mode should generate a fence or whatever to guarantee 
that the impl remain a strong reference until it goes out of scope at the end 
of the try block ?
Thus is the case in non -Xcomp mode, but in -Xcomp the status of the impl 
reference is accelerate to to being unreachable and a candidate for garbage 
collection ?
Is it not a hotspot compiler issue or the component area responsible for -Xcomp 
?

-

PR: https://git.openjdk.org/jdk/pull/12399


Re: RFR: 8301737: java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with -Xcomp [v4]

2023-02-09 Thread Mark Sheppard
On Thu, 9 Feb 2023 08:48:17 GMT, SUN Guoyun  wrote:

>> Hi all,
>> When -Xcomp be used, this testcase will use more codecaches, causing the GC 
>> to be triggered early, then causing this test failed on LoongArch64 
>> architecture.
>> 
>> This PR fix the issue, Please help review it.
>> 
>> Thanks.
>
> SUN Guoyun has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8301737: 
> java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with 
> -Xcomp

As you have shown moving the impl to a member variable of the test, allows the 
test to pass and means that when the reaper runs it doesn't find a weak 
reference for the RemoteImpl. We can see this in trace below ... there are no 
remove call on ObjectTable

Feb 09, 2023 12:03:13 PM sun.rmi.transport.Target pinImpl
FINER: MainThread: strongRef = sun.rmi.transport.DGCImpl@7cbf0dfe
Feb 09, 2023 12:03:13 PM sun.rmi.transport.DGCImpl$2 run
FINER: MainThread: add object [0:0:0, 2]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7fff, -828094982909796927]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7ffd, 1016769954862579574]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7ffb, 5046449780262743815]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7ff9, -5609899494232426191]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7ff7, -3723530497788482934]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7ff5, 26547078264881776]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7ff3, 893326842025149942]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7ff1, -5974039217477050622]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7fef, -8933988202720453193]
Feb 09, 2023 12:03:13 PM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [e560232:186360eee7e:-7fed, -3244609101583748901]
STATUS:Passed.


I have an instrumented build and we can see that the Reaper's  reapQueue is 
empty, no weak refreences, when it runs first time

ObjectTable::incrementKeepAliveCount Reaper Thread started 
ObjectTable::Reaper::run ENTER thread Thread[#23,RMI Reaper,5,system]
ObjectTable::Reaper::run check reapQueue Thread[#23,RMI Reaper,5,system]
ObjectTable::incrementKeepAliveCount gcInterval == 360


> After `-Dsun.rmi.dgc.logLevel=VERBOSE`, test run passed. I don't understand 
> why, you know?

as it's a subtle race condition within the test, the logging can delay when the 
removal event happens

If declaring impl as a member variable works, and mitigates the failure, then 
that's a reasonable work around ... I expect Roger will comment on this further.

-

PR: https://git.openjdk.org/jdk/pull/12399


Re: RFR: 8301737: java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with -Xcomp [v4]

2023-02-09 Thread Mark Sheppard
On Thu, 9 Feb 2023 08:48:17 GMT, SUN Guoyun  wrote:

>> Hi all,
>> When -Xcomp be used, this testcase will use more codecaches, causing the GC 
>> to be triggered early, then causing this test failed on LoongArch64 
>> architecture.
>> 
>> This PR fix the issue, Please help review it.
>> 
>> Thanks.
>
> SUN Guoyun has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8301737: 
> java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with 
> -Xcomp

ok thanks for the feedback and trying my suggestion. I would put a D in front 
of GC. The issue relates to the DGC in RMI, and it is possible that a race 
condition could occur in the test.
If you add -Dsun.rmi.dgc.logLevel=VERBOSE as follows and execute the test this 
should generate some informative trace

@run testng/othervm -Dsun.rmi.dgc.logLevel=VERBOSE FilterUROTest

generates trace as follows:

Feb 09, 2023 9:50:57 AM sun.rmi.transport.Target pinImpl
FINER: MainThread: strongRef = sun.rmi.transport.DGCImpl@7cbf0dfe
Feb 09, 2023 9:50:57 AM sun.rmi.transport.DGCImpl$2 run
FINER: MainThread: add object [0:0:0, 2]
Feb 09, 2023 9:50:57 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7fff, 5531192820167141595]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7ffd, 8746623791996069441]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.ObjectTable$Reaper run   
* REAPER RUNNING
FINER: RMI Reaper: remove object [62fd29ac:1863595d6b8:-7fff, 
5531192820167141595]  * obj removed
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7ffb, 1056299151672018720]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7ff9, 8876407693706706001]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7ff7, -7965896295360386187]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7ff5, -7624739283438349379]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7ff3, 7826842570026353870]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7ff1, 2393465261706369534]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7fef, -151694713694850]
Feb 09, 2023 9:50:58 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [62fd29ac:1863595d6b8:-7fed, 2782915653950164382]
STATUS:Passed.

-

PR: https://git.openjdk.org/jdk/pull/12399


Re: RFR: 8301737: java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with -Xcomp [v3]

2023-02-08 Thread Mark Sheppard
On Tue, 7 Feb 2023 08:27:57 GMT, SUN Guoyun  wrote:

>> Hi all,
>> When -Xcomp be used, this testcase will use more codecaches, causing the GC 
>> to be triggered early, then causing this test failed on LoongArch64 
>> architecture.
>> 
>> This PR fix the issue, Please help review it.
>> 
>> Thanks.
>
> SUN Guoyun has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8301737: 
> java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with 
> -Xcomp

if this failure can be reproduced consistently, you could add the following 
properties to the run command of the test
-Dsun.rmi.runtime.logLevel=VERBOSE -Djava.rmi.server.logCalls=true 
-Dsun.rmi.server.logLevel=VERBOSE -Dsun.rmi.transport.logLevel=VERBOSE 
-Dsun.rmi.transport.tcp.logLevel=VERBOSE -Dsun.rmi.dgc.logLevel=VERBOSE
i.e.
@run testng/othervm -Dsun.rmi.runtime.logLevel=VERBOSE 
-Djava.rmi.server.logCalls=true -Dsun.rmi.server.logLevel=VERBOSE 
-Dsun.rmi.transport.logLevel=VERBOSE -Dsun.rmi.transport.tcp.logLevel=VERBOSE 
-Dsun.rmi.dgc.logLevel=VERBOSE FilterUROTes

these will produce log traces on stderr, which is recorded in the test's jtr 
file, and in the event of the failure occuring this trace might be useful in 
the diagnosis of the issue, for example, whether or not the removeTarget method 
is being invoked on the ObjectTable. You will see objects being added with 
trace such as:

FINER: MainThread: add object [0:0:0, 2]

FINER: MainThread: add object [-3d215748:18634027149:-7fff, 248225399703400139]

Feb 09, 2023 2:30:21 AM sun.rmi.transport.Transport exportObject
FINER: MainThread: add object [-3d215748:18634027149:-7ffd, 651168978636195090]

-

PR: https://git.openjdk.org/jdk/pull/12399


Re: RFR: 8301737: java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with -Xcomp [v3]

2023-02-08 Thread Mark Sheppard
On Tue, 7 Feb 2023 08:27:57 GMT, SUN Guoyun  wrote:

>> Hi all,
>> When -Xcomp be used, this testcase will use more codecaches, causing the GC 
>> to be triggered early, then causing this test failed on LoongArch64 
>> architecture.
>> 
>> This PR fix the issue, Please help review it.
>> 
>> Thanks.
>
> SUN Guoyun has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8301737: 
> java/rmi/server/UnicastRemoteObject/serialFilter/FilterUROTest.java fail with 
> -Xcomp

This change seems to be a bit dubious, it may solve your observed issue, but 
could be masking a deeper issue within the GC or the code generated as a result 
the -Xcomp vm arg, or an issue within TestNg, or possibly some obscure issue in 
RMI.
As I understand it, in simple terms, an object instance is not available for GC 
until there are no references held to it i.e. when it is not reachable and all 
references have been discarded.
In this scenario the test method, useExportObject, holds a reference to a 
RemoteImpl impl which has the scope of the try block, that should be sufficient 
for it to retain its liveness for it to be still available when the client 
invokes the fliterCount method.

So maybe extend the scope of the variable by moving the declaration of the 
variable impl outside of the try block, it is still assigned within the try 
block, but now its scope is the full duration of the test method, and not just 
that of the try block. As such, it should only be available for GC when the 
test method completes and it goes out of scope.

-

PR: https://git.openjdk.org/jdk/pull/12399


Re: RFR: 8301214: Adjust handshakeTimeout value in test HandshakeTimeout.java after 8189338 [v2]

2023-02-07 Thread Mark Sheppard
On Tue, 7 Feb 2023 09:38:13 GMT, Daniel Jeliński  wrote:

>> Please review this patch that reduces the socket timeout used in 
>> HandshakeTimeout test to its minimum value of 1 millisecond.
>> 
>> This change makes the test complete 10 seconds faster; before this change it 
>> took 5 seconds for the handshake to timeout, and the test attempts 2 
>> handshakes.
>> 
>> The change also makes the test more likely to pass when it has to compete 
>> with other tests for CPU time.
>
> Daniel Jeliński has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains three additional 
> commits since the last revision:
> 
>  - Restructure test
>  - Merge remote-tracking branch 'origin' into rmi-test-timeout
>  - Reduce socket timeout

Marked as reviewed by msheppar (Reviewer).

the restructuring is good idea. I never think of running clint and server in 
same thread, but it will eliminate any possibility of race between main thread 
and Connector thread. so job done (LGTM)

-

PR: https://git.openjdk.org/jdk/pull/12292


Re: RFR: 8301214: Adjust handshakeTimeout value in test HandshakeTimeout.java after 8189338

2023-01-30 Thread Mark Sheppard
On Mon, 30 Jan 2023 11:06:05 GMT, Daniel Jeliński  wrote:

> Please review this patch that reduces the socket timeout used in 
> HandshakeTimeout test to its minimum value of 1 millisecond.
> 
> This change makes the test complete 10 seconds faster; before this change it 
> took 5 seconds for the handshake to timeout, and the test attempts 2 
> handshakes.
> 
> The change also makes the test more likely to pass when it has to compete 
> with other tests for CPU time.

What is the motivation for reducing the timeout for the handshake? 

I think the failure in the bug hints at a few potential issues that may be 
worth a deeper look.

There are two runs of this test ... plain socket and ssl socket. The failures 
is on the second ssl scenario. 

This is unexpected and question arises why did it happen. Is it an obscure 
issue in the RMI layer, or in the SSL layer, or perhaps in the native code 
associated with "select" for the timeout on the read/write ? 

Reducing the handshake timeout for the first scenario seems ok. But reducing 
the handshake to 1 millisecond for the ssl scenario maybe masking an issue with 
the SSL timers or within the RMI layer -- the failure could be suggesting some 
deeper issues. 

What the failure suggestion is that additional diagnostics would be helpful for 
determining how the main thread has reached the no timeout condition: 
 if (connector.exception == null) { 
throw new RuntimeException( 
"TEST FAILED: remote call did not time out"); 
} else { 

Did the Connector thread actually run? Is it in the RTR state waiting to be 
scheduled when the test completed? is it still alive? should additional state 
be added to indicate that the Connector thread has started? Should the 
Connector thread and the main thread synchronise on a countDownLatch before the 
thread join ? 

Thus, the failure suggests that a bit of restructuring would be useful, to add 
a bit of Connect run state, or check the state of the Connector thread prior to 
join, and some synchronization between the Connector thread and the main thread 
before the join to ensure that the Connector thread has started. The main 
thread waits for the Connector thread to actually start.

-

PR: https://git.openjdk.org/jdk/pull/12292


Re: RFR: 8030616: sun/management/jmxremote/bootstrap/RmiBootstrapTest fails intermittently with cannot find a free port [v2]

2022-10-20 Thread Mark Sheppard
On Thu, 20 Oct 2022 10:01:19 GMT, Jaikiran Pai  wrote:

>> Can I please get a review of this test only change which proposes to fix the 
>> recent intermittent failures in `RmiBootstrapTest` reported in 
>> https://bugs.openjdk.org/browse/JDK-8030616?
>> 
>> The test has been intermittently failing with `cannot find a free port after 
>> 10 tries`.  The tests uses the `jdk.test.lib.Utils.getFreePort()` utility 
>> method to get a free port to use in the tests. That port is then used to 
>> create and bind a JMX connector server. The issue resides in the fact that 
>> the `getFreePort` utility uses loopback address to identify a free port, 
>> whereas the JMX connector server uses that identified port to bind to a 
>> non-loopback address - there's logic in `sun.rmi.transport.tcp.TCPEndpoint` 
>> line 117 which calls `InetAddress.getLocalHost()` which can and does return 
>> a non-loopback address. This effectively means that the port that was 
>> identified as free (on loopback) may not really be free on (some other 
>> address) and a subsequent attempt to bind against it by the connector server 
>> will end up with a `BindException`.
>> 
>> Data collected in failures on the CI system has shown that this is indeed 
>> the case and the port that was chosen (for loopback) as free was already 
>> used by another process on a different address. The test additionally has 
>> logic to attempt retries upto a maximum of 10 times to find a new free port 
>> on such `BindException`. Turns out the next free port (on loopback) returned 
>> by `jdk.test.lib.Utils.getFreePort()` is incremental and it so happens that 
>> the systems where this failed had a process listening on a range of 10 to 12 
>> ports, so these too ended up with `BindException` when the JMX connector 
>> server used that port for a different address.
>> 
>> The commit here makes sure that the JMX connector server in the test is now 
>> configured to loopback address as the address to bind to. That way the test 
>> uses the correct address (loopback) on which the port was free.
>> 
>> The commit also has a change to the javadoc of the test utility 
>> `jdk.test.lib.Utils.getFreePort()` to clarify that it returns a free port on 
>> loopback address. This now matches what the implementation of that method 
>> does.
>> 
>> Multiple test runs after this change hasn't yet run into the failure.
>
> Jaikiran Pai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains three additional 
> commits since the last revision:
> 
>  - Merge latest from master branch
>  - remove unintentionally committed file
>  - 8030616: sun/management/jmxremote/bootstrap/RmiBootstrapTest fails 
> intermittently with cannot find a free port

yes, I think we found the change will be benefial

-

Marked as reviewed by msheppar (Reviewer).

PR: https://git.openjdk.org/jdk/pull/10322


Re: RFR: 8030616: sun/management/jmxremote/bootstrap/RmiBootstrapTest fails intermittently with cannot find a free port

2022-09-18 Thread Mark Sheppard
On Sun, 18 Sep 2022 11:52:28 GMT, Jaikiran Pai  wrote:

> Can I please get a review of this test only change which proposes to fix the 
> recent intermittent failures in `RmiBootstrapTest` reported in 
> https://bugs.openjdk.org/browse/JDK-8030616?
> 
> The test has been intermittently failing with `cannot find a free port after 
> 10 tries`.  The tests uses the `jdk.test.lib.Utils.getFreePort()` utility 
> method to get a free port to use in the tests. That port is then used to 
> create and bind a JMX connector server. The issue resides in the fact that 
> the `getFreePort` utility uses loopback address to identify a free port, 
> whereas the JMX connector server uses that identified port to bind to a 
> non-loopback address - there's logic in `sun.rmi.transport.tcp.TCPEndpoint` 
> line 117 which calls `InetAddress.getLocalHost()` which can and does return a 
> non-loopback address. This effectively means that the port that was 
> identified as free (on loopback) may not really be free on (some other 
> address) and a subsequent attempt to bind against it by the connector server 
> will end up with a `BindException`.
> 
> Data collected in failures on the CI system has shown that this is indeed the 
> case and the port that was chosen (for loopback) as free was already used by 
> another process on a different address. The test additionally has logic to 
> attempt retries upto a maximum of 10 times to find a new free port on such 
> `BindException`. Turns out the next free port (on loopback) returned by 
> `jdk.test.lib.Utils.getFreePort()` is incremental and it so happens that the 
> systems where this failed had a process listening on a range of 10 to 12 
> ports, so these too ended up with `BindException` when the JMX connector 
> server used that port for a different address.
> 
> The commit here makes sure that the JMX connector server in the test is now 
> configured to loopback address as the address to bind to. That way the test 
> uses the correct address (loopback) on which the port was free.
> 
> The commit also has a change to the javadoc of the test utility 
> `jdk.test.lib.Utils.getFreePort()` to clarify that it returns a free port on 
> loopback address. This now matches what the implementation of that method 
> does.
> 
> Multiple test runs after this change hasn't yet run into the failure.

While the change is reasonable, I’m not sure about your rationale.
The contributing factors to the intermittent failures are due to the flawed 
ephemeral port allocation strategy, that incurs a race condition to use the 
selected  ephemeral port within the test before the OS uses it. 
So from the time that the port has been acquired, then released via the 
autoclose of the serversocket, 
and then reused in the test, it is possible that the OS may have also relocated 
that ephemeral port.

This change doesn’t alter this race condition.

The are a number of significant attributes to the port allocation strategy:
The port allocation strategy acquires an ephemeral port from the OS while 
binding it to a ServerSocket. 
The significance of ServerSocket is it surreptitiously sets SO_REUSEADDR socket 
option on the server socket (Net.socket0() == Java_sun_nio_ch_Net_socket0). The 
fact that SO_REUSEADDR is set is important in this scenario.

The objective with using SO_REUSEADDR on a serversocket is allow for efficient 
server restarts due to
a shutdown for whatever reason. With SO_REUSEADDR set the restarting server, 
which typically uses a well known port, won’t get a BindException, while the 
port may still be associated with the “terminating” server i.e. it may not have 
been fully released by the OS, for general reuse.

Thus, this has a number of favourable consequences with your proposed change, 
as the address binding is the
same as that used for acquiring the port, so it is essential a “server restart 
scenario”, albeit with an
Ephemeral port. As such, there still exists the possibility that the port will 
be reused by the OS, where
conditions of heavy ephemeral port allocation is occurring.

As such, the change may reduce the rate of intermittent failure, but it won’t 
solve the issue fully.
There is a possibility that it may increase the rate of failure as many network 
tests now use
Loopback address rather than the local host address i.e. the primary configured 
IP address.
This will depend on the load on a test system, and the amount of concurrently 
executing network tests,
with the corresponding high volume of ephemeral port allocation.

Nonetheless, it is a reasonable change, provided there is no subtle change in 
test semantics. I think it
will ameliorate the intermittent failures.

-

PR: https://git.openjdk.org/jdk/pull/10322


Re: RFR: 8290178: failure_handler: run netstat without name lookups

2022-07-13 Thread Mark Sheppard
On Tue, 12 Jul 2022 13:16:12 GMT, Daniel Jeliński  wrote:

> `netstat -av` in Mac OS X failure handler is frequently running into the 20 
> second timeout, leaving us with no socket information. This PR proposes 
> running `netstat -anv` along with the existing `netstat -av`, so that we have 
> at least some socket information if the original command times out.
> 
> `netstat -anv` does not perform reverse DNS lookups on the socket IP 
> addresses. The output contains IP addresses instead of DNS names. The command 
> usually finishes in a few milliseconds.

Marked as reviewed by msheppar (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/9469