Hi Daniel,
    fwiw  a couple of points in passing on your change,​
while noting the rationale for the change.​
​
It is suggested that the use of wildcard address (INADDR_ANY) is a contributing 
factor, but i'm​
not sure that is a correct assumption. The sockets created in the initial 
version of the test​
bind to inaddr_any and an ephemeral port. The ephemeral port is chosen by the 
OS. So for another​
test to have the same port as those in this test would suggest an issue in the 
OS port allocation​
strategy.​
This also applies to the recent tagging of 
test/jdk/java/net/DatagramSocket/ReuseAddressTest.java ​
as intermittent failure due to SO_REUSEADDR. All the DatgramSocket, in this 
test, are using INADDR_ANY and​
an ephemeral port. Other than the multicastsocket scenario, which also uses an 
ephemeral port and​
set the so_reuseaddr options, all datagramsockets use wildcard and an 
ephemeral, making an endpoint​
clash with another test unlikely.​
​
​
Just to emphasize the reuse address option is not on the IP address but rather 
on the IP address and port ​
combination, or at least that what is was meant to be. In the TCP context there 
are other restrictions, also.​
​
Your change is to the "server socket", but the client is a symmetric 
equivalent, DatagramSocket on​
wildcard and ephemeral port, so the echo send from the server could equally be 
sent astray!!​
That is, if the wildcard addressing is an issue.​
​
Looking at the overall structure of the test, is it not possible, that the 
server socket has been​
GCed, finalized and so closed, prior to the server's packet send having been 
completed within the OS, and so the​
client hangs. ​
​
So a little jackanory in the context of jtreg execution with many tests 
executing concurrently,​
it would seem possible, although may be wild conjecture, that client send 
packet to server, client executes receive and​
blocks on i/o. Server (thread) receives packet and then echoes the packet, this 
is copied from user space into the​
kernel space and placed on queue for send (send is pending), and OS send call 
returns,
 server releases socket, which is available for GC. With the heavy load on the​
test system, with hundreds, maybe thousands of threads, the server echo packet 
is pending send,​
and client hasn't received the echo packet. In the meantime GC is scheduled​
and eager beaver it reclaims the released server socket. This closes the server 
socket, which in turn drops the pending​
echo datagram, and the client continues to wait in receive. ??​
So it could be down to the load on the system, the number of concurrently 
executing threads, and whether the
GC thread executed before the server's echo packet send was completed within 
the OS  kernel.
​
​
sometimes with the intermittently failing test there is a quirkiness about the 
net config on the test system,​
it can be useful for diagnostic assistance to dump the config at the start of 
the test.​
​
in summary, the use of wildcard address (inaddr_any) and ephemeral port should 
not be an issue here in this context.​
There should be no conflicts for OS allocated ephemeral ports.​
making your change to the "server socket", should it be equally applied to the 
client datagram socket?​
​
regards​
Mark

________________________________
From: net-dev <net-dev-boun...@openjdk.java.net> on behalf of Daniel Fuchs 
<daniel.fu...@oracle.com>
Sent: Friday 9 August 2019 15:36
To: OpenJDK Network Dev list <net-dev@openjdk.java.net>
Subject: [teststabilization] RFR 8229348: 
java/net/DatagramSocket/UnreferencedDatagramSockets.java fails intermittently

Hi,

Please find below a trivial fix for:

8229348: java/net/DatagramSocket/UnreferencedDatagramSockets.java
          fails intermittently
https://bugs.openjdk.java.net/browse/JDK-8229348

webrev: http://cr.openjdk.java.net/~dfuchs/webrev_8229348/webrev.00/

This test has been observed failing intermittently in our CI.
The test failed in timeout - and there's no message saying
that the expected reply has been received or that any file
descriptor has been freed.
I suspect the test was blocked in receive() in its main() method
due to port reuse issues.

The usual fix for that is to avoid binding to the wildcard.

best regards,

-- daniel

Reply via email to