subject:"Re\: RFR 8066708\: JMXStartStopTest fails to connect to port 38112"

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-08 Thread Stuart Marks


On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to bind to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by explicitly
forbidding the reusable mode.


Hi Jaroslav,

I happened to see this fly by, and there are (I think) some similar issues going 
on in the RMI tests.


But first I'll note that I don't think setReuseAddress() will have the effect 
that you want. Typically it's set to true before binding a socket, so that a 
subsequent bind operation will succeed even if the address/port is already in 
use. ServerSockets created with new ServerSocket(0) are already bound, and I'm 
not sure what calling setReuseAddress(false) will do on such sockets. The spec 
says behavior is undefined, but my bet is that it does nothing.


I guess it doesn't hurt to try this out to see if it makes a difference, but I 
don't have much confidence it will help.


The potential similarity to the RMI tests is exemplified by JDK-8049202 (sorry, 
this bug report isn't open) but briefly this tests the RMI registry as follows:


1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
   RMI registry port] in order to ensure that 1099 isn't in use by
   something else already;

2. If this succeeds, it immediately closes the ServerSocket.

3. Then it creates a new RMI registry on port 1099.

In principle, this should succeed, yet it fails around 10% of the time on some 
systems. The error is "port already in use". My best theory is that even though 
the socket has just been closed by a user program, the kernel has to run the 
socket through some of the socket states such as FIN_WAIT_1, FIN_WAIT_2, or 
CLOSING before the socket is actually closed and is available for reuse. If a 
program -- even the same one -- attempts to open a socket on the same port 
before the socket has reached its final state, it will get an "already in use 
error".


If this is true I don't believe that setting SO_REUSEADDR will work if the 
socket is in one of these final states. (I remember reading this somewhere but 
I'm not sure where at the moment. I can try to dig it up if there is interest.)


I admit this is just a theory and I'm open to alternatives, and I'm also open to 
hearing about ways to deal with this problem.


Could something similar be going on with this JMX test?

s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-09 Thread Jaroslav Bachorik


On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to bind
to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new
ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by explicitly
forbidding the reusable mode.


Hi Jaroslav,

I happened to see this fly by, and there are (I think) some similar
issues going on in the RMI tests.

But first I'll note that I don't think setReuseAddress() will have the
effect that you want. Typically it's set to true before binding a
socket, so that a subsequent bind operation will succeed even if the
address/port is already in use. ServerSockets created with new
ServerSocket(0) are already bound, and I'm not sure what calling
setReuseAddress(false) will do on such sockets. The spec says behavior
is undefined, but my bet is that it does nothing.

I guess it doesn't hurt to try this out to see if it makes a difference,
but I don't have much confidence it will help.

The potential similarity to the RMI tests is exemplified by JDK-8049202
(sorry, this bug report isn't open) but briefly this tests the RMI
registry as follows:

1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
RMI registry port] in order to ensure that 1099 isn't in use by
something else already;

2. If this succeeds, it immediately closes the ServerSocket.

3. Then it creates a new RMI registry on port 1099.

In principle, this should succeed, yet it fails around 10% of the time
on some systems. The error is "port already in use". My best theory is
that even though the socket has just been closed by a user program, the
kernel has to run the socket through some of the socket states such as
FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually closed
and is available for reuse. If a program -- even the same one --
attempts to open a socket on the same port before the socket has reached
its final state, it will get an "already in use error".

If this is true I don't believe that setting SO_REUSEADDR will work if
the socket is in one of these final states. (I remember reading this
somewhere but I'm not sure where at the moment. I can try to dig it up
if there is interest.)

I admit this is just a theory and I'm open to alternatives, and I'm also
open to hearing about ways to deal with this problem.

Could something similar be going on with this JMX test?


Hm, this is exactly what happened with this test :(

The problem is that the port is reported as available while it is still 
occupied and RMI registry attempts to start using that port.


If setting SO_REUSEADDR does not work then the only solution would be to 
retry the test case when this exception occurs.


-JB-



s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-11 Thread Jaroslav Bachorik


On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:

On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to bind
to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new
ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by explicitly
forbidding the reusable mode.


Hi Jaroslav,

I happened to see this fly by, and there are (I think) some similar
issues going on in the RMI tests.

But first I'll note that I don't think setReuseAddress() will have the
effect that you want. Typically it's set to true before binding a
socket, so that a subsequent bind operation will succeed even if the
address/port is already in use. ServerSockets created with new
ServerSocket(0) are already bound, and I'm not sure what calling
setReuseAddress(false) will do on such sockets. The spec says behavior
is undefined, but my bet is that it does nothing.

I guess it doesn't hurt to try this out to see if it makes a difference,
but I don't have much confidence it will help.

The potential similarity to the RMI tests is exemplified by JDK-8049202
(sorry, this bug report isn't open) but briefly this tests the RMI
registry as follows:

1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
RMI registry port] in order to ensure that 1099 isn't in use by
something else already;

2. If this succeeds, it immediately closes the ServerSocket.

3. Then it creates a new RMI registry on port 1099.

In principle, this should succeed, yet it fails around 10% of the time
on some systems. The error is "port already in use". My best theory is
that even though the socket has just been closed by a user program, the
kernel has to run the socket through some of the socket states such as
FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually closed
and is available for reuse. If a program -- even the same one --
attempts to open a socket on the same port before the socket has reached
its final state, it will get an "already in use error".

If this is true I don't believe that setting SO_REUSEADDR will work if
the socket is in one of these final states. (I remember reading this
somewhere but I'm not sure where at the moment. I can try to dig it up
if there is interest.)

I admit this is just a theory and I'm open to alternatives, and I'm also
open to hearing about ways to deal with this problem.

Could something similar be going on with this JMX test?


Hm, this is exactly what happened with this test :(

The problem is that the port is reported as available while it is still
occupied and RMI registry attempts to start using that port.

If setting SO_REUSEADDR does not work then the only solution would be to
retry the test case when this exception occurs.


Further investigation shows that the problem was rather the client 
connecting to a socket being shut down.


It sounds like setting SO_REUSEADDR to false should prevent this failure.

From the ServerSocket javadoc:
"When a TCP connection is closed the connection may remain in a timeout 
state for a period of time after the connection is closed (typically 
known as the TIME_WAIT state or 2MSL wait state). For applications using 
a well known socket address or port it may not be possible to bind a 
socket to the required SocketAddress if there is a connection in the 
timeout state involving the socket address or port."


It also turns out that the test does not close the server sockets 
properly so there might be several sockets being opened or timed out 
dangling around.


I've updated the test so it is setting SO_REUSEADDR for all the new 
ServerSockets instances + introduced the mechanism to run the test code 
while properly cleaning up any allocated ports.


http://cr.openjdk.java.net/~jbachorik/8066708/webrev.01/

-JB-



-JB-



s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-11 Thread Dmitry Samersoff

Jaroslav,

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

But there are no reliable way to predict whether you can take this port
or not after you close it.

So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.

-Dmitry


On 2014-12-11 17:06, Jaroslav Bachorik wrote:
> On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:
>> On 12/09/2014 01:39 AM, Stuart Marks wrote:
>>> On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:
 Please, review the following test change

 Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
 Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

 The test fails very intermittently when RMI registry is trying to bind
 to a port
 previously used in the test (via ServerSocket).

 This seems to be caused by the sockets created via `new
 ServerSocket(0)` and
 being in reusable mode. The fix attempts to prevent this by explicitly
 forbidding the reusable mode.
>>>
>>> Hi Jaroslav,
>>>
>>> I happened to see this fly by, and there are (I think) some similar
>>> issues going on in the RMI tests.
>>>
>>> But first I'll note that I don't think setReuseAddress() will have the
>>> effect that you want. Typically it's set to true before binding a
>>> socket, so that a subsequent bind operation will succeed even if the
>>> address/port is already in use. ServerSockets created with new
>>> ServerSocket(0) are already bound, and I'm not sure what calling
>>> setReuseAddress(false) will do on such sockets. The spec says behavior
>>> is undefined, but my bet is that it does nothing.
>>>
>>> I guess it doesn't hurt to try this out to see if it makes a difference,
>>> but I don't have much confidence it will help.
>>>
>>> The potential similarity to the RMI tests is exemplified by JDK-8049202
>>> (sorry, this bug report isn't open) but briefly this tests the RMI
>>> registry as follows:
>>>
>>> 1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
>>> RMI registry port] in order to ensure that 1099 isn't in use by
>>> something else already;
>>>
>>> 2. If this succeeds, it immediately closes the ServerSocket.
>>>
>>> 3. Then it creates a new RMI registry on port 1099.
>>>
>>> In principle, this should succeed, yet it fails around 10% of the time
>>> on some systems. The error is "port already in use". My best theory is
>>> that even though the socket has just been closed by a user program, the
>>> kernel has to run the socket through some of the socket states such as
>>> FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually closed
>>> and is available for reuse. If a program -- even the same one --
>>> attempts to open a socket on the same port before the socket has reached
>>> its final state, it will get an "already in use error".
>>>
>>> If this is true I don't believe that setting SO_REUSEADDR will work if
>>> the socket is in one of these final states. (I remember reading this
>>> somewhere but I'm not sure where at the moment. I can try to dig it up
>>> if there is interest.)
>>>
>>> I admit this is just a theory and I'm open to alternatives, and I'm also
>>> open to hearing about ways to deal with this problem.
>>>
>>> Could something similar be going on with this JMX test?
>>
>> Hm, this is exactly what happened with this test :(
>>
>> The problem is that the port is reported as available while it is still
>> occupied and RMI registry attempts to start using that port.
>>
>> If setting SO_REUSEADDR does not work then the only solution would be to
>> retry the test case when this exception occurs.
> 
> Further investigation shows that the problem was rather the client
> connecting to a socket being shut down.
> 
> It sounds like setting SO_REUSEADDR to false should prevent this failure.
> 
> From the ServerSocket javadoc:
> "When a TCP connection is closed the connection may remain in a timeout
> state for a period of time after the connection is closed (typically
> known as the TIME_WAIT state or 2MSL wait state). For applications using
> a well known socket address or port it may not be possible to bind a
> socket to the required SocketAddress if there is a connection in the
> timeout state involving the socket address or port."
> 
> It also turns out that the test does not close the server sockets
> properly so there might be several sockets being opened or timed out
> dangling around.
> 
> I've updated the test so it is setting SO_REUSEADDR for all the new
> ServerSockets instances + introduced the mechanism to run the test code
> while properly cleaning up any allocated ports.
> 
> http://cr.openjdk.java.net/~jbachorik/8066708/webrev.01/
> 
> -JB-
> 
>>
>> -JB-
>>
>>>
>>> s'marks
>>
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they w

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-11 Thread olivier.lagn...@oracle.com


Hi Jaroslav,

On 11/12/2014 15:06, Jaroslav Bachorik wrote:
Further investigation shows that the problem was rather the client 
connecting to a socket being shut down.
I remember I met this situation for an RMI fix a while ago and IIRC no 
flag setting could help (SO_REUSEADDR as well),

the port kept being unavailable.


It sounds like setting SO_REUSEADDR to false should prevent this failure.

From the ServerSocket javadoc:
"When a TCP connection is closed the connection may remain in a 
timeout state for a period of time after the connection is closed 
(typically known as the TIME_WAIT state or 2MSL wait state). For 
applications using a well known socket address or port it may not be 
possible to bind a socket to the required SocketAddress if there is a 
connection in the timeout state involving the socket address or port."


It also turns out that the test does not close the server sockets 
properly so there might be several sockets being opened or timed out 
dangling around.

I think this is the main reason why we see these intermittent failures.


I've updated the test so it is setting SO_REUSEADDR for all the new 
ServerSockets instances + introduced the mechanism to run the test 
code while properly cleaning up any allocated ports. 

Olivier.

On 11/12/2014 15:06, Jaroslav Bachorik wrote:

On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:

On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to bind
to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new
ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by explicitly
forbidding the reusable mode.


Hi Jaroslav,

I happened to see this fly by, and there are (I think) some similar
issues going on in the RMI tests.

But first I'll note that I don't think setReuseAddress() will have the
effect that you want. Typically it's set to true before binding a
socket, so that a subsequent bind operation will succeed even if the
address/port is already in use. ServerSockets created with new
ServerSocket(0) are already bound, and I'm not sure what calling
setReuseAddress(false) will do on such sockets. The spec says behavior
is undefined, but my bet is that it does nothing.

I guess it doesn't hurt to try this out to see if it makes a 
difference,

but I don't have much confidence it will help.

The potential similarity to the RMI tests is exemplified by JDK-8049202
(sorry, this bug report isn't open) but briefly this tests the RMI
registry as follows:

1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
RMI registry port] in order to ensure that 1099 isn't in use by
something else already;

2. If this succeeds, it immediately closes the ServerSocket.

3. Then it creates a new RMI registry on port 1099.

In principle, this should succeed, yet it fails around 10% of the time
on some systems. The error is "port already in use". My best theory is
that even though the socket has just been closed by a user program, the
kernel has to run the socket through some of the socket states such as
FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually closed
and is available for reuse. If a program -- even the same one --
attempts to open a socket on the same port before the socket has 
reached

its final state, it will get an "already in use error".

If this is true I don't believe that setting SO_REUSEADDR will work if
the socket is in one of these final states. (I remember reading this
somewhere but I'm not sure where at the moment. I can try to dig it up
if there is interest.)

I admit this is just a theory and I'm open to alternatives, and I'm 
also

open to hearing about ways to deal with this problem.

Could something similar be going on with this JMX test?


Hm, this is exactly what happened with this test :(

The problem is that the port is reported as available while it is still
occupied and RMI registry attempts to start using that port.

If setting SO_REUSEADDR does not work then the only solution would be to
retry the test case when this exception occurs.


Further investigation shows that the problem was rather the client 
connecting to a socket being shut down.


It sounds like setting SO_REUSEADDR to false should prevent this failure.

From the ServerSocket javadoc:
"When a TCP connection is closed the connection may remain in a 
timeout state for a period of time after the connection is closed 
(typically known as the TIME_WAIT state or 2MSL wait state). For 
applications using a well known socket address or port it may not be 
possible to bind a socket to the required SocketAddress if there is a 
connection in the timeout state involving the socket address or port."


It also t

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-11 Thread olivier.lagn...@oracle.com


Hi Dmitry,

On 11/12/2014 15:43, Dmitry Samersoff wrote:

Jaroslav,

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

SO-LINGER did not help either in my case (see my previous mail to Jaroslav).
That ended-up in using another hard-coded (supposedly free) port.
Note that was before RMI tests used randomly allocated ports.


But there are no reliable way to predict whether you can take this port
or not after you close it.

This is what I observed in my case.


So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.

IIRC think this is what is currently done in RMI tests.

Olivier.



On 2014-12-11 17:06, Jaroslav Bachorik wrote:

On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:

On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to bind
to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new
ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by explicitly
forbidding the reusable mode.

Hi Jaroslav,

I happened to see this fly by, and there are (I think) some similar
issues going on in the RMI tests.

But first I'll note that I don't think setReuseAddress() will have the
effect that you want. Typically it's set to true before binding a
socket, so that a subsequent bind operation will succeed even if the
address/port is already in use. ServerSockets created with new
ServerSocket(0) are already bound, and I'm not sure what calling
setReuseAddress(false) will do on such sockets. The spec says behavior
is undefined, but my bet is that it does nothing.

I guess it doesn't hurt to try this out to see if it makes a difference,
but I don't have much confidence it will help.

The potential similarity to the RMI tests is exemplified by JDK-8049202
(sorry, this bug report isn't open) but briefly this tests the RMI
registry as follows:

1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
 RMI registry port] in order to ensure that 1099 isn't in use by
 something else already;

2. If this succeeds, it immediately closes the ServerSocket.

3. Then it creates a new RMI registry on port 1099.

In principle, this should succeed, yet it fails around 10% of the time
on some systems. The error is "port already in use". My best theory is
that even though the socket has just been closed by a user program, the
kernel has to run the socket through some of the socket states such as
FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually closed
and is available for reuse. If a program -- even the same one --
attempts to open a socket on the same port before the socket has reached
its final state, it will get an "already in use error".

If this is true I don't believe that setting SO_REUSEADDR will work if
the socket is in one of these final states. (I remember reading this
somewhere but I'm not sure where at the moment. I can try to dig it up
if there is interest.)

I admit this is just a theory and I'm open to alternatives, and I'm also
open to hearing about ways to deal with this problem.

Could something similar be going on with this JMX test?

Hm, this is exactly what happened with this test :(

The problem is that the port is reported as available while it is still
occupied and RMI registry attempts to start using that port.

If setting SO_REUSEADDR does not work then the only solution would be to
retry the test case when this exception occurs.

Further investigation shows that the problem was rather the client
connecting to a socket being shut down.

It sounds like setting SO_REUSEADDR to false should prevent this failure.

 From the ServerSocket javadoc:
"When a TCP connection is closed the connection may remain in a timeout
state for a period of time after the connection is closed (typically
known as the TIME_WAIT state or 2MSL wait state). For applications using
a well known socket address or port it may not be possible to bind a
socket to the required SocketAddress if there is a connection in the
timeout state involving the socket address or port."

It also turns out that the test does not close the server sockets
properly so there might be several sockets being opened or timed out
dangling around.

I've updated the test so it is setting SO_REUSEADDR for all the new
ServerSockets instances + introduced the mechanism to run the test code
while properly cleaning up any allocated ports.

http://cr.openjdk.java.net/~jbachorik/8066708/webrev.01/

-JB-


-JB-


s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-11 Thread Stuart Marks




On 12/11/14 7:09 AM, olivier.lagn...@oracle.com wrote:

On 11/12/2014 15:43, Dmitry Samersoff wrote:

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

SO-LINGER did not help either in my case (see my previous mail to Jaroslav).
That ended-up in using another hard-coded (supposedly free) port.
Note that was before RMI tests used randomly allocated ports.


But there are no reliable way to predict whether you can take this port
or not after you close it.

This is what I observed in my case.


So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.

IIRC think this is what is currently done in RMI tests.


The RMI tests are still suffering from this problem, unfortunately.

The RMI test library gets a "random" port with "new ServerSocket(0)", gets the 
port number, closes the socket, then returns the port to the caller. The caller 
then assumes that it can use that port as it wishes. That's when the 
BindException can occur. There are about 10 RMI test bugs in the database that 
all seem to have this as their root cause.


There is some retry logic in RMI's test library, but that's to avoid the 
so-called "reserved ports" that specific RMI tests use, or if "new 
ServerSocket(0)" fails. It doesn't have anything to do with the BindException 
that occurs when the caller attempts to reuse the port with another socket.


My observation is also that setting SO_REUSEADDR has no effect. I haven't tried 
SO_LINGER. My hunch is that it won't have any effect, since the sockets in 
question aren't actually going into TIME_WAIT state. But I suppose it's worth a try.


I don't have any solution for this; we're still discussing the issue. I think 
the best approach would be to refactor the code so that the eventual user of the 
socket opens it up on an ephemeral port in the first place. That avoids the 
open/close/reopen business. Unfortunately that doesn't help the case where you 
want to tell another JVM to run a service on a specific port. We don't have a 
solution for that case yet.


The second-best approach (not really a solution) is to open/close a serversocket 
to get the port, sleep for a little bit, then return the port number to the 
caller. This might give the kernel a chance to clean up the socket after the 
close. Of course, this still has a race condition, but it might reduce the 
incidence of problems to an acceptable level.


I'll let you know if we come up with anything better.

s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-11 Thread Dmitry Samersoff

Stuart,

As soon as you close socket, you open a door for the race.

So you need another communication channel to pass a port number (or bind
result) between a client and a server without closing a socket on the
server side.

Typical scenario used by network related code is:

1. Server opens the socket
2. Server binds to port(0)
3. Server gets port number assigned by OS
4. Server informs client (e.g. write the port down to known file,
broadcast it etc)
5. Client establishes connection.

If the server is a blackbox and have to get a port number from outside,
scenario looks like:

WHILE(!success and !timeout)
1. Driver chooses random port number
2. Driver runs a server with this number
3. Driver checks that server is actually listening on this port
   (e.g. try to connect by it self)
WEND

4. Driver runs a client with this port number or bails out with
   descriptive error message.

-Dmitry

On 2014-12-11 20:53, Stuart Marks wrote:
> 
> 
> On 12/11/14 7:09 AM, olivier.lagn...@oracle.com wrote:
>> On 11/12/2014 15:43, Dmitry Samersoff wrote:
>>> You can set SO_LINGER to zero, in this case socket will be closed
>>> immediately without waiting in TIME_WAIT
>> SO-LINGER did not help either in my case (see my previous mail to
>> Jaroslav).
>> That ended-up in using another hard-coded (supposedly free) port.
>> Note that was before RMI tests used randomly allocated ports.
>>
>>> But there are no reliable way to predict whether you can take this port
>>> or not after you close it.
>> This is what I observed in my case.
>>>
>>> So the only valid solution is to try to connect to a random port and if
>>> this attempt fails try another random port. Everything else will cause
>>> more or less frequent intermittent failures.
>> IIRC think this is what is currently done in RMI tests.
> 
> The RMI tests are still suffering from this problem, unfortunately.
> 
> The RMI test library gets a "random" port with "new ServerSocket(0)",
> gets the port number, closes the socket, then returns the port to the
> caller. The caller then assumes that it can use that port as it wishes.
> That's when the BindException can occur. There are about 10 RMI test
> bugs in the database that all seem to have this as their root cause.
> 
> There is some retry logic in RMI's test library, but that's to avoid the
> so-called "reserved ports" that specific RMI tests use, or if "new
> ServerSocket(0)" fails. It doesn't have anything to do with the
> BindException that occurs when the caller attempts to reuse the port
> with another socket.
> 
> My observation is also that setting SO_REUSEADDR has no effect. I
> haven't tried SO_LINGER. My hunch is that it won't have any effect,
> since the sockets in question aren't actually going into TIME_WAIT
> state. But I suppose it's worth a try.
> 
> I don't have any solution for this; we're still discussing the issue. I
> think the best approach would be to refactor the code so that the
> eventual user of the socket opens it up on an ephemeral port in the
> first place. That avoids the open/close/reopen business. Unfortunately
> that doesn't help the case where you want to tell another JVM to run a
> service on a specific port. We don't have a solution for that case yet.
> 
> The second-best approach (not really a solution) is to open/close a
> serversocket to get the port, sleep for a little bit, then return the
> port number to the caller. This might give the kernel a chance to clean
> up the socket after the close. Of course, this still has a race
> condition, but it might reduce the incidence of problems to an
> acceptable level.
> 
> I'll let you know if we come up with anything better.
> 
> s'marks


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-16 Thread Stuart Marks


Hi Dmitry,

Strictly speaking you are correct. As soon as you close a socket, there is a 
possibility -- perhaps vanishingly small but nonzero -- that you might not be 
able to open it again.


The first scenario, where the user of the socket itself opens the socket using 
an ephemeral port (e.g. new ServerSocket(0)) is of course preferred. This avoids 
race conditions entirely.


It's the second case that I'm still wrestling with, and maybe Jaroslav too. It's 
fairly difficult to get such "black box" systems to open an ephemeral port and 
report it back, as opposed to opening up their service on some port number 
handed in from the outside. (For RMI, rmid is the culprit here. I don't know 
about JMX.) What makes this difficult is that the rmid service is running in a 
separate VM, so getting reliable information back from it can be difficult.


It's also fairly difficult to establish the retry logic in such cases. If the 
service fails with a BindException, maybe -- maybe -- it was because there was a 
conflict over the port, and a retry is warranted. But this needs to be 
distinguished from other failure modes that might occur, that should be reported 
as failures instead of causing a retry. In principle, this is possible to do, of 
course, it's just that it involves more restructuring of the tests, and possibly 
adding debug/test code to rmid. (It may yet come to that.)


I'm still pondering the reasons that, in the open/close/reopen scenario, why the 
reopen might fail. The obvious reason is that some other process on the system 
has opened that port between the close and the reopen. I admit that this is a 
possibility. However, with the open/close/reopen scenario in place, we see tests 
that fail up to 15% of the time with BindExceptions. This is an extraordinarily 
high failure rate to be caused by some random other process happening to open 
the same port in the few microseconds between the close and reopen. It's simply 
not believable to me.


My thinking is still that the port isn't ready for reuse until a small amount of 
time after it's closed. I have some test programs that exercise sockets in a 
particular way (e.g., from multiple threads, or opening and closing batches of 
sockets) that can reproduce the problem on some systems, and these test programs 
seem to behave better if a time delay is added between the close and the reopen. 
The exact circumstances under which the problem occurs is difficult to pin down 
and seems OS specific, and so choosing the "right" delay time is very difficult. 
But it does strengthen this conjecture in my mind.


Naturally it would be better if there were a way to determine when a port is 
available for reuse without actually opening it. I'm not aware of any such way, 
but I'm holding onto a little hope that one can be found.


s'marks



On 12/11/14 10:18 AM, Dmitry Samersoff wrote:

Stuart,

As soon as you close socket, you open a door for the race.

So you need another communication channel to pass a port number (or bind
result) between a client and a server without closing a socket on the
server side.

Typical scenario used by network related code is:

1. Server opens the socket
2. Server binds to port(0)
3. Server gets port number assigned by OS
4. Server informs client (e.g. write the port down to known file,
broadcast it etc)
5. Client establishes connection.

If the server is a blackbox and have to get a port number from outside,
scenario looks like:

WHILE(!success and !timeout)
1. Driver chooses random port number
2. Driver runs a server with this number
3. Driver checks that server is actually listening on this port
(e.g. try to connect by it self)
WEND

4. Driver runs a client with this port number or bails out with
descriptive error message.

-Dmitry

On 2014-12-11 20:53, Stuart Marks wrote:



On 12/11/14 7:09 AM, olivier.lagn...@oracle.com wrote:

On 11/12/2014 15:43, Dmitry Samersoff wrote:

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

SO-LINGER did not help either in my case (see my previous mail to
Jaroslav).
That ended-up in using another hard-coded (supposedly free) port.
Note that was before RMI tests used randomly allocated ports.


But there are no reliable way to predict whether you can take this port
or not after you close it.

This is what I observed in my case.


So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.

IIRC think this is what is currently done in RMI tests.


The RMI tests are still suffering from this problem, unfortunately.

The RMI test library gets a "random" port with "new ServerSocket(0)",
gets the port number, closes the socket, then returns the port to the
caller. The caller then assumes that it can use that port as it wishes.
That's when the BindException can occur. There are about 10 RMI test
bugs in the da

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-17 Thread Dmitry Samersoff

Stuart,

1. Ever if you set SO_LINGER to zero, socket will not be closed
immediately. see TCP shutdown sequence.

2. In a native world it's quite easy to find the port your rmi server
uses - it could be achieved by parsing /proc//net/tcp on Linux or
using special API on windows and solaris.

3. For rmid and port provided from outside I think the only reliable way
to get what you need is:

 Write a driver that:

 1. starts rmid -port 
 2. connect to this port to make sure rmid is actually started (no rmi,
just plain tcp connect)

 3. if (2) fail return to (1)
 4. run client with this port number.


4. For rmid you can also emulate inetd behaviour - i.e. driver open a
server port, communicate it to client than redirect everything that come
to this port to stdin of rmid.

-Dmitry

On 2014-12-17 01:55, Stuart Marks wrote:
> Hi Dmitry,
> 
> Strictly speaking you are correct. As soon as you close a socket, there
> is a possibility -- perhaps vanishingly small but nonzero -- that you
> might not be able to open it again.
> 
> The first scenario, where the user of the socket itself opens the socket
> using an ephemeral port (e.g. new ServerSocket(0)) is of course
> preferred. This avoids race conditions entirely.
> 
> It's the second case that I'm still wrestling with, and maybe Jaroslav
> too. It's fairly difficult to get such "black box" systems to open an
> ephemeral port and report it back, as opposed to opening up their
> service on some port number handed in from the outside. (For RMI, rmid
> is the culprit here. I don't know about JMX.) What makes this difficult
> is that the rmid service is running in a separate VM, so getting
> reliable information back from it can be difficult.
> 
> It's also fairly difficult to establish the retry logic in such cases.
> If the service fails with a BindException, maybe -- maybe -- it was
> because there was a conflict over the port, and a retry is warranted.
> But this needs to be distinguished from other failure modes that might
> occur, that should be reported as failures instead of causing a retry.
> In principle, this is possible to do, of course, it's just that it
> involves more restructuring of the tests, and possibly adding debug/test
> code to rmid. (It may yet come to that.)
> 
> I'm still pondering the reasons that, in the open/close/reopen scenario,
> why the reopen might fail. The obvious reason is that some other process
> on the system has opened that port between the close and the reopen. I
> admit that this is a possibility. However, with the open/close/reopen
> scenario in place, we see tests that fail up to 15% of the time with
> BindExceptions. This is an extraordinarily high failure rate to be
> caused by some random other process happening to open the same port in
> the few microseconds between the close and reopen. It's simply not
> believable to me.
> 
> My thinking is still that the port isn't ready for reuse until a small
> amount of time after it's closed. I have some test programs that
> exercise sockets in a particular way (e.g., from multiple threads, or
> opening and closing batches of sockets) that can reproduce the problem
> on some systems, and these test programs seem to behave better if a time
> delay is added between the close and the reopen. The exact circumstances
> under which the problem occurs is difficult to pin down and seems OS
> specific, and so choosing the "right" delay time is very difficult. But
> it does strengthen this conjecture in my mind.
> 
> Naturally it would be better if there were a way to determine when a
> port is available for reuse without actually opening it. I'm not aware
> of any such way, but I'm holding onto a little hope that one can be found.
> 
> s'marks
> 
> 
> 
> On 12/11/14 10:18 AM, Dmitry Samersoff wrote:
>> Stuart,
>>
>> As soon as you close socket, you open a door for the race.
>>
>> So you need another communication channel to pass a port number (or bind
>> result) between a client and a server without closing a socket on the
>> server side.
>>
>> Typical scenario used by network related code is:
>>
>> 1. Server opens the socket
>> 2. Server binds to port(0)
>> 3. Server gets port number assigned by OS
>> 4. Server informs client (e.g. write the port down to known file,
>> broadcast it etc)
>> 5. Client establishes connection.
>>
>> If the server is a blackbox and have to get a port number from outside,
>> scenario looks like:
>>
>> WHILE(!success and !timeout)
>> 1. Driver chooses random port number
>> 2. Driver runs a server with this number
>> 3. Driver checks that server is actually listening on this port
>> (e.g. try to connect by it self)
>> WEND
>>
>> 4. Driver runs a client with this port number or bails out with
>> descriptive error message.
>>
>> -Dmitry
>>
>> On 2014-12-11 20:53, Stuart Marks wrote:
>>>
>>>
>>> On 12/11/14 7:09 AM, olivier.lagn...@oracle.com wrote:
 On 11/12/2014 15:43, Dmitry Samersoff wrote:
> You can set SO_LINGER to zero, in this case socket wil

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-18 Thread Jaroslav Bachorik


On 12/11/2014 03:43 PM, Dmitry Samersoff wrote:

Jaroslav,

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

But there are no reliable way to predict whether you can take this port
or not after you close it.

So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.


Thanks for all the suggestions!

http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02

I've enhanced the original patch with the retry logic using different 
random port if starting the JMX agent on the provided port fails with 
BindException.


I'm keeping there the changes for properly closing the ports opened for 
the test purposes and also setting the SO_REUSEADDR - anyway, it does 
not make sense to reuse the ephemeral test ports.


I've split the original "test_06" test case in order to keep it readable 
even with the new retry logic - and also to make each test case to test 
just one scenario.


Cheers,

-JB-



-Dmitry


On 2014-12-11 17:06, Jaroslav Bachorik wrote:

On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:

On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to bind
to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new
ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by explicitly
forbidding the reusable mode.


Hi Jaroslav,

I happened to see this fly by, and there are (I think) some similar
issues going on in the RMI tests.

But first I'll note that I don't think setReuseAddress() will have the
effect that you want. Typically it's set to true before binding a
socket, so that a subsequent bind operation will succeed even if the
address/port is already in use. ServerSockets created with new
ServerSocket(0) are already bound, and I'm not sure what calling
setReuseAddress(false) will do on such sockets. The spec says behavior
is undefined, but my bet is that it does nothing.

I guess it doesn't hurt to try this out to see if it makes a difference,
but I don't have much confidence it will help.

The potential similarity to the RMI tests is exemplified by JDK-8049202
(sorry, this bug report isn't open) but briefly this tests the RMI
registry as follows:

1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
 RMI registry port] in order to ensure that 1099 isn't in use by
 something else already;

2. If this succeeds, it immediately closes the ServerSocket.

3. Then it creates a new RMI registry on port 1099.

In principle, this should succeed, yet it fails around 10% of the time
on some systems. The error is "port already in use". My best theory is
that even though the socket has just been closed by a user program, the
kernel has to run the socket through some of the socket states such as
FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually closed
and is available for reuse. If a program -- even the same one --
attempts to open a socket on the same port before the socket has reached
its final state, it will get an "already in use error".

If this is true I don't believe that setting SO_REUSEADDR will work if
the socket is in one of these final states. (I remember reading this
somewhere but I'm not sure where at the moment. I can try to dig it up
if there is interest.)

I admit this is just a theory and I'm open to alternatives, and I'm also
open to hearing about ways to deal with this problem.

Could something similar be going on with this JMX test?


Hm, this is exactly what happened with this test :(

The problem is that the port is reported as available while it is still
occupied and RMI registry attempts to start using that port.

If setting SO_REUSEADDR does not work then the only solution would be to
retry the test case when this exception occurs.


Further investigation shows that the problem was rather the client
connecting to a socket being shut down.

It sounds like setting SO_REUSEADDR to false should prevent this failure.

 From the ServerSocket javadoc:
"When a TCP connection is closed the connection may remain in a timeout
state for a period of time after the connection is closed (typically
known as the TIME_WAIT state or 2MSL wait state). For applications using
a well known socket address or port it may not be possible to bind a
socket to the required SocketAddress if there is a connection in the
timeout state involving the socket address or port."

It also turns out that the test does not close the server sockets
properly so there might be several sockets being opened or timed out
dangling around.

I've updated the test so it is setting SO_

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2014-12-19 Thread Stuart Marks


On 12/17/14 10:40 AM, Dmitry Samersoff wrote:

1. Ever if you set SO_LINGER to zero, socket will not be closed
immediately. see TCP shutdown sequence.


Which TCP shutdown sequence? The one that involves FIN_WAIT_1, FIN_WAIT_2, and 
TIME_WAIT? That's the state machine for a connected socket; the sockets in 
question here have never been connected. There is still apparently a state the 
socket goes through before the port is actually freed, but it might not have 
anything to do with TCP.


If you have references for this behavior I'd appreciate them. What I've learned 
about this issue is only through empirical observation.


By the way, more empiricism: I can reproduce the EADDRINUSE on Solaris a large 
fraction of the time, by running multiple programs (or threads) that simply 
close and reopen distinct sockets. I've also observed that setting SO_REUSEPORT 
(introduced in Solaris 11) seems to avoid this problem entirely.


Unfortunately SO_REUSEPORT isn't available from Java, it doesn't exist on all 
systems, and I don't know if it would have this same behavior on other systems 
on which it does exist.



2. In a native world it's quite easy to find the port your rmi server
uses - it could be achieved by parsing /proc//net/tcp on Linux or
using special API on windows and solaris.


I think your definition of "easy" differs from mine. :-)

The context is the jdk regression tests, does support native code. I'd prefer 
not to have to explore this new area, especially in addition to writing a bunch 
of system-specific native code.



3. For rmid and port provided from outside I think the only reliable way
to get what you need is: [...]


I think it's pretty clear at this point that the open-close-reopen approach 
can't be made reliable in any platform-independent way. For rmid I think I might 
create a new mode that opens an ephemeral port and sends that to the test driver 
somehow. Looks like Jaroslav is proceeding with a retry strategy.



4. For rmid you can also emulate inetd behaviour - i.e. driver open a
server port, communicate it to client than redirect everything that come
to this port to stdin of rmid.


Thanks, but unfortunately this is actually one of the modes that needs to be 
tested in rmid. It has a mode where it opens and listens on its own socket, and 
another mode where it inherits one from its parent process, so that it can be 
invoked from inetd.


s'marks



-Dmitry

On 2014-12-17 01:55, Stuart Marks wrote:

Hi Dmitry,

Strictly speaking you are correct. As soon as you close a socket, there
is a possibility -- perhaps vanishingly small but nonzero -- that you
might not be able to open it again.

The first scenario, where the user of the socket itself opens the socket
using an ephemeral port (e.g. new ServerSocket(0)) is of course
preferred. This avoids race conditions entirely.

It's the second case that I'm still wrestling with, and maybe Jaroslav
too. It's fairly difficult to get such "black box" systems to open an
ephemeral port and report it back, as opposed to opening up their
service on some port number handed in from the outside. (For RMI, rmid
is the culprit here. I don't know about JMX.) What makes this difficult
is that the rmid service is running in a separate VM, so getting
reliable information back from it can be difficult.

It's also fairly difficult to establish the retry logic in such cases.
If the service fails with a BindException, maybe -- maybe -- it was
because there was a conflict over the port, and a retry is warranted.
But this needs to be distinguished from other failure modes that might
occur, that should be reported as failures instead of causing a retry.
In principle, this is possible to do, of course, it's just that it
involves more restructuring of the tests, and possibly adding debug/test
code to rmid. (It may yet come to that.)

I'm still pondering the reasons that, in the open/close/reopen scenario,
why the reopen might fail. The obvious reason is that some other process
on the system has opened that port between the close and the reopen. I
admit that this is a possibility. However, with the open/close/reopen
scenario in place, we see tests that fail up to 15% of the time with
BindExceptions. This is an extraordinarily high failure rate to be
caused by some random other process happening to open the same port in
the few microseconds between the close and reopen. It's simply not
believable to me.

My thinking is still that the port isn't ready for reuse until a small
amount of time after it's closed. I have some test programs that
exercise sockets in a particular way (e.g., from multiple threads, or
opening and closing batches of sockets) that can reproduce the problem
on some systems, and these test programs seem to behave better if a time
delay is added between the close and the reopen. The exact circumstances
under which the problem occurs is difficult to pin down and seems OS
specific, and so choosing the "right" delay time is very difficult. But
it d

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-01-06 Thread Dmitry Samersoff

Jaroslav,

It might be better to just choose a random digit between 49152–65535
 and attempt to use it.

-Dmitry


On 2014-12-18 17:29, Jaroslav Bachorik wrote:
> On 12/11/2014 03:43 PM, Dmitry Samersoff wrote:
>> Jaroslav,
>>
>> You can set SO_LINGER to zero, in this case socket will be closed
>> immediately without waiting in TIME_WAIT
>>
>> But there are no reliable way to predict whether you can take this port
>> or not after you close it.
>>
>> So the only valid solution is to try to connect to a random port and if
>> this attempt fails try another random port. Everything else will cause
>> more or less frequent intermittent failures.
> 
> Thanks for all the suggestions!
> 
> http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02
> 
> I've enhanced the original patch with the retry logic using different
> random port if starting the JMX agent on the provided port fails with
> BindException.
> 
> I'm keeping there the changes for properly closing the ports opened for
> the test purposes and also setting the SO_REUSEADDR - anyway, it does
> not make sense to reuse the ephemeral test ports.
> 
> I've split the original "test_06" test case in order to keep it readable
> even with the new retry logic - and also to make each test case to test
> just one scenario.
> 
> Cheers,
> 
> -JB-
> 
>>
>> -Dmitry
>>
>>
>> On 2014-12-11 17:06, Jaroslav Bachorik wrote:
>>> On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:
 On 12/09/2014 01:39 AM, Stuart Marks wrote:
> On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:
>> Please, review the following test change
>>
>> Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
>> Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00
>>
>> The test fails very intermittently when RMI registry is trying to
>> bind
>> to a port
>> previously used in the test (via ServerSocket).
>>
>> This seems to be caused by the sockets created via `new
>> ServerSocket(0)` and
>> being in reusable mode. The fix attempts to prevent this by
>> explicitly
>> forbidding the reusable mode.
>
> Hi Jaroslav,
>
> I happened to see this fly by, and there are (I think) some similar
> issues going on in the RMI tests.
>
> But first I'll note that I don't think setReuseAddress() will have the
> effect that you want. Typically it's set to true before binding a
> socket, so that a subsequent bind operation will succeed even if the
> address/port is already in use. ServerSockets created with new
> ServerSocket(0) are already bound, and I'm not sure what calling
> setReuseAddress(false) will do on such sockets. The spec says behavior
> is undefined, but my bet is that it does nothing.
>
> I guess it doesn't hurt to try this out to see if it makes a
> difference,
> but I don't have much confidence it will help.
>
> The potential similarity to the RMI tests is exemplified by
> JDK-8049202
> (sorry, this bug report isn't open) but briefly this tests the RMI
> registry as follows:
>
> 1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
>  RMI registry port] in order to ensure that 1099 isn't in use by
>  something else already;
>
> 2. If this succeeds, it immediately closes the ServerSocket.
>
> 3. Then it creates a new RMI registry on port 1099.
>
> In principle, this should succeed, yet it fails around 10% of the time
> on some systems. The error is "port already in use". My best theory is
> that even though the socket has just been closed by a user program,
> the
> kernel has to run the socket through some of the socket states such as
> FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually
> closed
> and is available for reuse. If a program -- even the same one --
> attempts to open a socket on the same port before the socket has
> reached
> its final state, it will get an "already in use error".
>
> If this is true I don't believe that setting SO_REUSEADDR will work if
> the socket is in one of these final states. (I remember reading this
> somewhere but I'm not sure where at the moment. I can try to dig it up
> if there is interest.)
>
> I admit this is just a theory and I'm open to alternatives, and I'm
> also
> open to hearing about ways to deal with this problem.
>
> Could something similar be going on with this JMX test?

 Hm, this is exactly what happened with this test :(

 The problem is that the port is reported as available while it is still
 occupied and RMI registry attempts to start using that port.

 If setting SO_REUSEADDR does not work then the only solution would
 be to
 retry the test case when this exception occurs.
>>>
>>> Further investigation shows that the problem was rather the client
>>> connecting to a socket being shut down.
>>>
>>

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-01-08 Thread Stuart Marks


Hi Jaroslav,

I'm distant enough from this code that I don't think I'm in a position to say 
"no you can't check this in," and I'm mindful of the fact that this bug is a 
high priority and you want to get a fix in. But having said that I think there's 
a surprising amount of complexity here for what it does.


Implementing a retry-on-BindException policy seems to be fairly sensible, since 
I assume it would be fairly invasive to modify the code in the JVMs being forked 
to use an ephemeral port and send that information back to the test.


My conjecture is however that the open/close/reopen logic is the primary cause 
of the BindExceptions that occur. If you're going to retry on BindException, you 
might as well choose a random port number instead of doing the open/close to get 
a port number from the system.


The range that Dmitry suggests is reasonable, though I note that the actual 
ephemeral port range used by the kernel will differ from OS to OS and even from 
system to system. I don't know if that's really significant though. If you end 
up choosing a port outside the ephemeral range for some system, does it really 
matter?


If you do decide to have PortAllocator open and close a ServerSocket (in order 
to find a previously unused port) I'd suggest removing the logic that leaves the 
socket open until the first call to get(). That logic reduces the possibility 
that some other process will open the socket after the close but before the 
reopen. In my experience that's not what's causing the BindExceptions. It could 
still happen, though, but you're protected by the retry logic anyway. Leaving 
the socket open longer actually hurts, I think, because it increases the chance 
that the kernel won't have cleaned up the port by the time the test wants to 
reopen it.


If you change PortAllocator to close the socket immediately, you can get rid of 
the need to call release() in a finally-block of the caller. You could also 
change the type of the functional interface to be


int[] -> void

since the PortAllocator doesn't hold onto any resources that need to be cleaned 
up. It just calls the execute() method and passes an array of n port numbers.


It's probably necessary to have the socket close() call in a retry loop. The 
socket is always closed immediately from the user process point of view, so 
isClosed() will always return true immediately after the close() call returns. 
You can verify this easily by looking in the ServerSocket.java source code. I 
believe the state that prevents the port from being reused immediately is 
private to the kernel and cannot be observed from a user process, at least not 
without attempting to reopen the socket.


Side note: one of the jcmd() overloads says that parameter 'c' (a Consumer) may 
be null. It doesn't look like this is handled. If you really want to support 
this, I'd assign () -> { } to it if it's null so that it can be called 
unconditionally. (Or just disallow null.)


s'marks


On 1/6/15 2:00 PM, Dmitry Samersoff wrote:

Jaroslav,

It might be better to just choose a random digit between 49152–65535
  and attempt to use it.

-Dmitry


On 2014-12-18 17:29, Jaroslav Bachorik wrote:

On 12/11/2014 03:43 PM, Dmitry Samersoff wrote:

Jaroslav,

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

But there are no reliable way to predict whether you can take this port
or not after you close it.

So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.


Thanks for all the suggestions!

http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02

I've enhanced the original patch with the retry logic using different
random port if starting the JMX agent on the provided port fails with
BindException.

I'm keeping there the changes for properly closing the ports opened for
the test purposes and also setting the SO_REUSEADDR - anyway, it does
not make sense to reuse the ephemeral test ports.

I've split the original "test_06" test case in order to keep it readable
even with the new retry logic - and also to make each test case to test
just one scenario.

Cheers,

-JB-



-Dmitry


On 2014-12-11 17:06, Jaroslav Bachorik wrote:

On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:

On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to
bind
to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new
ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by
explicitly
forbidding the reusable mode.


Hi Jaroslav,

I happened to see this fly by, and th

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-01-09 Thread Dmitry Samersoff

Stuart,

> The range that Dmitry suggests is reasonable, though I note that the
> actual ephemeral port range used by the kernel will differ from OS to
> OS and even from system to system. I don't know if that's really
> significant though. If you end up choosing a port outside the
> ephemeral range for some system, does it really matter?

This range is assigned by IANA so it's a standard.

-Dmitry


On 2015-01-09 05:50, Stuart Marks wrote:
> Hi Jaroslav,
> 
> I'm distant enough from this code that I don't think I'm in a position
> to say "no you can't check this in," and I'm mindful of the fact that
> this bug is a high priority and you want to get a fix in. But having
> said that I think there's a surprising amount of complexity here for
> what it does.
> 
> Implementing a retry-on-BindException policy seems to be fairly
> sensible, since I assume it would be fairly invasive to modify the code
> in the JVMs being forked to use an ephemeral port and send that
> information back to the test.
> 
> My conjecture is however that the open/close/reopen logic is the primary
> cause of the BindExceptions that occur. If you're going to retry on
> BindException, you might as well choose a random port number instead of
> doing the open/close to get a port number from the system.
> 
> The range that Dmitry suggests is reasonable, though I note that the
> actual ephemeral port range used by the kernel will differ from OS to OS
> and even from system to system. I don't know if that's really
> significant though. If you end up choosing a port outside the ephemeral
> range for some system, does it really matter?
> 
> If you do decide to have PortAllocator open and close a ServerSocket (in
> order to find a previously unused port) I'd suggest removing the logic
> that leaves the socket open until the first call to get(). That logic
> reduces the possibility that some other process will open the socket
> after the close but before the reopen. In my experience that's not
> what's causing the BindExceptions. It could still happen, though, but
> you're protected by the retry logic anyway. Leaving the socket open
> longer actually hurts, I think, because it increases the chance that the
> kernel won't have cleaned up the port by the time the test wants to
> reopen it.
> 
> If you change PortAllocator to close the socket immediately, you can get
> rid of the need to call release() in a finally-block of the caller. You
> could also change the type of the functional interface to be
> 
> int[] -> void
> 
> since the PortAllocator doesn't hold onto any resources that need to be
> cleaned up. It just calls the execute() method and passes an array of n
> port numbers.
> 
> It's probably necessary to have the socket close() call in a retry loop.
> The socket is always closed immediately from the user process point of
> view, so isClosed() will always return true immediately after the
> close() call returns. You can verify this easily by looking in the
> ServerSocket.java source code. I believe the state that prevents the
> port from being reused immediately is private to the kernel and cannot
> be observed from a user process, at least not without attempting to
> reopen the socket.
> 
> Side note: one of the jcmd() overloads says that parameter 'c' (a
> Consumer) may be null. It doesn't look like this is handled. If you
> really want to support this, I'd assign () -> { } to it if it's null so
> that it can be called unconditionally. (Or just disallow null.)
> 
> s'marks
> 
> 
> On 1/6/15 2:00 PM, Dmitry Samersoff wrote:
>> Jaroslav,
>>
>> It might be better to just choose a random digit between 49152–65535
>>   and attempt to use it.
>>
>> -Dmitry
>>
>>
>> On 2014-12-18 17:29, Jaroslav Bachorik wrote:
>>> On 12/11/2014 03:43 PM, Dmitry Samersoff wrote:
 Jaroslav,

 You can set SO_LINGER to zero, in this case socket will be closed
 immediately without waiting in TIME_WAIT

 But there are no reliable way to predict whether you can take this port
 or not after you close it.

 So the only valid solution is to try to connect to a random port and if
 this attempt fails try another random port. Everything else will cause
 more or less frequent intermittent failures.
>>>
>>> Thanks for all the suggestions!
>>>
>>> http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02
>>>
>>> I've enhanced the original patch with the retry logic using different
>>> random port if starting the JMX agent on the provided port fails with
>>> BindException.
>>>
>>> I'm keeping there the changes for properly closing the ports opened for
>>> the test purposes and also setting the SO_REUSEADDR - anyway, it does
>>> not make sense to reuse the ephemeral test ports.
>>>
>>> I've split the original "test_06" test case in order to keep it readable
>>> even with the new retry logic - and also to make each test case to test
>>> just one scenario.
>>>
>>> Cheers,
>>>
>>> -JB-
>>>

 -Dmitry


 On 2014-12-11

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-01-09 Thread Jaroslav Bachorik


Thank you all for the valuable input!

On 9.1.2015 03:50, Stuart Marks wrote:

Hi Jaroslav,

I'm distant enough from this code that I don't think I'm in a position
to say "no you can't check this in," and I'm mindful of the fact that
this bug is a high priority and you want to get a fix in. But having
said that I think there's a surprising amount of complexity here for
what it does.

Implementing a retry-on-BindException policy seems to be fairly
sensible, since I assume it would be fairly invasive to modify the code
in the JVMs being forked to use an ephemeral port and send that
information back to the test.

My conjecture is however that the open/close/reopen logic is the primary
cause of the BindExceptions that occur. If you're going to retry on
BindException, you might as well choose a random port number instead of
doing the open/close to get a port number from the system.

The range that Dmitry suggests is reasonable, though I note that the
actual ephemeral port range used by the kernel will differ from OS to OS
and even from system to system. I don't know if that's really
significant though. If you end up choosing a port outside the ephemeral
range for some system, does it really matter?

If you do decide to have PortAllocator open and close a ServerSocket (in
order to find a previously unused port) I'd suggest removing the logic
that leaves the socket open until the first call to get(). That logic
reduces the possibility that some other process will open the socket
after the close but before the reopen. In my experience that's not
what's causing the BindExceptions. It could still happen, though, but
you're protected by the retry logic anyway. Leaving the socket open
longer actually hurts, I think, because it increases the chance that the
kernel won't have cleaned up the port by the time the test wants to
reopen it.

If you change PortAllocator to close the socket immediately, you can get
rid of the need to call release() in a finally-block of the caller. You
could also change the type of the functional interface to be

 int[] -> void

since the PortAllocator doesn't hold onto any resources that need to be
cleaned up. It just calls the execute() method and passes an array of n
port numbers.

It's probably necessary to have the socket close() call in a retry loop.
The socket is always closed immediately from the user process point of
view, so isClosed() will always return true immediately after the
close() call returns. You can verify this easily by looking in the
ServerSocket.java source code. I believe the state that prevents the
port from being reused immediately is private to the kernel and cannot
be observed from a user process, at least not without attempting to
reopen the socket.

Side note: one of the jcmd() overloads says that parameter 'c' (a
Consumer) may be null. It doesn't look like this is handled. If you
really want to support this, I'd assign () -> { } to it if it's null so
that it can be called unconditionally. (Or just disallow null.)


I've changed the PortAllocator to allocate an array of unique random 
ports instead of letting ServerSocket to take care of it.


I ran the test 200x in a tight loop without a failure.

I hope this will resolve this test's intermittent failures due to port 
conflicts once and for all.


Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.03

Thanks,

-JB-




s'marks


On 1/6/15 2:00 PM, Dmitry Samersoff wrote:

Jaroslav,

It might be better to just choose a random digit between 49152–65535
  and attempt to use it.

-Dmitry


On 2014-12-18 17:29, Jaroslav Bachorik wrote:

On 12/11/2014 03:43 PM, Dmitry Samersoff wrote:

Jaroslav,

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

But there are no reliable way to predict whether you can take this port
or not after you close it.

So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.


Thanks for all the suggestions!

http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02

I've enhanced the original patch with the retry logic using different
random port if starting the JMX agent on the provided port fails with
BindException.

I'm keeping there the changes for properly closing the ports opened for
the test purposes and also setting the SO_REUSEADDR - anyway, it does
not make sense to reuse the ephemeral test ports.

I've split the original "test_06" test case in order to keep it readable
even with the new retry logic - and also to make each test case to test
just one scenario.

Cheers,

-JB-



-Dmitry


On 2014-12-11 17:06, Jaroslav Bachorik wrote:

On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:

On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-01-14 Thread Stuart Marks


On 1/9/15 4:17 AM, Jaroslav Bachorik wrote:

I've changed the PortAllocator to allocate an array of unique random ports
instead of letting ServerSocket to take care of it.

I ran the test 200x in a tight loop without a failure.

I hope this will resolve this test's intermittent failures due to port conflicts
once and for all.

Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.03


Hi Jaroslav,

Good to hear that the test seems to be running more reliably. (I'm assuming that 
you'd see failures before if you ran it 200x in a tight loop.) This is probably 
because you're avoiding the open/close/reopen approach that we've now thoroughly 
discredited. :-)


That said, it still looks to me like the code is more complex than it needs to 
be. You're the one who's going to be maintaining it, but if it were me, I'd put 
more effort into simplifying it. Here are a few approaches I'd suggest.


1) It looks like occupyPort() is used only once. This is I think the only place 
in PortAllocator that actually opens a socket. It's used in test_09 where it 
looks like the intent is to keep a port busy by opening a socket, and then 
making sure that the JMX stuff in the child process does the right thing when it 
encounters a busy port.


Since this is the only place that needs a real socket, you might as well move 
the ServerSocket creation stuff out of PortAllocator and create it directly here 
with new ServerSocket(0). This should never fail with a BindException, so you 
needn't worry about retries. Then get the local port, and pass it to the 
subprocess. You should put this within a try-with-resources in test_09 so that 
the socket will be closed properly.


2) If you do this, then PortAllocator gets a lot simpler. There's no need to 
keep a collection of sockets, so release() can go away, and the finally-block of 
withAllocatedPorts can go away too.


3) Now PortAllocator's only instance state is the array of random port numbers. 
But once this is generated, the only reason PortAllocator stays around is to 
host the getter for array elements; basically it's just a wrapper for the array. 
And the reset() method regenerates the array. The essence of this is now just a 
function that returns an array of N random port numbers. You can pass the array 
directly to the Task and it can just use the ports from the array, instead of 
calling a getter. If there's a BindException and a "reset" needs to be done, 
this is just another call to the function to generate another array. So there's 
really no longer a need to have PortAllocator instances.


These is a bit farther afield from this particular change, but there are some 
other opportunities for simplification:


4) Each of the test_NN methods consists entirely of a println followed by a 
withAllocatedPorts() call which is passed a long multi-line lambda. (In my book, 
multi-line lambdas are a bit of a code smell.) The withAllocatedPorts() method 
essentially implements the retry-on-BindException policy. Since each test_NN 
method is invoked reflectively by a mini-framework in the test, you could merge 
that logic into the mini-framework. In turn, each test method would then call 
the random port generator method and get an array of the requested number of 
ports, and just use them. This would removing a level of nesting and a few lines 
of vertical space from every test method.


5) Most (but not all) of the tests call doSomething and then follow it with a 
try/finally block that calls stop(). It seems like this commonality could be 
extracted somehow, but it eludes me at the moment.


6) The internal jcmd() methods all have a void return value, but some of them 
take a Consumer, which is usually passed a lambda argument that performs some 
test and then sets an AtomicBoolean as a side effect. (This is another code 
smell. Sometimes it's necessary, but only sometimes.) This parameter really 
wants to be a Predicate. The jcmd() method can just call the predicate and keep 
track of the results, and return a final result boolean that, for example, 
indicates whether the predicate had ever returend true. I'm not sure that's 
exactly the semantic you want. But a preferable idiom is to return a value 
instead of calling a lambda that performs side effects on captured locals.


--

This is quite a bit of stuff, and I don't necessarily expect you to fix it all. 
At least not in this changeset. But I've found that investing in refactoring of 
test code usually pays off, as it makes it easier to maintain the tests when 
you're forced to make changes to them again in six months.


s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-02-02 Thread Jaroslav Bachorik


Hi Stuart,

On 15.1.2015 02:14, Stuart Marks wrote:

On 1/9/15 4:17 AM, Jaroslav Bachorik wrote:

I've changed the PortAllocator to allocate an array of unique random
ports
instead of letting ServerSocket to take care of it.

I ran the test 200x in a tight loop without a failure.

I hope this will resolve this test's intermittent failures due to port
conflicts
once and for all.

Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.03


Hi Jaroslav,

Good to hear that the test seems to be running more reliably. (I'm
assuming that you'd see failures before if you ran it 200x in a tight
loop.) This is probably because you're avoiding the open/close/reopen
approach that we've now thoroughly discredited. :-)

That said, it still looks to me like the code is more complex than it
needs to be. You're the one who's going to be maintaining it, but if it
were me, I'd put more effort into simplifying it. Here are a few
approaches I'd suggest.

1) It looks like occupyPort() is used only once. This is I think the
only place in PortAllocator that actually opens a socket. It's used in
test_09 where it looks like the intent is to keep a port busy by opening
a socket, and then making sure that the JMX stuff in the child process
does the right thing when it encounters a busy port.

Since this is the only place that needs a real socket, you might as well
move the ServerSocket creation stuff out of PortAllocator and create it
directly here with new ServerSocket(0). This should never fail with a
BindException, so you needn't worry about retries. Then get the local
port, and pass it to the subprocess. You should put this within a
try-with-resources in test_09 so that the socket will be closed properly.

2) If you do this, then PortAllocator gets a lot simpler. There's no
need to keep a collection of sockets, so release() can go away, and the
finally-block of withAllocatedPorts can go away too.

3) Now PortAllocator's only instance state is the array of random port
numbers. But once this is generated, the only reason PortAllocator stays
around is to host the getter for array elements; basically it's just a
wrapper for the array. And the reset() method regenerates the array. The
essence of this is now just a function that returns an array of N random
port numbers. You can pass the array directly to the Task and it can
just use the ports from the array, instead of calling a getter. If
there's a BindException and a "reset" needs to be done, this is just
another call to the function to generate another array. So there's
really no longer a need to have PortAllocator instances.

These is a bit farther afield from this particular change, but there are
some other opportunities for simplification:

4) Each of the test_NN methods consists entirely of a println followed
by a withAllocatedPorts() call which is passed a long multi-line lambda.
(In my book, multi-line lambdas are a bit of a code smell.) The
withAllocatedPorts() method essentially implements the
retry-on-BindException policy. Since each test_NN method is invoked
reflectively by a mini-framework in the test, you could merge that logic
into the mini-framework. In turn, each test method would then call the
random port generator method and get an array of the requested number of
ports, and just use them. This would removing a level of nesting and a
few lines of vertical space from every test method.

5) Most (but not all) of the tests call doSomething and then follow it
with a try/finally block that calls stop(). It seems like this
commonality could be extracted somehow, but it eludes me at the moment.

6) The internal jcmd() methods all have a void return value, but some of
them take a Consumer, which is usually passed a lambda argument that
performs some test and then sets an AtomicBoolean as a side effect.
(This is another code smell. Sometimes it's necessary, but only
sometimes.) This parameter really wants to be a Predicate. The jcmd()
method can just call the predicate and keep track of the results, and
return a final result boolean that, for example, indicates whether the
predicate had ever returend true. I'm not sure that's exactly the
semantic you want. But a preferable idiom is to return a value instead
of calling a lambda that performs side effects on captured locals.



I've applied your comments and the code is a tad simpler now. I also 
ironed out a few more corner cases. I ran the test 500x in a tight loop 
and no failure, yay!


Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.04

-JB-

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-02-02 Thread Stuart Marks




On 2/2/15 2:33 AM, Jaroslav Bachorik wrote:

I've applied your comments and the code is a tad simpler now. I also ironed out
a few more corner cases. I ran the test 500x in a tight loop and no failure, 
yay!

Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.04


Hi Jaroslav,

Looks quite a bit more straightforward now. I'm pretty much OK with this if 
you're OK with it; I think I've reviewed it enough times already. :-)


I have a couple comments on test_09 that you might want to address before you 
push or sometime in the future.


Line 714 typo "hugging" => "hogging"

But I'm not convinced this retry logic in the while-loop from lines 716-723 is 
necessary. If you've already opened a server socket on a port, I've never seen a 
case where opening the same port *again* will succeed, so why bother?


I'd suggest simply opening ServerSocket(0) and then getting the port via 
getLocalPort(). I've never seen this fail. This case should work since you 
actually want to open up some random port, instead of generating a random port 
number for somebody else (the subprocess) to open. You can then allocate one 
fewer random port. You might want to have a little loop to check that the random 
port number isn't a duplicate of the actual port that you just opened. I think 
this lets the code boil down a bit further:


ServerSocket ss = new ServerSocket(0);
try {
int localPort = ss.getLocalPort();
int[] ports;
do {
ports = PortAllocator.allocatePorts(1);
} while (localPort != ports[0]);

AtomicBoolean checks = new AtomicBoolean(false);
...
} finally {
ss.close();
s.stop();
}


s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-02-03 Thread Jaroslav Bachorik


On 2.2.2015 23:09, Stuart Marks wrote:



On 2/2/15 2:33 AM, Jaroslav Bachorik wrote:

I've applied your comments and the code is a tad simpler now. I also
ironed out
a few more corner cases. I ran the test 500x in a tight loop and no
failure, yay!

Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.04


Hi Jaroslav,

Looks quite a bit more straightforward now. I'm pretty much OK with this
if you're OK with it; I think I've reviewed it enough times already. :-)

I have a couple comments on test_09 that you might want to address
before you push or sometime in the future.

Line 714 typo "hugging" => "hogging"

D'oh. Thanks.



But I'm not convinced this retry logic in the while-loop from lines
716-723 is necessary. If you've already opened a server socket on a
port, I've never seen a case where opening the same port *again* will
succeed, so why bother?


I saw the test failing without this retry logic. The purpose is to make 
sure that the port is really taken before the test continues.




I'd suggest simply opening ServerSocket(0) and then getting the port via
getLocalPort(). I've never seen this fail. This case should work since
you actually want to open up some random port, instead of generating a
random port number for somebody else (the subprocess) to open. You can
then allocate one fewer random port. You might want to have a little
loop to check that the random port number isn't a duplicate of the
actual port that you just opened. I think this lets the code boil down a
bit further:

 ServerSocket ss = new ServerSocket(0);
 try {
 int localPort = ss.getLocalPort();
 int[] ports;
 do {
 ports = PortAllocator.allocatePorts(1);
 } while (localPort != ports[0]);

 AtomicBoolean checks = new AtomicBoolean(false);
 ...
 } finally {
 ss.close();
 s.stop();
 }


I'll give it a try and see how this behaves.

Thanks,

-JB-




s'marks

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-02-03 Thread Dmitry Samersoff

Jaroslav,

Looks good for me!

> I've changed the PortAllocator to allocate an array of unique random
> ports instead of letting ServerSocket to take care of it.
>
> I ran the test 200x in a tight loop without a failure.
>
> I hope this will resolve this test's intermittent failures due to port
> conflicts once and for all.
>
> Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.03
>

-Dmitry


On 2015-01-09 15:17, Jaroslav Bachorik wrote:
> Thank you all for the valuable input!
> 
> On 9.1.2015 03:50, Stuart Marks wrote:
>> Hi Jaroslav,
>>
>> I'm distant enough from this code that I don't think I'm in a position
>> to say "no you can't check this in," and I'm mindful of the fact that
>> this bug is a high priority and you want to get a fix in. But having
>> said that I think there's a surprising amount of complexity here for
>> what it does.
>>
>> Implementing a retry-on-BindException policy seems to be fairly
>> sensible, since I assume it would be fairly invasive to modify the code
>> in the JVMs being forked to use an ephemeral port and send that
>> information back to the test.
>>
>> My conjecture is however that the open/close/reopen logic is the primary
>> cause of the BindExceptions that occur. If you're going to retry on
>> BindException, you might as well choose a random port number instead of
>> doing the open/close to get a port number from the system.
>>
>> The range that Dmitry suggests is reasonable, though I note that the
>> actual ephemeral port range used by the kernel will differ from OS to OS
>> and even from system to system. I don't know if that's really
>> significant though. If you end up choosing a port outside the ephemeral
>> range for some system, does it really matter?
>>
>> If you do decide to have PortAllocator open and close a ServerSocket (in
>> order to find a previously unused port) I'd suggest removing the logic
>> that leaves the socket open until the first call to get(). That logic
>> reduces the possibility that some other process will open the socket
>> after the close but before the reopen. In my experience that's not
>> what's causing the BindExceptions. It could still happen, though, but
>> you're protected by the retry logic anyway. Leaving the socket open
>> longer actually hurts, I think, because it increases the chance that the
>> kernel won't have cleaned up the port by the time the test wants to
>> reopen it.
>>
>> If you change PortAllocator to close the socket immediately, you can get
>> rid of the need to call release() in a finally-block of the caller. You
>> could also change the type of the functional interface to be
>>
>>  int[] -> void
>>
>> since the PortAllocator doesn't hold onto any resources that need to be
>> cleaned up. It just calls the execute() method and passes an array of n
>> port numbers.
>>
>> It's probably necessary to have the socket close() call in a retry loop.
>> The socket is always closed immediately from the user process point of
>> view, so isClosed() will always return true immediately after the
>> close() call returns. You can verify this easily by looking in the
>> ServerSocket.java source code. I believe the state that prevents the
>> port from being reused immediately is private to the kernel and cannot
>> be observed from a user process, at least not without attempting to
>> reopen the socket.
>>
>> Side note: one of the jcmd() overloads says that parameter 'c' (a
>> Consumer) may be null. It doesn't look like this is handled. If you
>> really want to support this, I'd assign () -> { } to it if it's null so
>> that it can be called unconditionally. (Or just disallow null.)
> 
> I've changed the PortAllocator to allocate an array of unique random
> ports instead of letting ServerSocket to take care of it.
> 
> I ran the test 200x in a tight loop without a failure.
> 
> I hope this will resolve this test's intermittent failures due to port
> conflicts once and for all.
> 
> Update: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.03
> 
> Thanks,
> 
> -JB-
> 
> 
>>
>> s'marks
>>
>>
>> On 1/6/15 2:00 PM, Dmitry Samersoff wrote:
>>> Jaroslav,
>>>
>>> It might be better to just choose a random digit between 49152–65535
>>>   and attempt to use it.
>>>
>>> -Dmitry
>>>
>>>
>>> On 2014-12-18 17:29, Jaroslav Bachorik wrote:
 On 12/11/2014 03:43 PM, Dmitry Samersoff wrote:
> Jaroslav,
>
> You can set SO_LINGER to zero, in this case socket will be closed
> immediately without waiting in TIME_WAIT
>
> But there are no reliable way to predict whether you can take this
> port
> or not after you close it.
>
> So the only valid solution is to try to connect to a random port
> and if
> this attempt fails try another random port. Everything else will cause
> more or less frequent intermittent failures.

 Thanks for all the suggestions!

 http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02

 I've enhanced the original patch with the retry lo

[ping] Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

2015-01-06 Thread Jaroslav Bachorik


On 18.12.2014 15:29, Jaroslav Bachorik wrote:

On 12/11/2014 03:43 PM, Dmitry Samersoff wrote:

Jaroslav,

You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT

But there are no reliable way to predict whether you can take this port
or not after you close it.

So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.


Thanks for all the suggestions!

http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02

I've enhanced the original patch with the retry logic using different
random port if starting the JMX agent on the provided port fails with
BindException.

I'm keeping there the changes for properly closing the ports opened for
the test purposes and also setting the SO_REUSEADDR - anyway, it does
not make sense to reuse the ephemeral test ports.

I've split the original "test_06" test case in order to keep it readable
even with the new retry logic - and also to make each test case to test
just one scenario.

Cheers,

-JB-



-Dmitry


On 2014-12-11 17:06, Jaroslav Bachorik wrote:

On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote:

On 12/09/2014 01:39 AM, Stuart Marks wrote:

On 12/8/14 12:35 PM, Jaroslav Bachorik wrote:

Please, review the following test change

Issue : https://bugs.openjdk.java.net/browse/JDK-8066708
Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00

The test fails very intermittently when RMI registry is trying to
bind
to a port
previously used in the test (via ServerSocket).

This seems to be caused by the sockets created via `new
ServerSocket(0)` and
being in reusable mode. The fix attempts to prevent this by
explicitly
forbidding the reusable mode.


Hi Jaroslav,

I happened to see this fly by, and there are (I think) some similar
issues going on in the RMI tests.

But first I'll note that I don't think setReuseAddress() will have the
effect that you want. Typically it's set to true before binding a
socket, so that a subsequent bind operation will succeed even if the
address/port is already in use. ServerSockets created with new
ServerSocket(0) are already bound, and I'm not sure what calling
setReuseAddress(false) will do on such sockets. The spec says behavior
is undefined, but my bet is that it does nothing.

I guess it doesn't hurt to try this out to see if it makes a
difference,
but I don't have much confidence it will help.

The potential similarity to the RMI tests is exemplified by
JDK-8049202
(sorry, this bug report isn't open) but briefly this tests the RMI
registry as follows:

1. Opens port 1099 using new ServerSocket(1099) [1099 is the default
 RMI registry port] in order to ensure that 1099 isn't in use by
 something else already;

2. If this succeeds, it immediately closes the ServerSocket.

3. Then it creates a new RMI registry on port 1099.

In principle, this should succeed, yet it fails around 10% of the time
on some systems. The error is "port already in use". My best theory is
that even though the socket has just been closed by a user program,
the
kernel has to run the socket through some of the socket states such as
FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually
closed
and is available for reuse. If a program -- even the same one --
attempts to open a socket on the same port before the socket has
reached
its final state, it will get an "already in use error".

If this is true I don't believe that setting SO_REUSEADDR will work if
the socket is in one of these final states. (I remember reading this
somewhere but I'm not sure where at the moment. I can try to dig it up
if there is interest.)

I admit this is just a theory and I'm open to alternatives, and I'm
also
open to hearing about ways to deal with this problem.

Could something similar be going on with this JMX test?


Hm, this is exactly what happened with this test :(

The problem is that the port is reported as available while it is still
occupied and RMI registry attempts to start using that port.

If setting SO_REUSEADDR does not work then the only solution would
be to
retry the test case when this exception occurs.


Further investigation shows that the problem was rather the client
connecting to a socket being shut down.

It sounds like setting SO_REUSEADDR to false should prevent this
failure.

 From the ServerSocket javadoc:
"When a TCP connection is closed the connection may remain in a timeout
state for a period of time after the connection is closed (typically
known as the TIME_WAIT state or 2MSL wait state). For applications using
a well known socket address or port it may not be possible to bind a
socket to the required SocketAddress if there is a connection in the
timeout state involving the socket address or port."

It also turns out that the test does not close the server sockets
properly so there might be several sockets being opened or timed out
dangling around.

I've

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

[ping] Re: RFR 8066708: JMXStartStopTest fails to connect to port 38112

22 matches

Site Navigation

Mail list logo

Footer information