Here's a test that consistently fails with the current Mux
implementation and passes with the patch I proposed at the beginning of
this thread. In my test I explicitly pretend that the server side of the
connect has blocked. In reality, all we need to agree on is that it's
possible for the server side to block.
The proposed patch needs a little more work to make the timeout be
configurable. If so, the test can be sped up by setting that timeout to
something unrealistically short.
public class MuxStartTimeout {
@Test
public void test() throws IOException, InterruptedException {
// make fake input and output streams.
OutputStream os = new ByteArrayOutputStream();
InputStream is = new InputStream() {
@Override
public synchronized int read() throws IOException {
try {
// block indefinitely
while (true)
wait();
} catch (InterruptedException e) {
return 0;
}
}
};
final AtomicBoolean finished = new AtomicBoolean(false);
final AtomicBoolean succeeded = new AtomicBoolean(false);
final AtomicBoolean failed = new AtomicBoolean(false);
final MuxClient muxClient = new MuxClient(os, is);
try {
Thread t = new Thread(new Runnable() {
public void run() {
try {
muxClient.start();
succeeded.set(true);
} catch (IOException e) {
failed.set(true);
}
finished.set(true);
}
});
t.start();
t.join(20000);
Assert.assertTrue(finished.get());
Assert.assertFalse(succeeded.get());
Assert.assertTrue(failed.get());
if (!t.isInterrupted())
t.interrupt();
} finally {
muxClient.shutdown("end of test");
}
}
}
Chris
P.S. Amusingly, I actually compiled the test against
org.testng.annotations.Test org.testng.Assert but it should also work as
written against org.junit.Test and org.junit.Assert
-----Original Message-----
From: Patricia Shanahan [mailto:[email protected]]
Sent: Wednesday, May 04, 2011 11:24 AM
To: [email protected]
Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()
This raises a more general question that has been troubling me: What
should we do about theoretical deadlocks and similar concurrency issues
that have not been demonstrated in practice?
On the one hand, I like to have a test to show that a change really
fixed something. On the other hand, a concurrency problem can contribute
to general flakiness without ever reaching the point of being reported
as a bug or having a test that demonstrates it.
Patricia
On 5/4/2011 8:47 AM, Christopher Dolan wrote:
...
> I haven't conclusively witnessed that specific deadlock, but I've had
a
> closely related problem where another process coincidentally grabs
port
> 4160 before Reggie gets it. This happens because Win2k, WinXP and
Win2k3
> use 1024-5000 for their dynamic port range, contrary to IANA
> recommendations. I suspect the deadlock described above happens in
real
> life, but I've never gotten detailed enough logs to prove it, just
> client stack traces showing the hang in Mux.start().
...