I've experienced occasional cases where clients get stuck in the
following block of code in Mux.start. Has anyone experienced this
problem? I have a proposed solution below. Has anyone thought about a
similar solution already?

-- Current code --
1           asyncSendClientConnectionHeader();
2           synchronized (muxLock) {
3               while (!muxDown && !clientConnectionReady) {
4                   try {
5                       muxLock.wait();         // REMIND: timeout?
6                   } catch (InterruptedException e) {
7                       ...
8                   }
9               }
10              if (muxDown) {
11                  IOException ioe = new IOException(muxDownMessage);
12                  ioe.initCause(muxDownCause);
13                  throw ioe;
14              }
15          }

-- Explanation of the code --
This code handles the initial client-server handshake that starts a JERI
connection. In line 1, the client sends its 8-byte greeting to the
server. Then in the loop on lines 3-9, it waits for the server's
response. If the reader thread gets a satisfactory response from the
server, it sets clientConnectionReady=true and calls
muxLock.notifyAll(). In all other cases (aborted connection, mismatched
protocol version, etc) the reader invokes Mux.setDown() which sets
muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it throws if
the handshake was a failure.

In my scenario (which uses simple TCP sockets, nothing fancy), the
invoker thread sits on line 5 indefinitely. My problem hard to
reproduce, so I haven't found out what the server is doing in this case.
I hope to figure that out eventually, but presently I'm interested in
the "REMIND: timeout?" comment.

-- Timeout solution --
It seems obvious to me that there should be a timeout here. There are
lots of imaginable cases where the client could get stuck here:
server-side deadlock, abrupt server crash, logic error in client Mux
code. You'd expect that the server would either respond with its 8-byte
handshake very quickly or never, so a modest timeout (like 15 or 30
seconds) should be good. If that timeout is triggered, I would expect
that the code above would call Mux.setDown() and throw an IOException.
That exception would either cause a retry or be thrown up to the invoker
as a RemoteException.

-- Proposed code (untested) --
3               long now = System.currentTimeMillis();
4               long endTime = now + timeoutMillis;
5               while (!muxDown && !clientConnectionReady) {
6                   if (now >= endTime) {
7                       setDown("timeout waiting for server to respond
to handshake", null);
8                   } else {
9                       try {
10                          muxLock.wait(endTime - now);
11                          now = System.currentTimeMillis();
12                      } catch (InterruptedException e) {
13                          setDown("interrupt waiting for connection
header", e);
14                      }
15                  }
16              }

This code assumes a configurable timeoutMillis parameter has been set
earlier.

I can't think of any alternative solutions. Putting the timeout in the
Reader logic seems higher risk. There's incomplete code in JERI to
implement a ping packet (see Mux.asyncSendPing, never used), but that
would only be relevant after the initial handshake and wouldn't help
here.

Thanks,
Chris

Reply via email to