Yes I think a timeout is what will work the best. I just want to make sure we can release any associated server resources at the same time so that this doesn't result in a resource leak in the server.
Gregg Sent from my iPhone On May 2, 2011, at 11:27 AM, "Christopher Dolan" <[email protected]> wrote: > I strongly agree with Gregg's comments about timeouts in general, but I > think this might be a special case. In the Mux.start() data flow, the > client sends an 8-byte handshake and expects the server to respond with > a similar 8-byte handshake (or error) promptly. I'm seeing indefinite > stalls in real-world cases, so I need to do something and a timeout is > the only solution I can think of. > > Chris > > -----Original Message----- > From: Gregg Wonderly [mailto:[email protected]] > Sent: Saturday, April 30, 2011 10:03 AM > To: [email protected] > Cc: Mike McGrady > Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start() > > In the history of original Java threading model, and the NIO development > to use > "select/poll" from your own thread, rather than registering call back > methods > (via an interface) kept a lot of development from using a model where > threading > was managed internally by the package or by the JVM. As a result, we > have > structures like today where notifications are less common. In this code > though, > I think the structure is internal enough that it's not necessary to > really use > Future or some other mechanisms. > > Timeouts are always a "hard way" to manage "loss of functionality" > because you > really don't know when things are "not working", only that something is > taking > longer than your timeout accounted for. Timeouts can make it possible > for more > pending work to pile up on the other end that might slow the results > even more. > E.g. you wait 30seconds and retry while the actual work on the other > end is > taking 35 seconds to get to and thus the queue rate exceeds the dequeue > rate and > things start piling up. > > If you are going to use a timeout, we need to have some sort of > indication from > both ends perspective that the attempt has been aborted, as early as > possible. > For cases where I/O traffic is written/read, that usually means closing > the > socket. I'm not sure of the ramifications of doing that, since I > haven't looked > too hard at this code. > > Gregg Wonderly > > On 4/29/2011 4:38 PM, Mike McGrady wrote: >> Throwing my two cents in here just to state my opinion. This is an > effort that >> could pay dividends if it were done with a view toward the future - > absolutely >> no pun intended. I do not know the details of the code but if > futures could be >> useful here, they would be welcomed by myself. >> >> MG >> >> >> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote: >> >>> Thanks, Tom. >>> >>> I don't really understand your Future suggestion. Are you suggesting >>> changing the async handshake to a Future? If so, that sounds like a > very >>> involved change, touching a lot of code in Mux and its subclasses. >>> >>> setDown() changes the value of the muxDown boolean to true, so it's a >>> valid way out of the loop. >>> >>> Chris >>> >>> -----Original Message----- >>> From: Tom Hobbs [mailto:[email protected]] >>> Sent: Friday, April 29, 2011 2:27 PM >>> To: [email protected] >>> Subject: Re: client hang in > com.sun.jini.jeri.internal.mux.Mux.start() >>> >>> The proposed code looks fine to me. Two points leap out, more >>> discussion >>> points than anything else. >>> >>> For some reason I've recently developed an aversion to writing my on >>> timeout >>> logic. did you consider using something like a Future here or might >>> that be >>> serious overkill (it wouldn't surprise me if it was)? >>> >>> Also is Setdown intended to break out of the while loop? Because I >>> can't >>> see a way to escape it. (I don't have the rest of the code in front > of >>> me) >>> >>> Thanks for keep raising these issues - especially because you usually >>> supply >>> fixes for them! >>> >>> Tom >>> >>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]> >>> wrote: >>>> I've experienced occasional cases where clients get stuck in the >>>> following block of code in Mux.start. Has anyone experienced this >>>> problem? I have a proposed solution below. Has anyone thought about > a >>>> similar solution already? >>>> >>>> -- Current code -- >>>> 1 asyncSendClientConnectionHeader(); >>>> 2 synchronized (muxLock) { >>>> 3 while (!muxDown&& !clientConnectionReady) { >>>> 4 try { >>>> 5 muxLock.wait(); // REMIND: timeout? >>>> 6 } catch (InterruptedException e) { >>>> 7 ... >>>> 8 } >>>> 9 } >>>> 10 if (muxDown) { >>>> 11 IOException ioe = new IOException(muxDownMessage); >>>> 12 ioe.initCause(muxDownCause); >>>> 13 throw ioe; >>>> 14 } >>>> 15 } >>>> >>>> -- Explanation of the code -- >>>> This code handles the initial client-server handshake that starts a >>> JERI >>>> connection. In line 1, the client sends its 8-byte greeting to the >>>> server. Then in the loop on lines 3-9, it waits for the server's >>>> response. If the reader thread gets a satisfactory response from the >>>> server, it sets clientConnectionReady=true and calls >>>> muxLock.notifyAll(). In all other cases (aborted connection, >>> mismatched >>>> protocol version, etc) the reader invokes Mux.setDown() which sets >>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it > throws >>> if >>>> the handshake was a failure. >>>> >>>> In my scenario (which uses simple TCP sockets, nothing fancy), the >>>> invoker thread sits on line 5 indefinitely. My problem hard to >>>> reproduce, so I haven't found out what the server is doing in this >>> case. >>>> I hope to figure that out eventually, but presently I'm interested > in >>>> the "REMIND: timeout?" comment. >>>> >>>> -- Timeout solution -- >>>> It seems obvious to me that there should be a timeout here. There > are >>>> lots of imaginable cases where the client could get stuck here: >>>> server-side deadlock, abrupt server crash, logic error in client Mux >>>> code. You'd expect that the server would either respond with its >>> 8-byte >>>> handshake very quickly or never, so a modest timeout (like 15 or 30 >>>> seconds) should be good. If that timeout is triggered, I would > expect >>>> that the code above would call Mux.setDown() and throw an > IOException. >>>> That exception would either cause a retry or be thrown up to the >>> invoker >>>> as a RemoteException. >>>> >>>> -- Proposed code (untested) -- >>>> 3 long now = System.currentTimeMillis(); >>>> 4 long endTime = now + timeoutMillis; >>>> 5 while (!muxDown&& !clientConnectionReady) { >>>> 6 if (now>= endTime) { >>>> 7 setDown("timeout waiting for server to respond >>>> to handshake", null); >>>> 8 } else { >>>> 9 try { >>>> 10 muxLock.wait(endTime - now); >>>> 11 now = System.currentTimeMillis(); >>>> 12 } catch (InterruptedException e) { >>>> 13 setDown("interrupt waiting for connection >>>> header", e); >>>> 14 } >>>> 15 } >>>> 16 } >>>> >>>> This code assumes a configurable timeoutMillis parameter has been > set >>>> earlier. >>>> >>>> I can't think of any alternative solutions. Putting the timeout in > the >>>> Reader logic seems higher risk. There's incomplete code in JERI to >>>> implement a ping packet (see Mux.asyncSendPing, never used), but > that >>>> would only be relevant after the initial handshake and wouldn't help >>>> here. >>>> >>>> Thanks, >>>> Chris >>>> >> >> Michael McGrady >> Chief Systems Architect >> Topia Technology, Inc. >> Cel 1.253.720.3365 >> Work 1.253.572.9712 extension 2037 >> Skype ID: michael.mcgrady5 >> [email protected] >> >> >> >> >
