Mmm, agreed though I think to do it requires that Chris get to the root cause that is producing the need for a client timeout.
On 2 May 2011 23:17, Gregg Wonderly <[email protected]> wrote: > Yes I think a timeout is what will work the best. I just want to make sure > we can release any associated server resources at the same time so that this > doesn't result in a resource leak in the server. > > Gregg > > Sent from my iPhone > > On May 2, 2011, at 11:27 AM, "Christopher Dolan" <[email protected]> > wrote: > >> I strongly agree with Gregg's comments about timeouts in general, but I >> think this might be a special case. In the Mux.start() data flow, the >> client sends an 8-byte handshake and expects the server to respond with >> a similar 8-byte handshake (or error) promptly. I'm seeing indefinite >> stalls in real-world cases, so I need to do something and a timeout is >> the only solution I can think of. >> >> Chris >> >> -----Original Message----- >> From: Gregg Wonderly [mailto:[email protected]] >> Sent: Saturday, April 30, 2011 10:03 AM >> To: [email protected] >> Cc: Mike McGrady >> Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start() >> >> In the history of original Java threading model, and the NIO development >> to use >> "select/poll" from your own thread, rather than registering call back >> methods >> (via an interface) kept a lot of development from using a model where >> threading >> was managed internally by the package or by the JVM. As a result, we >> have >> structures like today where notifications are less common. In this code >> though, >> I think the structure is internal enough that it's not necessary to >> really use >> Future or some other mechanisms. >> >> Timeouts are always a "hard way" to manage "loss of functionality" >> because you >> really don't know when things are "not working", only that something is >> taking >> longer than your timeout accounted for. Timeouts can make it possible >> for more >> pending work to pile up on the other end that might slow the results >> even more. >> E.g. you wait 30seconds and retry while the actual work on the other >> end is >> taking 35 seconds to get to and thus the queue rate exceeds the dequeue >> rate and >> things start piling up. >> >> If you are going to use a timeout, we need to have some sort of >> indication from >> both ends perspective that the attempt has been aborted, as early as >> possible. >> For cases where I/O traffic is written/read, that usually means closing >> the >> socket. I'm not sure of the ramifications of doing that, since I >> haven't looked >> too hard at this code. >> >> Gregg Wonderly >> >> On 4/29/2011 4:38 PM, Mike McGrady wrote: >>> Throwing my two cents in here just to state my opinion. This is an >> effort that >>> could pay dividends if it were done with a view toward the future - >> absolutely >>> no pun intended. I do not know the details of the code but if >> futures could be >>> useful here, they would be welcomed by myself. >>> >>> MG >>> >>> >>> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote: >>> >>>> Thanks, Tom. >>>> >>>> I don't really understand your Future suggestion. Are you suggesting >>>> changing the async handshake to a Future? If so, that sounds like a >> very >>>> involved change, touching a lot of code in Mux and its subclasses. >>>> >>>> setDown() changes the value of the muxDown boolean to true, so it's a >>>> valid way out of the loop. >>>> >>>> Chris >>>> >>>> -----Original Message----- >>>> From: Tom Hobbs [mailto:[email protected]] >>>> Sent: Friday, April 29, 2011 2:27 PM >>>> To: [email protected] >>>> Subject: Re: client hang in >> com.sun.jini.jeri.internal.mux.Mux.start() >>>> >>>> The proposed code looks fine to me. Two points leap out, more >>>> discussion >>>> points than anything else. >>>> >>>> For some reason I've recently developed an aversion to writing my on >>>> timeout >>>> logic. did you consider using something like a Future here or might >>>> that be >>>> serious overkill (it wouldn't surprise me if it was)? >>>> >>>> Also is Setdown intended to break out of the while loop? Because I >>>> can't >>>> see a way to escape it. (I don't have the rest of the code in front >> of >>>> me) >>>> >>>> Thanks for keep raising these issues - especially because you usually >>>> supply >>>> fixes for them! >>>> >>>> Tom >>>> >>>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]> >>>> wrote: >>>>> I've experienced occasional cases where clients get stuck in the >>>>> following block of code in Mux.start. Has anyone experienced this >>>>> problem? I have a proposed solution below. Has anyone thought about >> a >>>>> similar solution already? >>>>> >>>>> -- Current code -- >>>>> 1 asyncSendClientConnectionHeader(); >>>>> 2 synchronized (muxLock) { >>>>> 3 while (!muxDown&& !clientConnectionReady) { >>>>> 4 try { >>>>> 5 muxLock.wait(); // REMIND: timeout? >>>>> 6 } catch (InterruptedException e) { >>>>> 7 ... >>>>> 8 } >>>>> 9 } >>>>> 10 if (muxDown) { >>>>> 11 IOException ioe = new IOException(muxDownMessage); >>>>> 12 ioe.initCause(muxDownCause); >>>>> 13 throw ioe; >>>>> 14 } >>>>> 15 } >>>>> >>>>> -- Explanation of the code -- >>>>> This code handles the initial client-server handshake that starts a >>>> JERI >>>>> connection. In line 1, the client sends its 8-byte greeting to the >>>>> server. Then in the loop on lines 3-9, it waits for the server's >>>>> response. If the reader thread gets a satisfactory response from the >>>>> server, it sets clientConnectionReady=true and calls >>>>> muxLock.notifyAll(). In all other cases (aborted connection, >>>> mismatched >>>>> protocol version, etc) the reader invokes Mux.setDown() which sets >>>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it >> throws >>>> if >>>>> the handshake was a failure. >>>>> >>>>> In my scenario (which uses simple TCP sockets, nothing fancy), the >>>>> invoker thread sits on line 5 indefinitely. My problem hard to >>>>> reproduce, so I haven't found out what the server is doing in this >>>> case. >>>>> I hope to figure that out eventually, but presently I'm interested >> in >>>>> the "REMIND: timeout?" comment. >>>>> >>>>> -- Timeout solution -- >>>>> It seems obvious to me that there should be a timeout here. There >> are >>>>> lots of imaginable cases where the client could get stuck here: >>>>> server-side deadlock, abrupt server crash, logic error in client Mux >>>>> code. You'd expect that the server would either respond with its >>>> 8-byte >>>>> handshake very quickly or never, so a modest timeout (like 15 or 30 >>>>> seconds) should be good. If that timeout is triggered, I would >> expect >>>>> that the code above would call Mux.setDown() and throw an >> IOException. >>>>> That exception would either cause a retry or be thrown up to the >>>> invoker >>>>> as a RemoteException. >>>>> >>>>> -- Proposed code (untested) -- >>>>> 3 long now = System.currentTimeMillis(); >>>>> 4 long endTime = now + timeoutMillis; >>>>> 5 while (!muxDown&& !clientConnectionReady) { >>>>> 6 if (now>= endTime) { >>>>> 7 setDown("timeout waiting for server to respond >>>>> to handshake", null); >>>>> 8 } else { >>>>> 9 try { >>>>> 10 muxLock.wait(endTime - now); >>>>> 11 now = System.currentTimeMillis(); >>>>> 12 } catch (InterruptedException e) { >>>>> 13 setDown("interrupt waiting for connection >>>>> header", e); >>>>> 14 } >>>>> 15 } >>>>> 16 } >>>>> >>>>> This code assumes a configurable timeoutMillis parameter has been >> set >>>>> earlier. >>>>> >>>>> I can't think of any alternative solutions. Putting the timeout in >> the >>>>> Reader logic seems higher risk. There's incomplete code in JERI to >>>>> implement a ping packet (see Mux.asyncSendPing, never used), but >> that >>>>> would only be relevant after the initial handshake and wouldn't help >>>>> here. >>>>> >>>>> Thanks, >>>>> Chris >>>>> >>> >>> Michael McGrady >>> Chief Systems Architect >>> Topia Technology, Inc. >>> Cel 1.253.720.3365 >>> Work 1.253.572.9712 extension 2037 >>> Skype ID: michael.mcgrady5 >>> [email protected] >>> >>> >>> >>> >> >
