Dan raises a good point, which is probably of higher importance than the one I did.
My only comment about the use of a timeout was questioning whether we could start to use some Java 5 timeout stuff. You guys have been looking at that bit of the code more than me, so if the answer is "no" then I can accept that. On 3 May 2011 08:29, "Dan Creswell" <[email protected]> wrote: > Mmm, agreed though I think to do it requires that Chris get to the > root cause that is producing the need for a client timeout. > > On 2 May 2011 23:17, Gregg Wonderly <[email protected]> wrote: >> Yes I think a timeout is what will work the best. I just want to make sure we can release any associated server resources at the same time so that this doesn't result in a resource leak in the server. >> >> Gregg >> >> Sent from my iPhone >> >> On May 2, 2011, at 11:27 AM, "Christopher Dolan" < [email protected]> wrote: >> >>> I strongly agree with Gregg's comments about timeouts in general, but I >>> think this might be a special case. In the Mux.start() data flow, the >>> client sends an 8-byte handshake and expects the server to respond with >>> a similar 8-byte handshake (or error) promptly. I'm seeing indefinite >>> stalls in real-world cases, so I need to do something and a timeout is >>> the only solution I can think of. >>> >>> Chris >>> >>> -----Original Message----- >>> From: Gregg Wonderly [mailto:[email protected]] >>> Sent: Saturday, April 30, 2011 10:03 AM >>> To: [email protected] >>> Cc: Mike McGrady >>> Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start() >>> >>> In the history of original Java threading model, and the NIO development >>> to use >>> "select/poll" from your own thread, rather than registering call back >>> methods >>> (via an interface) kept a lot of development from using a model where >>> threading >>> was managed internally by the package or by the JVM. As a result, we >>> have >>> structures like today where notifications are less common. In this code >>> though, >>> I think the structure is internal enough that it's not necessary to >>> really use >>> Future or some other mechanisms. >>> >>> Timeouts are always a "hard way" to manage "loss of functionality" >>> because you >>> really don't know when things are "not working", only that something is >>> taking >>> longer than your timeout accounted for. Timeouts can make it possible >>> for more >>> pending work to pile up on the other end that might slow the results >>> even more. >>> E.g. you wait 30seconds and retry while the actual work on the other >>> end is >>> taking 35 seconds to get to and thus the queue rate exceeds the dequeue >>> rate and >>> things start piling up. >>> >>> If you are going to use a timeout, we need to have some sort of >>> indication from >>> both ends perspective that the attempt has been aborted, as early as >>> possible. >>> For cases where I/O traffic is written/read, that usually means closing >>> the >>> socket. I'm not sure of the ramifications of doing that, since I >>> haven't looked >>> too hard at this code. >>> >>> Gregg Wonderly >>> >>> On 4/29/2011 4:38 PM, Mike McGrady wrote: >>>> Throwing my two cents in here just to state my opinion. This is an >>> effort that >>>> could pay dividends if it were done with a view toward the future - >>> absolutely >>>> no pun intended. I do not know the details of the code but if >>> futures could be >>>> useful here, they would be welcomed by myself. >>>> >>>> MG >>>> >>>> >>>> On Apr 29, 2011, at 1:33 PM, Christopher Dolan wrote: >>>> >>>>> Thanks, Tom. >>>>> >>>>> I don't really understand your Future suggestion. Are you suggesting >>>>> changing the async handshake to a Future? If so, that sounds like a >>> very >>>>> involved change, touching a lot of code in Mux and its subclasses. >>>>> >>>>> setDown() changes the value of the muxDown boolean to true, so it's a >>>>> valid way out of the loop. >>>>> >>>>> Chris >>>>> >>>>> -----Original Message----- >>>>> From: Tom Hobbs [mailto:[email protected]] >>>>> Sent: Friday, April 29, 2011 2:27 PM >>>>> To: [email protected] >>>>> Subject: Re: client hang in >>> com.sun.jini.jeri.internal.mux.Mux.start() >>>>> >>>>> The proposed code looks fine to me. Two points leap out, more >>>>> discussion >>>>> points than anything else. >>>>> >>>>> For some reason I've recently developed an aversion to writing my on >>>>> timeout >>>>> logic. did you consider using something like a Future here or might >>>>> that be >>>>> serious overkill (it wouldn't surprise me if it was)? >>>>> >>>>> Also is Setdown intended to break out of the while loop? Because I >>>>> can't >>>>> see a way to escape it. (I don't have the rest of the code in front >>> of >>>>> me) >>>>> >>>>> Thanks for keep raising these issues - especially because you usually >>>>> supply >>>>> fixes for them! >>>>> >>>>> Tom >>>>> >>>>> On 29 Apr 2011 19:41, "Christopher Dolan"<[email protected]> >>>>> wrote: >>>>>> I've experienced occasional cases where clients get stuck in the >>>>>> following block of code in Mux.start. Has anyone experienced this >>>>>> problem? I have a proposed solution below. Has anyone thought about >>> a >>>>>> similar solution already? >>>>>> >>>>>> -- Current code -- >>>>>> 1 asyncSendClientConnectionHeader(); >>>>>> 2 synchronized (muxLock) { >>>>>> 3 while (!muxDown&& !clientConnectionReady) { >>>>>> 4 try { >>>>>> 5 muxLock.wait(); // REMIND: timeout? >>>>>> 6 } catch (InterruptedException e) { >>>>>> 7 ... >>>>>> 8 } >>>>>> 9 } >>>>>> 10 if (muxDown) { >>>>>> 11 IOException ioe = new IOException(muxDownMessage); >>>>>> 12 ioe.initCause(muxDownCause); >>>>>> 13 throw ioe; >>>>>> 14 } >>>>>> 15 } >>>>>> >>>>>> -- Explanation of the code -- >>>>>> This code handles the initial client-server handshake that starts a >>>>> JERI >>>>>> connection. In line 1, the client sends its 8-byte greeting to the >>>>>> server. Then in the loop on lines 3-9, it waits for the server's >>>>>> response. If the reader thread gets a satisfactory response from the >>>>>> server, it sets clientConnectionReady=true and calls >>>>>> muxLock.notifyAll(). In all other cases (aborted connection, >>>>> mismatched >>>>>> protocol version, etc) the reader invokes Mux.setDown() which sets >>>>>> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it >>> throws >>>>> if >>>>>> the handshake was a failure. >>>>>> >>>>>> In my scenario (which uses simple TCP sockets, nothing fancy), the >>>>>> invoker thread sits on line 5 indefinitely. My problem hard to >>>>>> reproduce, so I haven't found out what the server is doing in this >>>>> case. >>>>>> I hope to figure that out eventually, but presently I'm interested >>> in >>>>>> the "REMIND: timeout?" comment. >>>>>> >>>>>> -- Timeout solution -- >>>>>> It seems obvious to me that there should be a timeout here. There >>> are >>>>>> lots of imaginable cases where the client could get stuck here: >>>>>> server-side deadlock, abrupt server crash, logic error in client Mux >>>>>> code. You'd expect that the server would either respond with its >>>>> 8-byte >>>>>> handshake very quickly or never, so a modest timeout (like 15 or 30 >>>>>> seconds) should be good. If that timeout is triggered, I would >>> expect >>>>>> that the code above would call Mux.setDown() and throw an >>> IOException. >>>>>> That exception would either cause a retry or be thrown up to the >>>>> invoker >>>>>> as a RemoteException. >>>>>> >>>>>> -- Proposed code (untested) -- >>>>>> 3 long now = System.currentTimeMillis(); >>>>>> 4 long endTime = now + timeoutMillis; >>>>>> 5 while (!muxDown&& !clientConnectionReady) { >>>>>> 6 if (now>= endTime) { >>>>>> 7 setDown("timeout waiting for server to respond >>>>>> to handshake", null); >>>>>> 8 } else { >>>>>> 9 try { >>>>>> 10 muxLock.wait(endTime - now); >>>>>> 11 now = System.currentTimeMillis(); >>>>>> 12 } catch (InterruptedException e) { >>>>>> 13 setDown("interrupt waiting for connection >>>>>> header", e); >>>>>> 14 } >>>>>> 15 } >>>>>> 16 } >>>>>> >>>>>> This code assumes a configurable timeoutMillis parameter has been >>> set >>>>>> earlier. >>>>>> >>>>>> I can't think of any alternative solutions. Putting the timeout in >>> the >>>>>> Reader logic seems higher risk. There's incomplete code in JERI to >>>>>> implement a ping packet (see Mux.asyncSendPing, never used), but >>> that >>>>>> would only be relevant after the initial handshake and wouldn't help >>>>>> here. >>>>>> >>>>>> Thanks, >>>>>> Chris >>>>>> >>>> >>>> Michael McGrady >>>> Chief Systems Architect >>>> Topia Technology, Inc. >>>> Cel 1.253.720.3365 >>>> Work 1.253.572.9712 extension 2037 >>>> Skype ID: michael.mcgrady5 >>>> [email protected] >>>> >>>> >>>> >>>> >>> >>
