The problem is that we dont know if there is a packet loss or there are delays due to a clogged up network.
The goal is to keep operating in presence of the random network errors. Although, not tried, the AMQ apparently has a way to auto retry, with configurable delay and back off in between retries. I would suggest to try that instead of building recovery into uima-as. My point is that if this problem can be solved by AMQ why do this in the UIMA-AS. Although, one benefit on doing the recovery in the UIMA-AS is that we can learn when the delays occur and how long (approx) they last. I think using AMQ to do recovery will hide the delays as I suspect the recovery is silent (this may need to be tested). On Fri, Feb 7, 2014 at 11:52 AM, Marshall Schor (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/UIMA-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894717#comment-13894717] > > Marshall Schor commented on UIMA-3605: > -------------------------------------- > > hmmm, is the design point for uima-as to operate OK in environments were > tcp/ip packets are lost (I thought tcp/ip has recovery for that: > http://en.wikipedia.org/wiki/Packet_loss )? or just the network is slow? > It seems that if the network is so slow that a 10 second timeout pops, then > maybe the network is too poor to support a UIMA-AS scaleout? > > Or, is the goal to keep operating in the presence of random, occasional, > network hangs? > > Do we have any profile of the what's going on in networks when this kind > of problem happens - is it temporary? > > The answer could guide what kind of solution is appropriate. For > instance, if it is determined that for some reason, the network is usually > great, but occasionally delays packets for 1 minute, then the recovery > might want to do something like wait 1 minute before retrying. Or if the > desire is to operate in the presence of occasional network hangs,, perhaps > some design which measures the duration of these hangs, on an ongoing > basis, would be useful - if it found they were 20 seconds, then the delay > before higher-level retry could be set at 20 + delta seconds. > > If it is thought this is too much for UIMA-AS to handle, and it should be > handled by fixing the networks, then perhaps the current design is OK :-) > > > UIMA-AS gets "Wire format negotiation timeout" on connection.open() > > ------------------------------------------------------------------- > > > > Key: UIMA-3605 > > URL: https://issues.apache.org/jira/browse/UIMA-3605 > > Project: UIMA > > Issue Type: Bug > > Components: Async Scaleout > > Affects Versions: 2.4.2AS > > Reporter: Jerry Cwiklik > > Assignee: Jerry Cwiklik > > Fix For: 2.5.0AS > > > > > > It appears that under heavy network load UIMA-AS is getting "Wire format > negotiation timeout" Exception when opening a connection to a broker. > > The client side of AMQ is sending a frame containing its parameters to > the server (broker). It reconciles clients params against its own and sends > a reply back to the client. The reply apparently never reaches the client > causing the timer to pop (default=10secs) and an exception is thrown. > > Attempt to extend the client timeout via > wireFormat.maxInactivityDurationInitalDelay=60000 doesnt fix the problem. > One possible explanation is that either the client wire format frame is not > reaching the server or the server's reply doesnt reach the client. This may > be due to a lost TCP packet. > > Since the low level amq wire negotiation doesnt offer retry, the UIMA-AS > may need implement a higher level retry around the connection open() logic. > It should capture generic JMSException and check the associated description > for "wire format ..." problem. In such case, the connection should be > closed and reopened. > > > > -- > This message was sent by Atlassian JIRA > (v6.1.5#6160) >
