[ 
https://issues.apache.org/jira/browse/AMQ-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017577#comment-13017577
 ] 

Scott Emmons commented on AMQ-2730:
-----------------------------------

Related to the ORIGINAL description of this bug, here's our experience in 
debugging EOFException errors with AMQ 5.4.2 on linux, using a failover 
connector:

We just spent some time debugging a similar EOFException situation and believe 
the cause is memory related on the ActiveMQ server, even though the server may 
not ever throw an OutOfMemory exception. In our case, this cause was not the 
least bit obvious as the server would appear to function fine, but at some 
point the inactivity timer would kick in and cause reconnects. When this would 
happen, new connections (from our Java clients) to the server would be 
immediately closed and the client would start hammering the server up until the 
max reconnect attempts is hit (even though the documentation says the default 
is unlimited, it's actually 1000 in TaskRunnerFactory - due to 
DefaultThreadPools having what we think is the right code commented out, unless 
that means something different). Because of synchronization in this code, a 
single thread will hit this loop, while all the other threads are stuck waiting 
on the mutex (which is fine, because nobody can do any work anyway). Once the 
1000 connection attempts are done, the client JVM becomes effectively useless 
for any JMS activity, as the thread in the reconnection loop will give up and 
end up waiting forever on an event that will never occur. All other threads are 
stuck waiting on this mutex.

We have seen many cases where restarting the client appears fine (it reconnects 
to ActiveMQ server just fine), giving a false sense that all is well. At some 
point, the client will once again get into this state. Because of this, we 

While there is little the client code can do in this situation, there are 
things that can be done to help with this situation. Here's my suggestions, if 
it helps:

1. In the TaskRunnerFactory, if we hit the max reconnect attempts, a high 
severity log entry should be written saying that AMQ client is effectively dead 
and giving up. This message should repeat at some reasonable interval so it 
doesn't get buried in logs.

2. Improve AMQ client logging when it is this reconnect loop. We found that we 
had to enable TRACE level logging to really get enough detail to see what was 
going on.

3. Improve AMQ server logging, if possible, to indicate why it is closing 
sockets from the client.

4. Validate if the current code in DefaultThreadPools is indeed correct (not 
the commented out code).

5. I don't know what the default memoryUsage limit is in the example 
activemq.xml, but make sure that it's a sane value considering the default max 
heap is 512MB.


> Transport failed: java.io.EOFException
> --------------------------------------
>
>                 Key: AMQ-2730
>                 URL: https://issues.apache.org/jira/browse/AMQ-2730
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.3.1
>         Environment: Red hat enterprise
>            Reporter: Dennis Klinkott
>             Fix For: 5.6.0
>
>         Attachments: TCP Dump.jpg
>
>
> We see many Exceptions of this type in the broker logs: 
> 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | 
> org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ 
> Transport: tcp:///10.231.233.55:50809
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>         at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211)
>         at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
>         at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186)
>         at java.lang.Thread.run(Thread.java:619)
> This exception occurs with a lot of clients that run on different machines.
> One case where we almost always see this exception is when shutting down a 
> bridge. We did a tcp dump on the broker and on the client machine: there a no 
> packets lost.
> LOG ON BRIDGE:
> 2010-05-12 14:10:02,634 INFO  org.apache.activemq.broker.BrokerService - 
> ActiveMQ Message Broker (solr-2073bridge, ID:solr-2073-50804-
> 1273665033782-0:0) is shutting down
> 2010-05-12 14:10:02,799 INFO  org.apache.activemq.network.jms.JmsConnector - 
> JMS Connector Connector:0 Stopped
> 2010-05-12 14:10:02,819 INFO  org.apache.activemq.broker.BrokerService - 
> ActiveMQ JMS Message Broker (solr-2073bridge, ID:solr-2073-50
> 804-1273665033782-0:0) stopped
> LOG ON BROKER:
> 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | 
> org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ 
> Transport: tcp:///10.231.233.55:50809
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>         at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211)
>         at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
>         at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-05-12 14:10:03,492 | DEBUG | Stopping connection: /10.231.233.55:50809 | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Task
> 2010-05-12 14:10:03,492 | DEBUG | Stopping transport 
> tcp:///10.231.233.55:50809 | org.apache.activemq.transport.tcp.TcpTransport | 
> ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | Stopped transport: /10.231.233.55:50809 | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | Cleaning up connection resources: 
> /10.231.233.55:50809 | org.apache.activemq.broker.TransportConnection | 
> ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | remove connection id: 
> ID:solr-2073-50804-1273665033782-2:0 | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | jms1 removing consumer: 
> ID:solr-2073-50804-1273665033782-2:0:225:1 for destination: 
> temp-queue://ID:solr-2073-50804-1273665033782-2:0:105 | 
> org.apache.activemq.broker.region.AbstractRegion | ActiveMQ Task
> 2010-05-12 14:10:03,493 | DEBUG | remove sub: QueueSubscription: 
> consumer=ID:solr-2073-50804-1273665033782-2:0:225:1, destinations=1, 
> dispatched=0, delivered=0, pending=0, lastDeliveredSeqId: 0, dequeues: 0, 
> dispatched: 0, inflight: 0 | org.apache.activemq.broker.region.Queue | 
> ActiveMQ Task

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to