[ https://issues.apache.org/jira/browse/AMQ-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017577#comment-13017577 ]
Scott Emmons commented on AMQ-2730: ----------------------------------- Related to the ORIGINAL description of this bug, here's our experience in debugging EOFException errors with AMQ 5.4.2 on linux, using a failover connector: We just spent some time debugging a similar EOFException situation and believe the cause is memory related on the ActiveMQ server, even though the server may not ever throw an OutOfMemory exception. In our case, this cause was not the least bit obvious as the server would appear to function fine, but at some point the inactivity timer would kick in and cause reconnects. When this would happen, new connections (from our Java clients) to the server would be immediately closed and the client would start hammering the server up until the max reconnect attempts is hit (even though the documentation says the default is unlimited, it's actually 1000 in TaskRunnerFactory - due to DefaultThreadPools having what we think is the right code commented out, unless that means something different). Because of synchronization in this code, a single thread will hit this loop, while all the other threads are stuck waiting on the mutex (which is fine, because nobody can do any work anyway). Once the 1000 connection attempts are done, the client JVM becomes effectively useless for any JMS activity, as the thread in the reconnection loop will give up and end up waiting forever on an event that will never occur. All other threads are stuck waiting on this mutex. We have seen many cases where restarting the client appears fine (it reconnects to ActiveMQ server just fine), giving a false sense that all is well. At some point, the client will once again get into this state. Because of this, we While there is little the client code can do in this situation, there are things that can be done to help with this situation. Here's my suggestions, if it helps: 1. In the TaskRunnerFactory, if we hit the max reconnect attempts, a high severity log entry should be written saying that AMQ client is effectively dead and giving up. This message should repeat at some reasonable interval so it doesn't get buried in logs. 2. Improve AMQ client logging when it is this reconnect loop. We found that we had to enable TRACE level logging to really get enough detail to see what was going on. 3. Improve AMQ server logging, if possible, to indicate why it is closing sockets from the client. 4. Validate if the current code in DefaultThreadPools is indeed correct (not the commented out code). 5. I don't know what the default memoryUsage limit is in the example activemq.xml, but make sure that it's a sane value considering the default max heap is 512MB. > Transport failed: java.io.EOFException > -------------------------------------- > > Key: AMQ-2730 > URL: https://issues.apache.org/jira/browse/AMQ-2730 > Project: ActiveMQ > Issue Type: Bug > Affects Versions: 5.3.1 > Environment: Red hat enterprise > Reporter: Dennis Klinkott > Fix For: 5.6.0 > > Attachments: TCP Dump.jpg > > > We see many Exceptions of this type in the broker logs: > 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | > org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ > Transport: tcp:///10.231.233.55:50809 > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269) > at > org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211) > at > org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203) > at > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186) > at java.lang.Thread.run(Thread.java:619) > This exception occurs with a lot of clients that run on different machines. > One case where we almost always see this exception is when shutting down a > bridge. We did a tcp dump on the broker and on the client machine: there a no > packets lost. > LOG ON BRIDGE: > 2010-05-12 14:10:02,634 INFO org.apache.activemq.broker.BrokerService - > ActiveMQ Message Broker (solr-2073bridge, ID:solr-2073-50804- > 1273665033782-0:0) is shutting down > 2010-05-12 14:10:02,799 INFO org.apache.activemq.network.jms.JmsConnector - > JMS Connector Connector:0 Stopped > 2010-05-12 14:10:02,819 INFO org.apache.activemq.broker.BrokerService - > ActiveMQ JMS Message Broker (solr-2073bridge, ID:solr-2073-50 > 804-1273665033782-0:0) stopped > LOG ON BROKER: > 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | > org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ > Transport: tcp:///10.231.233.55:50809 > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269) > at > org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211) > at > org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203) > at > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186) > at java.lang.Thread.run(Thread.java:619) > 2010-05-12 14:10:03,492 | DEBUG | Stopping connection: /10.231.233.55:50809 | > org.apache.activemq.broker.TransportConnection | ActiveMQ Task > 2010-05-12 14:10:03,492 | DEBUG | Stopping transport > tcp:///10.231.233.55:50809 | org.apache.activemq.transport.tcp.TcpTransport | > ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | Stopped transport: /10.231.233.55:50809 | > org.apache.activemq.broker.TransportConnection | ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | Cleaning up connection resources: > /10.231.233.55:50809 | org.apache.activemq.broker.TransportConnection | > ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | remove connection id: > ID:solr-2073-50804-1273665033782-2:0 | > org.apache.activemq.broker.TransportConnection | ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | jms1 removing consumer: > ID:solr-2073-50804-1273665033782-2:0:225:1 for destination: > temp-queue://ID:solr-2073-50804-1273665033782-2:0:105 | > org.apache.activemq.broker.region.AbstractRegion | ActiveMQ Task > 2010-05-12 14:10:03,493 | DEBUG | remove sub: QueueSubscription: > consumer=ID:solr-2073-50804-1273665033782-2:0:225:1, destinations=1, > dispatched=0, delivered=0, pending=0, lastDeliveredSeqId: 0, dequeues: 0, > dispatched: 0, inflight: 0 | org.apache.activemq.broker.region.Queue | > ActiveMQ Task -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira