[ https://issues.apache.org/activemq/browse/AMQ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=47059#action_47059 ]
Filip Hanik commented on AMQ-1993: ---------------------------------- That's correct Gary, thank you! > Systems hang due to inability to timeout socket write operation > --------------------------------------------------------------- > > Key: AMQ-1993 > URL: https://issues.apache.org/activemq/browse/AMQ-1993 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.1.0, 5.2.0 > Environment: Unix (Solaris and Linux tested) > Reporter: Filip Hanik > Assignee: Gary Tully > Priority: Critical > Fix For: 5.3.0 > > Attachments: patch-1-threadname-filter.patch, > patch-3-tcp-writetimeout.patch > > > the blocking Java Socket API doesn't have a timeout on socketWrite > invocations. > This means, if a TCP session is dropped or terminated without RST or FIN > packets, the operating system it left to eventually time out the session. On > the linux kernel this timeout usually takes 15 to 30minutes. > For this entire period, the AMQ server hangs, and producers and consumers are > unable to use a topic. > I have created two patches for this at the page: > http://www.hanik.com/covalent/amq/index.html > Let me show a bit more > --------------------------------- > "ActiveMQ Transport: tcp:///X.YYY.XXX.ZZZZ:2011" daemon prio=10 > tid=0x0000000055d39000 nid=0xc78 runnable > [0x00000000447c9000..0x00000000447cac10] > java.lang.Thread.State: RUNNABLE > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > This is a thread stuck in blocking IO, and can be stuck for 30 minutes during > the kernel TCP retransmission attempts. > Unfortunately the thread dump is very misleading since the name of the > thread, is not the destination or even remotely related to the socket it is > operating on. > To mend this, a very simple (and configurable) ThreadNameFilter has been > suggested to the patch, that appends the destination and helps the system > administrator correctly identify the client that is about to receive data. > ----------------------------------- > at org.apache.activemq.broker.region.Topic.dispatch(Topic.java:581) > at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:421) > - locked <0x00002aaaec155818> (a > org.apache.activemq.broker.region.Topic) > at org.apache.activemq.broker.region.Topic.send(Topic.java:363) > The lock being held at this issue unfortunately makes the entire Topic single > threaded. > When this lock is being held, no other clients (producers and consumers) can > publish to/receive from this topic. > And this lock can hold up to 30 minutes. > I consider solving this single threaded behavior a 'feature enhancement' that > should be handled separately from this bug. Because even if it is solved, > threads still risk being stuck in socketWrite0 for dropped connections that > still appear to be established. > For this, I have implemented a socket timeout filter, based on a > TransportFilter, this filter only times out connections that are actually > writing data. > The two patches are at: > http://www.hanik.com/covalent/amq/patch-1-threadname-filter.patch > http://www.hanik.com/covalent/amq/patch-3-tcp-writetimeout.patch > the binary 0000.jar applies to both 5.1 and trunk and can be used today in > existing environments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.