[ 
https://issues.apache.org/activemq/browse/AMQ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-1993.
-----------------------------

    Resolution: Fixed

update applied in r711292

> Systems hang due to inability to timeout socket write operation
> ---------------------------------------------------------------
>
>                 Key: AMQ-1993
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1993
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.1.0, 5.2.0
>         Environment: Unix (Solaris and Linux tested)
>            Reporter: Filip Hanik
>            Assignee: Gary Tully
>            Priority: Critical
>             Fix For: 5.3.0
>
>         Attachments: patch-1-threadname-filter.patch, 
> patch-3-tcp-writetimeout.patch
>
>
> the blocking Java Socket API doesn't have a timeout on socketWrite 
> invocations.
> This means, if a TCP session is dropped or terminated without RST or FIN 
> packets, the operating system it left to eventually time out the session. On 
> the linux kernel this timeout usually takes 15 to 30minutes. 
> For this entire period, the AMQ server hangs, and producers and consumers are 
> unable to use a topic.
> I have created two patches for this at the page:
> http://www.hanik.com/covalent/amq/index.html
> Let me show a bit more
> ---------------------------------
> "ActiveMQ Transport: tcp:///X.YYY.XXX.ZZZZ:2011" daemon prio=10 
> tid=0x0000000055d39000 nid=0xc78 runnable 
> [0x00000000447c9000..0x00000000447cac10]
>    java.lang.Thread.State: RUNNABLE
>       at java.net.SocketOutputStream.socketWrite0(Native Method)
>       at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> This is a thread stuck in blocking IO, and can be stuck for 30 minutes during 
> the kernel TCP retransmission attempts.
> Unfortunately the thread dump is very misleading since the name of the 
> thread, is not the destination or even remotely related to the socket it is 
> operating on.
> To mend this, a very simple (and configurable) ThreadNameFilter has been 
> suggested to the patch, that appends the destination and helps the system 
> administrator correctly identify the client that is about to receive data. 
> -----------------------------------
>       at org.apache.activemq.broker.region.Topic.dispatch(Topic.java:581)
>       at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:421)
>       - locked <0x00002aaaec155818> (a 
> org.apache.activemq.broker.region.Topic)
>       at org.apache.activemq.broker.region.Topic.send(Topic.java:363)
> The lock being held at this issue unfortunately makes the entire Topic single 
> threaded. 
> When this lock is being held, no other clients (producers and consumers) can 
> publish to/receive from this topic.
> And this lock can hold up to 30 minutes.
> I consider solving this single threaded behavior a 'feature enhancement' that 
> should be handled separately from this bug. Because even if it is solved, 
> threads still risk being stuck in socketWrite0 for dropped connections that 
> still appear to be established.
> For this, I have implemented a socket timeout filter, based on a 
> TransportFilter, this filter only times out connections that are actually 
> writing data.
> The two patches are at:
> http://www.hanik.com/covalent/amq/patch-1-threadname-filter.patch
> http://www.hanik.com/covalent/amq/patch-3-tcp-writetimeout.patch
> the binary 0000.jar applies to both 5.1 and trunk and can be used today in 
> existing environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to