[
https://issues.apache.org/jira/browse/AMQ-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justin Bertram updated AMQ-9787:
--------------------------------
Description:
When the network cable is pulled from the host of a remote broker, clients
experience extended hangs—up to multiple minutes—while waiting for the network
stack to close the connection. This occurs despite the client InactivityMonitor
being configured with the default 30-second timeout (maxInactivityDuration).
A thread dump of the client captured during this state indicates that the
InactivityMonitor is blocked because a write operation holds a mutex, and that
write is itself blocked while waiting for the network stack to complete TCP
transmission attempts. This behavior can lead to significantly delayed client
failover in scenarios where the network connection is abruptly lost, even with
inactivity monitoring enabled.
Analysis:
* The producer holds a lock (reconnectMutex) in FailoverTransport while
performing a write operation.
* TCP retransmissions caused by network disconnection block the write.
* Because the write retains the lock, the InactivityMonitor is blocked when
attempting to fire the InactivityIOException, preventing timely detection of
inactive clients.
Steps to reproduce:
# Configure a client to connect to a failover URL specifying two or more
ActiveMQ brokers over TCP/SSL.
# Configure the client inactivity monitor with the default 30-second timeout.
# Pull the network cable (or otherwise sever the network connection) to the
broker the clients are connected to.
# Observe that the clients hang for several minutes until the network stack
eventually closes the connection, and then failover to an alternate host
address.
Expected behavior:
Clients should detect inactivity within the configured timeout period, even if
the network transport is experiencing retransmissions.
+Thread dump snippet – InactivityMonitor blocked:+
{noformat}
"ActiveMQ InactivityMonitor Worker 4" #3386 daemon prio=5 os_prio=0 cpu=16.18ms
elapsed=655.78s tid=0x00007f5a78003020 nid=0xce84 waiting for monitor entry
[0x00007f59ecd99000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.activemq.transport.failover.FailoverTransport.handleTransportFailure(FailoverTransport.java:276)
- waiting to lock <0x00000000c15946f0> (a java.lang.Object)
at
org.apache.activemq.transport.failover.FailoverTransport$3.onException(FailoverTransport.java:226)
at
org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:114)
at
org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:173)
at
org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:346)
at
org.apache.activemq.transport.AbstractInactivityMonitor$5.run(AbstractInactivityMonitor.java:248)
at
java.util.concurrent.ThreadPoolExecutor.runWorker([[email protected]/ThreadPoolExecutor.java:1136|mailto:[email protected]/ThreadPoolExecutor.java:1136])
at
java.util.concurrent.ThreadPoolExecutor$Worker.run([[email protected]/ThreadPoolExecutor.java:635|mailto:[email protected]/ThreadPoolExecutor.java:635])
at
java.lang.Thread.run([[email protected]/Thread.java:840|mailto:[email protected]/Thread.java:840]){noformat}
+Producer blocked by the write that owns the mutex 0x00000000c15946f0:+
{noformat}
"Camel (camelContext) thread #34 - seda://publish" #383 daemon prio=5 os_prio=0
cpu=1600.83ms elapsed=1357.89s tid=0x00007f5acc2582b0 nid=0x725a runnable
java.lang.Thread.State: RUNNABLE
at
sun.nio.ch.FileDispatcherImpl.write0([[email protected]/Native|mailto:[email protected]/Native]
Method)
at
sun.nio.ch.SocketDispatcher.write([[email protected]/SocketDispatcher.java:62|mailto:[email protected]/SocketDispatcher.java:62])
at
sun.nio.ch.NioSocketImpl.tryWrite([[email protected]/NioSocketImpl.java:403|mailto:[email protected]/NioSocketImpl.java:403])
at
sun.nio.ch.NioSocketImpl.implWrite([[email protected]/NioSocketImpl.java:418|mailto:[email protected]/NioSocketImpl.java:418])
at
sun.nio.ch.NioSocketImpl.write([[email protected]/NioSocketImpl.java:445|mailto:[email protected]/NioSocketImpl.java:445])
at
sun.nio.ch.NioSocketImpl$2.write([[email protected]/NioSocketImpl.java:831|mailto:[email protected]/NioSocketImpl.java:831])
at
java.net.Socket$SocketOutputStream.write([[email protected]/Socket.java:1035|mailto:[email protected]/Socket.java:1035])
at
sun.security.ssl.SSLSocketOutputRecord.deliver([[email protected]/SSLSocketOutputRecord.java:345|mailto:[email protected]/SSLSocketOutputRecord.java:345])
at
sun.security.ssl.SSLSocketImpl$AppOutputStream.write([[email protected]/SSLSocketImpl.java:1308|mailto:[email protected]/SSLSocketImpl.java:1308])
at
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
at
java.io.DataOutputStream.flush([[email protected]/DataOutputStream.java:128|mailto:[email protected]/DataOutputStream.java:128])
at
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194)
at
org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336)
at
org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318)
at
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:94)
at
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:116)
at
org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:670)
- locked <0x00000000c15946f0> (a java.lang.Object)
at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
at
org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
at
org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1311)
at
org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1305)
at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1965)
- locked <0x00000000c0c29e08> (a java.lang.Object)
at
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
at
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
at
org.apache.activemq.jms.pool.PooledProducer.send(PooledProducer.java:95){noformat}
was:
When the network cable is pulled from the host of a remote broker, clients
experience extended hangs—up to multiple minutes—while waiting for the network
stack to close the connection. This occurs despite the client InactivityMonitor
being configured with the default 30-second timeout (maxInactivityDuration).
A thread dump of the client captured during this state indicates that the
InactivityMonitor is blocked because a write operation holds a mutex, and that
write is itself blocked while waiting for the network stack to complete TCP
transmission attempts. This behavior can lead to significantly delayed client
failover in scenarios where the network connection is abruptly lost, even with
inactivity monitoring enabled.
Analysis:
* The producer holds a lock (reconnectMutex) in FailoverTransport while
performing a write operation.
* TCP retransmissions caused by network disconnection block the write.
* Because the write retains the lock, the InactivityMonitor is blocked when
attempting to fire the InactivityIOException, preventing timely detection of
inactive clients.
Steps to reproduce:
# Configure a client to connect to a failover URL specifying two or more
ActiveMQ brokers over TCP/SSL.
# Configure the client inactivity monitor with the default 30-second timeout.
# Pull the network cable (or otherwise sever the network connection) to the
broker the clients are connected to.
# Observe that the clients hang for several minutes until the network stack
eventually closes the connection, and then failover to an alternate host
address.
Expected behavior:
Clients should detect inactivity within the configured timeout period, even if
the network transport is experiencing retransmissions.
+Thread dump snippet – InactivityMonitor blocked:+
"ActiveMQ InactivityMonitor Worker 4" #3386 daemon prio=5 os_prio=0 cpu=16.18ms
elapsed=655.78s tid=0x00007f5a78003020 nid=0xce84 waiting for monitor entry
[0x00007f59ecd99000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.activemq.transport.failover.FailoverTransport.handleTransportFailure(FailoverTransport.java:276)
- waiting to lock <0x00000000c15946f0> (a java.lang.Object)
at
org.apache.activemq.transport.failover.FailoverTransport$3.onException(FailoverTransport.java:226)
at
org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:114)
at
org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:173)
at
org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:346)
at
org.apache.activemq.transport.AbstractInactivityMonitor$5.run(AbstractInactivityMonitor.java:248)
at
java.util.concurrent.ThreadPoolExecutor.runWorker([[email protected]/ThreadPoolExecutor.java:1136|mailto:[email protected]/ThreadPoolExecutor.java:1136])
at
java.util.concurrent.ThreadPoolExecutor$Worker.run([[email protected]/ThreadPoolExecutor.java:635|mailto:[email protected]/ThreadPoolExecutor.java:635])
at
java.lang.Thread.run([[email protected]/Thread.java:840|mailto:[email protected]/Thread.java:840])
+Producer blocked by the write that owns the mutex 0x00000000c15946f0:+
"Camel (camelContext) thread #34 - seda://publish" #383 daemon prio=5 os_prio=0
cpu=1600.83ms elapsed=1357.89s tid=0x00007f5acc2582b0 nid=0x725a runnable
java.lang.Thread.State: RUNNABLE
at
sun.nio.ch.FileDispatcherImpl.write0([[email protected]/Native|mailto:[email protected]/Native]
Method)
at
sun.nio.ch.SocketDispatcher.write([[email protected]/SocketDispatcher.java:62|mailto:[email protected]/SocketDispatcher.java:62])
at
sun.nio.ch.NioSocketImpl.tryWrite([[email protected]/NioSocketImpl.java:403|mailto:[email protected]/NioSocketImpl.java:403])
at
sun.nio.ch.NioSocketImpl.implWrite([[email protected]/NioSocketImpl.java:418|mailto:[email protected]/NioSocketImpl.java:418])
at
sun.nio.ch.NioSocketImpl.write([[email protected]/NioSocketImpl.java:445|mailto:[email protected]/NioSocketImpl.java:445])
at
sun.nio.ch.NioSocketImpl$2.write([[email protected]/NioSocketImpl.java:831|mailto:[email protected]/NioSocketImpl.java:831])
at
java.net.Socket$SocketOutputStream.write([[email protected]/Socket.java:1035|mailto:[email protected]/Socket.java:1035])
at
sun.security.ssl.SSLSocketOutputRecord.deliver([[email protected]/SSLSocketOutputRecord.java:345|mailto:[email protected]/SSLSocketOutputRecord.java:345])
at
sun.security.ssl.SSLSocketImpl$AppOutputStream.write([[email protected]/SSLSocketImpl.java:1308|mailto:[email protected]/SSLSocketImpl.java:1308])
at
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
at
java.io.DataOutputStream.flush([[email protected]/DataOutputStream.java:128|mailto:[email protected]/DataOutputStream.java:128])
at
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194)
at
org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336)
at
org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318)
at
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:94)
at
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:116)
at
org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:670)
- locked <0x00000000c15946f0> (a java.lang.Object)
at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
at
org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
at
org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1311)
at
org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1305)
at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1965)
- locked <0x00000000c0c29e08> (a java.lang.Object)
at
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
at
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
at org.apache.activemq.jms.pool.PooledProducer.send(PooledProducer.java:95)
> FailoverTransport deadlock on half-open broker connection
> ---------------------------------------------------------
>
> Key: AMQ-9787
> URL: https://issues.apache.org/jira/browse/AMQ-9787
> Project: ActiveMQ Classic
> Issue Type: Bug
> Affects Versions: 5.17.1, 6.1.2
> Reporter: Mitchell Wagner
> Priority: Major
>
> When the network cable is pulled from the host of a remote broker, clients
> experience extended hangs—up to multiple minutes—while waiting for the
> network stack to close the connection. This occurs despite the client
> InactivityMonitor being configured with the default 30-second timeout
> (maxInactivityDuration).
> A thread dump of the client captured during this state indicates that the
> InactivityMonitor is blocked because a write operation holds a mutex, and
> that write is itself blocked while waiting for the network stack to complete
> TCP transmission attempts. This behavior can lead to significantly delayed
> client failover in scenarios where the network connection is abruptly lost,
> even with inactivity monitoring enabled.
> Analysis:
> * The producer holds a lock (reconnectMutex) in FailoverTransport while
> performing a write operation.
> * TCP retransmissions caused by network disconnection block the write.
> * Because the write retains the lock, the InactivityMonitor is blocked when
> attempting to fire the InactivityIOException, preventing timely detection of
> inactive clients.
> Steps to reproduce:
> # Configure a client to connect to a failover URL specifying two or more
> ActiveMQ brokers over TCP/SSL.
> # Configure the client inactivity monitor with the default 30-second timeout.
> # Pull the network cable (or otherwise sever the network connection) to the
> broker the clients are connected to.
> # Observe that the clients hang for several minutes until the network stack
> eventually closes the connection, and then failover to an alternate host
> address.
> Expected behavior:
> Clients should detect inactivity within the configured timeout period, even
> if the network transport is experiencing retransmissions.
> +Thread dump snippet – InactivityMonitor blocked:+
> {noformat}
> "ActiveMQ InactivityMonitor Worker 4" #3386 daemon prio=5 os_prio=0
> cpu=16.18ms elapsed=655.78s tid=0x00007f5a78003020 nid=0xce84 waiting for
> monitor entry [0x00007f59ecd99000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.activemq.transport.failover.FailoverTransport.handleTransportFailure(FailoverTransport.java:276)
> - waiting to lock <0x00000000c15946f0> (a java.lang.Object)
> at
> org.apache.activemq.transport.failover.FailoverTransport$3.onException(FailoverTransport.java:226)
> at
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:114)
> at
> org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:173)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:346)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor$5.run(AbstractInactivityMonitor.java:248)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker([[email protected]/ThreadPoolExecutor.java:1136|mailto:[email protected]/ThreadPoolExecutor.java:1136])
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run([[email protected]/ThreadPoolExecutor.java:635|mailto:[email protected]/ThreadPoolExecutor.java:635])
> at
> java.lang.Thread.run([[email protected]/Thread.java:840|mailto:[email protected]/Thread.java:840]){noformat}
> +Producer blocked by the write that owns the mutex 0x00000000c15946f0:+
> {noformat}
> "Camel (camelContext) thread #34 - seda://publish" #383 daemon prio=5
> os_prio=0 cpu=1600.83ms elapsed=1357.89s tid=0x00007f5acc2582b0 nid=0x725a
> runnable
> java.lang.Thread.State: RUNNABLE
> at
> sun.nio.ch.FileDispatcherImpl.write0([[email protected]/Native|mailto:[email protected]/Native]
> Method)
> at
> sun.nio.ch.SocketDispatcher.write([[email protected]/SocketDispatcher.java:62|mailto:[email protected]/SocketDispatcher.java:62])
> at
> sun.nio.ch.NioSocketImpl.tryWrite([[email protected]/NioSocketImpl.java:403|mailto:[email protected]/NioSocketImpl.java:403])
> at
> sun.nio.ch.NioSocketImpl.implWrite([[email protected]/NioSocketImpl.java:418|mailto:[email protected]/NioSocketImpl.java:418])
> at
> sun.nio.ch.NioSocketImpl.write([[email protected]/NioSocketImpl.java:445|mailto:[email protected]/NioSocketImpl.java:445])
> at
> sun.nio.ch.NioSocketImpl$2.write([[email protected]/NioSocketImpl.java:831|mailto:[email protected]/NioSocketImpl.java:831])
> at
> java.net.Socket$SocketOutputStream.write([[email protected]/Socket.java:1035|mailto:[email protected]/Socket.java:1035])
> at
> sun.security.ssl.SSLSocketOutputRecord.deliver([[email protected]/SSLSocketOutputRecord.java:345|mailto:[email protected]/SSLSocketOutputRecord.java:345])
> at
> sun.security.ssl.SSLSocketImpl$AppOutputStream.write([[email protected]/SSLSocketImpl.java:1308|mailto:[email protected]/SSLSocketImpl.java:1308])
> at
> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
> at
> java.io.DataOutputStream.flush([[email protected]/DataOutputStream.java:128|mailto:[email protected]/DataOutputStream.java:128])
> at
> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318)
> at
> org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:94)
> at
> org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:116)
> at
> org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:670)
> - locked <0x00000000c15946f0> (a java.lang.Object)
> at
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
> at
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> at
> org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1311)
> at
> org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1305)
> at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1965)
> - locked <0x00000000c0c29e08> (a java.lang.Object)
> at
> org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
> at
> org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
> at
> org.apache.activemq.jms.pool.PooledProducer.send(PooledProducer.java:95){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact