[ https://issues.apache.org/activemq/browse/AMQ-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63084#action_63084 ]
Eric commented on AMQ-3016: --------------------------- Hi Since duplex connections are very different from non-duplex one, this patch must be carefully tests in all conditions. Did you launch all ActiveMQ Maven tests and verify that no border effects are discovered ? Regards Eric-AWL > Race condition in DemandForwardingBridgeSupport can cause remote connection > to be leaked. > ----------------------------------------------------------------------------------------- > > Key: AMQ-3016 > URL: https://issues.apache.org/activemq/browse/AMQ-3016 > Project: ActiveMQ > Issue Type: Bug > Components: Connector, Transport > Affects Versions: 5.4.1 > Reporter: Stirling Chow > Attachments: ConnectionLeakTest.java, patch.txt > > > Symptom > ======== > I set up two Brokers and a network bridge from Broker A to Broker B over > HTTP. When the bridge is established, each Broker has a single transport > connection (VM on broker A and HTTP on broker B) as recorded in > RegionBroker.connections > I noticed that when Broker A was stopped (normally), periodically the HTTP > connection would remain in Broker B's RegionBroker.connections until the > InactivityMonitor on the connection timed out. If the InactivityMonitor was > disbled, then the connection would remain indefinitely. > If Broker A was restarted, the bridge would be restarted and a second > connection would be recorded in Broker B's RegionBroker.connections. > Repeating restarts of Broker A would cause an accumulation of "dead" > connections, which would eventually lead to an OOM. > Cause > ===== > When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and > sends a ShutdownInfo command to the local and remote transports. When the > transports receive the ShutdownInfo, they remove the connection from their > associated RegionBroker.connections as part of > TransportConnection.processRemoveConnection(ConnectionId, long): > public synchronized Response processRemoveConnection(ConnectionId id, > long lastDeliveredSequenceId) > throws InterruptedException { > ... > try { > broker.removeConnection(cs.getContext(), cs.getInfo(), null); > } catch (Throwable e) { > SERVICELOG.warn("Failed to remove connection " + > cs.getInfo(), e); > } > In the cases were Broker B would not clean up its connection, I traced the > code and discovered that the ShutdownInfo message was not being sent because > the remote transport (HttpClientTransport) had already had its "stopped" flag > set to true as part of HttpClientTransport.oneway(Object command): > public void oneway(Object command) throws IOException { > if (isStopped()) { > throw new IOException("stopped."); > } > ... > DemandForwardingBridgeSupport's stop() method has the following code: > public void stop() throws Exception { > ... > ASYNC_TASKS.execute(new Runnable() { > public void run() { > try { > localBroker.oneway(new ShutdownInfo()); > sendShutdown.countDown(); > remoteBroker.oneway(new ShutdownInfo()); > } catch (Throwable e) { > LOG.debug("Caught exception sending > shutdown", e); > } finally { > sendShutdown.countDown(); > } > } > }); > if (!sendShutdown.await(10, TimeUnit.SECONDS)) { > LOG.info("Network Could not shutdown in a timely > manner"); > } > } finally { > ServiceStopper ss = new ServiceStopper(); > ss.stop(remoteBroker); > ss.stop(localBroker); > // Release the started Latch since another thread could be > // stuck waiting for it to start up. > startedLatch.countDown(); > startedLatch.countDown(); > localStartedLatch.countDown(); > ss.throwFirstException(); > } > } > ShutdownInfo is sent asynchronously to the local and remote transports by a > slave thread: > localBroker.oneway(new ShutdownInfo()); > sendShutdown.countDown(); > remoteBroker.oneway(new ShutdownInfo()); > The sendShutdown latch is used by the master thread to prevent running the > finally clause until the ShutdownInfo has been sent: > if (!sendShutdown.await(10, TimeUnit.SECONDS)) { > LOG.info("Network Could not shutdown in a timely > manner"); > } > } finally { > ServiceStopper ss = new ServiceStopper(); > ss.stop(remoteBroker); > ss.stop(localBroker); > ... > } > } > However, because the latch is countdown *before* remoteTransport.oneway(new > ShutdownInfo()) there is a race condition and in most cases the main thread > calls ss.stop(remoteBroker) before the slave thread has completed its call to > remoteTransport.oneway(new ShutdownInfo()). As a result, the remoteTransport > appears already stopped and the ShutdownInfo is discarded. This leaves the > connection dangling on the remote broker until the InactivityMonitor closes > it. > Solution > ====== > The sendShutdown latch should be countdown *after* remoteTransport.oneway(new > ShutdownInfo()). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.