[ 
https://issues.apache.org/activemq/browse/AMQ-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Davies reassigned AMQ-1709:
-------------------------------

    Assignee: Rob Davies

> Network of Brokers Memory Leak Due to Race Condition
> ----------------------------------------------------
>
>                 Key: AMQ-1709
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1709
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Transport
>    Affects Versions: 4.1.2, 5.0.0
>            Reporter: Howard Orner
>            Assignee: Rob Davies
>
> When you a a network of brokers configuration with at least 3 brokers, such 
> as:
> <broker brokerName="A" persistent="false" ...
> ...
> <transportConnector name="AListener" uri="tcp://localhost:61610"/>
> ...
> <networkConnector name="BConnector" uri="static:(tcp://localhost:61620)"/>
> <networkConnector name="CConnector" uri="static:(tcp://localhost:61630)"/>
> with the other brokers have a similar configuration.
> Then, if you have subscribers trying to connect to all of the brokers you can 
> have a race condition at start up where the transports accept connections 
> from subscribers before the network connectors are initialized.  In 
> BrokerService.startAllConnectors(), the transports are started first.  Then 
> the NetworkConnectors.  As part of starting the network connectors, their 
> constructors takes a collection obtained by calling 
> getBroker().getDurableDestinations().  Normally this list would be empty.  
> However, if clients connect before this is called, a list is returned for 
> each topic subscribed to.  Then, instead of creating standard 
> TopicSubscriptions for the network connector, DurableTopicSubscriptions are 
> created.  I'm not sure if this really should be a problem, but it is because 
> SimpleDispatchPolicy, in the process of iterating through the 
> DurableTopicSubscriptions, causes messages to be queued up for prefetch 
> without clearing all of the references (for each pass through it looks like 
> three references are registered and only two are cleared.  This becomes a 
> memory leak.  In the logs you see a message saying the PrefetchLimit was 
> reached and then you start seeing logs about memory usage increasing until it 
> gets to 100% and then everything stops.  
> To reproduce this, create a network of brokers configuration of at least 3 
> brokers -- the more you have the more likely you are to hit this without a 
> lot of tries so I suggest a bunch.  Start all brokers.  Establish a publisher 
> on broker A using failover://(tcp://localhost:61610) then establish a bunch 
> of subscribers on all the brokers using a similar configuration, i.e, 
> failover://(tcp://localhost:61610), failover://(tcp://localhost:61620).  The 
> more you have on broker 'A' the better since you are trying to reproduce the 
> race condition.  You want the others up so that the other brokers expect 
> messages to be passed to them.    Once everybody is up and happy, kill broker 
> A and restart it.  If you do that enough times, you will hit the race 
> condition and the memory leak will start.    You can also put a break point 
> in BrokerService.startAllConnectors() after the transports are started but 
> before the network connectors are started.  That'll give clients to connect 
> to the transport threads before you tell the VM to continue.
> I found it an easy fix to store the durable destination list in a local 
> variable before starting the transports and passing that to the network 
> connectors instead of separate calls..  I'm not sure if there are 'normal' 
> ways for that list to be anything other than empty.  If not, you could just 
> pass an empty set to the network connectors, but suspect there are legitimate 
> configurations that may need this to requested.  If so, this memory leak 
> would likely occur in these cases, too.   
> I ran into this in 4.1.2.  I haven't tested 5.0 since our attempts to switch 
> to 5.0 were met with failure due to the number of bugs in 5.0 (already 
> reported by others).  Looking at 5.0.0 source, the race condition is still 
> there in BrokerService.startAllConnectors() so I suspect the memory leak is 
> there as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to