[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357632#comment-14357632
 ] 

Jim Robinson edited comment on AMQ-5082 at 3/11/15 9:19 PM:
------------------------------------------------------------

I was seeing similar behavior on 5.11.1 when I spun up two 3-node clusters in a 
VirtualBox environment.  I assumed the problem was not enough cpu cycles to 
keep the zookeeper session alive.  I've got a proposed Unit Test:

https://github.com/jimrobinson/activemq/commit/58b7198880f5296af6b2e4e9efbbdfdb51220411

and a potential fix:

https://github.com/jimrobinson/activemq/commit/d272a116ff5c0916a6044d657f99df48f264bd2a

I believe the underlying issue is that ZooKeeperGroup is relying on

org.linkedin.zookeeper.tracker.ZooKeeperTreeTracker

to keep its membership list up to date, and I don't believe that is happening 
on zookeeper session expiration.



was (Author: jim.robin...@gmail.com):
I was seeing similar behavior when I spun up two 3-node clusters in a 
VirtualBox environment.
I assumed the problem was not enough cpu cycles to keep the zookeeper session 
alive.
I've got a proposed Unit Test:

https://github.com/jimrobinson/activemq/commit/58b7198880f5296af6b2e4e9efbbdfdb51220411

and a potential fix:

https://github.com/jimrobinson/activemq/commit/d272a116ff5c0916a6044d657f99df48f264bd2a

I believe the underlying issue is that ZooKeeperGroup is relying on

org.linkedin.zookeeper.tracker.ZooKeeperTreeTracker

to keep its membership list up to date, and I don't believe that is happening 
on zookeeper session expiration.


> ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
> -------------------------------------------------------------------
>
>                 Key: AMQ-5082
>                 URL: https://issues.apache.org/jira/browse/AMQ-5082
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: activemq-leveldb-store
>    Affects Versions: 5.9.0, 5.10.0
>            Reporter: Scott Feldstein
>            Priority: Critical
>         Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
> mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
> zookeeper.out-cluster.failure
>
>
> I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
> persistence adapter.
> {code}
>         <persistenceAdapter>
>             <replicatedLevelDB
>               directory="${activemq.data}/leveldb"
>               replicas="3"
>               bind="tcp://0.0.0.0:0"
>               zkAddress="zookeep0:2181"
>               zkPath="/activemq/leveldb-stores"/>
>         </persistenceAdapter>
> {code}
> After about a day or so of sitting idle there are cascading failures and the 
> cluster completely stops listening all together.
> I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
> 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
> the three mq nodes and the zookeeper logs that reflect the time where the 
> cluster starts having issues.
> The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
> The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
> issue.
> If you need more data it should be pretty easy to get whatever is needed 
> since it is consistently reproducible.
> This bug may be related to AMQ-5026, but looks different enough to file a 
> separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to