[ 
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387659#comment-14387659
 ] 

Christian Posta commented on AMQ-5082:
--------------------------------------

After running a handful more times, it seems to pass ~90% of the time. Is that 
the same behavior for you? The patch looks fine to me (e.g., rebuilding the ZK 
tracking tree on reconnect). I'll keep running to see if it's my env or if 
there is a timing issue with the test.

I've applied the patch (w/o using the pull request which had some other merge 
noise) as a clean commit here: 
https://github.com/apache/activemq/commit/a39e51e0519852332f7779443c3db09b6e691d49

[~scottmf] can you test on the latest master to verify things look good?

> ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
> -------------------------------------------------------------------
>
>                 Key: AMQ-5082
>                 URL: https://issues.apache.org/jira/browse/AMQ-5082
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: activemq-leveldb-store
>    Affects Versions: 5.9.0, 5.10.0
>            Reporter: Scott Feldstein
>            Assignee: Christian Posta
>            Priority: Critical
>             Fix For: 5.12.0
>
>         Attachments: 03-07.tgz, amq_5082_threads.tar.gz, 
> mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure, 
> zookeeper.out-cluster.failure
>
>
> I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB 
> persistence adapter.
> {code}
>         <persistenceAdapter>
>             <replicatedLevelDB
>               directory="${activemq.data}/leveldb"
>               replicas="3"
>               bind="tcp://0.0.0.0:0"
>               zkAddress="zookeep0:2181"
>               zkPath="/activemq/leveldb-stores"/>
>         </persistenceAdapter>
> {code}
> After about a day or so of sitting idle there are cascading failures and the 
> cluster completely stops listening all together.
> I can reproduce this consistently on 5.9 and the latest 5.10 (commit 
> 2360fb859694bacac1e48092e53a56b388e1d2f0).  I am going to attach logs from 
> the three mq nodes and the zookeeper logs that reflect the time where the 
> cluster starts having issues.
> The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
> The OSs are all centos 5.9 on one esx server, so I doubt networking is an 
> issue.
> If you need more data it should be pretty easy to get whatever is needed 
> since it is consistently reproducible.
> This bug may be related to AMQ-5026, but looks different enough to file a 
> separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to