Guillaume created AMQ-4837:
------------------------------

             Summary: LevelDB corrupted in AMQ cluster
                 Key: AMQ-4837
                 URL: https://issues.apache.org/jira/browse/AMQ-4837
             Project: ActiveMQ
          Issue Type: Bug
          Components: activemq-leveldb-store
    Affects Versions: 5.9.0
         Environment: CentOS, Linux version 2.6.32-71.29.1.el6.x86_64
java-1.7.0-openjdk.x86_64/java-1.6.0-openjdk.x86_64
zookeeper-3.4.5.2
            Reporter: Guillaume
            Priority: Critical


I have clustered 3 ActiveMQ instances using replicated leveldb and zookeeper. 
When performing some tests using Web UI, I can across issues that appears to 
corrupt the leveldb data files.

The issue can be replicated by performing the following steps:
1.      Start 3 activemq nodes.
2.      Push a message to the master (Node1) and browse the queue using the web 
UI
3.      Stop master node (Node1)
4.      Push a message to the new master (Node2) and browse the queue using the 
web UI. Message summary and queue content ok.
5.      Start Node1
6.      Stop master node (Node2)
7.      Browse the queue using the web UI on new master (Node3). Message 
summary ok however when clicking on the queue, no message details. An error 
(see below) is logged by the master, which attempts a restart.

>From this point, the database appears to be corrupted and the same error 
>occurs to each node infinitely (shutdown/restart). The only way around is to 
>stop the nodes and clear the data files.

However when a message is pushed between step 5 and 6, the error doesn’t occur.

=================================
Leveldb configuration on the 3 instances:
                <persistenceAdapter>
                        <replicatedLevelDB
                                        directory="${activemq.data}/leveldb"
                                        replicas="3"
                                        bind="tcp://0.0.0.0:0"
                                        zkAddress="zkserver:2181"
                                        zkPath="/activemq/leveldb-stores"
                                        />
                </persistenceAdapter>

=================================
The error is:
INFO | Stopping BrokerService[localhost] due to exception, java.io.IOException
java.io.IOException
        at 
org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39)
        at 
org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:543)
        at 
org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:974)
        at 
org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1270)
        at 
org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1194)
        at 
org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:708)
       at 
org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:741)
        at 
org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:106)
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:258)
        at 
org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:108)
        at 
org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
        at 
org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1875)
        at 
org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2086)
        at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1581)
        at 
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
        at 
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1198)
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1194)
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1272)
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1271)
        at 
org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:315)
        at 
org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:317)
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1271)
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
        at 
org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:968)
        at 
org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:974)
        at 
org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:540)
        ... 17 more





--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to