[ https://issues.apache.org/jira/browse/AMQ-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810473#comment-13810473 ]
Hiram Chirino commented on AMQ-4837: ------------------------------------ Ah.. hold off.. I'm still seeing the issue. Need better unit test. > LevelDB corrupted in AMQ cluster > -------------------------------- > > Key: AMQ-4837 > URL: https://issues.apache.org/jira/browse/AMQ-4837 > Project: ActiveMQ > Issue Type: Bug > Components: activemq-leveldb-store > Affects Versions: 5.9.0 > Environment: CentOS, Linux version 2.6.32-71.29.1.el6.x86_64 > java-1.7.0-openjdk.x86_64/java-1.6.0-openjdk.x86_64 > zookeeper-3.4.5.2 > Reporter: Guillaume > Assignee: Hiram Chirino > Priority: Critical > Attachments: LevelDBCorrupted.zip > > > I have clustered 3 ActiveMQ instances using replicated leveldb and zookeeper. > When performing some tests using Web UI, I can across issues that appears to > corrupt the leveldb data files. > The issue can be replicated by performing the following steps: > 1. Start 3 activemq nodes. > 2. Push a message to the master (Node1) and browse the queue using the web > UI > 3. Stop master node (Node1) > 4. Push a message to the new master (Node2) and browse the queue using the > web UI. Message summary and queue content ok. > 5. Start Node1 > 6. Stop master node (Node2) > 7. Browse the queue using the web UI on new master (Node3). Message > summary ok however when clicking on the queue, no message details. An error > (see below) is logged by the master, which attempts a restart. > From this point, the database appears to be corrupted and the same error > occurs to each node infinitely (shutdown/restart). The only way around is to > stop the nodes and clear the data files. > However when a message is pushed between step 5 and 6, the error doesn’t > occur. > ================================= > Leveldb configuration on the 3 instances: > <persistenceAdapter> > <replicatedLevelDB > directory="${activemq.data}/leveldb" > replicas="3" > bind="tcp://0.0.0.0:0" > zkAddress="zkserver:2181" > zkPath="/activemq/leveldb-stores" > /> > </persistenceAdapter> > ================================= > The error is: > INFO | Stopping BrokerService[localhost] due to exception, java.io.IOException > java.io.IOException > at > org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39) > at > org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:543) > at > org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:974) > at > org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1270) > at > org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1194) > at > org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:708) > at > org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:741) > at > org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:106) > at > org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:258) > at > org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:108) > at > org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157) > at > org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1875) > at > org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2086) > at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1581) > at > org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129) > at > org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.lang.NullPointerException > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1198) > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1194) > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1272) > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1271) > at > org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:315) > at > org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:317) > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1271) > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271) > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271) > at > org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:968) > at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:974) > at > org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:540) > ... 17 more -- This message was sent by Atlassian JIRA (v6.1#6144)