[ https://issues.apache.org/jira/browse/AMQ-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347030#comment-14347030 ]
Artem Karpenko commented on AMQ-5618: ------------------------------------- For some reason these steps did not reproduce the problem on our working servers (I've setup this one separate environment). Additional step "helped": restart any slave between steps 2 and 3 (before restarting masters). Additionally, if you restart 2 slaves one at a time instead of one between steps 2 and 3 then it's enough to restart only one master to reproduce a problem (i.e. step 3 was not required). > Infinite loop in log replay with Replicated LevelDB > --------------------------------------------------- > > Key: AMQ-5618 > URL: https://issues.apache.org/jira/browse/AMQ-5618 > Project: ActiveMQ > Issue Type: Bug > Affects Versions: 5.11.0, 5.11.1 > Environment: Linux > Reporter: Artem Karpenko > Priority: Critical > > This is very similar to AMQ-5300 except that I use replicatedLevelDB > persistence adapter and in order to reproduce I don't have to delete any > index files. > Setup: 1 ZK instance, 3 AMQ nodes. > One of the AMQ configs: > {code} > <replicatedLevelDB directory="${activemq.data}/replicatedLevelDB" > replicas="3" > bind="tcp://0.0.0.0:61619" > zkAddress="instance-6:2181" > zkPath="/activemq/leveldb-stores" > hostname="instance-7" /> > {code} > Difference between nodes is only in hostname attribute. > The way to reproduce is almost the same as in AMQ-5300: > # Produce lots of messages to generate several log files in leveldb data > directory. > # Consume _some_ messages until you see "Deleting log" in activemq.log. > # Restart master. Wait for system to rebalance itself. Everything's fine at > this point. > # Restart the second master. > # Observe the massive (infinite?) logging on slave and relatively calm but > still possibly infinite logging on master. > This is what the first master logs after it's restarted: > {code} > 2015-02-25 21:37:08,338 | DEBUG | Download session connected... | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:08,582 | INFO | Slave skipping download of: > log/00000000190be289.log | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,099 | INFO | Slave skipping download of: > log/000000000642f848.log | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,411 | INFO | Slave skipping download of: > log/000000000c85f06d.log | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,838 | INFO | Slave skipping download of: > log/0000000012c8e921.log | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,842 | INFO | Slave requested: > 000000001c9373b4.index/CURRENT | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,846 | INFO | Slave requested: > 000000001c9373b4.index/MANIFEST-000002 | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,850 | INFO | Slave requested: > 000000001c9373b4.index/000003.log | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,857 | INFO | Attaching... Downloaded 0.02/95.65 kb and > 1/3 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,859 | INFO | Attaching... Downloaded 0.06/95.65 kb and > 2/3 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,861 | INFO | Attaching... Downloaded 95.65/95.65 kb and > 3/3 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,862 | INFO | Attached | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:37:09,878 | DEBUG | Taking a snapshot of the current index: > /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index > | org.apache.activemq.leveldb.LevelDBClient | Thread-2 > 2015-02-25 21:37:10,352 | DEBUG | Recovering from last index snapshot at: > /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index | > org.apache.activemq.leveldb.LevelDBClient | Thread-2 > {code} > Right after that everything seems fine. But as soon as I stop the new master, > the another new master (that would be the third one) logs > {code} > 2015-02-25 21:38:43,876 | INFO | Promoted to master | > org.apache.activemq.leveldb.replicated.MasterElector | main-EventThread > 2015-02-25 21:38:43,894 | INFO | Using the pure java LevelDB implementation. > | org.apache.activemq.leveldb.LevelDBClient | ActiveMQ > BrokerService[localhost] Task-5 > 2015-02-25 21:38:45,954 | WARN | Invalid log position: 44 | > org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] > Task-5 > 2015-02-25 21:38:45,955 | WARN | Invalid log position: 325280 | > org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] > Task-5 > ... > 2015-02-25 21:38:46,696 | WARN | Invalid log position: 104726993 | > org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] > Task-5 > 2015-02-25 21:38:46,902 | INFO | Master started: tcp://instance-8:61619 | > org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ > BrokerService[localhost] Task-6 > ... > 2015-02-25 21:38:52,127 | INFO | Initializing Spring FrameworkServlet > 'dispatcher' | /admin | main > 2015-02-25 21:38:53,181 | INFO | jolokia-agent: No access restrictor found > at classpath:/jolokia-access.xml, access to all MBeans is allowed | /api | > main > 2015-02-25 21:38:56,881 | WARN | Invalid log position: 0 | > org.apache.activemq.leveldb.LevelDBClient | Thread-4 > 2015-02-25 21:39:06,884 | WARN | Invalid log position: 0 | > org.apache.activemq.leveldb.LevelDBClient | Thread-4 > 2015-02-25 21:39:16,887 | WARN | Invalid log position: 0 | > org.apache.activemq.leveldb.LevelDBClient | Thread-4 > ... > {code} > the last message starts repeating every 10 seconds without hint of stopping. > At the same time the one left slave (that was the 1st master) logs > {code} > 2015-02-25 21:38:50,124 | INFO | Attached | > org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | > hawtdispatch-DEFAULT-1 > 2015-02-25 21:38:50,134 | DEBUG | Taking a snapshot of the current index: > /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index > | org.apache.activemq.leveldb.LevelDBClient | Thread-3 > 2015-02-25 21:38:50,149 | DEBUG | Recovering from last index snapshot at: > /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index | > org.apache.activemq.leveldb.LevelDBClient | Thread-3 > 2015-02-25 21:38:50,189 | DEBUG | Replay of journal from: 0 to 479425460. | > org.apache.activemq.leveldb.LevelDBClient | Thread-3 > 2015-02-25 21:38:50,328 | WARN | No reader available for position: 0, > log_infos: > {105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197), > > 210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340), > > 315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520), > > 420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)} > | org.apache.activemq.leveldb.RecordLog | Thread-3 > 2015-02-25 21:38:50,332 | WARN | No reader available for position: 0, > log_infos: > {105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197), > > 210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340), > > 315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520), > > 420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)} > | org.apache.activemq.leveldb.RecordLog | Thread-3 > ... > {code} > the last message repeats infinitely and every ~1ms, with activemq process > taking 100% CPU. > If I would now start the stopped node (that was the 2d master) it would log > exactly the same message with the same intensity. > It should also be noted that while I can observe correct diminished > (relatively to initial, due to some consumption) count of messages in a queue > via console after the first restart, the second restart changes that and > queue size becomes bigger (although I'm not sure if it is equal to the size > of queue before any consumption). -- This message was sent by Atlassian JIRA (v6.3.4#6332)