Hi All,
I have been trying to setup NN-HA using BKJournal plugins. Here I have observed the Checkpointing operations are getting skipped and not receiving the latest transactions. However on Active failure, the Standby is able to switch to Active by reading the log from the bookies. I just wanted to improve the switching time. Anything I have missed? Is there any configurations available in the Bookie Journal side, to make 'Hot Standby' rather than silently skipping the log streams ? Logs of Standby NN:- ------------------------- 2012-04-05 20:32:29,365 INFO ha.StandbyCheckpointer (StandbyCheckpointer.java:start(119)) - Starting standby checkpoint thread... Checkpointing active NN at 10.18.40.45:50070 Serving checkpoints at HOST-10-18-40-91/10.18.40.91:50070 2012-04-05 20:32:58,484 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(573)) - BLOCK* NameSystem.registerDatanode: node registration from 10.18.40.91:50010 storage DS-1584274703-10.18.40.91-50010-1333638178337 2012-04-05 20:32:58,487 INFO net.NetworkTopology (NetworkTopology.java:add(354)) - Adding a new node: /default-rack/10.18.40.91:50010 2012-04-05 20:32:58,557 INFO blockmanagement.BlockManager (BlockManager.java:processReport(1439)) - BLOCK* processReport: Received first block report from 10.18.40.91:50010 after becoming active. Its block contents are no longer considered stale. 2012-04-05 20:32:58,557 INFO hdfs.StateChange (BlockManager.java:processReport(1453)) - BLOCK* processReport: from 10.18.40.91:50010, blocks: 0, processing time: 2 msecs 2012-04-05 20:33:05,077 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(573)) - BLOCK* NameSystem.registerDatanode: node registration from 10.18.40.45:50010 storage DS-1120258987-10.18.40.45-50010-1333638341930 2012-04-05 20:33:05,078 INFO net.NetworkTopology (NetworkTopology.java:add(354)) - Adding a new node: /default-rack/10.18.40.45:50010 2012-04-05 20:33:05,185 INFO blockmanagement.BlockManager (BlockManager.java:processReport(1439)) - BLOCK* processReport: Received first block report from 10.18.40.45:50010 after becoming active. Its block contents are no longer considered stale. 2012-04-05 20:33:05,185 INFO hdfs.StateChange (BlockManager.java:processReport(1453)) - BLOCK* processReport: from 10.18.40.45:50010, blocks: 0, processing time: 0 msecs 2012-04-05 20:37:29,356 INFO ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(263)) - Triggering log roll on remote NameNode /10.18.40.45:8020 2012-04-05 20:38:37,614 INFO hdfs.StateChange (BlockManager.java:processReport(1453)) - BLOCK* processReport: from 10.18.40.91:50010, blocks: 0, processing time: 0 msecs 2012-04-05 20:39:14,209 INFO hdfs.StateChange (BlockManager.java:processReport(1453)) - BLOCK* processReport: from 10.18.40.45:50010, blocks: 0, processing time: 0 msecs 2012-04-05 20:42:29,368 INFO ha.StandbyCheckpointer (StandbyCheckpointer.java:doWork(270)) - Triggering checkpoint because it has been 600 seconds since the last checkpoint, which exceeds the configured interval 600 2012-04-05 20:42:29,368 INFO ha.StandbyCheckpointer (StandbyCheckpointer.java:doCheckpoint(151)) - A checkpoint was triggered but the Standby Node has not received any transactions since the last checkpoint at txid 0. Skipping... Thanks & Regards, Rakesh R
