Thanks Ivan. I am using 0.24 version. I have configured <dfs.namenode.edit.dirs> with bookie details in both the NNs and observed no 'journalSet' is creating in the BNN side. Probably will try either by merging HDFS-3058 or using the gitub repos.
Thanks, Rakesh R ________________________________________ From: Ivan Kelly [[email protected]] Sent: Tuesday, April 10, 2012 7:48 PM To: [email protected] Subject: Re: Standby NN is skipping the Checkpointing ? What is the state of the EditLogTailer thread in this case. 20:37:29,356 indicates that the log has rolled, so it should be possible to read events from it, even if the events are just start and end segment. However, doTailEdits() doesn't seem to be called. Which version of the code are you using? The code to get BK working with HA isn't in hadoop-common trunk yet. In particular, HDFS-3058 needs to be applied for it to work. All changes already exist in https://github.com/ivankelly/hadoop-common/tree/BKJM-benching -Ivan On Mon, Apr 09, 2012 at 05:22:18AM +0000, Rakesh R wrote: > Hi All, > > > > I have been trying to setup NN-HA using BKJournal plugins. Here I have > observed the Checkpointing operations are getting skipped and not receiving > the latest transactions. However on Active failure, the Standby is able to > switch to Active by reading the log from the bookies. > > > > I just wanted to improve the switching time. Anything I have missed? > > Is there any configurations available in the Bookie Journal side, to make > 'Hot Standby' rather than silently skipping the log streams ? > > > > Logs of Standby NN:- > > ------------------------- > > 2012-04-05 20:32:29,365 INFO ha.StandbyCheckpointer > (StandbyCheckpointer.java:start(119)) - Starting standby checkpoint thread... > Checkpointing active NN at 10.18.40.45:50070 > Serving checkpoints at HOST-10-18-40-91/10.18.40.91:50070 > 2012-04-05 20:32:58,484 INFO hdfs.StateChange > (DatanodeManager.java:registerDatanode(573)) - BLOCK* > NameSystem.registerDatanode: node registration from 10.18.40.91:50010 storage > DS-1584274703-10.18.40.91-50010-1333638178337 > 2012-04-05 20:32:58,487 INFO net.NetworkTopology > (NetworkTopology.java:add(354)) - Adding a new node: > /default-rack/10.18.40.91:50010 > 2012-04-05 20:32:58,557 INFO blockmanagement.BlockManager > (BlockManager.java:processReport(1439)) - BLOCK* processReport: Received > first block report from 10.18.40.91:50010 after becoming active. Its block > contents are no longer considered stale. > 2012-04-05 20:32:58,557 INFO hdfs.StateChange > (BlockManager.java:processReport(1453)) - BLOCK* processReport: from > 10.18.40.91:50010, blocks: 0, processing time: 2 msecs > 2012-04-05 20:33:05,077 INFO hdfs.StateChange > (DatanodeManager.java:registerDatanode(573)) - BLOCK* > NameSystem.registerDatanode: node registration from 10.18.40.45:50010 storage > DS-1120258987-10.18.40.45-50010-1333638341930 > 2012-04-05 20:33:05,078 INFO net.NetworkTopology > (NetworkTopology.java:add(354)) - Adding a new node: > /default-rack/10.18.40.45:50010 > 2012-04-05 20:33:05,185 INFO blockmanagement.BlockManager > (BlockManager.java:processReport(1439)) - BLOCK* processReport: Received > first block report from 10.18.40.45:50010 after becoming active. Its block > contents are no longer considered stale. > 2012-04-05 20:33:05,185 INFO hdfs.StateChange > (BlockManager.java:processReport(1453)) - BLOCK* processReport: from > 10.18.40.45:50010, blocks: 0, processing time: 0 msecs > 2012-04-05 20:37:29,356 INFO ha.EditLogTailer > (EditLogTailer.java:triggerActiveLogRoll(263)) - Triggering log roll on > remote NameNode /10.18.40.45:8020 > 2012-04-05 20:38:37,614 INFO hdfs.StateChange > (BlockManager.java:processReport(1453)) - BLOCK* processReport: from > 10.18.40.91:50010, blocks: 0, processing time: 0 msecs > 2012-04-05 20:39:14,209 INFO hdfs.StateChange > (BlockManager.java:processReport(1453)) - BLOCK* processReport: from > 10.18.40.45:50010, blocks: 0, processing time: 0 msecs > 2012-04-05 20:42:29,368 INFO ha.StandbyCheckpointer > (StandbyCheckpointer.java:doWork(270)) - Triggering checkpoint because it has > been 600 seconds since the last checkpoint, which exceeds the configured > interval 600 > 2012-04-05 20:42:29,368 INFO ha.StandbyCheckpointer > (StandbyCheckpointer.java:doCheckpoint(151)) - A checkpoint was triggered but > the Standby Node has not received any transactions since the last checkpoint > at txid 0. Skipping... > > > > Thanks & Regards, > > Rakesh R
