Hi All,


I have been trying to setup NN-HA using BKJournal plugins. Here I have observed 
the Checkpointing operations are getting skipped and not receiving the latest 
transactions. However on Active failure, the Standby is able to switch to 
Active by reading the log from the bookies.



I just wanted to improve the switching time. Anything I have missed?

Is there any configurations available in the Bookie Journal side, to make 'Hot 
Standby' rather than silently skipping the log streams ?



Logs of Standby NN:-

-------------------------

2012-04-05 20:32:29,365 INFO  ha.StandbyCheckpointer 
(StandbyCheckpointer.java:start(119)) - Starting standby checkpoint thread...
Checkpointing active NN at 10.18.40.45:50070
Serving checkpoints at HOST-10-18-40-91/10.18.40.91:50070
2012-04-05 20:32:58,484 INFO  hdfs.StateChange 
(DatanodeManager.java:registerDatanode(573)) - BLOCK* 
NameSystem.registerDatanode: node registration from 10.18.40.91:50010 storage 
DS-1584274703-10.18.40.91-50010-1333638178337
2012-04-05 20:32:58,487 INFO  net.NetworkTopology 
(NetworkTopology.java:add(354)) - Adding a new node: 
/default-rack/10.18.40.91:50010
2012-04-05 20:32:58,557 INFO  blockmanagement.BlockManager 
(BlockManager.java:processReport(1439)) - BLOCK* processReport: Received first 
block report from 10.18.40.91:50010 after becoming active. Its block contents 
are no longer considered stale.
2012-04-05 20:32:58,557 INFO  hdfs.StateChange 
(BlockManager.java:processReport(1453)) - BLOCK* processReport: from 
10.18.40.91:50010, blocks: 0, processing time: 2 msecs
2012-04-05 20:33:05,077 INFO  hdfs.StateChange 
(DatanodeManager.java:registerDatanode(573)) - BLOCK* 
NameSystem.registerDatanode: node registration from 10.18.40.45:50010 storage 
DS-1120258987-10.18.40.45-50010-1333638341930
2012-04-05 20:33:05,078 INFO  net.NetworkTopology 
(NetworkTopology.java:add(354)) - Adding a new node: 
/default-rack/10.18.40.45:50010
2012-04-05 20:33:05,185 INFO  blockmanagement.BlockManager 
(BlockManager.java:processReport(1439)) - BLOCK* processReport: Received first 
block report from 10.18.40.45:50010 after becoming active. Its block contents 
are no longer considered stale.
2012-04-05 20:33:05,185 INFO  hdfs.StateChange 
(BlockManager.java:processReport(1453)) - BLOCK* processReport: from 
10.18.40.45:50010, blocks: 0, processing time: 0 msecs
2012-04-05 20:37:29,356 INFO  ha.EditLogTailer 
(EditLogTailer.java:triggerActiveLogRoll(263)) - Triggering log roll on remote 
NameNode /10.18.40.45:8020
2012-04-05 20:38:37,614 INFO  hdfs.StateChange 
(BlockManager.java:processReport(1453)) - BLOCK* processReport: from 
10.18.40.91:50010, blocks: 0, processing time: 0 msecs
2012-04-05 20:39:14,209 INFO  hdfs.StateChange 
(BlockManager.java:processReport(1453)) - BLOCK* processReport: from 
10.18.40.45:50010, blocks: 0, processing time: 0 msecs
2012-04-05 20:42:29,368 INFO  ha.StandbyCheckpointer 
(StandbyCheckpointer.java:doWork(270)) - Triggering checkpoint because it has 
been 600 seconds since the last checkpoint, which exceeds the configured 
interval 600
2012-04-05 20:42:29,368 INFO  ha.StandbyCheckpointer 
(StandbyCheckpointer.java:doCheckpoint(151)) - A checkpoint was triggered but 
the Standby Node has not received any transactions since the last checkpoint at 
txid 0. Skipping...



Thanks & Regards,

Rakesh R

Reply via email to