Hi, I have HDFS cluster with 4 nodes: 2 master nodes (name0, name1) and 2 slave nodes (data0, data1). Hadoop 0.23 is used. Cluster OS is Ubuntu 10.04 Lucid. On name0 I bootstrap NameNode, on data0 and data1 - DataNode. On name1 I want to bootstrap BackupNode, but I got an error:
2011-12-28 15:49:04,543 INFO namenode.FSEditLog (FSEditLog.java: registerBackupNode(976)) - Registering new backup node: NamenodeRegistration(name1:50100, role=Backup Node) 2011-12-28 15:49:04,568 INFO namenode.FSNamesystem (FSNamesystem.java:startCheckpoint(3435)) - Start checkpoint for name1:50100 2011-12-28 15:49:04,568 INFO namenode.FSEditLog (FSEditLog.java:rollEditLog(798)) - Rolling edit logs. 2011-12-28 15:49:04,568 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(859)) - Ending log segment 94 2011-12-28 15:49:04,569 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(492)) - Number of transactions: 2 Total time for transactions(ms): 1Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 16 2011-12-28 15:49:04,570 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(814)) - Starting log segment at 96 2011-12-28 15:49:04,580 INFO common.Storage (EditLogBackupOutputStream.java:<init>(57)) - EditLogBackupOutputStream connects to: name1/10.4.113.70:50100 2011-12-28 15:49:04,878 ERROR namenode.FSEditLog (FSEditLog.java:logSync(425)) - Unable to sync edit log. java.io.IOException: Journal request from unexpected name-node: name0:9000 expecting name1/10.4.113.70:50100 at org.apache.hadoop.hdfs.server.namenode.BackupNode$BackupNodeRpcServer.journal(BackupNode.java:246) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:365) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1490) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1486) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484) at org.apache.hadoop.ipc.Client.call(Client.java:1085) ... 2011-12-28 15:49:04,879 ERROR namenode.FSEditLog (FSEditLog.java:disableAndReportErrorOnJournals(1066)) - Disabling journal JournalAndStream(mgr=org.apache.hadoop.hdfs.server.namenode.BackupJournalManager@3ed19cdf, stream=name1:50100) 2011-12-28 15:54:05,052 INFO namenode.FSNamesystem (FSNamesystem.java:startCheckpoint(3435)) - Start checkpoint for name1:50100 Also I try to bootstrap BackupNode on the NameNode host (name0). Looks better, but I still got the same error: 2011-12-29 10:06:06,940 INFO namenode.FSEditLog (FSEditLog.java:registerBackupNode(976)) - Registering new backup node: NamenodeRegistration(0.0.0.0:50100, role=Backup Node) 2011-12-29 10:06:06,971 INFO namenode.FSNamesystem (FSNamesystem.java:startCheckpoint(3435)) - Start checkpoint for 0.0.0.0:50100 2011-12-29 10:06:06,971 INFO namenode.FSEditLog (FSEditLog.java:rollEditLog(798)) - Rolling edit logs. 2011-12-29 10:06:06,972 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(859)) - Ending log segment 236 2011-12-29 10:06:06,972 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(492)) - Number of transactions: 2 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs : 2 SyncTimes(ms): 29 2011-12-29 10:06:06,973 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(492)) - Number of transactions: 2 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 29 2011-12-29 10:06:06,975 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(814)) - Starting log segment at 238 2011-12-29 10:06:06,990 INFO common.Storage (EditLogBackupOutputStream.java:<init>(57)) - EditLogBackupOutputStream connects to: /0.0.0.0:50100 2011-12-29 10:06:07,249 ERROR namenode.FSEditLog (FSEditLog.java:logSync(425)) - Unable to sync edit log. java.io.IOException: Journal request from unexpected name-node: name0:9000 expecting 0.0.0.0/0.0.0.0:50100 at org.apache.hadoop.hdfs.server.namenode.BackupNode$BackupNodeRpcServer.journal(BackupNode.java:246) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:365) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1490) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1486) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484) at org.apache.hadoop.ipc.Client.call(Client.java:1085) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:193) ... BackupNode logs: 011-12-29 10:06:06,935 INFO namenode.NameNode (NameNode.java:initialize(344)) - Backup Node up at: 0.0.0.0/0.0.0.0:50100 2011-12-29 10:06:06,936 INFO ipc.Server (Server.java:run(1460)) - IPC Server handler 2 on 50100: starting 2011-12-29 10:06:06,968 INFO namenode.Checkpointer (Checkpointer.java:initialize(110)) - Checkpoint Period : 3600 secs (60 min) 2011-12-29 10:06:06,968 INFO namenode.Checkpointer (Checkpointer.java:initialize(112)) - Transactions count is : 40000, to trigger checkpoint 2011-12-29 10:06:07,115 INFO namenode.FSImage (BackupImage.java:namenodeStartedLogSegment(320)) - NameNode started a new log segment at txid 238 2011-12-29 10:06:07,116 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(814)) - Starting log segment at 238 2011-12-29 10:06:07,236 INFO ipc.Server (Server.java:run(1498)) - IPC Server handler 1 on 50100, call: journal(NamenodeRegistration(name0:9000, role=NameNode), 238, 1, [B@4412dd76), rpc version=1, client version=1, methodsFingerPrint=-852377201 from 10.4.113.72:54533, error: java.io.IOException: Journal request from unexpected name-node: name0:9000 expecting 0.0.0.0/0.0.0.0:50100 at org.apache.hadoop.hdfs.server.namenode.BackupNode$BackupNodeRpcServer.journal(BackupNode.java:246) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:365) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1490) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1486) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484) 2011-12-29 10:06:07,303 INFO namenode.Checkpointer (Checkpointer.java:doCheckpoint(218)) - Unable to roll forward using only logs. Downloading image with txid 235 2011-12-29 10:06:07,477 INFO namenode.TransferFsImage (TransferFsImage.java:downloadImageToStorage(65)) - Downloaded file fsimage.ckpt_0000000000000000235 size 3427 bytes. 2011-12-29 10:06:07,484 INFO namenode.Checkpointer (Checkpointer.java:doCheckpoint(226)) - Loading image with txid 235 2011-12-29 10:06:07,490 INFO namenode.FSImage (FSImageFormat.java:load(181)) - Loading image file /tmp/hadoop-hduser/dfs/backup/current/fsimage_0000000000000000235 using no compression 2011-12-29 10:06:07,491 INFO namenode.FSImage (FSImageFormat.java:load(184)) - Number of files = 33 2011-12-29 10:06:07,502 INFO namenode.FSImage (FSImageFormat.java:loadFilesUnderConstruction(423)) - Number of files under construction = 0 2011-12-29 10:06:07,502 INFO namenode.FSImage (FSImageFormat.java:load(210)) - Image file of size 3427 loaded in 0 seconds. 2011-12-29 10:06:07,503 INFO namenode.FSImage (FSImage.java:loadFSImage(725)) - Loaded image for txid 235 from /tmp/hadoop-hduser/dfs/backup/current/fsimage_0000000000000000235 2011-12-29 10:06:07,508 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000094-0000000000000000095 size 30 bytes. 2011-12-29 10:06:07,511 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000096-0000000000000000097 size 30 bytes. 2011-12-29 10:06:07,515 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000098-0000000000000000099 size 30 bytes. 2011-12-29 10:06:07,518 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000100-0000000000000000101 size 30 bytes. 2011-12-29 10:06:07,522 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000102-0000000000000000103 size 30 bytes. 2011-12-29 10:06:07,526 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000104-0000000000000000105 size 30 bytes. 2011-12-29 10:06:07,530 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000106-0000000000000000107 size 30 bytes. 2011-12-29 10:06:07,534 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000108-0000000000000000109 size 30 bytes. 2011-12-29 10:06:07,538 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000110-0000000000000000111 size 30 bytes. 2011-12-29 10:06:07,541 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000112-0000000000000000113 size 30 bytes. 2011-12-29 10:06:07,569 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000114-0000000000000000235 size 1048576 bytes. 2011-12-29 10:06:07,573 INFO namenode.TransferFsImage (TransferFsImage.java:downloadEditsToStorage(93)) - Downloaded file edits_0000000000000000236-0000000000000000237 size 30 bytes. 2011-12-29 10:06:07,576 INFO namenode.Checkpointer (Checkpointer.java:rollForwardByApplyingLogs(287)) - Checkpointer about to load edits from 1 file(s). 2011-12-29 10:06:07,592 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(80)) - Edits file /tmp/hadoop-hduser/dfs/backup/current/edits_0000000000000000236-0000000000000000237 of size 30 edits # 2 loaded in 0 seconds. 2011-12-29 10:06:07,597 INFO namenode.FSImage (FSImageFormat.java:save(567)) - Saving image file /tmp/hadoop-hduser/dfs/backup/current/fsimage.ckpt_0000000000000000237 using no compression 2011-12-29 10:06:07,637 INFO namenode.FSImage (FSImageFormat.java:save(592)) - Image file of size 3427 saved in 0 seconds. 2011-12-29 10:06:07,643 INFO namenode.FSImageTransactionalStorageInspector (FSImageTransactionalStorageInspector.java:inspectDirectory(69)) - No version file in /tmp/hadoop-hduser/dfs/backup 2011-12-29 10:06:12,834 INFO namenode.TransferFsImage (TransferFsImage.java:uploadImageFromStorage(115)) - Uploaded image with txid 237 to namenode at 0.0.0.0:50070 2011-12-29 10:06:12,837 INFO namenode.FSImage (BackupImage.java:tryConvergeJournalSpool(282)) - Going to finish converging with remaining 0 txns from in-progress stream /tmp/hadoop-hduser/dfs/backup/current/edits_inprogress_0000000000000000238 2011-12-29 10:06:12,837 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(80)) - Edits file /tmp/hadoop-hduser/dfs/backup/current/edits_inprogress_0000000000000000238 of size 1048576 edits # 0 loaded in 0 seconds. 2011-12-29 10:06:12,838 INFO namenode.FSImage (BackupImage.java:tryConvergeJournalSpool(295)) - Successfully synced BackupNode with NameNode at txnid 237 2011-12-29 10:06:12,838 INFO namenode.Checkpointer (Checkpointer.java:doCheckpoint(263)) - Checkpoint completed in 5 seconds. New Image Size: 3427 Also I tried BackupNode using Hadoop 0.21 single-node installation - works fine. Is it a bug of 0.23 version or do I miss something? Can someone share an example of BackupNode configuration or give some useful link? -- Andrey