According to hadoop tutorial on Yahoo developer netwrok and hadoop 
documentation on apache, a simple way to achieve namenode backup and recovery 
from single point namenode failure is to use a folder which is mounted on 
namenode machine but actually on a different machine to save dfs meta data as 
well, in addition to the folder on the namenode, as follows:

<property>
    <name>dfs.name.dir</name>
    <value>/home/hadoop/dfs/name,/mnt/namenode-backup</value>
    <final>true</final>
  </property>where /mnt/namenode-backup is mounted on the namenode machine

I followed this approach. However, we did this not to a fresh cluster, instead, 
we have run the cluster for a while, which means it has data already in hdfs.

But
 this method or my deployment failed and namenode simply failed to start. I did 
almost the same: instead of mounting the namenode-backup under /mnt, I mount it 
under "/". The folder "/namenode-backup" belongs to account "hadoop", under 
which the cluster is running. Thus there is no access restriction issue.

I got the following errors in the namenode log on the namenode machine:

/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = namenodedomainname/#.#.#.#
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2+228
STARTUP_MSG:   build =  -r cfc3233ece0769b11af9add328261295aaf4d1ad; compiled 
by 'root' on Mon Mar 22 03:11:39 EDT 2010
************************************************************/
2010-06-14 16:46:53,879 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=50001
2010-06-14 16:46:53,886 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
Namenode up at: namenodedomainname/#.#.#.#:50001
2010-06-14 16:46:53,888 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-06-14 16:46:53,889 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing 
NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-06-14 16:46:53,934 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2010-06-14 16:46:53,934 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-06-14 16:46:53,934 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2010-06-14 16:46:53,940 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
Initializing FSNamesystemMetrics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2010-06-14 16:46:53,942 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStatusMBean
2010-06-14 16:47:23,974 INFO org.apache.hadoop.hdfs.server.common.Storage: 
java.io.IOException: No locks available
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
        at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013)

        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
        at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013)

2010-06-14 16:47:23,976 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
initialization failed.
java.io.IOException: No locks available
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
        at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013)
2010-06-14 16:47:23,976 INFO org.apache.hadoop.ipc.Server: Stopping server on 
50001
2010-06-14 16:47:23,977 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
java.io.IOException: No locks available
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:881)
        at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:527)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:505)
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:285)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013)

2010-06-14 16:47:23,978 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenodedomainname/#.#.#.#
************************************************************/

Thanks for your help!

-Michael



      

Reply via email to