Hi,

(I should prefix this by saying that bin/hadoop fsck reported corrupt HDFS 
after I replaced one of the DNs with a new/empty DN)

I've removed 1 old DN and added 1 new DN .  The cluster has 4 nodes total (all 
4 act as DNs) and replication factor of 3.  I'm trying to re-balance the data 
by following http://wiki.apache.org/hadoop/FAQ#6:
- I stopped all daemons
- I removed the old DN and added the new DN to conf/slaves
- I started all daemons

The new DN shows in the JT and NN GUIs and bin/hadoop dfsadmin -report shows 
it.  At this point I expected NN to figure out that it needs to re-balance 
under-replicated blocks and start pushing data to the new DN.  However, no data 
got copied to the new DN.  I pumped the replication factor to 6 and restarted 
all daemons, but still nothing.  I noticed the NN GUI says the NN is in safe 
mode, but it has been stuck there for 10+ minutes now - too long, it seems.

I then tried running bin/hadoop balancer, but got this:

 
$ bin/hadoop balancer
Received an IO exception: org.apache.hadoop.dfs.SafeModeException: Cannot 
create file/system/balancer.id. Name node is in safe mode.
Safe mode will be turned off automatically.
        at 
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:947)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:931)
...
...

So now I'm wondering what steps one need to follow when replacing a DN?  Just 
pulling it out and listing a new one in conf/slaves leads to NN getting into 
the permanent(?) safe mode, it seems.

I know I can run bin/hadoop dfsadmin -safemode leave .... but is that safe? ;)
If I do that, will I then be able to run bin/hadoop balancer and get some 
replicas of the old HDFS data on the newly added DN?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Reply via email to