El 4/12/2011 10:46 AM, felix gao escribió:

What reason/condition would cause a datanode’s blocks to be removed? Our cluster had a one of its datanodes crash because of bad RAM. After the system was upgraded and the datanode/tasktracker brought online the next day we noticed the amount of space utilized was minimal and the cluster was rebalancing blocks to the datanode. It would seem the prior blocks were removed. Was this because the datanode was declared dead? What is the criteria for a namenode to decide (Assuming its the namenode) when a datanode should remove prior blocks?

1- Did you check the DataNode´s logs?
2- Did you protect the NameNode´s dfs.name.dir and the dfs.edits.dir ´s directories? On these directories, the NameNode stores the file system image and the second is where the edit log or journal is written. A good practice for these directories is to have them on RAID 1 or RAID 10 to guarantize the consistency of your cluster.

Any data loss in these directories (dfs.name.dir and dfs.edits.dir) will result in a loss of data in your HDFS. So, the second good practice is to have a secondary NameNode to setup in any case that the primary NameNode fails.

Another thing to keep in mind, is that when the NameNode fails, you have to restar the JobTracker and the TaskTrackers after that the NameNode will be restarted.
Regards

--
Marcos Luís Ortíz Valmaseda
 Software Engineer (Large-Scaled Distributed Systems)
 University of Information Sciences,
 La Habana, Cuba
 Linux User # 418229

Reply via email to