El 4/12/2011 10:46 AM, felix gao escribió:
What reason/condition would cause a datanode’s blocks to be removed?
Our cluster had a one of its datanodes crash because of bad RAM.
After the system was upgraded and the datanode/tasktracker brought
online the next day we noticed the amount of space utilized was
minimal and the cluster was rebalancing blocks to the datanode. It
would seem the prior blocks were removed. Was this because the
datanode was declared dead? What is the criteria for a namenode to
decide (Assuming its the namenode) when a datanode should remove prior
blocks?
1- Did you check the DataNode´s logs?
2- Did you protect the NameNode´s dfs.name.dir and the dfs.edits.dir ´s
directories?
On these directories, the NameNode stores the file system image and the
second is where the edit log or journal is written. A good practice for
these directories is to have them on RAID 1 or RAID 10 to guarantize the
consistency of your cluster.
Any data loss in these directories (dfs.name.dir and dfs.edits.dir)
will result in a loss of data in your HDFS. So, the second good practice
is to have a secondary NameNode to setup in any case that the primary
NameNode fails.
Another thing to keep in mind, is that when the NameNode fails, you have
to restar the JobTracker and the TaskTrackers after that the NameNode
will be restarted.
Regards
--
Marcos Luís Ortíz Valmaseda
Software Engineer (Large-Scaled Distributed Systems)
University of Information Sciences,
La Habana, Cuba
Linux User # 418229