Currently HDFS guarantees the replication level. When a datanode becomes dead, HDFS automatically replicates all the blocks in the datanode as long as any other datanode still has a replica. When the datanode rejoins the cluster, HDFS removes an excessive replica.
Datanode rejoin has a risk if the datanode has an obsolete instantiation of a block. So it is recommended that an administrator removes the old data on the disk before bringing it back to the cluster. Hairong On 4/2/08 7:01 AM, "Alfonso Olias Sanz" <[EMAIL PROTECTED]> wrote: > Hi Hadoopers, > > I opened a discussion on the core-users list about the replication > level. Whenever a data node is dead, all the blocks (files) contained > in that node can be considered as lost?? > > And if that node never gets back again or at least it takes a while > (long long time) till is ready again. Some files can get their > replication level compromised. > > Shouldn't exist a daemon or being part of the name node server's > responsibilities to recover from that failure. > > My point is that whenever a data node is gone a replication process > should be started in order to restore the replication level to all > those files which have lost 1 replica. > > Then the replication level would be guaranteed. > > If the fault node is back again during the recovery process. It should > not be considered as part of the data nodes group until this process > is over. Then the file system would add the data node and free all the > contained blocks in that data node. Or enable the node to join and > delete only the files who has been modified during the time that node > was down and delete also the files which have been already replicated. > It would save time and bandwidth, but the process would be more > complex. > > Cheers > Alfonso
