Re: [RFC] Replication Level policy improvement

Hairong Kuang Wed, 02 Apr 2008 09:37:08 -0700

Currently HDFS guarantees the replication level. When a datanode becomes
dead, HDFS automatically replicates all the blocks in the datanode as long
as any other datanode still has a replica. When the datanode rejoins the
cluster, HDFS removes an excessive replica.


Datanode rejoin has a risk if the datanode has an obsolete instantiation of
a block. So it is recommended that an administrator removes the old data on
the disk before bringing it back to the cluster.

Hairong

On 4/2/08 7:01 AM, "Alfonso Olias Sanz" <[EMAIL PROTECTED]>
wrote:

> Hi Hadoopers,
> 
> I opened a discussion on the core-users list about the replication
> level.  Whenever a data node is dead, all the blocks (files) contained
> in that node can be considered as lost??
> 
> And if that node never gets back again or at least it takes a while
> (long long time) till is ready again.  Some files can get their
> replication level compromised.
> 
> Shouldn't exist a daemon or being part of the name node server's
> responsibilities to recover from that failure.
> 
> My point is that whenever a data node is gone a replication process
> should be started in order to restore the replication level to all
> those files which have lost 1 replica.
> 
> Then the replication level would be guaranteed.
> 
> If the fault node is back again during the recovery process. It should
> not be considered as part of the data nodes group until this process
> is over. Then the file system would add the data node and free all the
> contained blocks in that data node.   Or enable the node to join and
> delete only the files who has been modified during the time that node
> was down and delete also the files which have been already replicated.
>  It would save time and bandwidth, but the process would be more
> complex.
> 
> Cheers
> Alfonso

Re: [RFC] Replication Level policy improvement

Reply via email to