Hello Hadoop Users list:

                We are running Hadoop version 0.18.2. My team lead has asked me 
to investigate the answer to a particular question regarding Hadoop's handling 
of offline DataNodes - specifically, we would like to know how long a node can 
be offline before it is totally rebuilt when it has been readded to the cluster.
                From what I've been able to determine from the documentation it 
appears to me that the NameNode will simply begin scheduling block replication 
on its remaining cluster members. If the offline node comes back online, and it 
reports all its blocks as being uncorrupted, then the NameNode just cleans up 
the "extra" blocks.
                In other words, there is no explicit handling based on the 
length of the outage - the behavior of the cluster will depend entirely on the 
outage duration.

                Anyone care to shed some light on this?

                Thanks!
Regards,
                Joseph Hammerman

Reply via email to