Hello Hadoop Users list: We are running Hadoop version 0.18.2. My team lead has asked me to investigate the answer to a particular question regarding Hadoop's handling of offline DataNodes - specifically, we would like to know how long a node can be offline before it is totally rebuilt when it has been readded to the cluster. From what I've been able to determine from the documentation it appears to me that the NameNode will simply begin scheduling block replication on its remaining cluster members. If the offline node comes back online, and it reports all its blocks as being uncorrupted, then the NameNode just cleans up the "extra" blocks. In other words, there is no explicit handling based on the length of the outage - the behavior of the cluster will depend entirely on the outage duration.
Anyone care to shed some light on this? Thanks! Regards, Joseph Hammerman