Datanode 'alive' but with its disk failed, Namenode thinks it's alive ---------------------------------------------------------------------
Key: HDFS-1234 URL: https://issues.apache.org/jira/browse/HDFS-1234 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks it's alive - Setups: + Replication = 1 + # available datanodes = 2 + # disks / datanode = 1 + # failures = 1 + Failure type = bad disk + When/where failure happens = first phase of the pipeline - Details: In this experiment we have two datanodes. Each node has 1 disk. However, if one datanode has a failed disk (but the node is still alive), the datanode does not keep track of this. From the perspective of the namenode, that datanode is still alive, and thus the namenode gives back the same datanode to the client. The client will retry 3 times by asking the namenode to give a new set of datanodes, and always get the same datanode. And every time the client wants to write there, it gets an exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.