[
https://issues.apache.org/jira/browse/HDFS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon resolved HDFS-1234.
-------------------------------
Resolution: Duplicate
Resolved by HDFS-630
> Datanode 'alive' but with its disk failed, Namenode thinks it's alive
> ---------------------------------------------------------------------
>
> Key: HDFS-1234
> URL: https://issues.apache.org/jira/browse/HDFS-1234
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20.1
> Reporter: Thanh Do
>
> - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks
> it's alive
>
> - Setups:
> + Replication = 1
> + # available datanodes = 2
> + # disks / datanode = 1
> + # failures = 1
> + Failure type = bad disk
> + When/where failure happens = first phase of the pipeline
>
> - Details:
> In this experiment we have two datanodes. Each node has 1 disk.
> However, if one datanode has a failed disk (but the node is still alive), the
> datanode
> does not keep track of this. From the perspective of the namenode,
> that datanode is still alive, and thus the namenode gives back the same
> datanode
> to the client. The client will retry 3 times by asking the namenode to
> give a new set of datanodes, and always get the same datanode.
> And every time the client wants to write there, it gets an exception.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do ([email protected]) and
> Haryadi Gunawi ([email protected])
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.