[ https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273617#comment-13273617 ]
Suresh Srinivas commented on HDFS-3368: --------------------------------------- Sorry, I should have added more details to my comment: In your description of the problem first the failure is one by one - "At different times all three nodes malfunctioned and died, causing the replicas to migrate to dn1, dn2, dn3." Later the failure is together in a short time "Expectedly do1, do2, do3 malfunction again and go down shortly after reporting their blocks to NN". While you change how you choose the replicas to delete, the presence of nodes like do1, do2 and do3 means that the following scenario is possible: * d01, do2, do3 are chosen for adding new block. * client adds a block to these nodes. * shortly all do1, do2, do3 go down shortly. Now the replicas are no longer available. HDFS multiple replicas assumes the probability of three nodes having same replicas going down altogether in a short time is low. Given that not sure if this problem is important enough. Alternatively, given block placement policy is pluggable, you could write a custom implementation and not change the default implementation? > Missing blocks due to bad DataNodes comming up and down. > -------------------------------------------------------- > > Key: HDFS-3368 > URL: https://issues.apache.org/jira/browse/HDFS-3368 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0 > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > Attachments: blockDeletePolicy-0.22.patch, > blockDeletePolicy-trunk.patch, blockDeletePolicy.patch > > > All replicas of a block can be removed if bad DataNodes come up and down > during cluster restart resulting in data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira