[ https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268200#comment-13268200 ]
Konstantin Shvachko commented on HDFS-3368: ------------------------------------------- I propose to adjust {{BlockPlacementPolicyDefault.chooseReplicaToDelete()}} to first look at the oldest heartbeat time, and second at the free space, when all heartbeats are within the heartbeat interval. With such policy in the scenario above the replicas for deletion are most likely to be assigned to do1, do2, do3, but will never be deleted, because the old nodes have already died. NN will automatically remove replicas from the live ones 10 minutes later or so. Also when only one or two DNs malfunction in the similar scenario this will reduce unnecessary deletions and replications. No change in policy will be seen in regular case when all nodes function properly. > Missing blocks due to bad DataNodes comming up and down. > -------------------------------------------------------- > > Key: HDFS-3368 > URL: https://issues.apache.org/jira/browse/HDFS-3368 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0 > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > > All replicas of a block can be removed if bad DataNodes come up and down > during cluster restart resulting in data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira