[ 
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268200#comment-13268200
 ] 

Konstantin Shvachko commented on HDFS-3368:
-------------------------------------------

I propose to adjust {{BlockPlacementPolicyDefault.chooseReplicaToDelete()}} to 
first look at the oldest heartbeat time, and second at the free space, when all 
heartbeats are within the heartbeat interval.
With such policy in the scenario above the replicas for deletion are most 
likely to be assigned to do1, do2, do3, but will never be deleted, because the 
old nodes have already died. NN will automatically remove replicas from the 
live ones 10 minutes later or so. 
Also when only one or two DNs malfunction in the similar scenario this will 
reduce unnecessary deletions and replications.
No change in policy will be seen in regular case when all nodes function 
properly.
                
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>
> All replicas of a block can be removed if bad DataNodes come up and down 
> during cluster restart resulting in data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to