[ 
https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273617#comment-13273617
 ] 

Suresh Srinivas commented on HDFS-3368:
---------------------------------------

Sorry, I should have added more details to my comment:
In your description of the problem first the failure is one by one - "At 
different times all three nodes malfunctioned and died, causing the replicas to 
migrate to dn1, dn2, dn3." Later the failure is together in a short time 
"Expectedly do1, do2, do3 malfunction again and go down shortly after reporting 
their blocks to NN".

While you change how you choose the replicas to delete, the presence of nodes 
like do1, do2 and do3 means that the following scenario is possible:
* d01, do2, do3 are chosen for adding new block.
* client adds a block to these nodes.
* shortly all do1, do2, do3 go down shortly.
Now the replicas are no longer available.

HDFS multiple replicas assumes the probability of three nodes having same 
replicas going down altogether in a short time is low. Given that not sure if 
this problem is important enough. 

Alternatively, given block placement policy is pluggable, you could write a 
custom implementation and not change the default implementation?


                
> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: blockDeletePolicy-0.22.patch, 
> blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
>
>
> All replicas of a block can be removed if bad DataNodes come up and down 
> during cluster restart resulting in data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to