[ 
https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405178#comment-13405178
 ] 

Uma Maheswara Rao G commented on HDFS-3586:
-------------------------------------------

Thanks Brahma for filing the JIRA.

This is similar to HDFS-3493. But here the change is, DNs are available more 
than replication. So, ideally block should get replicated.
The problem here is, you have 2 live replicas and in remaining 2 DNs you have 
partial block present in RBW. So, when NN tries to replicate, DN will reject 
them saying block already exist in RBW. So, your replication may not happen 
even though you have more nodes.

Here I think the possible fix could be that, we should change the below 
condition 
*if (countNodes(b.stored).liveReplicas() >= bc.getReplication()) {*

to something like *if ((countNodes(b.stored).liveReplicas() + 
countNodes(b.stored).corruptReplicas())  > bc.getReplication()) {*

So, the extra corrupted blocks(more than replication) will get invalidated and 
later replication can work normally.


                
> Blocks are not getting replicate even DN's are availble.
> --------------------------------------------------------
>
>                 Key: HDFS-3586
>                 URL: https://issues.apache.org/jira/browse/HDFS-3586
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, name-node
>    Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 3.0.0
>            Reporter: Brahma Reddy Battula
>         Attachments: HDFS-3586-analysis.txt
>
>
> Scenario:
> =========
> Started four DN's(Say DN1,DN2,DN3 and DN4)
> writing files with RF=3..
> formed pipeline with DN1->DN2->DN3.
> Since DN3 network is very slow.it's not able to send acks.
> Again pipeline is fromed with DN1->DN2->DN4.
> Here DN4 network is also slow.
> So finally commitblocksync happend tp DN1 and DN2 successfully.
> block present in all the four DN's(finalized state in two DN's and rbw state 
> in another DN's)..
> Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's 
> are already present in RBW dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to