[ 
https://issues.apache.org/jira/browse/HADOOP-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-5479:
----------------------------------

    Attachment: numTransfers.patch

This patch has three changes:
# NameNode interprets numOfTransfers as numOfBlocks to be replicated. The 
current code interprets it as numOfTargets to be replicated. This change is 
made in DatanodeDescriptor#BlockTargetPair.poll(). This prevents empty 
replication requests as well as empty recover requests.
# The number of targets to be chosen is not capped by the number of transfers. 
Again NameNode should not treat the number of transfers as the number of 
targets.
# The third change is not directly related to this issue. But I saw this happen 
when I debugged this issue. The current code moves a block to the pending 
replication queue only when it reaches its replication factor. This sometimes 
causes over-replication because it does not track all pending replications. 
This patch adds a block to the pending replication queue whenever there is one 
replication scheduled for this block.

> NameNode should not send empty block replication request to DataNode
> --------------------------------------------------------------------
>
>                 Key: HADOOP-5479
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5479
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0, 0.21.0
>
>         Attachments: numTransfers.patch
>
>
> On our production clusters, we occasionally see that NameNode sends an empty 
> block replication request to DataNode on  every heartbeat, thus blocking this 
> DataNode from replicating or deleting any block.
> This is partly caused by DataNode sending a wrong number of replications in 
> progress which will be fixed by HADOOP-5465. There is also a flaw at the 
> NameNode side. NameNode should not interpret the number of replications in 
> progress as the number of targets since replication is done through a 
> pipeline. It also should make sure that no empty replication request is sent 
> to DataNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to