[
https://issues.apache.org/jira/browse/HADOOP-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dhruba borthakur updated HADOOP-940:
------------------------------------
Attachment: pendingReplication.patch
First version of code. Review comments welcome.
A new thread that records all replications that are currently in progress. If a
replication request does not complete in 10 minutes, then the block is put back
in neededReplication.
> pendingReplications of FSNamesystem is not informative
> ------------------------------------------------------
>
> Key: HADOOP-940
> URL: https://issues.apache.org/jira/browse/HADOOP-940
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.10.1
> Reporter: Hairong Kuang
> Attachments: pendingReplication.patch
>
>
> Currently when a neededReplication block is scheduled to be replicated, it is
> put to the pendingReplications queue. When it is no longer under replicated,
> it is pulled out of the pendingReplications queue. But the queue does not
> provide any information like how many targets have been choosen or who those
> targets are. PendingReplications are not used when deciding if a block is
> under replication. This may cause a block to be over replications or
> inaccurate estimate of its replication priority.
> For example, when a block has 1 replicas but it's replication factor is 2, a
> data node is choosen to replicate this block and the block is put in the
> pendingReplications queue. If the block's replication factor is changed to be
> 3 before the block replication notification, which is the next block report,
> comes in, the block will be put into neededReplictions queue again under the
> assumption that it needs to choose 2 targets instead of 1. So the block will
> end up with 4 replicas.
> I propose that we change pendingReplications to be a map from a block to the
> choosen data nodes. Data nodes in both pendingReplications and blockMap are
> used when deciding the total number of replicas that a block has. When the
> name node is notified that the block is replicated in a choosen data node,
> the data node is moved from pendingReplications to blockMap.
> Each choosen target is also associated with a timer indicating how long it
> expects to receive the block replication notification. PendingReplications
> queue needs to be periodically scanned to remove those data nodes whose timer
> is expired.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.