[
https://issues.apache.org/jira/browse/HADOOP-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Chansler updated HADOOP-2259:
------------------------------------
Component/s: dfs
> Replication should be decoupled from heartbeat
> ----------------------------------------------
>
> Key: HADOOP-2259
> URL: https://issues.apache.org/jira/browse/HADOOP-2259
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.15.0
> Environment: Hadoop 80 node cluster
> Reporter: Srikanth Kakani
>
> I did a simple experiment of shooting down one node in the cluster and
> measure the time taken to replicate the under-replicated blocks.
> ~30000 blocks were under replicated == ~400 / node should take 200 minutes
> to replicate completely given 1 minute heartbeat interval.
> My findings: it took around 220 minutes, which is reasonable.
> Bug: Replication is coupled with heartbeat. Heartbeat interval is based on
> how much a namenode can handle. Repliaction should be based on how much a
> datanode can handle.
> So given the default heartbeat interval of 20 seconds, we computed datanodes
> can handle 2 replications in that interval based on which Namenodes give 2
> blocks per heartbeat to replicate.
> What we propose is to keep the 20second/2blocks constant and hence a datanode
> coming in with a heartbeat of 1 minute interval should be given 6 blocks to
> replicate per heartbeat. In this case instead on taking 200 minutes it should
> take 200/3 ~1 hour to replicate the entire node.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.