[ 
https://issues.apache.org/jira/browse/HDFS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143493#comment-13143493
 ] 

Aaron T. Myers commented on HDFS-2537:
--------------------------------------

+1, something to address this issue would be great. Either a way of dynamically 
changing the various settings which control replication rate, or some smarter 
algorithm for determining the appropriate replication rate. In particular, I 
would love to see something that gets *way* more aggressive on blocks which end 
up with only replication 1 after some failures. We already prioritize these 
blocks over others, but I don't think we currently do anything to increase the 
overall replication rate.

We've seen issues where when a whole rack (or large number of DNs) goes offline 
simultaneously that blocks may sit at replication 1 for far too long. This is 
obviously undesirable, and currently the only way to increase the replication 
rate is to change some configs and bounce the NN, which may be even more 
undesirable.

Part of the problem is that the replication work given to a DN on each 
heartbeat is capped at {{dfs.namenode.replication.max-streams}}, which defaults 
to 2. It's entirely possible, if not likely, that a DN will get through all of 
its replication work before the next heartbeat (especially if the blocks are 
small) in which case it will just sit idle until the next heartbeat. One 
relatively easy improvement to this would be to have two configs along the 
lines of {{dfs.namenode.replication.max-blocks-per-heartbeat}} and 
{{dfs.datanode.replication.max-streams}}. This way the NN can send the DN more 
work than the DN may actually be able to get through in any given heartbeat, 
and then the DN can limit its own max simultaneous block streams dedicated to 
replication.
                
> re-replicating under replicated blocks should be more dynamic
> -------------------------------------------------------------
>
>                 Key: HDFS-2537
>                 URL: https://issues.apache.org/jira/browse/HDFS-2537
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Nathan Roberts
>
> When a node fails or is decommissioned, a large number of blocks become 
> under-replicated. Since re-replication work is distributed, the hope would be 
> that all blocks could be restored to their desired replication factor in very 
> short order. This doesn't happen though because the load the cluster is 
> willing to devote to this activity is mostly static (controlled by 
> configuration variables). Since it's mostly static, the rate has to be set 
> conservatively to avoid overloading the cluster with replication work.
> This problem is especially noticeable when you have lots of small blocks. It 
> can take many hours to re-replicate the blocks that were on a node while the 
> cluster is mostly idle. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to