[ 
https://issues.apache.org/jira/browse/HADOOP-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465520
 ] 

Hairong Kuang commented on HADOOP-659:
--------------------------------------

I am OK to priotize blocks that  have less than 1/3 of their replicas in place. 
Then I would introduce 3 priority levels. The blocks that have only one replica 
have the highest priority, followed by blocks having less than 1/3 replicas, 
and then followed by the rest of the blocks.

The data structures that suggested by Konstantin has the performance advantage. 
But I went through some code design and felt that the data structures and 
manipulations were quite complicated. It's not a simple solution. I am 
convinced that it is the right way to go. Besides, adding blocks to the 
begining or the end of the priority list is not an extensible solution when we 
have more than 2 priority levels.

> Boost the priority of re-replicating blocks that are far from their 
> replication target
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-659
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.7.2
>            Reporter: Konstantin Shvachko
>         Assigned To: Hairong Kuang
>
> I see two types of replications that should be accelerated compared to all 
> others.
> 1. Blocks that have only one remaining copy (but are required to have higher 
> replication).
> 2. Blocks that have less than 1/3 of their replicas in place.
> The latter occurs when map/reduce sets replication of certain files to 10, 
> and we want
> it happen fast to achieve better performance on the tasks.
> So I think we should distinguish two major groups of under-replicated blocks:
> first-priority (having only 1 copy or less than 1/3 of required replicas), 
> and the rest.
> The name-node places first-priority blocks into the beginning of the 
> neededReplication
> list, and the rest are placed at the end. That way the first-priority blocks 
> will be replicated
> first and then the others.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to