[ 
https://issues.apache.org/jira/browse/HDFS-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747725#action_12747725
 ] 

Boris Shkolnik commented on HDFS-343:
-------------------------------------

I agree with Enis, that using random policy in highly heterogeneous clusters we 
may end up with severely underutilized nodes. 
(One extreme use-case is adding a new/restored empty node to a running 
cluster). 
I don't know about pluggable policy, but some modification/improvements to 
existing one can help.
Using probability(based on usage) instead of direct placement should address  
issue  #1.
We also need to make sure that this overwrites only the "random" part of the 
placing policy.

> Better Target selection for block replication
> ---------------------------------------------
>
>                 Key: HDFS-343
>                 URL: https://issues.apache.org/jira/browse/HDFS-343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>
> Block replication policy tends to balance the number of blocks in each 
> datanode in the long run, however with heterogeneous clusters with varying 
> number of disks per node, the nodes with one disk fill quickly while nodes 
> with 3 disks still have 60% free disk space. This also reduces the advantage 
> of using more than one disk for parallel IO, since machines with multiple 
> disks are not used as much.
> The javadoc of the ReplicationTargetChooser reads as : 
> The replica placement strategy is that if the writer is on a datanode, the 
> 1st replica is placed on the local machine, otherwise a random datanode. The 
> 2nd replica is placed on a datanode that is on a different rack. The 3rd 
> replica is placed on a datanode which is on the same rack as the first 
> replica.
> I think we should switch to a policy that balances the percent of disk usage 
> rather than balancing total block count among the datanodes. This can be done 
> by defining the probability of selection of a datanode based on its disk 
> percent usage. A formula like 1 - (percent_usage / 100 ) seems reasonable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to