[ https://issues.apache.org/jira/browse/HDFS-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747725#action_12747725 ]
Boris Shkolnik commented on HDFS-343: ------------------------------------- I agree with Enis, that using random policy in highly heterogeneous clusters we may end up with severely underutilized nodes. (One extreme use-case is adding a new/restored empty node to a running cluster). I don't know about pluggable policy, but some modification/improvements to existing one can help. Using probability(based on usage) instead of direct placement should address issue #1. We also need to make sure that this overwrites only the "random" part of the placing policy. > Better Target selection for block replication > --------------------------------------------- > > Key: HDFS-343 > URL: https://issues.apache.org/jira/browse/HDFS-343 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Enis Soztutar > > Block replication policy tends to balance the number of blocks in each > datanode in the long run, however with heterogeneous clusters with varying > number of disks per node, the nodes with one disk fill quickly while nodes > with 3 disks still have 60% free disk space. This also reduces the advantage > of using more than one disk for parallel IO, since machines with multiple > disks are not used as much. > The javadoc of the ReplicationTargetChooser reads as : > The replica placement strategy is that if the writer is on a datanode, the > 1st replica is placed on the local machine, otherwise a random datanode. The > 2nd replica is placed on a datanode that is on a different rack. The 3rd > replica is placed on a datanode which is on the same rack as the first > replica. > I think we should switch to a policy that balances the percent of disk usage > rather than balancing total block count among the datanodes. This can be done > by defining the probability of selection of a datanode based on its disk > percent usage. A formula like 1 - (percent_usage / 100 ) seems reasonable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.