[ 
https://issues.apache.org/jira/browse/HDFS-14675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909008#comment-16909008
 ] 

Stephen O'Donnell commented on HDFS-14675:
------------------------------------------

Both of these settings are limits for any given datanode. The speed of 
balancing at one node is dictated by how many concurrent moves it is allowed 
(ie how many blocks it can move in parallel) and the bandwidth that DN is 
allowed to use. Therefore, as the cluster grows, more nodes will be involved in 
balancing and the aggregate speed will be increased. For a smaller cluster, the 
aggregate speed will be less, but throttled by the two settings mentioned here.

It is very hard to get a good idea of the speed-up changing these parameters 
will bring, as it depends on a lot of factors, such as:
 # How many nodes are under and over utilised.
 # The size of the blocks - are they tiny blocks or close to the block size
 # Other load on the cluster
 # The general network capacity, number of disks in the DN etc.

>From a support perspective, we see a lot of cases where the customer complains 
>"the balancer is running too slowly" and our default answer to that is to 
>increase concurrent moves to 250 and bandwidth to 1Gib. This tends to get the 
>balancer running at a fast speed, and has not been seen to cause any major 
>issues on the clusters. Based on using 250 / 1Gib on many clusters, the 
>changes proposed here (100 / 100Mib) are still fairly conservative, but I hope 
>it gets performance to a 'fast enough level without stressing the cluster. 
>Experience says that will be the case.

Note that originally these settings were 5 and 1Mib for concurrent moves and 
bandwidth, which were much too low, so a previous Jira increased the defaults 
to 50 / 10Mib, but we have found that is still not high enough in most cases, 
which is why this Jira was raised to increase them further.

> Increase Balancer Defaults Further
> ----------------------------------
>
>                 Key: HDFS-14675
>                 URL: https://issues.apache.org/jira/browse/HDFS-14675
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14675.001.patch
>
>
> HDFS-10297 increased the balancer defaults to 50 for 
> dfs.datanode.balance.max.concurrent.moves and to 10MB/s for 
> dfs.datanode.balance.bandwidthPerSec.
> We have found that these settings often have to be increased further as users 
> find the balancer operates too slowly with 50 and 10MB/s. We often recommend 
> moving concurrent moves to between 200 and 300 and setting the bandwidth to 
> 100 or even 1000MB/s, and these settings seem to work well in practice.
> I would like to suggest we increase the balancer defaults further. I would 
> suggest 100 for concurrent moves and 100MB/s for the bandwidth, but I would 
> like to know what others think on this topic too.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to