[jira] Commented: (HDFS-1105) Balancer improvement

Hairong Kuang (JIRA) Thu, 22 Apr 2010 15:37:17 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860046#action_12860046
 ]


Hairong Kuang commented on HDFS-1105:
-------------------------------------

Dmytro, I really like the improvements you proposed. We observed similar issues 
with the balancer in our clusters and are thinking a similar idea to limit the 
elapsed time of each iteration. I took a quick look at your patch. One comment 
is that making number of blocks to move in parallel to a given node may not be 
useful because each datanode is also configured to move 5 blocks in parallel.

> 3) it can hit namenode and the network pretty hard
This probably is caused by the call NamenodeProtocol#getBlocks. The number of 
returned blocks is limited by the total size. We should also have a limit on 
the total number of blocks returned. So the response size can be bounded 
ideally within 1M bytes.

> Balancer improvement
> --------------------
>
>                 Key: HDFS-1105
>                 URL: https://issues.apache.org/jira/browse/HDFS-1105
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Dmytro Molkov
>         Attachments: HDFS-1105.patch
>
>
> We were seeing some weird issues with the balancer in our cluster:
> 1) it can get stuck during an iteration and only restarting it helps
> 2) the iterations are highly inefficient. With 20 minutes iteration it moves 
> 7K blocks a minute for the first 6 minutes and hundreds of blocks in the next 
> 14 minutes
> 3) it can hit namenode and the network pretty hard
> A few improvements we came up with as a result:
> Making balancer more deterministic in terms of running time of iteration, 
> improving the efficiency and making the load configurable:
> Make many of the constants configurable command line parameters: Iteration 
> length, number of blocks to move in parallel to a given node and in cluster 
> overall.
> Terminate transfers that are still in progress after iteration is over.
> Previously iteration time was the time window in which the balancer was 
> scheduling the moves and then it would wait for the moves to finish 
> indefinitely. Each scheduling task can run up to iteration time or even 
> longer. This means if you have too many of them and they are long your actual 
> iterations are longer than 20 minutes. Now each scheduling task has a time of 
> the start of iteration and it should schedule the moves only if it did not 
> run out of time. So the tasks that have started after the iteration is over 
> will not schedule any moves.
> The number of move threads and dispatch threads is configurable so that 
> depending on the load of the cluster you can run it slower.
> I will attach a patch, please let me know what you think and what can be done 
> better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1105) Balancer improvement

Reply via email to