[ 
https://issues.apache.org/jira/browse/HDFS-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996803#comment-15996803
 ] 

Daryn Sharp commented on HDFS-8818:
-----------------------------------

Rather than creating fixed thread pools which will be idle as cluster size 
increases, perhaps cached thread pools that spawn dynamically would help.

The previous balancer was easy to configure.  I don't fully understand the 
previous design but a simpler approach that achieves the same improvement would 
be returning to a single fixed thread pool - with intelligent queuing of work.  
Ie. interleaving work for all targets, with a max queued limit, so replications 
are distributed evenly across nodes.  I'm assuming it didn't do that.

{quote}
Do you have HDFS-8824 in your runs? I suspect the first run has it but the 
second one does not.
bq. over time older nodes will end up with only small blocks, if it is set 
permanently? It will look good for quick balancing, but may not be good in long 
term
{quote}

Exactly.  We had to disable the feature because nodes become concentrated with 
small blocks.  getBlocks becomes increasing expensive as it searches for a 
dwindling number of large blocks on unbalanced nodes.  The client load 
increases on those nodes due to block volume.  Eventually the balancer just 
plays a shell game moving the larger blocks.

The current balancer probably works great when adding nodes, but not as a 
continuous service.  If not reverted, something has to be done to restore 
previous steady state performance.

> Allow Balancer to run faster
> ----------------------------
>
>                 Key: HDFS-8818
>                 URL: https://issues.apache.org/jira/browse/HDFS-8818
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer & mover
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>             Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
>         Attachments: bal1.png, bal2.png, h8818_20150723.patch, 
> h8818_20150727.patch, HDFS-8818-branch-2.7.00.patch
>
>
> The original design of Balancer is intentionally to make it run slowly so 
> that the balancing activities won't affect the normal cluster activities and 
> the running jobs.
> There are new use case that cluster admin may choose to balance the cluster 
> when the cluster load is low, or in a maintain window.  So that we should 
> have an option to allow Balancer to run faster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to