RE: Load balancing HDFS

Uma Maheswara Rao G Wed, 30 Nov 2011 06:53:31 -0800

>

________________________________

>From: Lior Schachter [lior...@gmail.com]
>Sent: Wednesday, November 30, 2011 7:04 PM
>To: hdfs-user@hadoop.apache.org
>Subject: Re: Load balancing HDFS

>Thanks Uma.
>So when HDFS writes data the it distributes the blocks only according to the 
>percentage usage (and not >actual utilization)?
  yes, actually speaking it will check many factors while choosing the targets 
for a block.
1) what is the excivers count. Onaverage excivers count should less than other 
nodes.
2) Whether node is already decommisioned or inprogress..
3) is blocksize*MIN_BLKS_FOR_WRITE > remaining space , then the node is not good
4) whether has too many chosen nodes in current rack.

for your case, check the 3rd point. That means , it is not bothering till it 
reaches some tolarent boundary of the space in node.
>I think that running balancer between every job is overkill. I prefer to 
>format the existing nodes and give them >3TB.
I agree for running balancer for evry job is over kill.
>Lior

On Wed, Nov 30, 2011 at 3:02 PM, Uma Maheswara Rao G 
<mahesw...@huawei.com<mailto:mahesw...@huawei.com>> wrote:

Default blockplacement policy will check the remaining space like following.

If the remaining space in that node is greater than blksize*MIN_BLKS_FOR_WRITE 
(default 5) , then it will treat that node as good.

I think the option may be is to run the balancer to move the blocks based on DN 
utilization, in-between after some jobs completed... I am not sure this can 
work with your requirements.

Regards,

Uma

________________________________
From: Lior Schachter [lior...@gmail.com<mailto:lior...@gmail.com>]
Sent: Wednesday, November 30, 2011 5:55 PM
To: hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>
Subject: Load balancing HDFS

Hi all,
We currently have a 10 nodes cluster with 6TB per machine.
We are buying few more nodes and considering to have only 3TB per machine.

By default HDFS assigns blocks according to used capacity, percentage wise.
This means that old nodes will contain more data.
We prefer that the nodes (6TB, 3TB) will be balanced by actual used space so 
M/R jobs will work better.
We don't expect to exceed the 3TB limit (buy more machines).

Thanks,
Lior

RE: Load balancing HDFS

Reply via email to