Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-12-05 Thread Ajit Ratnaparkhi
Hi,

dfs data directory at a datanode stores blocks in following directory
structure:
All blocks are stored at location:
dfs.data.dir/current/

This directory contains some blocks and some subdirectories named like
'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63)

To be precise, each directory in directory hierarchy rooted
at dfs.data.dir/current/ contains max 64 block (data+metadata) plus max
64 subdirectories (named subdir0 to subdir63).

So my question is, whenever I do a manual block transfer across disks for
load balancing with newly added disks, do I need to take care of
maintaining this constraint of directory hierarchy? or just putting blocks
in data.dfs.dir/current/ will work?

thanks,
Ajit.

On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi 
ajit.ratnapar...@gmail.com wrote:

 Thanks Harsh!


 On Tue, Nov 22, 2011 at 10:05 PM, Harsh J ha...@cloudera.com wrote:

 Ajit / Inder,

 Please see
 http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

 On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
 ajit.ratnapar...@gmail.com wrote:
  Thanks for Help Joey!
  Does just copying block files from one drive to another work?
  Isn't there metadata maintained at datanode about block locations on
 that
  datanode? If not, then how does datanode know about blocks stored on it?
 
  -Ajit.
  On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com
 wrote:
 
  The balancer only balances between datanodes. This means the new
  drives won't get used until you start writing new data to them. If you
  want to balance the drives on a node, you need to
 
  1) copy a bunch of block files from the old drives to the new drives
  2) shutdown the datanode
  3) delete the old block files
  4) configure the datanode to see the new drives
  5) start the datanode
 
  -Joey
 
  On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
  ajit.ratnapar...@gmail.com wrote:
   Hi,
   If I add additional disks to existing datanode (assume existing
 datanode
   has
   7 1TB disk which are already 80% full and then I add two new 2TB
 disks
   0%
   full) and then run balancer, does balancer balance data in a
 datanode?
   ie.
   Will it move data from existing disks to newly added disks such that
 all
   disks are approx equally full ?
   thanks,
   Ajit.
 
 
 
  --
  Joseph Echeverria
  Cloudera, Inc.
  443.305.9434
 
 



 --
 Harsh J





Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-12-05 Thread Harsh J
Ajit,

Just move/merge subdirectories - its the easiest way to go about it and does no 
harm. For confidence, you can also fire up a test cluster and test out these 
things :)

On 05-Dec-2011, at 2:59 PM, Ajit Ratnaparkhi wrote:

 Hi,
 
 dfs data directory at a datanode stores blocks in following directory 
 structure:
 All blocks are stored at location:
 dfs.data.dir/current/
 
 This directory contains some blocks and some subdirectories named like 
 'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63)
 
 To be precise, each directory in directory hierarchy rooted at 
 dfs.data.dir/current/ contains max 64 block (data+metadata) plus max 64 
 subdirectories (named subdir0 to subdir63).
 
 So my question is, whenever I do a manual block transfer across disks for 
 load balancing with newly added disks, do I need to take care of maintaining 
 this constraint of directory hierarchy? or just putting blocks in 
 data.dfs.dir/current/ will work?
 
 thanks,
 Ajit.
 
 On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi 
 ajit.ratnapar...@gmail.com wrote:
 Thanks Harsh!
 
 
 On Tue, Nov 22, 2011 at 10:05 PM, Harsh J ha...@cloudera.com wrote:
 Ajit / Inder,
 
 Please see 
 http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
 
 On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
 ajit.ratnapar...@gmail.com wrote:
  Thanks for Help Joey!
  Does just copying block files from one drive to another work?
  Isn't there metadata maintained at datanode about block locations on that
  datanode? If not, then how does datanode know about blocks stored on it?
 
  -Ajit.
  On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote:
 
  The balancer only balances between datanodes. This means the new
  drives won't get used until you start writing new data to them. If you
  want to balance the drives on a node, you need to
 
  1) copy a bunch of block files from the old drives to the new drives
  2) shutdown the datanode
  3) delete the old block files
  4) configure the datanode to see the new drives
  5) start the datanode
 
  -Joey
 
  On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
  ajit.ratnapar...@gmail.com wrote:
   Hi,
   If I add additional disks to existing datanode (assume existing datanode
   has
   7 1TB disk which are already 80% full and then I add two new 2TB disks
   0%
   full) and then run balancer, does balancer balance data in a datanode?
   ie.
   Will it move data from existing disks to newly added disks such that all
   disks are approx equally full ?
   thanks,
   Ajit.
 
 
 
  --
  Joseph Echeverria
  Cloudera, Inc.
  443.305.9434
 
 
 
 
 
 --
 Harsh J
 
 



RE: What does hdfs balancer do after adding more disks to existing datanode.

2011-11-22 Thread Uma Maheswara Rao G
Hi,

Current volume choosing policy is round robin fashion, Since the DN got new 
disk, balancer will balance some blocks to this node. But the volume choosing 
will be same when palcing the block. AFAIK, it wont do any special balancing 
between disks in the same node. please correct me if i understood your question 
wrongly.





Regards,

Uma



From: Ajit Ratnaparkhi [ajit.ratnapar...@gmail.com]
Sent: Tuesday, November 22, 2011 5:13 PM
To: hdfs-user@hadoop.apache.org; hdfs-...@hadoop.apache.org
Subject: What does hdfs balancer do after adding more disks to existing 
datanode.


Hi,

If I add additional disks to existing datanode (assume existing datanode has 7 
1TB disk which are already 80% full and then I add two new 2TB disks 0% full) 
and then run balancer, does balancer balance data in a datanode? ie. Will it 
move data from existing disks to newly added disks such that all disks are 
approx equally full ?

thanks,
Ajit.


Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-11-22 Thread Ajit Ratnaparkhi
Thanks for Help Joey!

Does just copying block files from one drive to another work?
Isn't there metadata maintained at datanode about block locations on that
datanode? If not, then how does datanode know about blocks stored on it?

-Ajit.

On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote:

 The balancer only balances between datanodes. This means the new
 drives won't get used until you start writing new data to them. If you
 want to balance the drives on a node, you need to

 1) copy a bunch of block files from the old drives to the new drives
 2) shutdown the datanode
 3) delete the old block files
 4) configure the datanode to see the new drives
 5) start the datanode

 -Joey

 On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
 ajit.ratnapar...@gmail.com wrote:
  Hi,
  If I add additional disks to existing datanode (assume existing datanode
 has
  7 1TB disk which are already 80% full and then I add two new 2TB disks 0%
  full) and then run balancer, does balancer balance data in a datanode?
 ie.
  Will it move data from existing disks to newly added disks such that all
  disks are approx equally full ?
  thanks,
  Ajit.



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-11-22 Thread Inder Pall
This is an interesting usecase based on my understanding data nodes send
block information to name node so if you move the block files around old
data node should stop sending and new nodes would start sending. each block
is a seperate file.

it would be better to try this but i dont think this is recommended for
production use.

inder
 On Nov 22, 2011 9:45 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com
wrote:

 Thanks for Help Joey!

 Does just copying block files from one drive to another work?
 Isn't there metadata maintained at datanode about block locations on that
 datanode? If not, then how does datanode know about blocks stored on it?

 -Ajit.

 On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.comwrote:

 The balancer only balances between datanodes. This means the new
 drives won't get used until you start writing new data to them. If you
 want to balance the drives on a node, you need to

 1) copy a bunch of block files from the old drives to the new drives
 2) shutdown the datanode
 3) delete the old block files
 4) configure the datanode to see the new drives
 5) start the datanode

 -Joey

 On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
 ajit.ratnapar...@gmail.com wrote:
  Hi,
  If I add additional disks to existing datanode (assume existing
 datanode has
  7 1TB disk which are already 80% full and then I add two new 2TB disks
 0%
  full) and then run balancer, does balancer balance data in a datanode?
 ie.
  Will it move data from existing disks to newly added disks such that all
  disks are approx equally full ?
  thanks,
  Ajit.



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434





Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-11-22 Thread Harsh J
Ajit / Inder,

Please see 
http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
ajit.ratnapar...@gmail.com wrote:
 Thanks for Help Joey!
 Does just copying block files from one drive to another work?
 Isn't there metadata maintained at datanode about block locations on that
 datanode? If not, then how does datanode know about blocks stored on it?

 -Ajit.
 On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote:

 The balancer only balances between datanodes. This means the new
 drives won't get used until you start writing new data to them. If you
 want to balance the drives on a node, you need to

 1) copy a bunch of block files from the old drives to the new drives
 2) shutdown the datanode
 3) delete the old block files
 4) configure the datanode to see the new drives
 5) start the datanode

 -Joey

 On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
 ajit.ratnapar...@gmail.com wrote:
  Hi,
  If I add additional disks to existing datanode (assume existing datanode
  has
  7 1TB disk which are already 80% full and then I add two new 2TB disks
  0%
  full) and then run balancer, does balancer balance data in a datanode?
  ie.
  Will it move data from existing disks to newly added disks such that all
  disks are approx equally full ?
  thanks,
  Ajit.



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434





-- 
Harsh J


Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-11-22 Thread Ajit Ratnaparkhi
Thanks Harsh!

On Tue, Nov 22, 2011 at 10:05 PM, Harsh J ha...@cloudera.com wrote:

 Ajit / Inder,

 Please see
 http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

 On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
 ajit.ratnapar...@gmail.com wrote:
  Thanks for Help Joey!
  Does just copying block files from one drive to another work?
  Isn't there metadata maintained at datanode about block locations on that
  datanode? If not, then how does datanode know about blocks stored on it?
 
  -Ajit.
  On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com
 wrote:
 
  The balancer only balances between datanodes. This means the new
  drives won't get used until you start writing new data to them. If you
  want to balance the drives on a node, you need to
 
  1) copy a bunch of block files from the old drives to the new drives
  2) shutdown the datanode
  3) delete the old block files
  4) configure the datanode to see the new drives
  5) start the datanode
 
  -Joey
 
  On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
  ajit.ratnapar...@gmail.com wrote:
   Hi,
   If I add additional disks to existing datanode (assume existing
 datanode
   has
   7 1TB disk which are already 80% full and then I add two new 2TB disks
   0%
   full) and then run balancer, does balancer balance data in a datanode?
   ie.
   Will it move data from existing disks to newly added disks such that
 all
   disks are approx equally full ?
   thanks,
   Ajit.
 
 
 
  --
  Joseph Echeverria
  Cloudera, Inc.
  443.305.9434
 
 



 --
 Harsh J