Re: What does hdfs balancer do after adding more disks to existing datanode.
Hi, dfs data directory at a datanode stores blocks in following directory structure: All blocks are stored at location: dfs.data.dir/current/ This directory contains some blocks and some subdirectories named like 'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63) To be precise, each directory in directory hierarchy rooted at dfs.data.dir/current/ contains max 64 block (data+metadata) plus max 64 subdirectories (named subdir0 to subdir63). So my question is, whenever I do a manual block transfer across disks for load balancing with newly added disks, do I need to take care of maintaining this constraint of directory hierarchy? or just putting blocks in data.dfs.dir/current/ will work? thanks, Ajit. On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Thanks Harsh! On Tue, Nov 22, 2011 at 10:05 PM, Harsh J ha...@cloudera.com wrote: Ajit / Inder, Please see http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Thanks for Help Joey! Does just copying block files from one drive to another work? Isn't there metadata maintained at datanode about block locations on that datanode? If not, then how does datanode know about blocks stored on it? -Ajit. On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote: The balancer only balances between datanodes. This means the new drives won't get used until you start writing new data to them. If you want to balance the drives on a node, you need to 1) copy a bunch of block files from the old drives to the new drives 2) shutdown the datanode 3) delete the old block files 4) configure the datanode to see the new drives 5) start the datanode -Joey On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit. -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Harsh J
Re: What does hdfs balancer do after adding more disks to existing datanode.
Ajit, Just move/merge subdirectories - its the easiest way to go about it and does no harm. For confidence, you can also fire up a test cluster and test out these things :) On 05-Dec-2011, at 2:59 PM, Ajit Ratnaparkhi wrote: Hi, dfs data directory at a datanode stores blocks in following directory structure: All blocks are stored at location: dfs.data.dir/current/ This directory contains some blocks and some subdirectories named like 'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63) To be precise, each directory in directory hierarchy rooted at dfs.data.dir/current/ contains max 64 block (data+metadata) plus max 64 subdirectories (named subdir0 to subdir63). So my question is, whenever I do a manual block transfer across disks for load balancing with newly added disks, do I need to take care of maintaining this constraint of directory hierarchy? or just putting blocks in data.dfs.dir/current/ will work? thanks, Ajit. On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Thanks Harsh! On Tue, Nov 22, 2011 at 10:05 PM, Harsh J ha...@cloudera.com wrote: Ajit / Inder, Please see http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Thanks for Help Joey! Does just copying block files from one drive to another work? Isn't there metadata maintained at datanode about block locations on that datanode? If not, then how does datanode know about blocks stored on it? -Ajit. On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote: The balancer only balances between datanodes. This means the new drives won't get used until you start writing new data to them. If you want to balance the drives on a node, you need to 1) copy a bunch of block files from the old drives to the new drives 2) shutdown the datanode 3) delete the old block files 4) configure the datanode to see the new drives 5) start the datanode -Joey On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit. -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Harsh J
RE: What does hdfs balancer do after adding more disks to existing datanode.
Hi, Current volume choosing policy is round robin fashion, Since the DN got new disk, balancer will balance some blocks to this node. But the volume choosing will be same when palcing the block. AFAIK, it wont do any special balancing between disks in the same node. please correct me if i understood your question wrongly. Regards, Uma From: Ajit Ratnaparkhi [ajit.ratnapar...@gmail.com] Sent: Tuesday, November 22, 2011 5:13 PM To: hdfs-user@hadoop.apache.org; hdfs-...@hadoop.apache.org Subject: What does hdfs balancer do after adding more disks to existing datanode. Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit.
Re: What does hdfs balancer do after adding more disks to existing datanode.
Thanks for Help Joey! Does just copying block files from one drive to another work? Isn't there metadata maintained at datanode about block locations on that datanode? If not, then how does datanode know about blocks stored on it? -Ajit. On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote: The balancer only balances between datanodes. This means the new drives won't get used until you start writing new data to them. If you want to balance the drives on a node, you need to 1) copy a bunch of block files from the old drives to the new drives 2) shutdown the datanode 3) delete the old block files 4) configure the datanode to see the new drives 5) start the datanode -Joey On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit. -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: What does hdfs balancer do after adding more disks to existing datanode.
This is an interesting usecase based on my understanding data nodes send block information to name node so if you move the block files around old data node should stop sending and new nodes would start sending. each block is a seperate file. it would be better to try this but i dont think this is recommended for production use. inder On Nov 22, 2011 9:45 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Thanks for Help Joey! Does just copying block files from one drive to another work? Isn't there metadata maintained at datanode about block locations on that datanode? If not, then how does datanode know about blocks stored on it? -Ajit. On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.comwrote: The balancer only balances between datanodes. This means the new drives won't get used until you start writing new data to them. If you want to balance the drives on a node, you need to 1) copy a bunch of block files from the old drives to the new drives 2) shutdown the datanode 3) delete the old block files 4) configure the datanode to see the new drives 5) start the datanode -Joey On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit. -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: What does hdfs balancer do after adding more disks to existing datanode.
Ajit / Inder, Please see http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Thanks for Help Joey! Does just copying block files from one drive to another work? Isn't there metadata maintained at datanode about block locations on that datanode? If not, then how does datanode know about blocks stored on it? -Ajit. On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote: The balancer only balances between datanodes. This means the new drives won't get used until you start writing new data to them. If you want to balance the drives on a node, you need to 1) copy a bunch of block files from the old drives to the new drives 2) shutdown the datanode 3) delete the old block files 4) configure the datanode to see the new drives 5) start the datanode -Joey On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit. -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Harsh J
Re: What does hdfs balancer do after adding more disks to existing datanode.
Thanks Harsh! On Tue, Nov 22, 2011 at 10:05 PM, Harsh J ha...@cloudera.com wrote: Ajit / Inder, Please see http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Thanks for Help Joey! Does just copying block files from one drive to another work? Isn't there metadata maintained at datanode about block locations on that datanode? If not, then how does datanode know about blocks stored on it? -Ajit. On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria j...@cloudera.com wrote: The balancer only balances between datanodes. This means the new drives won't get used until you start writing new data to them. If you want to balance the drives on a node, you need to 1) copy a bunch of block files from the old drives to the new drives 2) shutdown the datanode 3) delete the old block files 4) configure the datanode to see the new drives 5) start the datanode -Joey On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi ajit.ratnapar...@gmail.com wrote: Hi, If I add additional disks to existing datanode (assume existing datanode has 7 1TB disk which are already 80% full and then I add two new 2TB disks 0% full) and then run balancer, does balancer balance data in a datanode? ie. Will it move data from existing disks to newly added disks such that all disks are approx equally full ? thanks, Ajit. -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Harsh J