Thanks for quick reply Harsh, > http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F ). so, if we can simply use "mv" to move blocks around in a node, I guess we can write a function to round-robinly spread blocks in removing directories to remaining directories. That should be handy.
On Mon, Apr 4, 2011 at 9:32 PM, Harsh Chouraria <ha...@cloudera.com> wrote: > Ah that thought completely slipped my mind! You can definitely merge > the data into another directory definitely (as noted in > > http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F > ). > But it could be cumbersome to balance one directory amongst all > others. No tool exists for doing this automatically AFAIK. > > You're right, decomission could prove costly. I take that suggestion > back (although the simpler version still stands). > > On Mon, Apr 4, 2011 at 4:51 PM, elton sky <eltonsky9...@gmail.com> wrote: > > Thanks Harsh, > > I will give it a go as you suggested. > > But I feel it's not convenient in my case. Decommission is for taking > down a > > node. What I am doing here is taking out a dir. In my case, all I need to > do > > is copy files in the dir I want to remove to remaining dirs on the node, > > isn't it? > > Why not hadoop has this functionality? > > > > On Mon, Apr 4, 2011 at 5:05 PM, Harsh Chouraria <ha...@cloudera.com> > wrote: > >> > >> Hello Elton, > >> > >> On Mon, Apr 4, 2011 at 11:44 AM, elton sky <eltonsky9...@gmail.com> > wrote: > >> > Now I want to remove 1 disk from each node, say /data4/hdfs-data. What > I > >> > should do to keep data integrity? > >> > -Elton > >> > >> This can be done using the reliable 'decommission' process, by > >> recommissioning them after having reconfigured (multiple nodes may be > >> taken down per decommission round this way, but be wary of your > >> cluster's actual used data capacity, and your minimum replication > >> factors). Read more about the decommission processes here: > >> > >> > http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_user_guide.html#DFSAdmin+Command > >> and > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission > >> > >> You may also have to run a cluster-wide balancer of DNs after the > >> entire process is done, to get rid of some skew in the distribution of > >> data across them. > >> > >> (P.s. As an alternative solution, you may bring down one DataNode at a > >> time, reconfigure it individually, and bring it up again; then repeat > >> with the next one once NN's fsck reports a healthy situation again (no > >> under-replicated blocks). But decommissioning is the guaranteed safe > >> way and is easier to do for some bulk of nodes.) > >> > >> -- > >> Harsh J > >> Support Engineer, Cloudera > > > > > > > > -- > Harsh J > Support Engineer, Cloudera >