Re: Remove one directory from multiple dfs.data.dir, how?

elton sky Mon, 04 Apr 2011 04:51:44 -0700

Thanks for quick reply Harsh,

>
http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
).
so, if we can simply use "mv" to move blocks around in a node, I guess we
can write a function to round-robinly spread blocks in removing directories
to remaining directories. That should be handy.


On Mon, Apr 4, 2011 at 9:32 PM, Harsh Chouraria <ha...@cloudera.com> wrote:

> Ah that thought completely slipped my mind! You can definitely merge
> the data into another directory definitely (as noted in
>
> http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
> ).
> But it could be cumbersome to balance one directory amongst all
> others. No tool exists for doing this automatically AFAIK.
>
> You're right, decomission could prove costly. I take that suggestion
> back (although the simpler version still stands).
>
> On Mon, Apr 4, 2011 at 4:51 PM, elton sky <eltonsky9...@gmail.com> wrote:
> > Thanks Harsh,
> > I will give it a go as you suggested.
> > But I feel it's not convenient in my case. Decommission is for taking
> down a
> > node. What I am doing here is taking out a dir. In my case, all I need to
> do
> > is copy files in the dir I want to remove to remaining dirs on the node,
> > isn't it?
> > Why not hadoop has this functionality?
> >
> > On Mon, Apr 4, 2011 at 5:05 PM, Harsh Chouraria <ha...@cloudera.com>
> wrote:
> >>
> >> Hello Elton,
> >>
> >> On Mon, Apr 4, 2011 at 11:44 AM, elton sky <eltonsky9...@gmail.com>
> wrote:
> >> > Now I want to remove 1 disk from each node, say /data4/hdfs-data. What
> I
> >> > should do to keep data integrity?
> >> > -Elton
> >>
> >> This can be done using the reliable 'decommission' process, by
> >> recommissioning them after having reconfigured (multiple nodes may be
> >> taken down per decommission round this way, but be wary of your
> >> cluster's actual used data capacity, and your minimum replication
> >> factors). Read more about the decommission processes here:
> >>
> >>
> http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_user_guide.html#DFSAdmin+Command
> >> and
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission
> >>
> >> You may also have to run a cluster-wide balancer of DNs after the
> >> entire process is done, to get rid of some skew in the distribution of
> >> data across them.
> >>
> >> (P.s. As an alternative solution, you may bring down one DataNode at a
> >> time, reconfigure it individually, and bring it up again; then repeat
> >> with the next one once NN's fsck reports a healthy situation again (no
> >> under-replicated blocks). But decommissioning is the guaranteed safe
> >> way and is easier to do for some bulk of nodes.)
> >>
> >> --
> >> Harsh J
> >> Support Engineer, Cloudera
> >
> >
>
>
>
> --
> Harsh J
> Support Engineer, Cloudera
>

Re: Remove one directory from multiple dfs.data.dir, how?

Reply via email to