Also, see the balancer tool that comes with Hadoop. This background process
should be run periodically (Every week or so?) to make sure that data's
evenly distributed.

http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#Rebalancer

- Aaron

On Sat, Jan 24, 2009 at 7:40 PM, jason hadoop <jason.had...@gmail.com>wrote:

> The blocks will be invalidated on the returned to service datanode.
> If you want to save your namenode and network a lot of work, wipe the hdfs
> block storage directory before returning the Datanode to service.
> dfs.data.dir will be the directory, most likley the value is
> ${hadoop.tmp.dir}/dfs/data
>
> Jason - Ex Attributor
>
> On Sat, Jan 24, 2009 at 6:19 PM, C G <parallel...@yahoo.com> wrote:
>
> > Hi All:
> >
> > I elected to take a node out of one of our grids for service.  Naturally
> > HDFS recognized the loss of the DataNode and did the right stuff, fixing
> > replication issues and ultimately delivering a clean file system.
> >
> > So now the node I removed is ready to go back in service.  When I return
> it
> > to service a bunch of files will suddenly have a replication of 4 instead
> of
> > 3.  My questions:
> >
> > 1.  Will HDFS delete a copy of the data to bring replication back to 3?
> > 2.  If (1) above is  yes, will it remove the copy by deleting from other
> > nodes, or will it remove files from the returned node, or both?
> >
> > The motivation for asking the questions are that I have a file system
> which
> > is extremely unbalanced - we recently doubled the size of the grid when a
> > few dozen terabytes already stored on the existing nodes.  I am wondering
> if
> > an easy way to restore some sense of balance is to cycle through the old
> > nodes, removing each one from service for several hours and then return
> it
> > to service.
> >
> > Thoughts?
> >
> > Thanks in Advance,
> > C G
> >
> >
> >
> >
> >
> >
> >
>

Reply via email to