Make sure you rebalance soon after adding the new node.  Otherwise, you will
have an age bias in file distribution.  This can, in some applications, lead
to some strange effects.  For example, if you have log files that you delete
when they get too old, disk space will be freed non-uniformly.  This
shouldn't much affect performance, but it can lead to a need to rebalance
again (and again) later.  Normal file churn combined with occasional
rebalancing should eventually fix this, but it is nicer not to.

On Fri, Aug 7, 2009 at 10:48 AM, Ravi Phulari <[email protected]> wrote:

> Use Rebalancer
>
>
> http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Rebalancer
> -
> Ravi
>
> On 8/7/09 10:38 AM, "prashant ullegaddi" <[email protected]> wrote:
>
> > Hi,
> >
> > We had a cluster of 9 machines with one name node, and 8 data nodes (2
> had
> > 220GB hard disk space, rest had 450GB).
> > Most of the space on first machines with 250GB disk space was consumed.
> > Now we added two new machines each with 450GB hard disk space as data
> nodes.
> >
> > Is there any way to redistribute files on HDFS so that there will
> > considerable free space left on first two machines without
> > downloading the files to one local machine and then uploading it back on
> > HDFS?
> >
> > ~
> > Prashant,
> > SIEL,
> > IIIT-Hyderabad.
> >
>
>


-- 
Ted Dunning, CTO
DeepDyve

Reply via email to