I have used the balancer to balance the data in the cluster with the
-threshold option. The bandwidth transfer was set to 1MB/sec ( I think
thats the default setting) in one of the config files and had to move
500GB of data around. It did take sometime but eventually the data got
spread out evenly. In my case i was using one of the machines as the
masternode and datanode at the same time which is why this one machine
consumed more as compared to the other datanodes.
Thanks,
Usman
Hey Alex,
Will Hadoop balancer utility work in this case?
Pankil
On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard <a...@cloudera.com>
wrote:
Are you seeing any exceptions because of the disk being at 99% capacity?
Hadoop should do something sane here and write new data to the disk with
more capacity. That said, it is ideal to be balanced. As far as I
know,
there is no way to balance an individual DataNode's hard drives (Hadoop
does
round-robin scheduling when writing data).
Alex
On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo <kjirapi...@biz360.com
>wrote:
> Hi all,
> How does one handle a mount running out of space for HDFS? We have
two
> disks mounted on /mnt and /mnt2 respectively on one of the machines
that
> are
> used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a
way
to
> tell the machine to balance itself out? I know for the cluster, you
can
> balance it using start-balancer.sh but I don't think that it will tell
the
> individual machine to balance itself out. Our "hack" right now would
be
> just to delete the data on /mnt, since we have replication of 3x, we
should
> be OK. But I'd prefer not to do that. Any thoughts?
>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/