Ah, crud. Typo on my part. Don't know how I didn't notice that. Thanks!

On 2/6/12 11:30 AM, Harsh J wrote:
You need your dfs.data.dir configured to the bigger disks for data.
That config targets the datanodes.

The one you've overriden is for the namenode's metadata, and hence the
default dfs.data.dir config is writing to /tmp on your root disk
(which is a bad thing, gets wiped after a reboot).

On Mon, Feb 6, 2012 at 9:51 PM, Eli Finkelshteyn<iefin...@gmail.com>  wrote:
Hi,
I have a pseudo-distributed Hadoop cluster setup, and I'm currently hoping
to put about 100 gigs of files on it to play around with. I got a unix box
at work no one else is using for this, and running a df -h, I get:
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7.9G  2.4G  5.2G  31% /
none                  3.8G     0  3.8G   0% /dev/shm
/dev/sdb              414G  210M  393G   1% /mnt

Alright, so /mnt looks quite big and seems like a good place to store my
hdfs files. I go ahead and create a folder named hadoop-data there and set
the following in hdfs-site.xml:

<property>
<!-- where hadoop stores its files (datanodes only) -->
<name>dfs.name.dir</name>
<value>/mnt/hadoop-data</value>
</property>

After a bit of troubleshooting, I restart the cluster and try to put a
couple of test files onto HDFS. Doing an ls of hadoop-data, I see:

$ ls
current  image  in_use.lock  previous.checkpoint

OK, things look good. Time to try uploading some real data. Now, here's
where the problem arises. If I add a 10mb dummy file to hadoop-data through
regular unix and run df -h, I see that the used space of /mnt goes up
exactly 10mb. But, when I start running a big dump of data through:

hadoop fs -put ~/hadoop_playground/data2/data2/ /data/

I notice that running df -h seems to put the data in completely the wrong
location! Note that below, only the usage of /dev/sda1 has increased. /mnt
has not moved.

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7.9G  3.4G  4.2G  45% /
none                  3.8G     0  3.8G   0% /dev/shm
/dev/sdb              414G  210M  393G   1% /mnt

So, what gives? Anyone have any clue how my files are seemingly both put in
the hadoop-data folder, but take up space elsewhere? I could see this likely
being a Unix issue, but I figured I'd ask here just in case it's not, since
I'm pretty stumped.

Cheers,
Eli



Reply via email to