Problematic disk in a datanode

Oded Rosen Fri, 04 Jun 2010 12:40:56 -0700

Hey,

A while ago We've added a new disk (volume) to every datanode in our
cluster.
We have configured the disks in "data.dfs.dir" in hdfs-site both on the job
tracker and on each machine.
This went successfully for all of the machines except one, where the new
disk was not recognized by hadoop.


We can not find out what's wrong with it.

We know that the new disk is not recognized because "http://namenode:50070/";
shows smaller capacity to that machine.
The mapred + hdfs directories on that drive exist, but they are not
identical to the structure of directories in other disks:
In the problematic drive there is no "local" directory under "mapred", and
no "name", "namesecondary" directories under "hdfs".

This problem was not so terrible until now, when the rest of the disks are
full:
The logs started containing errors such as "No space left on device" and
"DiskErrorException: Could not find any valid local directory for
taskTracker/jobcache/".
Some Hadoop jobs fail with the same errors, and the datanode+tasktracker on
that machine crash a lot.

How do we install this disk properly?

Thanks in advance.

Technical info: hadoop-0.20, centos, each machine is datanode and
tasktracker (another machine is jobtracker + namenode).

-- 
Oded

Problematic disk in a datanode

Reply via email to