Thanks for the info. We may implement this patch if this continues to be a problem.

- Adam

On 3/24/11 4:08 PM, Bharath Mundlapudi wrote:
Also, you will need this patch.
https://issues.apache.org/jira/browse/HADOOP-7040


------------------------------------------------------------------------
*From:* Bharath Mundlapudi <bharathw...@yahoo.com>
*To:* "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
*Sent:* Thursday, March 24, 2011 4:00 PM
*Subject:* Re: Datanode won't start with bad disk

Hi Adam,

I have posted a patch for this problem for Hadoop version 20. Please
refer the following Jira.
https://issues.apache.org/jira/browse/HDFS-1592

-Bharath

------------------------------------------------------------------------
*From:* Adam Phelps <a...@opendns.com>
*To:* "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
*Sent:* Thursday, March 24, 2011 10:30 AM
*Subject:* Re: Datanode won't start with bad disk

We have a bad disk on one of our datanode machines, and while we have
dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
problem while the DataNode process was running we are seeing a problem
when we needed to restart the DataNode process:

2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
rwxr-xr-x, while actual: ---------. Fixing...
2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2011-03-24 16:50:20,091 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
permitted

In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
It gets that permission error because we have the mount directory set to
be immutable:

root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
------------------- /var/lib/stats/hdfs/2
----i------------e- /var/lib/stats/hdfs/4
------------------- /var/lib/stats/hdfs/3
------------------- /var/lib/stats/hdfs/1

As we'd previously seen HDFS just write to the local disk when a disk
couldn't be mounted.

HDFS is supposed to be able to handle failed disk, but it doesn't seem
to be doing the right thing in this case. Is this a known problem, or is
there some other way we should be configuring things to allow the
DataNode to come up in this situation?

(clearly we can remove the mount point from hdfs-site.xml, but that
doesn't feel like the correct solution)

Thanks
- Adam





Reply via email to