I was able to track this down this morning. The process that ingests the log files into the HDFS cluster is not closing file handles after it deletes temp files created during ingest. That causes df and du to report different values of usage. Re-starting the ingest process cleared the filehandles and the Non DFS space is now back to normal. Thanks for the help guys.
From: suresh srinivas <srini30...@gmail.com<mailto:srini30...@gmail.com>> Reply-To: "hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>" <hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>> Date: Sat, 14 May 2011 21:20:44 -0700 To: "hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>" <hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>> Subject: Re: Rapid growth in Non DFS Used disk space dfs.data.dir/current is used by datanodes to store blocks. This directory should only have files starting with blk-* Things to check: - Are there other files that are not blk related? - Did you manually copy the content of one storage dir to another? (some folks did this when they added new disks) On Fri, May 13, 2011 at 1:41 PM, Kester, Scott <skes...@weather.com<mailto:skes...@weather.com>> wrote: We have a job that cleans up the mapred.local directory, so that¹s not it. I have done some further looking at data usage on the datanodes and 99% of the space used is under the dfs.data.dir/current directory. What would be under 'current' that wasn't part of HDFS? On 5/13/11 3:12 PM, "Allen Wittenauer" <a...@apache.org<mailto:a...@apache.org>> wrote: > >On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: >> >> >>> 2) Any ideas on what is driving the growth in Non DFS Used space? I >>> looked for things like growing log files on the datanodes but didn't >>>find >>> anything. >>> >> >> Logs are one possible culprit. Another is to look for old files that >>might >> be orphaned in your mapred.local.dir - there have been bugs in the past >> where we've leaked files. If you shut down the TaskTrackers, you can >>safely >> delete everything from within mapred.local.dirs. > > Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out. >The TT doesn't properly clean up after itself. -- Regards, Suresh