We have a job that cleans up the mapred.local directory, so that¹s not it. I have done some further looking at data usage on the datanodes and 99% of the space used is under the dfs.data.dir/current directory. What would be under 'current' that wasn't part of HDFS?
On 5/13/11 3:12 PM, "Allen Wittenauer" <a...@apache.org> wrote: > >On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: >> >> >>> 2) Any ideas on what is driving the growth in Non DFS Used space? I >>> looked for things like growing log files on the datanodes but didn't >>>find >>> anything. >>> >> >> Logs are one possible culprit. Another is to look for old files that >>might >> be orphaned in your mapred.local.dir - there have been bugs in the past >> where we've leaked files. If you shut down the TaskTrackers, you can >>safely >> delete everything from within mapred.local.dirs. > > Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out. >The TT doesn't properly clean up after itself.