dfs.data.dir/current is used by datanodes to store blocks. This directory should only have files starting with blk-*
Things to check: - Are there other files that are not blk related? - Did you manually copy the content of one storage dir to another? (some folks did this when they added new disks) On Fri, May 13, 2011 at 1:41 PM, Kester, Scott <skes...@weather.com> wrote: > We have a job that cleans up the mapred.local directory, so that¹s not it. > I have done some further looking at data usage on the datanodes and 99% > of the space used is under the dfs.data.dir/current directory. What would > be under 'current' that wasn't part of HDFS? > > On 5/13/11 3:12 PM, "Allen Wittenauer" <a...@apache.org> wrote: > > > > >On May 13, 2011, at 10:48 AM, Todd Lipcon wrote: > >> > >> > >>> 2) Any ideas on what is driving the growth in Non DFS Used space? I > >>> looked for things like growing log files on the datanodes but didn't > >>>find > >>> anything. > >>> > >> > >> Logs are one possible culprit. Another is to look for old files that > >>might > >> be orphaned in your mapred.local.dir - there have been bugs in the past > >> where we've leaked files. If you shut down the TaskTrackers, you can > >>safely > >> delete everything from within mapred.local.dirs. > > > > Part of our S.O.P. during Hadoop bounces is to wipe mapred.local > out. > >The TT doesn't properly clean up after itself. > > -- Regards, Suresh