We use the DistributedCache class to distribute a few lookup files for our
jobs. We have been aggressively deleting failed task attempts' leftover data
, and our script accidentally deleted the path to our distributed cache
files.

Our task attempt leftover data was here [per node]:
/hadoop/hadoop-metadata/cache/mapred/local/
and our distributed cache path was:
hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/<nameNode>
We deleted this path by accident.

Does this latter path look normal? I'm not that familiar with
DistributedCache but I'm up right now investigating the issue so I thought
I'd ask.

After that deletion, the first 2 jobs to run (which are use the addCacheFile
method to distribute their files) didn't seem to push the files out to the
cache path, except on one node. Is this expected behavior? Shouldn't
addCacheFile check to see if the files are missing, and if so, repopulate
them as needed?

I'm trying to get a handle on whether it's safe to delete the distributed
cache path when the grid is quiet and no jobs are running. That is, if
addCacheFile is designed to be robust against the files it's caching not
being at each job start.

Reply via email to