TrackerDistributedCacheManager never cleans its input directories
-----------------------------------------------------------------
Key: MAPREDUCE-1914
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1914
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Dick King
Assignee: Dick King
When we localize a file into a node's cache, it's installed in a directory
whose subroot is a random {{long}} . These {{long}} s all sit in a single flat
directory [per disk, per cluster node]. When the cached file is no longer
needed, its reference count becomes zero in a tracking data structure. The
file then becomes eligible for deletion when the total amount of space occupied
by cached files exceeds 10G [by default] or the total number of such files
exceeds 10K.
However, when we delete a cached file, we don't delete the directory that
contains it; this importantly includes the elements of the flat directory,
which then accumulate until they reach a system limit, 32K in some cases, and
then the node stops working.
We need to delete the flat directory when we delete the localized cache file it
contains.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.