Hi,

We have a requirement to essentially expire temporary files that are no longer
needed in an HDFS share.  I have noticed some traffic on this very same issue
and was wondering how best to approach the problem and/or contribute.
Basically, we need to remove a user specified subset of files from HDFS based
on mtime or atime.

Possible Approaches:
  - Mount HDFS over FUSE and use standard tmpreaper utility.
  - Implement a Hadoop version of tmpreaper using FileSystem, and PathFilter
    classes.
  - Place temporary files in .Trash like directory and use Trash classes
    checkpoint and expunge methods.  It would be nice here if the user could
    choose to expire all checkpoints except the N most recent checkpoints, or
    incrementally expire checkpoints to free up space.

Thanks for the feedback,

Michael

Reply via email to