[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016335#comment-14016335
 ] 

Hangjun Ye commented on HDFS-6382:
----------------------------------

Thanks Haohui and Colin.

The balancer or a balancer-like standalone daemon sounds a feasible approach to 
us. A special requirement of the TTL cleanup is that we need a persistent 
storage to contain all TTL policies set by users, while balancer and DistCp 
don't require. It might be nice if the namenode could store such information 
then we don't have to find somewhere else.

So just wondering if possible we add an "opaque feature" in INode to store 
arbitrary bytes? NN just stores it, doesn't interpret it. As an analogy, HBase 
supports "tags" to store arbitrary metadata at a cell: 
https://issues.apache.org/jira/browse/HBASE-8496

Then we could have external tools/daemon to let end-users set their TTL 
policies, and do the cleanup logic. The only change to NN is to add a new 
feature and also expose APIs to set/get the feature, complicated and volatile 
logic (metadata encoding, interpretation, cleanup) are done outside NN. And the 
change might have a much broader usage other than TTL.

Any thoughts?

> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to