[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011357#comment-14011357
 ] 

Colin Patrick McCabe commented on HDFS-6382:
--------------------------------------------

I agree with Chris' comments here.  There are just so many advantages to 
running outside the NameNode, that I think that's the design we should start 
with.  If we later find something that would work better with NN support, we 
can think about it then.

Hangjun Ye wrote:
bq. Another benefit to having it inside NN is we don't have to handle the 
authentication/authorization problem in a separate system. For example we have 
a shared HDFS cluster for many internal users, we don't want someone to set TTL 
policy to other one's files. NN could handle it easily by its own 
authentication/authorization mechanism.

The client handles authentication/authorization very well, actually.  You can 
choose to run your cleanup job as superuser (can do anything) or some other 
less powerful user who is limited (safer).  But when you run inside the 
NameNode, there are no safeguards... everything is effectively superuser.  And 
you can destroy or corrupt the entire filesystem very easily that way, 
especially if your cleanup code is buggy.

> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to