Yang Yun created HDFS-15829:
-------------------------------

             Summary: Use xattr to support HDFS TTL on Observer namenode
                 Key: HDFS-15829
                 URL: https://issues.apache.org/jira/browse/HDFS-15829
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: dfsclient, namenode
            Reporter: Yang Yun
            Assignee: Yang Yun


h3. Overview
 
HDFS TTL is implemented using the xattr mechanism provided by HDFS. When a user 
sets a TTL to a file or directory, HDFS creates an xattr named "ttl" for the 
file or directory, and stores the value set by the user in this xattr. A 
service called TtlService runs on HDFS standby or Observer(Recommended ). It 
scans the in-memony inode map regularly, reads the value of xattr "ttl" from 
each INode, and calculates whether the ttl has expired. If so, it will get the 
full file path from Inode and add it to expired file list. After scan it will 
create a DFSClient and delete the expired file list in bach. other option is to 
trigger a Yarn job to delete them in parallel。
h3. Protocol
Add two xattr 
"user.ttl":  value of TTL by minutes, identify the time that file or folder 
will be expired.
"user. ttlproperty": value is TTL types, including, * SINCELASTWRITE = 0x1 # 
caculate the TTL from last writing.
 * KEEPEMPTYDIR = 0x2;  # if keep the empty dir
 * KEEPEMPTYSUBDIR = 0x4; # fi keep subdir empty.

 
*Nested TTL*
TTL supports setting for each directory and file on a path, so that after 
setting, the setting of the lower-level subdirectory or file will take effect. 
If a directory or file does not have a time to live, it will inherit the 
settings of the nearest ancestor directory. The following is an illustrative 
example. Suppose there is such a directory tree:
 
{code:java}
/A/B/E  
/A/C  
/A/D {code}
 
That is, B, C and D under directory A. And there is file E under directory B. 
Suppose the user sets the TTL of A to 2 days, the TTL of B to 3 days, the TTL 
of E to 1 day, and the TTL of C and D is not set. Then the file E will be 
cleared after 1 day. After 2 days, C and D will be cleared. The settings 
inherited from directory A are used here. Please note that at this time, 
directory A will not be cleared because it is not empty. After 3 days, B will 
be cleared because its own settings expire. After B is cleared, because A’s 
settings have already expired and A has become an empty directory, it will also 
be cleared.
h3. Usage
Fro the first version, provide API to set the TTL,  will add comand line later.
 
{code:java}
/**
 * Set TTL to a file.
 * @param fs the file system.
 * @param path the target file to set TTL.
 * @param path the TTL value.
 * @param property the type of TTL.
 * @throws IOException
 */
public static void setTTl(FileSystem fs, Path path, int value, int 
property){code}
 
 
h3. Example
 
{code:java}
TtlInfo.setTTl(fs, file, System.currentTimeMillis() / 1000 / 60 + 60, 0); #The 
file will be expired in an 60 minutes. 

TtlInfo.setTTl(fs, file, 60, TtlInfo.SINCELASTWRITE); #The file will be expired 
after 60 minutes since last write.{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to