[ https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Bota updated HADOOP-15621: -------------------------------- Attachment: HADOOP-15621.002.patch > S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing > ------------------------------------------------------------------------------ > > Key: HADOOP-15621 > URL: https://issues.apache.org/jira/browse/HADOOP-15621 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1 > Reporter: Aaron Fabbri > Assignee: Gabor Bota > Priority: Major > Attachments: HADOOP-15621.001.patch, HADOOP-15621.002.patch > > > Similar to HADOOP-13649, I think we should add a TTL (time to live) feature > to the Dynamo metadata store (MS) for S3Guard. > This is a similar concept to an "online algorithm" version of the CLI prune() > function, which is the "offline algorithm". > Why: > 1. Self healing (soft state): since we do not implement transactions around > modification of the two systems (s3 and metadata store), certain failures can > lead to inconsistency between S3 and the metadata store (MS) state. Having a > time to live (TTL) on each entry in S3Guard means that any inconsistencies > will be time bound. Thus "wait and restart your job" becomes a valid, if > ugly, way to get around any issues with FS client failure leaving things in a > bad state. > 2. We could make manual invocation of `hadoop s3guard prune ...` > unnecessary, depending on the implementation. > 3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune > directories due to the lack of true modification time. > How: > I think we need a new column in the dynamo table "entry last written time". > This is updated each time the entry is written to dynamo. > After that we can either > 1. Have the client simply ignore / elide any entries that are older than the > configured TTL. > 2. Have the client delete entries older than the TTL. > The issue with #2 is it will increase latency if done inline in the context > of an FS operation. We could mitigate this some by using an async helper > thread, or probabilistically doing it "some times" to amortize the expense of > deleting stale entries (allowing some batching as well). > Caveats: > - Clock synchronization as usual is a concern. Many clusters already keep > clocks close enough via NTP. We should at least document the requirement > along with the configuration knob that enables the feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org