[jira] [Updated] (HADOOP-15621) S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing

Gabor Bota (JIRA) Thu, 27 Sep 2018 16:46:23 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gabor Bota updated HADOOP-15621:
--------------------------------
    Attachment: HADOOP-15621.002.patch

> S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15621
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Gabor Bota
>            Priority: Major
>         Attachments: HADOOP-15621.001.patch, HADOOP-15621.002.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15621) S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing

Reply via email to