[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624144#comment-16624144
 ] 

Gabor Bota edited comment on HADOOP-15621 at 9/21/18 8:44 PM:
--------------------------------------------------------------

Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all [get, listChildren, put, move] 
interactions should be modified to handle TTL.


was (Author: gabor.bota):
Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all {get, listChildren, put, move} 
interactions should be modified to handle TTL.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-15621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15621
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Aaron Fabbri
>            Assignee: Gabor Bota
>            Priority: Minor
>         Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to