[ 
https://issues.apache.org/jira/browse/HADOOP-18651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18651:
------------------------------------
    Parent: HADOOP-19353  (was: HADOOP-18477)

> Add "versions" tool to s3a command line entry point
> ---------------------------------------------------
>
>                 Key: HADOOP-18651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18651
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.9
>            Reporter: Steve Loughran
>            Priority: Major
>
> having just implemented some version command support in the cloudstore jar, I 
> can see benefit in actually implementing it in hadoop-aws module
> https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/versioned-objects.md
> https://github.com/steveloughran/cloudstore/blob/trunk/src/main/extra/org/apache/hadoop/fs/s3a/extra/)
>  
> this code
> * uses v1 sdk by asking the s3a fs for it; this will break with the move to 
> v2 sdk
> * doesn't have any tests
> * doesn't have any review, maintenance plan
> * bypasses audit log/referrer header creation
> we could just say "use the aws CLI", but there are some benefits in using the 
> s3a connector code
> * support for s3a:// urls
> * can use the s3a auth/signing chain (knox, etc)
> * plus proxy, region settings etc.
> * could integrate with other bits of the stack (e.g spark RDD to get at all 
> versions of objects)
> * would be really useful to have a tool to purge all directory delete markers 
> down a path, to speed up listing on versioned buckets.
> * gets bundled everywhere
> For use by downstream code we would want to have a public/evolving API to 
> access operations, e.g. 
> # taking an S3AFileStatus for rename/purge/restore operations
> # listing all versions of objects under a path within a given time range and 
> mapping to RemoteIterator.
> # HADOOP-16387. S3A openFile() options to allow etag/version to be set
> Core code straightforward (it takes exactly two days to write, *excluding 
> tests*), public API and tests more work.
> note, we should also move the entry point to being "s3a" with "s3guard" 
> retained for compatibility)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to