[
https://issues.apache.org/jira/browse/HADOOP-18651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-18651:
------------------------------------
Parent: HADOOP-19353 (was: HADOOP-18477)
> Add "versions" tool to s3a command line entry point
> ---------------------------------------------------
>
> Key: HADOOP-18651
> URL: https://issues.apache.org/jira/browse/HADOOP-18651
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.9
> Reporter: Steve Loughran
> Priority: Major
>
> having just implemented some version command support in the cloudstore jar, I
> can see benefit in actually implementing it in hadoop-aws module
> https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/versioned-objects.md
> https://github.com/steveloughran/cloudstore/blob/trunk/src/main/extra/org/apache/hadoop/fs/s3a/extra/)
>
> this code
> * uses v1 sdk by asking the s3a fs for it; this will break with the move to
> v2 sdk
> * doesn't have any tests
> * doesn't have any review, maintenance plan
> * bypasses audit log/referrer header creation
> we could just say "use the aws CLI", but there are some benefits in using the
> s3a connector code
> * support for s3a:// urls
> * can use the s3a auth/signing chain (knox, etc)
> * plus proxy, region settings etc.
> * could integrate with other bits of the stack (e.g spark RDD to get at all
> versions of objects)
> * would be really useful to have a tool to purge all directory delete markers
> down a path, to speed up listing on versioned buckets.
> * gets bundled everywhere
> For use by downstream code we would want to have a public/evolving API to
> access operations, e.g.
> # taking an S3AFileStatus for rename/purge/restore operations
> # listing all versions of objects under a path within a given time range and
> mapping to RemoteIterator.
> # HADOOP-16387. S3A openFile() options to allow etag/version to be set
> Core code straightforward (it takes exactly two days to write, *excluding
> tests*), public API and tests more work.
> note, we should also move the entry point to being "s3a" with "s3guard"
> retained for compatibility)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]