steveloughran commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2186215000
Aws sdk delete with version id actually requires more IAM permissions than unversioned delete, which always removes HEAD object, because granting that permission allows the caller to delete backups. Deployments where apps can delete HEAD but not versions are not unusual for this reason. This is why S3A doesn't use it even in simple listing -> delete calls where the status is known. you might also need to issue getFileStatus/list calls, which would massively increase the cost if the process didn't have those values already. A bulk delete with a tuple of (path, version) for each entry could work, if the store could be configured to use that version ID/type. for S3A we would leave it off by default. the tuple would be Map.entry to be reflection friendly. if you do thing version/etag support would be a blocker to use, well, things haven't shipped yet, though @mukund-thakur is preparing a 3.4.1 alpha release. You (and it would be you, sorry) will need to modify the api with * `Collection<Map.Entry<Path, version>>[]` * S3A impl to not use version by default, option to turn it on, parameterized testing for this if a versioned bucket is the test bucket. * WrappedIO changed to match This isn't that useful for table compaction, as the engines tend to use randomness in their names to spread the s3 store load across shards. But it could have other uses. for example. here's some work to do version printing, recovery and copy within the same bucket, lets you pull out the layers underneath a directory tree https://github.com/steveloughran/cloudstore/tree/main/src/main/extra/org/apache/hadoop/fs/s3a/extra question is: do we want to make something that complex part of a broader api with tests, specification, commitments to maintain etc, or do we just say call `S3A.getInternals().getClient()` and then sort it out yourself? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org