steveloughran commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2186215000

   Aws sdk delete with version id actually requires more IAM permissions than 
unversioned delete, which always removes HEAD object, because granting that 
permission allows the caller to delete backups. Deployments where apps can 
delete HEAD but not versions are not unusual for this reason.
    
   This is why S3A doesn't use it even in simple listing -> delete calls where 
the status is known.
   
   you might also need to issue getFileStatus/list calls, which would massively 
increase the cost if the process didn't have those values already. 
   
   A bulk delete with a tuple of (path, version) for each entry could work, if 
the store could be configured to use that version ID/type. for S3A we would 
leave it off by default. the tuple would be Map.entry to be reflection friendly.
   
   if you do thing version/etag support would be a blocker to use, well, things 
haven't shipped yet, though @mukund-thakur is preparing a 3.4.1 alpha release.
   
   You (and it would be you, sorry) will need to modify the api with
   * `Collection<Map.Entry<Path, version>>[]`
   * S3A impl to not use version by default, option to turn it on, 
parameterized testing for this if a versioned bucket is the test bucket.
   * WrappedIO changed to match
   
   This isn't that useful for table compaction, as the engines tend to use 
randomness in their names to spread the s3 store load across shards. But it 
could have other uses.
   
   for example. here's some work to do version printing, recovery and copy 
within the same bucket, lets you pull out the layers underneath a directory tree
   
   
https://github.com/steveloughran/cloudstore/tree/main/src/main/extra/org/apache/hadoop/fs/s3a/extra
   
   question is: do we want to make something that complex part of a broader api 
with tests, specification, commitments to maintain etc, or do we just say call 
`S3A.getInternals().getClient()` and then sort it out yourself?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to