[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837722#comment-17837722
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-----------------------------------------

steveloughran commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2059138849

   commented. I've also done a PR #6738 which tunes the API to work with 
iceberg, having just written a PoC of the iceberg binding. 
   
   My PR
   * moved the wrapper methods to a new wrappedio.WrappedIO class
   * add a probe for the api being available
   * I also added an availability probe in the interface. not sure about that 
as we really should make it available everywhere, always.
   
   Can you cherrypick this PR onto your branch and then do the review comments.
   
   After which, please do not do any rebasing of your PR. That way, it is 
easier for me too keep my own branch in sync with your changes. Thanks.
   
   PoC of iceberg integration, based on their S3FileIO one.
   
   
https://github.com/steveloughran/iceberg/blob/s3/HADOOP-18679-bulk-delete-api/core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java#L208
   
   The iceberg api passes in a collection of paths, *which may span multiple 
filesystems*.
   
   To handle this, 
   * the bulk delete API should take a Collection, not a list
   * it needs to be implemented in every FS, because trying to distinguish 
case-by-case on support would be really complex.
   
   
   
   




> Add API for bulk/paged object deletion
> --------------------------------------
>
>                 Key: HADOOP-18679
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18679
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.5
>            Reporter: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to