steveloughran commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2117702734
+1 I was about to merge then I realised that yetus wasn't ready. Here is my draft commit message 1. as soon as this is in I will rebase #6686 onto it, which extends WrappedIO and adds the reflection utility classes from Parquet to assist in testing. 2. I'll leave you to do the cherrypick and merge onto 3.4.x 3. And I want to get a minimal version into 3.3.x, maybe with a page size of 1 even on S3A, but without the safety checks, so still saves on LIST/HEAD calls. ---- create a BulkDelete implementation from a BulkDeleteSource; the BulkDelete interface provides the pageSize(): the maximum number of entries which can be deleted, and a bulkDelete(Collection<Path> paths) method which can take a collection up to pageSize() long. This is optimized for object stores with bulk delete APIs; the S3A connector will offer the page size of fs.s3a.bulk.delete.page.size unless bulk delete has been disabled. Even with a page size of 1, the S3A implementation is more efficient than delete(path) as there are no safety checks for the path being a directory or probes for the need to recreate directories. The interface BulkDeleteSource is implemented by all FileSystem implementations, with a page size of 1 and mapped to delete(pathToDelete, false). This means that callers do not need to have special case handling for object stores versus classic filesystems. To aid use through reflection APIs, the class org.apache.hadoop.io.wrappedio.WrappedIO has been created with "reflection friendly" methods. Contributed by Mukund Thakur and Steve Loughran -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org