Re: [PR] Add API for bulk/paged object deletion [hadoop]

2024-04-15 Thread via GitHub


mukund-thakur commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1566241797


##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java:
##
@@ -702,6 +704,54 @@ public void testPartialDeleteSingleDelete() throws 
Throwable {
 executePartialDelete(createAssumedRoleConfig(), true);
   }
 
+  @Test
+  public void testBulkDelete() throws Throwable {
+describe("Bulk delete with part of the child tree read only");
+executeBulkDelete(createAssumedRoleConfig());
+  }
+
+  private void executeBulkDelete(Configuration assumedRoleConfig) throws 
Exception {
+Path destDir = methodPath();
+Path readOnlyDir = new Path(destDir, "readonlyDir");
+
+// the full FS
+S3AFileSystem fs = getFileSystem();
+FileUtil.bulkDelete(fs, destDir, new ArrayList<>());
+
+bindRolePolicyStatements(assumedRoleConfig, STATEMENT_ALLOW_KMS_RW,
+statement(true, S3_ALL_BUCKETS, S3_ALL_OPERATIONS),
+new Statement(Effects.Deny)
+.addActions(S3_PATH_WRITE_OPERATIONS)
+.addResources(directory(readOnlyDir))
+);
+roleFS = (S3AFileSystem) destDir.getFileSystem(assumedRoleConfig);
+
+int range = 10;
+touchFiles(fs, readOnlyDir, range);
+touchFiles(roleFS, destDir, range);
+FileStatus[] fileStatuses = roleFS.listStatus(readOnlyDir);
+List pathsToDelete = Arrays.stream(fileStatuses)
+.map(FileStatus::getPath)
+.collect(Collectors.toList());
+// bulk delete in the read only FS should fail.
+BulkDelete bulkDelete = roleFS.createBulkDelete(readOnlyDir);
+assertAccessDeniedForEachPath(bulkDelete.bulkDelete(pathsToDelete));
+BulkDelete bulkDelete2 = roleFS.createBulkDelete(destDir);
+assertAccessDeniedForEachPath(bulkDelete2.bulkDelete(pathsToDelete));
+// delete the files in the original FS should succeed.
+BulkDelete bulkDelete3 = fs.createBulkDelete(readOnlyDir);
+assertSuccessfulBulkDelete(bulkDelete3.bulkDelete(pathsToDelete));
+BulkDelete bulkDelete4 = fs.createBulkDelete(destDir);
+assertSuccessfulBulkDelete(bulkDelete4.bulkDelete(pathsToDelete));
+// we can write a test for some successful and some failure as well.
+  }
+
+  private void assertAccessDeniedForEachPath(List> 
entries) {
+for (Map.Entry entry : entries) {
+  Assertions.assertThat(entry.getValue()).contains("AccessDenied");

Review Comment:
   add message.



##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsContractBulkDelete.java:
##
@@ -0,0 +1,32 @@
+//package org.apache.hadoop.fs.azurebfs.contract;
+//
+//import org.apache.hadoop.conf.Configuration;
+//import org.apache.hadoop.fs.contract.AbstractContractBulkDeleteTest;
+//import org.apache.hadoop.fs.contract.AbstractFSContract;
+//
+//public class ITestAbfsContractBulkDelete extends 
AbstractContractBulkDeleteTest {

Review Comment:
   delete/



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/S3AStoreImpl.java:
##
@@ -194,6 +189,10 @@ protected void incrementStatistic(Statistic statistic) {
 incrementStatistic(statistic, 1);
   }
 
+  protected void incrementDurationStatistic(Statistic statistic, Duration 
duration) {

Review Comment:
   remove.



##
hadoop-common-project/hadoop-common/pom.xml:
##
@@ -429,6 +429,12 @@
   lz4-java
   provided
 
+  

Review Comment:
   remove. not sure how this got added. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[PR] Add API for bulk/paged object deletion [hadoop]

2024-04-12 Thread via GitHub


mukund-thakur opened a new pull request, #6726:
URL: https://github.com/apache/hadoop/pull/6726

   
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org