[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-05-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846052#comment-17846052
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2108677588

   mukund, if you can do those naming changes then I'm +1




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844953#comment-17844953
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2102481437

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 07s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  markdownlint  |   0m 01s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m 01s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 42s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  | 107m 34s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  48m 30s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   7m 10s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   5m 19s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  23m 51s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 225m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  | 228m 42s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 57s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  19m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  45m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  45m 43s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   7m 25s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   5m 28s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  24m 00s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 232m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   7m 14s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 700m 51s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6726 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle 
markdownlint |
   | uname | MINGW64_NT-10.0-17763 296d6abd6fb2 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / e37d88f764665c8530097bbed890a5935a5fd1f0 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws 
hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase co

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-05-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844403#comment-17844403
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1592834729


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/wrappedio/WrappedIO.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.wrappedio;
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.BulkDelete;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Reflection-friendly access to APIs which are not available in
+ * some of the older Hadoop versions which libraries still
+ * compile against.
+ * 
+ * The intent is to avoid the need for complex reflection operations
+ * including wrapping of parameter classes, direct instatiation of
+ * new classes etc.
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Evolving
+public final class WrappedIO {
+
+  private WrappedIO() {
+  }
+
+  /**
+   * Get the maximum number of objects/files to delete in a single request.
+   * @param fs filesystem
+   * @param path path to delete under.
+   * @return a number greater than or equal to zero.
+   * @throws UnsupportedOperationException bulk delete under that path is not 
supported.
+   * @throws IllegalArgumentException path not valid.
+   * @throws IOException problems resolving paths
+   */
+  public static int bulkDeletePageSize(FileSystem fs, Path path) throws 
IOException {
+try (BulkDelete bulk = fs.createBulkDelete(path)) {
+  return bulk.pageSize();
+}
+  }
+
+  /**
+   * Delete a list of files/objects.
+   * 
+   *   Files must be under the path provided in {@code base}.
+   *   The size of the list must be equal to or less than the page 
size.
+   *   Directories are not supported; the outcome of attempting to delete
+   *   directories is undefined (ignored; undetected, listed as 
failures...).
+   *   The operation is not atomic.
+   *   The operation is treated as idempotent: network failures may
+   *trigger resubmission of the request -any new objects created under 
a
+   *path in the list may then be deleted.
+   *There is no guarantee that any parent directories exist after this 
call.
+   *
+   * 
+   * @param fs filesystem
+   * @param base path to delete under.
+   * @param paths list of paths which must be absolute and under the base path.
+   * @return a list of all the paths which couldn't be deleted for a reason 
other than "not found" and any associated error message.
+   * @throws UnsupportedOperationException bulk delete under that path is not 
supported.
+   * @throws IOException IO problems including networking, authentication and 
more.
+   * @throws IllegalArgumentException if a path argument is invalid.
+   */
+  public static List> bulkDelete(FileSystem fs,

Review Comment:
   rename bulkDelete_delete



##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/wrappedio/WrappedIO.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governi

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-05-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844057#comment-17844057
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2097047455

   > can you do the same here? some style checker will complain but it will 
help us to separate the methods in the new class.
   
   I don't understand what to do here. 
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842385#comment-17842385
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1584787287


##
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md:
##
@@ -0,0 +1,136 @@
+
+
+#  interface `BulkDelete`
+
+ Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842178#comment-17842178
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1583846245


##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java:
##
@@ -85,6 +88,9 @@ public ITestS3AContractBulkDelete(boolean 
enableMultiObjectDelete) {
 protected Configuration createConfiguration() {
 Configuration conf = super.createConfiguration();
 S3ATestUtils.disableFilesystemCaching(conf);
+conf = propagateBucketOptions(conf, getTestBucketName(conf));
+skipIfNotEnabled(conf, Constants.ENABLE_MULTI_DELETE,

Review Comment:
   nice catch. tested with gcs bucket as well. 





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842136#comment-17842136
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2083482008

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 06s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  markdownlint  |   0m 01s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 19s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  91m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  40m 19s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 14s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 28s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  19m 58s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 195m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 25s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  16m 00s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  37m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  37m 43s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   5m 59s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   4m 36s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  19m 48s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 199m 09s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 597m 32s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6726 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle 
markdownlint |
   | uname | MINGW64_NT-10.0-17763 e57383186604 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 0339eeb5bd4f0a90e5530abb8df9530f582d99b3 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws 
hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interfa

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842005#comment-17842005
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6738:
URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2082900564

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 05s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 00s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 00s |  |  xmllint was not available.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  markdownlint  |   0m 01s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 6 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 17s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  90m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  40m 09s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 58s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 31s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  13m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 168m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 17s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  10m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  38m 07s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  38m 07s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m 01s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   5m 56s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   4m 27s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  13m 49s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 182m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  asflicense  |   7m 44s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 550m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6738 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle 
markdownlint |
   | uname | MINGW64_NT-10.0-17763 374a372225c9 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 744a643945e9fbf2fd1246c3e48c752789060370 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws 
hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve L

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841319#comment-17841319
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2079798916

   iceberg poc pr https://github.com/apache/iceberg/pull/10233




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841304#comment-17841304
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1581288128


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java:
##
@@ -17,61 +17,86 @@
  */
 package org.apache.hadoop.fs;
 
+import java.io.FileNotFoundException;
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.List;
 import java.util.Map;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import org.apache.hadoop.util.functional.Tuples;
 
 import static java.util.Objects.requireNonNull;
 import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths;
-import static org.apache.hadoop.util.Preconditions.checkArgument;
 
 /**
  * Default implementation of the {@link BulkDelete} interface.
  */
 public class DefaultBulkDeleteOperation implements BulkDelete {
 
-private final int pageSize;
+private static Logger LOG = 
LoggerFactory.getLogger(DefaultBulkDeleteOperation.class);
+
+/** Default page size for bulk delete. */
+private static final int DEFAULT_PAGE_SIZE = 1;
 
+/** Base path for the bulk delete operation. */
 private final Path basePath;
 
+/** Delegate File system make actual delete calls. */
 private final FileSystem fs;
 
-public DefaultBulkDeleteOperation(int pageSize,
-  Path basePath,
+public DefaultBulkDeleteOperation(Path basePath,
   FileSystem fs) {
-checkArgument(pageSize == 1, "Page size must be equal to 1");
-this.pageSize = pageSize;
 this.basePath = requireNonNull(basePath);
 this.fs = fs;
 }
 
 @Override
 public int pageSize() {
-return pageSize;
+return DEFAULT_PAGE_SIZE;
 }
 
 @Override
 public Path basePath() {
 return basePath;
 }
 
+/**
+ * {@inheritDoc}
+ */
 @Override
 public List> bulkDelete(Collection paths)
 throws IOException, IllegalArgumentException {
-validateBulkDeletePaths(paths, pageSize, basePath);
+validateBulkDeletePaths(paths, DEFAULT_PAGE_SIZE, basePath);
 List> result = new ArrayList<>();
-// this for loop doesn't make sense as pageSize must be 1.
-for (Path path : paths) {
+if (!paths.isEmpty()) {
+// As the page size is always 1, this should be the only one
+// path in the collection.
+Path pathToDelete = paths.iterator().next();
 try {
-fs.delete(path, false);
-// What to do if this return false?
-// I think we should add the path to the result list with 
value "Not Deleted".
-} catch (IOException e) {
-result.add(Tuples.pair(path, e.toString()));
+boolean deleted = fs.delete(pathToDelete, false);
+if (deleted) {
+return result;
+} else {
+try {
+FileStatus fileStatus = fs.getFileStatus(pathToDelete);
+if (fileStatus.isDirectory()) {
+result.add(Tuples.pair(pathToDelete, "Path is a 
directory"));
+}
+} catch (FileNotFoundException e) {
+// Ignore FNFE and don't add to the result list.
+LOG.debug("Couldn't delete {} - does not exist: {}", 
pathToDelete, e.toString());
+} catch (Exception e) {
+LOG.debug("Couldn't delete {} - exception occurred: 
{}", pathToDelete, e.toString());
+result.add(Tuples.pair(pathToDelete, e.toString()));
+}
+}
+} catch (Exception ex) {

Review Comment:
   make this an IOException



##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java:
##
@@ -17,61 +17,86 @@
  */
 package org.apache.hadoop.fs;
 
+import java.io.FileNotFoundException;
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.List;
 import java.util.Map;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import org.apache.hadoop.util.functional.Tuples;
 
 import static java.util.Objects.requireNonNull;
 import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths;
-import static org.apache.hadoop.util.Preconditions.checkArgument;
 
 /**
  * Default implementation of the {@link BulkDelete} interface.
  */
 public class DefaultBulkDeleteOperation implements

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841095#comment-17841095
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2078874481

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  22m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  39m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 30s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  20m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   5m 52s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   5m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 59s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   4m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   8m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  43m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 37s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  22m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  21m 12s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/blanks-eol.txt)
 |  The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   5m 34s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 409 new + 106 unchanged - 0 fixed = 515 total 
(was 106)  |
   | +1 :green_heart: |  mvnsite  |   5m 35s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 52s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0)  |
   | -1 :x: |  javadoc  |   0m 43s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 
with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 
0 unchanged - 0 fixed = 3 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   9m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  40m 41s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   5m 45s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 275m  3s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841074#comment-17841074
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2078773861

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m 18s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 48s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  19m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  18m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 54s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   5m  9s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 59s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   4m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   8m 49s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 57s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  18m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  17m 56s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/blanks-eol.txt)
 |  The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 49s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 409 new + 106 unchanged - 0 fixed = 515 total 
(was 106)  |
   | +1 :green_heart: |  mvnsite  |   5m  5s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 11s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0)  |
   | -1 :x: |  javadoc  |   0m 47s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 
with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 
0 unchanged - 0 fixed = 3 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   9m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  40m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   5m 45s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 267m  9s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840989#comment-17840989
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1580173720


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefalutBulkDeleteSource.java:
##
@@ -0,0 +1,38 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs;
+
+import java.io.IOException;
+
+/**
+ * Default implementation of {@link BulkDeleteSource}.
+ */
+public class DefalutBulkDeleteSource implements BulkDeleteSource {
+
+private final FileSystem fs;

Review Comment:
   javadoc



##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDeleteUtils.java:
##
@@ -0,0 +1,54 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs;
+
+import java.util.Collection;
+
+import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Utility class for bulk delete operations.
+ */
+public final class BulkDeleteUtils {
+
+private BulkDeleteUtils() {
+}
+
+public static void validateBulkDeletePaths(Collection paths, int 
pageSize, Path basePath) {
+requireNonNull(paths);
+checkArgument(paths.size() <= pageSize,
+"Number of paths (%d) is larger than the page size (%d)", 
paths.size(), pageSize);
+paths.forEach(p -> {
+checkArgument(p.isAbsolute(), "Path %s is not absolute", p);
+checkArgument(validatePathIsUnderParent(p, basePath),
+"Path %s is not under the base path %s", p, basePath);
+});
+}
+
+public static boolean validatePathIsUnderParent(Path p, Path basePath) {

Review Comment:
   javadoc





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840274#comment-17840274
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2073964537

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 05s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  markdownlint  |   0m 01s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 6 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 31s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  91m 09s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  40m 32s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 09s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 42s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  14m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 171m 45s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 24s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  11m 00s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  38m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  38m 33s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m 00s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/blanks-eol.txt)
 |  The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   6m 31s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   4m 36s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  14m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 185m 02s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  asflicense  |   5m 46s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 555m 55s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6726 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle 
markdownlint |
   | uname | MINGW64_NT-10.0-17763 cfb6e8c364ad 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 741542703607b954851f005514b12af61a98afb6 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws 
hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve L

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840157#comment-17840157
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6738:
URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2072921149

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 05s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 00s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 00s |  |  xmllint was not available.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  markdownlint  |   0m 00s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 6 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   3m 11s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  90m 04s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  39m 11s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 51s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 20s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  13m 35s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 167m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 18s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  10m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  37m 05s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  37m 05s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m 00s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   6m 04s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   4m 25s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  13m 59s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 177m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  asflicense  |   5m 31s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 540m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6738 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle 
markdownlint |
   | uname | MINGW64_NT-10.0-17763 b4a02a5f9adc 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 744a643945e9fbf2fd1246c3e48c752789060370 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws 
hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve L

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839038#comment-17839038
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1572223236


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java:
##
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Default implementation of the {@link BulkDelete} interface.
+ */
+public class DefaultBulkDeleteOperation implements BulkDelete {
+
+private final int pageSize;

Review Comment:
   this is always 1, isn't it? so much can be simplified here
   * no need for the field
   * no need to pass it in the constructor
   * pageSize() to return 1





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838974#comment-17838974
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1572230146


##
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md:
##
@@ -161,124 +105,20 @@ store.hasPathCapability(path, 
"fs.capability.bulk.delete")
 
 ### Invocation through Reflection.
 
-The need for many Libraries to compile against very old versions of Hadoop
+The need for many libraries to compile against very old versions of Hadoop
 means that most of the cloud-first Filesystem API calls cannot be used except
 through reflection -And the more complicated The API and its data types are,
 The harder that reflection is to implement.
 
-To assist this, the class `org.apache.hadoop.fs.FileUtil` has two methods
+To assist this, the class `org.apache.hadoop.io.wrappedio.WrappedIO` has few 
methods
 which are intended to provide simple access to the API, especially
 through reflection.
 
 ```java
-  /**
-   * Get the maximum number of objects/files to delete in a single request.
-   * @param fs filesystem
-   * @param path path to delete under.
-   * @return a number greater than or equal to zero.
-   * @throws UnsupportedOperationException bulk delete under that path is not 
supported.
-   * @throws IllegalArgumentException path not valid.
-   * @throws IOException problems resolving paths
-   */
+
   public static int bulkDeletePageSize(FileSystem fs, Path path) throws 
IOException;
   
-  /**
-   * Delete a list of files/objects.
-   * 
-   *   Files must be under the path provided in {@code base}.
-   *   The size of the list must be equal to or less than the page 
size.
-   *   Directories are not supported; the outcome of attempting to delete
-   *   directories is undefined (ignored; undetected, listed as 
failures...).
-   *   The operation is not atomic.
-   *   The operation is treated as idempotent: network failures may
-   *trigger resubmission of the request -any new objects created under 
a
-   *path in the list may then be deleted.
-   *There is no guarantee that any parent directories exist after this 
call.
-   *
-   * 
-   * @param fs filesystem
-   * @param base path to delete under.
-   * @param paths list of paths which must be absolute and under the base path.
-   * @return a list of all the paths which couldn't be deleted for a reason 
other than "not found" and any associated error message.
-   * @throws UnsupportedOperationException bulk delete under that path is not 
supported.
-   * @throws IOException IO problems including networking, authentication and 
more.
-   * @throws IllegalArgumentException if a path argument is invalid.
-   */
-  public static List> bulkDelete(FileSystem fs, Path 
base, List paths)
-```
+  public static int bulkDeletePageSize(FileSystem fs, Path path) throws 
IOException;
 
-## S3A Implementation
-
-The S3A client exports this API.

Review Comment:
   this needs to be covered, along with the default implementation "maps to 
delete(path, false)"



##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java:
##
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Default implementation of the {@link BulkDelete} interface.
+ */
+public class DefaultBulkDeleteOperation implements BulkDelete {
+
+private final int pageSize;
+
+private final Path basePath;
+
+private final FileSystem fs;
+
+public DefaultBulkDeleteOperation(int pageSize,
+ 

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838835#comment-17838835
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2065627094

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 6 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 35s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  19m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  18m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 32s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   5m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 46s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   1m 55s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  18m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  17m 56s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/blanks-eol.txt)
 |  The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 53s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 309 new + 41 unchanged - 0 fixed = 350 total (was 
41)  |
   | +1 :green_heart: |  mvnsite  |   3m 28s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 10s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 
unchanged - 0 fixed = 3 total (was 0)  |
   | -1 :x: |  javadoc  |   0m 48s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 
with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 
0 unchanged - 0 fixed = 3 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   5m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   5m 40s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  |   3m  7s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838382#comment-17838382
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1569566843


##
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_upgrade.md:
##
@@ -324,6 +324,7 @@ They have also been updated to return V2 SDK classes.
 public interface S3AInternals {
   S3Client getAmazonS3V2Client(String reason);
 
+  S3AStore getStore();

Review Comment:
   this is a doc file. 





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838377#comment-17838377
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1569528772


##
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java:
##
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.contract;
+
+import org.apache.hadoop.fs.*;
+import org.assertj.core.api.Assertions;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.fs.contract.ContractTestUtils.touch;
+import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+
+public abstract class AbstractContractBulkDeleteTest extends 
AbstractFSContractTestBase {
+
+private static final Logger LOG =
+LoggerFactory.getLogger(AbstractContractBulkDeleteTest.class);
+
+protected int pageSize;
+
+protected Path basePath;
+
+protected FileSystem fs;
+
+@Before
+public void setUp() throws Exception {
+fs = getFileSystem();
+basePath = path(getClass().getName());

Review Comment:
   this is under setup and the path will be created under the contract test 
directory. so cleanup should work.





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837871#comment-17837871
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1567902648


##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java:
##
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.fs.contract.s3a;
+
+import org.apache.hadoop.conf.Configuration;

Review Comment:
   Ah sorry..installed a new IDE on my new Mac thus the old rules are gone. 





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837775#comment-17837775
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6738:
URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2059423509

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 6 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 14s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   8m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   2m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m  8s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m  2s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 49s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   8m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   8m 17s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   2m  2s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 206 new + 41 unchanged - 0 fixed = 247 total (was 
41)  |
   | +1 :green_heart: |  mvnsite  |   2m  9s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 44s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 
unchanged - 0 fixed = 3 total (was 0)  |
   | -1 :x: |  javadoc  |   0m 29s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 
with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 
0 unchanged - 0 fixed = 3 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   4m 20s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  |   2m 39s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837722#comment-17837722
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2059138849

   commented. I've also done a PR #6738 which tunes the API to work with 
iceberg, having just written a PoC of the iceberg binding. 
   
   My PR
   * moved the wrapper methods to a new wrappedio.WrappedIO class
   * add a probe for the api being available
   * I also added an availability probe in the interface. not sure about that 
as we really should make it available everywhere, always.
   
   Can you cherrypick this PR onto your branch and then do the review comments.
   
   After which, please do not do any rebasing of your PR. That way, it is 
easier for me too keep my own branch in sync with your changes. Thanks.
   
   PoC of iceberg integration, based on their S3FileIO one.
   
   
https://github.com/steveloughran/iceberg/blob/s3/HADOOP-18679-bulk-delete-api/core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java#L208
   
   The iceberg api passes in a collection of paths, *which may span multiple 
filesystems*.
   
   To handle this, 
   * the bulk delete API should take a Collection, not a list
   * it needs to be implemented in every FS, because trying to distinguish 
case-by-case on support would be really complex.
   
   
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837717#comment-17837717
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1566433464


##
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md:
##
@@ -0,0 +1,284 @@
+
+
+#  interface `BulkDelete`

Review Comment:
   needs to be referenced from index.md



##
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md:
##
@@ -0,0 +1,284 @@
+
+
+#  interface `BulkDelete`
+
+ Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837716#comment-17837716
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #6738:
URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2059110117

   This is #6726 with another commit




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837710#comment-17837710
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran opened a new pull request, #6738:
URL: https://github.com/apache/hadoop/pull/6738

   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837457#comment-17837457
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2057942133

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  17m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 6 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 53s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  37m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  19m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  17m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 47s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   5m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 37s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  18m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  17m 57s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 46s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 204 new + 41 unchanged - 0 fixed = 245 total (was 
41)  |
   | +1 :green_heart: |  mvnsite  |   3m 29s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 11s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 
unchanged - 0 fixed = 3 total (was 0)  |
   | -1 :x: |  javadoc  |   0m 46s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 
with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 
0 unchanged - 0 fixed = 3 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   5m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  40m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   5m 44s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  |   3m 11s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836273#comment-17836273
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1561315131


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.AccessDeniedException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse;
+import software.amazon.awssdk.services.s3.model.ObjectIdentifier;
+import software.amazon.awssdk.services.s3.model.S3Error;
+
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AStore;
+import org.apache.hadoop.fs.store.audit.AuditSpan;
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Collections.emptyList;
+import static java.util.Collections.singletonList;
+import static org.apache.hadoop.fs.s3a.Invoker.once;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+import static org.apache.hadoop.util.functional.Tuples.pair;
+
+/**
+ * Callbacks for the bulk delete operation.
+ */
+public class BulkDeleteOperationCallbacksImpl implements
+BulkDeleteOperation.BulkDeleteOperationCallbacks {
+
+  /**
+   * Path for logging.
+   */
+  private final String path;
+
+  /** Page size for bulk delete. */
+  private final int pageSize;
+
+  /** span for operations. */
+  private final AuditSpan span;
+
+  /**
+   * Store.
+   */
+  private final S3AStore store;
+
+
+  public BulkDeleteOperationCallbacksImpl(final S3AStore store,
+  String path, int pageSize, AuditSpan span) {
+this.span = span;
+this.pageSize = pageSize;
+this.path = path;
+this.store = store;
+  }
+
+  @Override
+  @Retries.RetryTranslated
+  public List> bulkDelete(final 
List keysToDelete)
+  throws IOException, IllegalArgumentException {
+span.activate();
+final int size = keysToDelete.size();
+checkArgument(size <= pageSize,
+"Too many paths to delete in one operation: %s", size);
+if (size == 0) {
+  return emptyList();
+}
+
+if (size == 1) {
+  return deleteSingleObject(keysToDelete.get(0).key());
+}
+
+final DeleteObjectsResponse response = once("bulkDelete", path, () ->
+store.deleteObjects(store.getRequestFactory()
+.newBulkDeleteRequestBuilder(keysToDelete)
+.build())).getValue();
+final List errors = response.errors();
+if (errors.isEmpty()) {
+  // all good.
+  return emptyList();
+} else {
+  return errors.stream()
+  .map(e -> pair(e.key(), e.message()))

Review Comment:
   yes e.toString() sounds better.





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836215#comment-17836215
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1561126867


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.AccessDeniedException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse;
+import software.amazon.awssdk.services.s3.model.ObjectIdentifier;
+import software.amazon.awssdk.services.s3.model.S3Error;
+
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AStore;
+import org.apache.hadoop.fs.store.audit.AuditSpan;
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Collections.emptyList;
+import static java.util.Collections.singletonList;
+import static org.apache.hadoop.fs.s3a.Invoker.once;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+import static org.apache.hadoop.util.functional.Tuples.pair;
+
+/**
+ * Callbacks for the bulk delete operation.
+ */
+public class BulkDeleteOperationCallbacksImpl implements
+BulkDeleteOperation.BulkDeleteOperationCallbacks {
+
+  /**
+   * Path for logging.
+   */
+  private final String path;
+
+  /** Page size for bulk delete. */
+  private final int pageSize;
+
+  /** span for operations. */
+  private final AuditSpan span;
+
+  /**
+   * Store.
+   */
+  private final S3AStore store;
+
+
+  public BulkDeleteOperationCallbacksImpl(final S3AStore store,
+  String path, int pageSize, AuditSpan span) {
+this.span = span;
+this.pageSize = pageSize;
+this.path = path;
+this.store = store;
+  }
+
+  @Override
+  @Retries.RetryTranslated
+  public List> bulkDelete(final 
List keysToDelete)
+  throws IOException, IllegalArgumentException {
+span.activate();
+final int size = keysToDelete.size();
+checkArgument(size <= pageSize,
+"Too many paths to delete in one operation: %s", size);
+if (size == 0) {
+  return emptyList();
+}
+
+if (size == 1) {
+  return deleteSingleObject(keysToDelete.get(0).key());
+}
+
+final DeleteObjectsResponse response = once("bulkDelete", path, () ->
+store.deleteObjects(store.getRequestFactory()
+.newBulkDeleteRequestBuilder(keysToDelete)
+.build())).getValue();
+final List errors = response.errors();
+if (errors.isEmpty()) {
+  // all good.
+  return emptyList();
+} else {
+  return errors.stream()
+  .map(e -> pair(e.key(), e.message()))

Review Comment:
   or e.toString()?





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835889#comment-17835889
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1560055171


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.AccessDeniedException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse;
+import software.amazon.awssdk.services.s3.model.ObjectIdentifier;
+import software.amazon.awssdk.services.s3.model.S3Error;
+
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AStore;
+import org.apache.hadoop.fs.store.audit.AuditSpan;
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Collections.emptyList;
+import static java.util.Collections.singletonList;
+import static org.apache.hadoop.fs.s3a.Invoker.once;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+import static org.apache.hadoop.util.functional.Tuples.pair;
+
+/**
+ * Callbacks for the bulk delete operation.
+ */
+public class BulkDeleteOperationCallbacksImpl implements
+BulkDeleteOperation.BulkDeleteOperationCallbacks {
+
+  /**
+   * Path for logging.
+   */
+  private final String path;
+
+  /** Page size for bulk delete. */
+  private final int pageSize;
+
+  /** span for operations. */
+  private final AuditSpan span;
+
+  /**
+   * Store.
+   */
+  private final S3AStore store;
+
+
+  public BulkDeleteOperationCallbacksImpl(final S3AStore store,
+  String path, int pageSize, AuditSpan span) {
+this.span = span;
+this.pageSize = pageSize;
+this.path = path;
+this.store = store;
+  }
+
+  @Override
+  @Retries.RetryTranslated
+  public List> bulkDelete(final 
List keysToDelete)
+  throws IOException, IllegalArgumentException {
+span.activate();
+final int size = keysToDelete.size();
+checkArgument(size <= pageSize,
+"Too many paths to delete in one operation: %s", size);
+if (size == 0) {
+  return emptyList();
+}
+
+if (size == 1) {
+  return deleteSingleObject(keysToDelete.get(0).key());
+}
+
+final DeleteObjectsResponse response = once("bulkDelete", path, () ->
+store.deleteObjects(store.getRequestFactory()
+.newBulkDeleteRequestBuilder(keysToDelete)
+.build())).getValue();
+final List errors = response.errors();
+if (errors.isEmpty()) {
+  // all good.
+  return emptyList();
+} else {
+  return errors.stream()
+  .map(e -> pair(e.key(), e.message()))

Review Comment:
   e.code() gives AccessDenied
   and e.message() gives Access Denied. Does it make sense to add both 
**e.code() + " " + e.message()** to have the max info returned to the user?  





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any pr

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835075#comment-17835075
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1556505328


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java:
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+
+import static java.util.Objects.requireNonNull;
+
+/**
+ * API for bulk deletion of objects/files,
+ * but not directories.
+ * After use, call {@code close()} to release any resources and
+ * to guarantee store IOStatistics are updated.
+ * 
+ * Callers MUST have no expectation that parent directories will exist after 
the
+ * operation completes; if an object store needs to explicitly look for and 
create
+ * directory markers, that step will be omitted.
+ * 
+ * Be aware that on some stores (AWS S3) each object listed in a bulk delete 
counts
+ * against the write IOPS limit; large page sizes are counterproductive here, 
as
+ * are attempts at parallel submissions across multiple threads.
+ * @see https://issues.apache.org/jira/browse/HADOOP-16823";>HADOOP-16823.
+ *  Large DeleteObject requests are their own Thundering Herd
+ * 
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Unstable
+public interface BulkDelete extends IOStatisticsSource, Closeable {
+
+  /**
+   * The maximum number of objects/files to delete in a single request.
+   * @return a number greater than or equal to zero.
+   */
+  int pageSize();

Review Comment:
   shouldn't this be greater than 0? 
   equal to 0 doesn't make sense. also we have the check in S3A impl. 





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835069#comment-17835069
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1556489762


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.AccessDeniedException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse;
+import software.amazon.awssdk.services.s3.model.ObjectIdentifier;
+import software.amazon.awssdk.services.s3.model.S3Error;
+
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AStore;
+import org.apache.hadoop.fs.store.audit.AuditSpan;
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Collections.emptyList;
+import static java.util.Collections.singletonList;
+import static org.apache.hadoop.fs.s3a.Invoker.once;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+import static org.apache.hadoop.util.functional.Tuples.pair;
+
+/**
+ * Callbacks for the bulk delete operation.
+ */
+public class BulkDeleteOperationCallbacksImpl implements
+BulkDeleteOperation.BulkDeleteOperationCallbacks {
+
+  /**
+   * Path for logging.
+   */
+  private final String path;
+
+  /** Page size for bulk delete. */
+  private final int pageSize;
+
+  /** span for operations. */
+  private final AuditSpan span;
+
+  /**
+   * Store.
+   */
+  private final S3AStore store;
+
+
+  public BulkDeleteOperationCallbacksImpl(final S3AStore store,
+  String path, int pageSize, AuditSpan span) {
+this.span = span;
+this.pageSize = pageSize;
+this.path = path;
+this.store = store;
+  }
+
+  @Override
+  @Retries.RetryTranslated
+  public List> bulkDelete(final 
List keysToDelete)
+  throws IOException, IllegalArgumentException {
+span.activate();
+final int size = keysToDelete.size();
+checkArgument(size <= pageSize,
+"Too many paths to delete in one operation: %s", size);
+if (size == 0) {
+  return emptyList();
+}
+
+if (size == 1) {
+  return deleteSingleObject(keysToDelete.get(0).key());
+}
+
+final DeleteObjectsResponse response = once("bulkDelete", path, () ->
+store.deleteObjects(store.getRequestFactory()
+.newBulkDeleteRequestBuilder(keysToDelete)
+.build())).getValue();
+final List errors = response.errors();
+if (errors.isEmpty()) {
+  // all good.
+  return emptyList();
+} else {
+  return errors.stream()
+  .map(e -> pair(e.key(), e.message()))
+  .collect(Collectors.toList());
+}
+  }
+
+  /**
+   * Delete a single object.
+   * @param key key to delete
+   * @return list of keys which failed to delete: length 0 or 1.
+   * @throws IOException IO problem other than AccessDeniedException
+   */
+  @Retries.RetryTranslated
+  private List> deleteSingleObject(final String key) 
throws IOException {

Review Comment:
   after checking locally, this is fine. 





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> subm

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-04-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835070#comment-17835070
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1556490279


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java:
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+
+import static java.util.Objects.requireNonNull;
+
+/**
+ * API for bulk deletion of objects/files,
+ * but not directories.
+ * After use, call {@code close()} to release any resources and
+ * to guarantee store IOStatistics are updated.
+ * 
+ * Callers MUST have no expectation that parent directories will exist after 
the
+ * operation completes; if an object store needs to explicitly look for and 
create
+ * directory markers, that step will be omitted.
+ * 
+ * Be aware that on some stores (AWS S3) each object listed in a bulk delete 
counts
+ * against the write IOPS limit; large page sizes are counterproductive here, 
as
+ * are attempts at parallel submissions across multiple threads.
+ * @see https://issues.apache.org/jira/browse/HADOOP-16823";>HADOOP-16823.
+ *  Large DeleteObject requests are their own Thundering Herd
+ * 
+ */
+@InterfaceAudience.Public
+@InterfaceStability.Unstable
+public interface BulkDelete extends IOStatisticsSource, Closeable {
+
+  /**
+   * The maximum number of objects/files to delete in a single request.
+   * @return a number greater than or equal to zero.
+   */
+  int pageSize();
+
+  /**
+   * Base path of a bulk delete operation.
+   * All paths submitted in {@link #bulkDelete(List)} must be under this path.
+   */
+  Path basePath();
+
+  /**
+   * Delete a list of files/objects.
+   * 
+   *   Files must be under the path provided in {@link #basePath()}.

Review Comment:
   writing contract tests for this locally., can't find the implementation of 
this in S3A. 





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831898#comment-17831898
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-2025696873

   In #6686 I'm creating a new utils class for reflection access, nothing else. 
And proposing that  all tests of the API use reflection to be really confident 
it works and that there's no accidental changes which break reflection




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831878#comment-17831878
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-2025591287

   FYI i want to pull the rate limiter API of #6596 in here too; we'd have a 
rate limiter in s3a store which if enabled would limit #of deletes which can be 
issued on a bucket. Ideally it'd be at 3000 on s3 standard, off for s3 express 
and third party stores, so reduce load this call can generate.




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831757#comment-17831757
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

ahmarsuhail commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1542647008


##
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md:
##
@@ -0,0 +1,284 @@
+
+
+#  interface `BulkDelete`
+
+ Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829847#comment-17829847
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1535463621


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -5457,7 +5421,11 @@ public boolean hasPathCapability(final Path path, final 
String capability)
 case STORE_CAPABILITY_DIRECTORY_MARKER_AWARE:
   return true;
 
-  // multi object delete flag
+// this is always true, even if multi object
+// delete is disabled -the page size is simply reduced to 1.
+case CommonPathCapabilities.BULK_DELETE:

Review Comment:
   it means the API is present and some of the semantics "parent dir existence 
not guaranteed". For that reason, it will always be faster than before: one 
DELETE; no LIST/HEAD etc





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829846#comment-17829846
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1535462548


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.AccessDeniedException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse;
+import software.amazon.awssdk.services.s3.model.ObjectIdentifier;
+import software.amazon.awssdk.services.s3.model.S3Error;
+
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AStore;
+import org.apache.hadoop.fs.store.audit.AuditSpan;
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Collections.emptyList;
+import static java.util.Collections.singletonList;
+import static org.apache.hadoop.fs.s3a.Invoker.once;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+import static org.apache.hadoop.util.functional.Tuples.pair;
+
+/**
+ * Callbacks for the bulk delete operation.
+ */
+public class BulkDeleteOperationCallbacksImpl implements
+BulkDeleteOperation.BulkDeleteOperationCallbacks {
+
+  /**
+   * Path for logging.
+   */
+  private final String path;
+
+  /** Page size for bulk delete. */
+  private final int pageSize;
+
+  /** span for operations. */
+  private final AuditSpan span;
+
+  /**
+   * Store.
+   */
+  private final S3AStore store;
+
+
+  public BulkDeleteOperationCallbacksImpl(final S3AStore store,
+  String path, int pageSize, AuditSpan span) {
+this.span = span;
+this.pageSize = pageSize;
+this.path = path;
+this.store = store;
+  }
+
+  @Override
+  @Retries.RetryTranslated
+  public List> bulkDelete(final 
List keysToDelete)
+  throws IOException, IllegalArgumentException {
+span.activate();
+final int size = keysToDelete.size();
+checkArgument(size <= pageSize,
+"Too many paths to delete in one operation: %s", size);
+if (size == 0) {
+  return emptyList();
+}
+
+if (size == 1) {
+  return deleteSingleObject(keysToDelete.get(0).key());
+}
+
+final DeleteObjectsResponse response = once("bulkDelete", path, () ->
+store.deleteObjects(store.getRequestFactory()
+.newBulkDeleteRequestBuilder(keysToDelete)
+.build())).getValue();
+final List errors = response.errors();
+if (errors.isEmpty()) {
+  // all good.
+  return emptyList();
+} else {
+  return errors.stream()
+  .map(e -> pair(e.key(), e.message()))
+  .collect(Collectors.toList());
+}
+  }
+
+  /**
+   * Delete a single object.
+   * @param key key to delete
+   * @return list of keys which failed to delete: length 0 or 1.
+   * @throws IOException IO problem other than AccessDeniedException
+   */
+  @Retries.RetryTranslated
+  private List> deleteSingleObject(final String key) 
throws IOException {

Review Comment:
   prefer a collection?





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829676#comment-17829676
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

mukund-thakur commented on code in PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#discussion_r1532928858


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -5457,7 +5421,11 @@ public boolean hasPathCapability(final Path path, final 
String capability)
 case STORE_CAPABILITY_DIRECTORY_MARKER_AWARE:
   return true;
 
-  // multi object delete flag
+// this is always true, even if multi object
+// delete is disabled -the page size is simply reduced to 1.
+case CommonPathCapabilities.BULK_DELETE:

Review Comment:
   nit: won't this be a bit misleading? 



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.nio.file.AccessDeniedException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse;
+import software.amazon.awssdk.services.s3.model.ObjectIdentifier;
+import software.amazon.awssdk.services.s3.model.S3Error;
+
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AStore;
+import org.apache.hadoop.fs.store.audit.AuditSpan;
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Collections.emptyList;
+import static java.util.Collections.singletonList;
+import static org.apache.hadoop.fs.s3a.Invoker.once;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+import static org.apache.hadoop.util.functional.Tuples.pair;
+
+/**
+ * Callbacks for the bulk delete operation.
+ */
+public class BulkDeleteOperationCallbacksImpl implements
+BulkDeleteOperation.BulkDeleteOperationCallbacks {
+
+  /**
+   * Path for logging.
+   */
+  private final String path;
+
+  /** Page size for bulk delete. */
+  private final int pageSize;
+
+  /** span for operations. */
+  private final AuditSpan span;
+
+  /**
+   * Store.
+   */
+  private final S3AStore store;
+
+
+  public BulkDeleteOperationCallbacksImpl(final S3AStore store,
+  String path, int pageSize, AuditSpan span) {
+this.span = span;
+this.pageSize = pageSize;
+this.path = path;
+this.store = store;
+  }
+
+  @Override
+  @Retries.RetryTranslated
+  public List> bulkDelete(final 
List keysToDelete)
+  throws IOException, IllegalArgumentException {
+span.activate();
+final int size = keysToDelete.size();
+checkArgument(size <= pageSize,
+"Too many paths to delete in one operation: %s", size);
+if (size == 0) {
+  return emptyList();
+}
+
+if (size == 1) {
+  return deleteSingleObject(keysToDelete.get(0).key());
+}
+
+final DeleteObjectsResponse response = once("bulkDelete", path, () ->
+store.deleteObjects(store.getRequestFactory()
+.newBulkDeleteRequestBuilder(keysToDelete)
+.build())).getValue();
+final List errors = response.errors();
+if (errors.isEmpty()) {
+  // all good.
+  return emptyList();
+} else {
+  return errors.stream()
+  .map(e -> pair(e.key(), e.message()))
+  .collect(Collectors.toList());
+}
+  }
+
+  /**
+   * Delete a single object.
+   * @param key key to delete
+   * @return list of keys which failed to delete: length 0 or 1.
+   * @throws IOException IO problem other than AccessDeniedException
+   */
+  @Retries.RetryTranslated
+  private List> deleteSingleObject(final String key) 
throws IOException {

Review Comment:
   do we need the return to be a List?



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to th

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827243#comment-17827243
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1998394225

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 40s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m 15s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  17m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   4m 44s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   2m 33s | 
[/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html)
 |  hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  38m 11s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  18m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  17m 13s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 37s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 23 new + 41 unchanged - 0 fixed = 64 total (was 
41)  |
   | +1 :green_heart: |  mvnsite  |   2m 29s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m  9s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 
unchanged - 0 fixed = 3 total (was 0)  |
   | -1 :x: |  javadoc  |   0m 45s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08 with 
JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 generated 3 new + 0 unchanged 
- 0 fixed = 3 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   4m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 25s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 13s |  |  hadoop-co

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-03-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826893#comment-17826893
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1996080816

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 27s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m 46s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  19m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  20m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   5m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 32s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   2m 36s | 
[/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html)
 |  hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  39m  4s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  18m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  17m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 31s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 18 new + 41 unchanged - 0 fixed = 59 total (was 
41)  |
   | +1 :green_heart: |  mvnsite  |   2m 26s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m  8s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 
unchanged - 0 fixed = 3 total (was 0)  |
   | -1 :x: |  javadoc  |   0m 45s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08 with 
JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 generated 4 new + 0 unchanged 
- 0 fixed = 4 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   4m  9s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 55s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  20m  3s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  |   3m 16s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/patch-

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-02-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817781#comment-17817781
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1947315201

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m 18s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 27s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  35m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  17m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   4m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   2m 31s | 
[/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html)
 |  hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  38m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  18m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  17m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 55s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 39 unchanged - 0 fixed = 40 total (was 
39)  |
   | +1 :green_heart: |  mvnsite  |   2m 38s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 10s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 
unchanged - 0 fixed = 4 total (was 0)  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 48s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  40m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  20m  8s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   2m 51s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 280m 10s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6494 |
   | Optional Tests

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-02-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816171#comment-17816171
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1936382465

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  8s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  36m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  20m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  16m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   4m 42s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   2m 33s | 
[/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html)
 |  hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  38m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  17m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  16m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 32s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 39 unchanged - 0 fixed = 40 total (was 
39)  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m  7s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 
unchanged - 0 fixed = 4 total (was 0)  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m  7s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   3m  9s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 259m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6494 |
   | Optional Tests

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-02-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816046#comment-17816046
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1935875347

   +add a FileUtils method to assist deletion here, with 
`FileUtils.bulkDeletePageSize(path) -> int` and `FileUtils.bulkDelete(path, 
List) -> List; each will create a bulk delete object, execute the 
operation/probe and then close. 
   
   why so?
   
   Makes reflection binding straighforward: no new types; just two methods.




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-01-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811339#comment-17811339
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1912353079

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 57s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 10s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  35m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  16m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   4m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 36s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  18m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  17m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 31s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3)  |
   | +1 :green_heart: |  mvnsite  |   2m 28s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m  9s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  
hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 
unchanged - 0 fixed = 4 total (was 0)  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 38s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m  5s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   3m  7s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 260m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6494 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f605ff408523 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810538#comment-17810538
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #6494:
URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1908768612

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 21s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |   7m  7s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   9m  3s | 
[/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  root in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   8m 32s | 
[/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  root in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. 
 |
   | +1 :green_heart: |  checkstyle  |   4m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 18s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 14s |  |  trunk passed  |
   | -1 :x: |  shadedclient  |  11m 29s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 47s |  |  the patch passed  |
   | -1 :x: |  compile  |  12m 33s | 
[/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  root in the patch failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |  12m 33s | 
[/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  root in the patch failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |  12m 23s | 
[/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  root in the patch failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | -1 :x: |  javac  |  12m 23s | 
[/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  root in the patch failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   6m 18s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3)  |
   | +1 :green_heart: |  mvnsite  |   2m 56s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 18s | 
[/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/P

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810520#comment-17810520
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #5993:
URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1908677204

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 17s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |   4m 17s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   3m 53s | 
[/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  root in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   3m 36s | 
[/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  root in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. 
 |
   | +1 :green_heart: |  checkstyle  |   1m 54s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 49s |  |  trunk passed  |
   | -1 :x: |  shadedclient  |   4m 52s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 20s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | -1 :x: |  compile  |   3m 48s | 
[/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  root in the patch failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   3m 48s | 
[/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  root in the patch failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   3m 36s | 
[/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  root in the patch failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | -1 :x: |  javac  |   3m 36s | 
[/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  root in the patch failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 48s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3)  |
   | +1 :green_heart: |  mvnsite  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  the patch passed with JDK 
Privat

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810499#comment-17810499
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran opened a new pull request, #6494:
URL: https://github.com/apache/hadoop/pull/6494

   
   A more minimal design that is easier to use and implement than #5993
   
   Caller creates a BulkOperation; they get the page size of it and then submit 
batches to delete of less than that size.
   
   The outcome of each call contains a list of failures.
   
   S3A implementation to show how straightforward it is.
   
   Even with the single entry page size, it is still more efficient to use this 
as it doesn't try to recreate a parent dir or perform any probes to see if it 
is a directory: it maps straight to a DELETE call.
   
   
   ### How was this patch tested?
   
   If the design looks good, I'll write some contract tests as well as a 
filesystem api
   specification.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2024-01-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801597#comment-17801597
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #5993:
URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1873467248

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 47s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 19s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   7m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   2m  4s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m  9s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 20s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   7m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   7m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 59s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3)  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  16m 22s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   2m 10s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 143m 28s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5993 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e75e83010c54 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d69fac0192c14889f0b3aa62bdb76e1d196eec8c |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apac

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-10-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772661#comment-17772661
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on code in PR #5993:
URL: https://github.com/apache/hadoop/pull/5993#discussion_r1349184302


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java:
##
@@ -0,0 +1,324 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSource;
+
+import static 
org.apache.hadoop.fs.statistics.IOStatisticsLogging.ioStatisticsToPrettyString;
+
+/**
+ * Interface for bulk file delete operations.
+ * 
+ * The expectation is that the iterator-provided list of paths
+ * will be batched into pages and submitted to the remote filesystem/store
+ * for bulk deletion, possibly in parallel.
+ * 
+ * A remote iterator provides the list of paths to delete; all must be under

Review Comment:
   its for multiple mounted filesystems (viewfs) to direct to the final fs.





> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-10-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772154#comment-17772154
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #5993:
URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1748524668

   @ahmarsuhail 
   
   - caller provides a remote iterator, such as the ones we do for listing or 
another source/transformation (see RemoteIterators)
   - build() call returns some result
   - implementation kicks off a worker thread to process the iterator, reading 
its values in until there's enough to kick off a DELETE request (page or maybe 
a parallel set in a thread pool)
   - after each page/set of deletes, invokes the supplied callback of results
   - then continues, unless told to stop
   - finish only on: iterator has nothing, iterator raises an exception
   - or maybe on reaching some limit on failures
   - including maybe those considered unrecoverable
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-08-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760373#comment-17760373
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #5993:
URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1699114687

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  10m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   9m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 39s |  |  trunk passed  |
   | -1 :x: |  shadedclient  |  20m 56s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   9m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   9m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 49s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/2/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 45s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  21m 16s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  16m 47s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 145m 56s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5993 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 7926fe6849b6 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 93174773ea7456127183541cc1c65d8435de98a4 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/2/testReport/ |
   | Max. process+thread count | 1320 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/ha

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759902#comment-17759902
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran commented on PR #5993:
URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1697184517

   writing up spec made me decide we should have a .opt to indicate when a bulk 
delete is a "background" operation, which may be executed at a rate to 
interfere less with live queries, e.g: smaller pages, rate limited buildup of 
pages, different throttle retry policy.




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-08-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759278#comment-17759278
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

hadoop-yetus commented on PR #5993:
URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1694491160

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 29s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  10m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   9m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   9m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   9m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 45s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  16m 49s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 148m 24s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5993 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f297046e6241 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f25a930b8893f2c9f358fd839e4fb943d250c726 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/1/testReport/ |
   | Max. process+thread count | 1253 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.

[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-08-26 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759269#comment-17759269
 ] 

Steve Loughran commented on HADOOP-18679:
-

Done a first pass at an API in a PR: no actual time allocated to implement it 
-others very welcome to!

Minimal API of  (basepath, RemoteIterator) for enumerating files; caller 
gets to implement the iterator of their choice.

progress report callbacks allow for the operation to be aborted.

final outcome report lists files not deleted (would that scale? I've left out 
the list of deleted files for that reason), exception to raise, some numbers 
and any IOStats to return. 

https://github.com/steveloughran/hadoop/blob/s3/HADOOP-18679-bulk-delete-api/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java

> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-08-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759268#comment-17759268
 ] 

ASF GitHub Bot commented on HADOOP-18679:
-

steveloughran opened a new pull request, #5993:
URL: https://github.com/apache/hadoop/pull/5993

   
   Initial pass at writing an API for bulk deletes,
   targeting S3 and any store with paged delete support.
   
   Minimal design of a RemoteIterator to provide the list of paths to delete; a 
progress report will be provided after pages are deleted so as to provide an 
update of files deleted, and a way for the application
   code to abort an ongoing delete -such as after a failure.
   
   ### How was this patch tested?
   
   No tests yet; working on API first.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-04-19 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714240#comment-17714240
 ] 

Steve Loughran commented on HADOOP-18679:
-

no: the outcome for the caller would be some lists of deleted/not-deleted from 
a page, so it only needs invoking once/page

> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-03-27 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705326#comment-17705326
 ] 

Steve Loughran commented on HADOOP-18679:
-

actually, i'd have the builder have a completion callback and let it handle all 
events, e.g


{code}
createDeleteOperation(basepath)
  .withReporter(outcome -> LOG.info("path {}, outcome {}", outcome.path, 
outcome.success)
{code}

+ then allow the reporter to throw some AbortOperationException  (extends 
RuntimeException) to trigger the abort.

 



> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-03-24 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704653#comment-17704653
 ] 

Steve Loughran commented on HADOOP-18679:
-

need to think about throttling here, maybe have a flag to indicate whether this 
is a background cleaner job or not and have different rate limiting options for 
foreground/background deletions

> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18679) Add API for bulk/paged object deletion

2023-03-24 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704648#comment-17704648
 ] 

Steve Loughran commented on HADOOP-18679:
-




h2. Possible API: delete queue

app creates a "DeleteOperation" from a filesystem which implements the 
DeleteOperationFactory interface; something like

{code}

DeleteOperationBuilder builder = createDeleteOperation(basepath)
builder.opt()  // for any options to set
builder.progress(progressable)

// then you get a queue you can submit files to delete to 
DeleteOperation deleter = builder.build()

Future oneOutcome = deleter.deleteFile(path)
Future[] outcomes = deleter.deleteFiles(paths[])
 
{code}

fs would build up pages of deletions and submit in batches, once a page comes 
back complete each of the outcomes.
if a store was on single file delete (third party stores need this), you'd get 
a few at a time in a thread pool.
A normal store would do it in batches of 200, again across a thread pool, but 
maybe with some rate limiting.
way too easy to overload s3 with big delete requests.

caller would get to wait for the outcome of every request; a failure of a 
single delete wouldn't halt the
uploads, though there'd be some other methods on DeleteQueue 

{code}

class DeleteOperation implements Closeable. IOStatisticsSource

void flush()   // wait for everything queued to complete
boolean cancel(path)   // remove a path from the queue if not already active
abort() cancel all not-yet submitted uploads.
close() : flush() then stop, unless abort() called first.
size(): queue size.[]
pageSize(): size of a page before a post; 1 -> single DELETE mode.

{code}



s3a would handle retries, permission failures would be reported in the outcome
IOStatistics api would have stats on IO (requests made, duration) and in 
close() it'd update the thread context iostats.

options would include
* whether to abort on first failure

this is fairly close to what we do in directory delete, though there we also 
queue tombstone markers (which end in /) and
abort the delete as soon as one page of deletes fails. we could make fail-fast 
an opt() option, perhaps

note, this doesn't doesn't take RemoteIterator<>, which is a pity, you'd wan't 
that for wiring up incremental listings.

for that we'd not be able to return a list of Futures<> as the list length 
isn't known at submission time, which implies a different way of reporting 
outcome, where the key outcome is probably "did this fail". providing a 
predicate to call back would be one strategy

it'd let you do good things like 

{code}
deleter.deleteFiles(
  filteringRemoteIterator(fs.listFiles(table), st -> st.getLen() == 0),
  outcome -> {if (outcome.failed) LOG.info("failed to delete {}", 
outcome.getPath()) })
{code}

  


> Add API for bulk/paged object deletion
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org