Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran closed pull request #6494: HADOOP-18679. Add API for bulk/paged object deletion URL: https://github.com/apache/hadoop/pull/6494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran closed pull request #5993: HADOOP-18679. Add API for bulk/paged object deletion URL: https://github.com/apache/hadoop/pull/5993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran closed pull request #6738: HADOOP-18679. Add API for bulk/paged object deletion URL: https://github.com/apache/hadoop/pull/6738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2108677588 mukund, if you can do those naming changes then I'm +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2102481437 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 07s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 01s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 01s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 01s | | xmllint was not available. | | +0 :ok: | spotbugs | 0m 01s | | spotbugs executables are not available. | | +0 :ok: | markdownlint | 0m 01s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 01s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 00s | | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 2m 42s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 107m 34s | | trunk passed | | +1 :green_heart: | compile | 48m 30s | | trunk passed | | +1 :green_heart: | checkstyle | 7m 10s | | trunk passed | | -1 :x: | mvnsite | 5m 19s | [/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in trunk failed. | | +1 :green_heart: | javadoc | 23m 51s | | trunk passed | | +1 :green_heart: | shadedclient | 225m 30s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 228m 42s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 2m 57s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 19m 17s | | the patch passed | | +1 :green_heart: | compile | 45m 43s | | the patch passed | | +1 :green_heart: | javac | 45m 43s | | the patch passed | | +1 :green_heart: | blanks | 0m 01s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 7m 25s | | the patch passed | | -1 :x: | mvnsite | 5m 28s | [/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch failed. | | +1 :green_heart: | javadoc | 24m 00s | | the patch passed | | +1 :green_heart: | shadedclient | 232m 12s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | asflicense | 7m 14s | | The patch does not generate ASF License warnings. | | | | 700m 51s | | | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/6726 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint | | uname | MINGW64_NT-10.0-17763 296d6abd6fb2 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys | | Build tool | maven | | Personality | /c/hadoop/dev-support/bin/hadoop.sh | | git revision | trunk / e37d88f764665c8530097bbed890a5935a5fd1f0 | | Default Java | Azul Systems, Inc.-1.8.0_332-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/testReport/ | | modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/console | | versions | git=2.44.0.windows.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1592834729 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/wrappedio/WrappedIO.java: ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.wrappedio; + +import java.io.IOException; +import java.util.Collection; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.BulkDelete; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; + +/** + * Reflection-friendly access to APIs which are not available in + * some of the older Hadoop versions which libraries still + * compile against. + * + * The intent is to avoid the need for complex reflection operations + * including wrapping of parameter classes, direct instatiation of + * new classes etc. + */ +@InterfaceAudience.Public +@InterfaceStability.Evolving +public final class WrappedIO { + + private WrappedIO() { + } + + /** + * Get the maximum number of objects/files to delete in a single request. + * @param fs filesystem + * @param path path to delete under. + * @return a number greater than or equal to zero. + * @throws UnsupportedOperationException bulk delete under that path is not supported. + * @throws IllegalArgumentException path not valid. + * @throws IOException problems resolving paths + */ + public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException { +try (BulkDelete bulk = fs.createBulkDelete(path)) { + return bulk.pageSize(); +} + } + + /** + * Delete a list of files/objects. + * + * Files must be under the path provided in {@code base}. + * The size of the list must be equal to or less than the page size. + * Directories are not supported; the outcome of attempting to delete + * directories is undefined (ignored; undetected, listed as failures...). + * The operation is not atomic. + * The operation is treated as idempotent: network failures may + *trigger resubmission of the request -any new objects created under a + *path in the list may then be deleted. + *There is no guarantee that any parent directories exist after this call. + * + * + * @param fs filesystem + * @param base path to delete under. + * @param paths list of paths which must be absolute and under the base path. + * @return a list of all the paths which couldn't be deleted for a reason other than "not found" and any associated error message. + * @throws UnsupportedOperationException bulk delete under that path is not supported. + * @throws IOException IO problems including networking, authentication and more. + * @throws IllegalArgumentException if a path argument is invalid. + */ + public static List> bulkDelete(FileSystem fs, Review Comment: rename bulkDelete_delete ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/wrappedio/WrappedIO.java: ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.wrappedio; + +import java.io.IOException; +import java.util.Collection; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.classification.Interfac
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2097047455 > can you do the same here? some style checker will complain but it will help us to separate the methods in the new class. I don't understand what to do here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1588274603 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java: ## @@ -4980,4 +4982,17 @@ public MultipartUploaderBuilder createMultipartUploader(Path basePath) methodNotSupported(); return null; } + + /** + * Create a default bulk delete operation to be used for any FileSystem. Review Comment: This doesn't hold for the subclasses. better to say ``` Create a bulk delete operation. The default implementation returns an instance of {@link DefaultBulkDeleteOperation} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1584787287 ## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ## @@ -0,0 +1,136 @@ + + +# interface `BulkDelete` + + + +The `BulkDelete` interface provides an API to perform bulk delete of files/objects +in an object store or filesystem. + +## Key Features + +* An API for submitting a list of paths to delete. +* This list must be no larger than the "page size" supported by the client; This size is also exposed as a method. +* Triggers a request to delete files at the specific paths. +* Returns a list of which paths were reported as delete failures by the store. +* Does not consider a nonexistent file to be a failure. +* Does not offer any atomicity guarantees. +* Idempotency guarantees are weak: retries may delete files newly created by other clients. +* Provides no guarantees as to the outcome if a path references a directory. +* Provides no guarantees that parent directories will exist after the call. + + +The API is designed to match the semantics of the AWS S3 [Bulk Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) REST API call, but it is not +exclusively restricted to this store. This is why the "provides no guarantees" +restrictions do not state what the outcome will be when executed on other stores. + +### Interface `org.apache.hadoop.fs.BulkDeleteSource` + +The interface `BulkDeleteSource` is offered by a FileSystem/FileContext class if +it supports the API. + +```java +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDeleteSource { + default BulkDelete createBulkDelete(Path path) + throws UnsupportedOperationException, IllegalArgumentException, IOException; + +} + +``` + +### Interface `org.apache.hadoop.fs.BulkDelete` + +This is the bulk delete implementation returned by the `createBulkDelete()` call. + +```java +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + int pageSize(); + Path basePath(); + List> bulkDelete(List paths) + throws IOException, IllegalArgumentException; + +} + +``` + +### `bulkDelete(paths)` + + Preconditions + +```python +if length(paths) > pageSize: throw IllegalArgumentException +``` + + Postconditions + +All paths which refer to files are removed from the set of files. +```python +FS'Files = FS.Files - [paths] +``` + +No other restrictions are placed upon the outcome. + + +### Availability + +The `BulkDeleteSource` interface is exported by `FileSystem` and `FileContext` storage clients +which is available for all FS via `org.apache.hadoop.fs.DefalutBulkDeleteSource`. For the +ICEBERG integration to work seamlessly, all FS which supports delete() MUST leave the Review Comment: say "for integration in applications like Apache Iceberg", all implementations of this interface MUST NOT reject the request but instead return a BulkDelete instance of size >= 1" ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java: ## @@ -0,0 +1,109 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.fs; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Objects.requireNonNull; +import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths; + +/** + * Default implementation of the {@link BulkDelete} interface. + */ +public class DefaultBulkDeleteOperation implements BulkDelete { + +private static Logger LOG = LoggerFactory.getLogger(DefaultBulkDeleteOperation.class); + +/** Default page size for bulk delete. */ +private static final int DEFAULT_PAGE_SIZE = 1; + +/** Base path for the bulk delete operation. */ +private final Path basePath; + +/** Delegate File system make actual delete calls. */ +private final Fi
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1583846245 ## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java: ## @@ -85,6 +88,9 @@ public ITestS3AContractBulkDelete(boolean enableMultiObjectDelete) { protected Configuration createConfiguration() { Configuration conf = super.createConfiguration(); S3ATestUtils.disableFilesystemCaching(conf); +conf = propagateBucketOptions(conf, getTestBucketName(conf)); +skipIfNotEnabled(conf, Constants.ENABLE_MULTI_DELETE, Review Comment: nice catch. tested with gcs bucket as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2083482008 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 06s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 01s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 01s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 01s | | xmllint was not available. | | +0 :ok: | spotbugs | 0m 01s | | spotbugs executables are not available. | | +0 :ok: | markdownlint | 0m 01s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 00s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 00s | | The patch appears to include 10 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 2m 19s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 91m 56s | | trunk passed | | +1 :green_heart: | compile | 40m 19s | | trunk passed | | +1 :green_heart: | checkstyle | 6m 14s | | trunk passed | | -1 :x: | mvnsite | 4m 28s | [/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in trunk failed. | | +1 :green_heart: | javadoc | 19m 58s | | trunk passed | | +1 :green_heart: | shadedclient | 195m 53s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 2m 25s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 16m 00s | | the patch passed | | +1 :green_heart: | compile | 37m 43s | | the patch passed | | +1 :green_heart: | javac | 37m 43s | | the patch passed | | +1 :green_heart: | blanks | 0m 00s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 5m 59s | | the patch passed | | -1 :x: | mvnsite | 4m 36s | [/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch failed. | | +1 :green_heart: | javadoc | 19m 48s | | the patch passed | | +1 :green_heart: | shadedclient | 199m 09s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | asflicense | 5m 37s | | The patch does not generate ASF License warnings. | | | | 597m 32s | | | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/6726 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint | | uname | MINGW64_NT-10.0-17763 e57383186604 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys | | Build tool | maven | | Personality | /c/hadoop/dev-support/bin/hadoop.sh | | git revision | trunk / 0339eeb5bd4f0a90e5530abb8df9530f582d99b3 | | Default Java | Azul Systems, Inc.-1.8.0_332-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/testReport/ | | modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/console | | versions | git=2.44.0.windows.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6738: URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2082900564 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 05s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 00s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 00s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 00s | | xmllint was not available. | | +0 :ok: | spotbugs | 0m 01s | | spotbugs executables are not available. | | +0 :ok: | markdownlint | 0m 01s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 00s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 00s | | The patch appears to include 6 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 2m 17s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 90m 51s | | trunk passed | | +1 :green_heart: | compile | 40m 09s | | trunk passed | | +1 :green_heart: | checkstyle | 5m 58s | | trunk passed | | -1 :x: | mvnsite | 4m 31s | [/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in trunk failed. | | +1 :green_heart: | javadoc | 13m 57s | | trunk passed | | +1 :green_heart: | shadedclient | 168m 48s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 2m 17s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 10m 38s | | the patch passed | | +1 :green_heart: | compile | 38m 07s | | the patch passed | | +1 :green_heart: | javac | 38m 07s | | the patch passed | | -1 :x: | blanks | 0m 01s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/blanks-eol.txt) | The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | checkstyle | 5m 56s | | the patch passed | | -1 :x: | mvnsite | 4m 27s | [/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch failed. | | +1 :green_heart: | javadoc | 13m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 182m 29s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | asflicense | 7m 44s | [/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/artifact/out/results-asflicense.txt) | The patch generated 1 ASF License warnings. | | | | 550m 15s | | | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/6738 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint | | uname | MINGW64_NT-10.0-17763 374a372225c9 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys | | Build tool | maven | | Personality | /c/hadoop/dev-support/bin/hadoop.sh | | git revision | trunk / 744a643945e9fbf2fd1246c3e48c752789060370 | | Default Java | Azul Systems, Inc.-1.8.0_332-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/testReport/ | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/2/console | | versions | git=2.44.0.windows.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2079798916 iceberg poc pr https://github.com/apache/iceberg/pull/10233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2079798082 for the iceberg support to work, all filesystems MUST implement the api or we have to modify that PoC to handle the case where they don't. I'd rather the spec says any FS which supports delete() MUST support this since it comes for free. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1581288128 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java: ## @@ -17,61 +17,86 @@ */ package org.apache.hadoop.fs; +import java.io.FileNotFoundException; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import java.util.List; import java.util.Map; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + import org.apache.hadoop.util.functional.Tuples; import static java.util.Objects.requireNonNull; import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths; -import static org.apache.hadoop.util.Preconditions.checkArgument; /** * Default implementation of the {@link BulkDelete} interface. */ public class DefaultBulkDeleteOperation implements BulkDelete { -private final int pageSize; +private static Logger LOG = LoggerFactory.getLogger(DefaultBulkDeleteOperation.class); + +/** Default page size for bulk delete. */ +private static final int DEFAULT_PAGE_SIZE = 1; +/** Base path for the bulk delete operation. */ private final Path basePath; +/** Delegate File system make actual delete calls. */ private final FileSystem fs; -public DefaultBulkDeleteOperation(int pageSize, - Path basePath, +public DefaultBulkDeleteOperation(Path basePath, FileSystem fs) { -checkArgument(pageSize == 1, "Page size must be equal to 1"); -this.pageSize = pageSize; this.basePath = requireNonNull(basePath); this.fs = fs; } @Override public int pageSize() { -return pageSize; +return DEFAULT_PAGE_SIZE; } @Override public Path basePath() { return basePath; } +/** + * {@inheritDoc} + */ @Override public List> bulkDelete(Collection paths) throws IOException, IllegalArgumentException { -validateBulkDeletePaths(paths, pageSize, basePath); +validateBulkDeletePaths(paths, DEFAULT_PAGE_SIZE, basePath); List> result = new ArrayList<>(); -// this for loop doesn't make sense as pageSize must be 1. -for (Path path : paths) { +if (!paths.isEmpty()) { +// As the page size is always 1, this should be the only one +// path in the collection. +Path pathToDelete = paths.iterator().next(); try { -fs.delete(path, false); -// What to do if this return false? -// I think we should add the path to the result list with value "Not Deleted". -} catch (IOException e) { -result.add(Tuples.pair(path, e.toString())); +boolean deleted = fs.delete(pathToDelete, false); +if (deleted) { +return result; +} else { +try { +FileStatus fileStatus = fs.getFileStatus(pathToDelete); +if (fileStatus.isDirectory()) { +result.add(Tuples.pair(pathToDelete, "Path is a directory")); +} +} catch (FileNotFoundException e) { +// Ignore FNFE and don't add to the result list. +LOG.debug("Couldn't delete {} - does not exist: {}", pathToDelete, e.toString()); +} catch (Exception e) { +LOG.debug("Couldn't delete {} - exception occurred: {}", pathToDelete, e.toString()); +result.add(Tuples.pair(pathToDelete, e.toString())); +} +} +} catch (Exception ex) { Review Comment: make this an IOException ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java: ## @@ -17,61 +17,86 @@ */ package org.apache.hadoop.fs; +import java.io.FileNotFoundException; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import java.util.List; import java.util.Map; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + import org.apache.hadoop.util.functional.Tuples; import static java.util.Objects.requireNonNull; import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths; -import static org.apache.hadoop.util.Preconditions.checkArgument; /** * Default implementation of the {@link BulkDelete} interface. */ public class DefaultBulkDeleteOperation implements BulkDelete { -private final int pageSize; +private static Logger LOG = LoggerFactory.getLogger(DefaultBulkDeleteOperation.class); + +/** Default page size for bulk delete. */ +private static final int DEFAULT_PAGE_SIZE = 1; +/** Base p
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2078874481 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 22m 31s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 10 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 58s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 39m 38s | | trunk passed | | +1 :green_heart: | compile | 23m 30s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 20m 16s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 5m 52s | | trunk passed | | +1 :green_heart: | mvnsite | 5m 23s | | trunk passed | | +1 :green_heart: | javadoc | 3m 59s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 4m 23s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 8m 57s | | trunk passed | | +1 :green_heart: | shadedclient | 43m 25s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 37s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 3m 23s | | the patch passed | | +1 :green_heart: | compile | 22m 41s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 22m 41s | | the patch passed | | +1 :green_heart: | compile | 21m 12s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 21m 12s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/blanks-eol.txt) | The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 5m 34s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/results-checkstyle-root.txt) | root: The patch generated 409 new + 106 unchanged - 0 fixed = 515 total (was 106) | | +1 :green_heart: | mvnsite | 5m 35s | | the patch passed | | -1 :x: | javadoc | 1m 52s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | -1 :x: | javadoc | 0m 43s | [/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | spotbugs | 9m 37s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 41s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 5m 45s | | hadoop-common in the patch passed. | | -1 :x: | unit | 275m 3s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 3m 23s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multib
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2078773861 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 18m 18s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 10 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 48s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 36m 32s | | trunk passed | | +1 :green_heart: | compile | 19m 45s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 18m 7s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 4m 54s | | trunk passed | | +1 :green_heart: | mvnsite | 5m 9s | | trunk passed | | +1 :green_heart: | javadoc | 3m 59s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 4m 22s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 8m 49s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 33s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 57s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 3m 14s | | the patch passed | | +1 :green_heart: | compile | 18m 46s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 18m 46s | | the patch passed | | +1 :green_heart: | compile | 17m 56s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 17m 56s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/blanks-eol.txt) | The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 49s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/results-checkstyle-root.txt) | root: The patch generated 409 new + 106 unchanged - 0 fixed = 515 total (was 106) | | +1 :green_heart: | mvnsite | 5m 5s | | the patch passed | | -1 :x: | javadoc | 1m 11s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | -1 :x: | javadoc | 0m 47s | [/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | spotbugs | 9m 24s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 5m 45s | | hadoop-common in the patch passed. | | -1 :x: | unit | 267m 9s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 3m 22s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multib
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1580173720 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefalutBulkDeleteSource.java: ## @@ -0,0 +1,38 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.fs; + +import java.io.IOException; + +/** + * Default implementation of {@link BulkDeleteSource}. + */ +public class DefalutBulkDeleteSource implements BulkDeleteSource { + +private final FileSystem fs; Review Comment: javadoc ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDeleteUtils.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.fs; + +import java.util.Collection; + +import static java.util.Objects.requireNonNull; +import static org.apache.hadoop.util.Preconditions.checkArgument; + +/** + * Utility class for bulk delete operations. + */ +public final class BulkDeleteUtils { + +private BulkDeleteUtils() { +} + +public static void validateBulkDeletePaths(Collection paths, int pageSize, Path basePath) { +requireNonNull(paths); +checkArgument(paths.size() <= pageSize, +"Number of paths (%d) is larger than the page size (%d)", paths.size(), pageSize); +paths.forEach(p -> { +checkArgument(p.isAbsolute(), "Path %s is not absolute", p); +checkArgument(validatePathIsUnderParent(p, basePath), +"Path %s is not under the base path %s", p, basePath); +}); +} + +public static boolean validatePathIsUnderParent(Path p, Path basePath) { Review Comment: javadoc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2073964537 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 05s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 01s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 01s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 01s | | xmllint was not available. | | +0 :ok: | spotbugs | 0m 01s | | spotbugs executables are not available. | | +0 :ok: | markdownlint | 0m 01s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 00s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 00s | | The patch appears to include 6 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 2m 31s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 91m 09s | | trunk passed | | +1 :green_heart: | compile | 40m 32s | | trunk passed | | +1 :green_heart: | checkstyle | 6m 09s | | trunk passed | | -1 :x: | mvnsite | 4m 42s | [/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in trunk failed. | | +1 :green_heart: | javadoc | 14m 12s | | trunk passed | | +1 :green_heart: | shadedclient | 171m 45s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 2m 24s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 11m 00s | | the patch passed | | +1 :green_heart: | compile | 38m 33s | | the patch passed | | +1 :green_heart: | javac | 38m 33s | | the patch passed | | -1 :x: | blanks | 0m 00s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/blanks-eol.txt) | The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | checkstyle | 6m 31s | | the patch passed | | -1 :x: | mvnsite | 4m 36s | [/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch failed. | | +1 :green_heart: | javadoc | 14m 19s | | the patch passed | | +1 :green_heart: | shadedclient | 185m 02s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | asflicense | 5m 46s | [/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/artifact/out/results-asflicense.txt) | The patch generated 1 ASF License warnings. | | | | 555m 55s | | | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/6726 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint | | uname | MINGW64_NT-10.0-17763 cfb6e8c364ad 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys | | Build tool | maven | | Personality | /c/hadoop/dev-support/bin/hadoop.sh | | git revision | trunk / 741542703607b954851f005514b12af61a98afb6 | | Default Java | Azul Systems, Inc.-1.8.0_332-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/testReport/ | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/console | | versions | git=2.44.0.windows.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6738: URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2072921149 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 05s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 00s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 00s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 00s | | xmllint was not available. | | +0 :ok: | spotbugs | 0m 00s | | spotbugs executables are not available. | | +0 :ok: | markdownlint | 0m 00s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 00s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 00s | | The patch appears to include 6 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 3m 11s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 90m 04s | | trunk passed | | +1 :green_heart: | compile | 39m 11s | | trunk passed | | +1 :green_heart: | checkstyle | 5m 51s | | trunk passed | | -1 :x: | mvnsite | 4m 20s | [/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in trunk failed. | | +1 :green_heart: | javadoc | 13m 35s | | trunk passed | | +1 :green_heart: | shadedclient | 167m 43s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 2m 18s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 10m 40s | | the patch passed | | +1 :green_heart: | compile | 37m 05s | | the patch passed | | +1 :green_heart: | javac | 37m 05s | | the patch passed | | -1 :x: | blanks | 0m 00s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/blanks-eol.txt) | The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | checkstyle | 6m 04s | | the patch passed | | -1 :x: | mvnsite | 4m 25s | [/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch failed. | | +1 :green_heart: | javadoc | 13m 59s | | the patch passed | | +1 :green_heart: | shadedclient | 177m 47s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | asflicense | 5m 31s | [/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/artifact/out/results-asflicense.txt) | The patch generated 1 ASF License warnings. | | | | 540m 54s | | | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/6738 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint | | uname | MINGW64_NT-10.0-17763 b4a02a5f9adc 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys | | Build tool | maven | | Personality | /c/hadoop/dev-support/bin/hadoop.sh | | git revision | trunk / 744a643945e9fbf2fd1246c3e48c752789060370 | | Default Java | Azul Systems, Inc.-1.8.0_332-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/testReport/ | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6738/1/console | | versions | git=2.44.0.windows.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1572223236 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java: ## @@ -0,0 +1,84 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.fs; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Objects.requireNonNull; +import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths; +import static org.apache.hadoop.util.Preconditions.checkArgument; + +/** + * Default implementation of the {@link BulkDelete} interface. + */ +public class DefaultBulkDeleteOperation implements BulkDelete { + +private final int pageSize; Review Comment: this is always 1, isn't it? so much can be simplified here * no need for the field * no need to pass it in the constructor * pageSize() to return 1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1572230146 ## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ## @@ -161,124 +105,20 @@ store.hasPathCapability(path, "fs.capability.bulk.delete") ### Invocation through Reflection. -The need for many Libraries to compile against very old versions of Hadoop +The need for many libraries to compile against very old versions of Hadoop means that most of the cloud-first Filesystem API calls cannot be used except through reflection -And the more complicated The API and its data types are, The harder that reflection is to implement. -To assist this, the class `org.apache.hadoop.fs.FileUtil` has two methods +To assist this, the class `org.apache.hadoop.io.wrappedio.WrappedIO` has few methods which are intended to provide simple access to the API, especially through reflection. ```java - /** - * Get the maximum number of objects/files to delete in a single request. - * @param fs filesystem - * @param path path to delete under. - * @return a number greater than or equal to zero. - * @throws UnsupportedOperationException bulk delete under that path is not supported. - * @throws IllegalArgumentException path not valid. - * @throws IOException problems resolving paths - */ + public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; - /** - * Delete a list of files/objects. - * - * Files must be under the path provided in {@code base}. - * The size of the list must be equal to or less than the page size. - * Directories are not supported; the outcome of attempting to delete - * directories is undefined (ignored; undetected, listed as failures...). - * The operation is not atomic. - * The operation is treated as idempotent: network failures may - *trigger resubmission of the request -any new objects created under a - *path in the list may then be deleted. - *There is no guarantee that any parent directories exist after this call. - * - * - * @param fs filesystem - * @param base path to delete under. - * @param paths list of paths which must be absolute and under the base path. - * @return a list of all the paths which couldn't be deleted for a reason other than "not found" and any associated error message. - * @throws UnsupportedOperationException bulk delete under that path is not supported. - * @throws IOException IO problems including networking, authentication and more. - * @throws IllegalArgumentException if a path argument is invalid. - */ - public static List> bulkDelete(FileSystem fs, Path base, List paths) -``` + public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; -## S3A Implementation - -The S3A client exports this API. Review Comment: this needs to be covered, along with the default implementation "maps to delete(path, false)" ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java: ## @@ -0,0 +1,84 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.fs; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Objects.requireNonNull; +import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths; +import static org.apache.hadoop.util.Preconditions.checkArgument; + +/** + * Default implementation of the {@link BulkDelete} interface. + */ +public class DefaultBulkDeleteOperation implements BulkDelete { + +private final int pageSize; + +private final Path basePath; + +private final FileSystem fs; + +public DefaultBulkDeleteOperation(int pageSize, + Path basePath, + FileSystem fs) { +checkArgument(pageSize == 1, "Page size must be equal to 1"); +this.pageSize = pageSize; +this.basePath = requireNonNull(basePath); +thi
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2065627094 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 6 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 35s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 36m 55s | | trunk passed | | +1 :green_heart: | compile | 19m 38s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 18m 20s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 4m 51s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 32s | | trunk passed | | +1 :green_heart: | javadoc | 2m 44s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 2m 21s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 5m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 46s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 1m 55s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 15s | | the patch passed | | +1 :green_heart: | compile | 18m 59s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 18m 59s | | the patch passed | | +1 :green_heart: | compile | 17m 56s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 17m 56s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/blanks-eol.txt) | The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 53s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/results-checkstyle-root.txt) | root: The patch generated 309 new + 41 unchanged - 0 fixed = 350 total (was 41) | | +1 :green_heart: | mvnsite | 3m 28s | | the patch passed | | -1 :x: | javadoc | 1m 10s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | -1 :x: | javadoc | 0m 48s | [/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | spotbugs | 5m 40s | | the patch passed | | +1 :green_heart: | shadedclient | 41m 29s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 5m 40s | | hadoop-common in the patch passed. | | -1 :x: | unit | 3m 7s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/3/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | hadoop-aws in the patch passed. | | +1 :green_heart: | unit | 2m 48s | | hadoop-azure in the patch passed. | | -1 :x: | asflicense | 1m 4s | [/results-asflic
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1569566843 ## hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_upgrade.md: ## @@ -324,6 +324,7 @@ They have also been updated to return V2 SDK classes. public interface S3AInternals { S3Client getAmazonS3V2Client(String reason); + S3AStore getStore(); Review Comment: this is a doc file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1569528772 ## hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract; + +import org.apache.hadoop.fs.*; +import org.assertj.core.api.Assertions; +import org.junit.Before; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static org.apache.hadoop.fs.contract.ContractTestUtils.touch; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; + +public abstract class AbstractContractBulkDeleteTest extends AbstractFSContractTestBase { + +private static final Logger LOG = +LoggerFactory.getLogger(AbstractContractBulkDeleteTest.class); + +protected int pageSize; + +protected Path basePath; + +protected FileSystem fs; + +@Before +public void setUp() throws Exception { +fs = getFileSystem(); +basePath = path(getClass().getName()); Review Comment: this is under setup and the path will be created under the contract test directory. so cleanup should work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1567902648 ## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract.s3a; + +import org.apache.hadoop.conf.Configuration; Review Comment: Ah sorry..installed a new IDE on my new Mac thus the old rules are gone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6738: URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2059423509 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 6 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 14s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 17s | | trunk passed | | +1 :green_heart: | compile | 9m 0s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 8m 13s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 2m 6s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 8s | | trunk passed | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 2s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 49s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 10s | | the patch passed | | +1 :green_heart: | compile | 8m 42s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 8m 42s | | the patch passed | | +1 :green_heart: | compile | 8m 17s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 8m 17s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/blanks-eol.txt) | The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 2m 2s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/results-checkstyle-root.txt) | root: The patch generated 206 new + 41 unchanged - 0 fixed = 247 total (was 41) | | +1 :green_heart: | mvnsite | 2m 9s | | the patch passed | | -1 :x: | javadoc | 0m 44s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | -1 :x: | javadoc | 0m 29s | [/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | spotbugs | 3m 24s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 4m 20s | | hadoop-common in the patch passed. | | -1 :x: | unit | 2m 39s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6738/1/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | hadoop-aws in the patch passed. | | +1 :green_heart: | unit | 2m 4s | | hadoop-azure in the patch passed. | | -1 :x: | asflicense | 0m 40s | [/results-asflic
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2059138849 commented. I've also done a PR #6738 which tunes the API to work with iceberg, having just written a PoC of the iceberg binding. My PR * moved the wrapper methods to a new wrappedio.WrappedIO class * add a probe for the api being available * I also added an availability probe in the interface. not sure about that as we really should make it available everywhere, always. Can you cherrypick this PR onto your branch and then do the review comments. After which, please do not do any rebasing of your PR. That way, it is easier for me too keep my own branch in sync with your changes. Thanks. PoC of iceberg integration, based on their S3FileIO one. https://github.com/steveloughran/iceberg/blob/s3/HADOOP-18679-bulk-delete-api/core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java#L208 The iceberg api passes in a collection of paths, *which may span multiple filesystems*. To handle this, * the bulk delete API should take a Collection, not a list * it needs to be implemented in every FS, because trying to distinguish case-by-case on support would be really complex. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1566433464 ## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ## @@ -0,0 +1,284 @@ + + +# interface `BulkDelete` Review Comment: needs to be referenced from index.md ## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ## @@ -0,0 +1,284 @@ + + +# interface `BulkDelete` + + + +The `BulkDelete` interface provides an API to perform bulk delete of files/objects +in an object store or filesystem. + +## Key Features + +* An API for submitting a list of paths to delete. +* This list must be no larger than the "page size" supported by the client; This size is also exposed as a method. +* Triggers a request to delete files at the specific paths. +* Returns a list of which paths were reported as delete failures by the store. +* Does not consider a nonexistent file to be a failure. +* Does not offer any atomicity guarantees. +* Idempotency guarantees are weak: retries may delete files newly created by other clients. +* Provides no guarantees as to the outcome if a path references a directory. +* Provides no guarantees that parent directories will exist after the call. + + +The API is designed to match the semantics of the AWS S3 [Bulk Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) REST API call, but it is not +exclusively restricted to this store. This is why the "provides no guarantees" +restrictions do not state what the outcome will be when executed on other stores. + +### Interface `org.apache.hadoop.fs.BulkDeleteSource` + +The interface `BulkDeleteSource` is offered by a FileSystem/FileContext class if +it supports the API. + +```java +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDeleteSource { + + /** + * Create a bulk delete operation. + * There is no network IO at this point, simply the creation of + * a bulk delete object. + * A path must be supplied to assist in link resolution. + * @param path path to delete under. + * @return the bulk delete. + * @throws UnsupportedOperationException bulk delete under that path is not supported. + * @throws IllegalArgumentException path not valid. + * @throws IOException problems resolving paths + */ + default BulkDelete createBulkDelete(Path path) + throws UnsupportedOperationException, IllegalArgumentException, IOException; + +} + +``` + +### Interface `org.apache.hadoop.fs.BulkDelete` + +This is the bulk delete implementation returned by the `createBulkDelete()` call. + +```java +/** + * API for bulk deletion of objects/files, + * but not directories. + * After use, call {@code close()} to release any resources and + * to guarantee store IOStatistics are updated. + * + * Callers MUST have no expectation that parent directories will exist after the + * operation completes; if an object store needs to explicitly look for and create + * directory markers, that step will be omitted. + * + * Be aware that on some stores (AWS S3) each object listed in a bulk delete counts + * against the write IOPS limit; large page sizes are counterproductive here, as + * are attempts at parallel submissions across multiple threads. + * @see https://issues.apache.org/jira/browse/HADOOP-16823";>HADOOP-16823. + * Large DeleteObject requests are their own Thundering Herd + * + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + + /** + * The maximum number of objects/files to delete in a single request. + * @return a number greater than or equal to zero. + */ + int pageSize(); + + /** + * Base path of a bulk delete operation. + * All paths submitted in {@link #bulkDelete(List)} must be under this path. + */ + Path basePath(); + + /** + * Delete a list of files/objects. + * + * Files must be under the path provided in {@link #basePath()}. + * The size of the list must be equal to or less than the page size + * declared in {@link #pageSize()}. + * Directories are not supported; the outcome of attempting to delete + * directories is undefined (ignored; undetected, listed as failures...). + * The operation is not atomic. + * The operation is treated as idempotent: network failures may + *trigger resubmission of the request -any new objects created under a + *path in the list may then be deleted. + *There is no guarantee that any parent directories exist after this call. + * + * + * @param paths list of paths which must be absolute and under the base path. + * provided in {@link #basePath()}. + * @throws IOException IO problems including networking, authentication and more. + * @throws IllegalArgumentException if a path argument is invalid. + */ + List> bulkDelete(List paths) + throws IOExcept
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6738: URL: https://github.com/apache/hadoop/pull/6738#issuecomment-2059110117 This is #6726 with another commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran opened a new pull request, #6738: URL: https://github.com/apache/hadoop/pull/6738 ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6726: URL: https://github.com/apache/hadoop/pull/6726#issuecomment-2057942133 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 17m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 6 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 53s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 37m 2s | | trunk passed | | +1 :green_heart: | compile | 19m 47s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 17m 53s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 4m 47s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 31s | | trunk passed | | +1 :green_heart: | javadoc | 2m 42s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 2m 25s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 5m 7s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 37s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 3s | | the patch passed | | +1 :green_heart: | compile | 18m 59s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 18m 59s | | the patch passed | | +1 :green_heart: | compile | 17m 57s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 17m 57s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/blanks-eol.txt) | The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 46s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/results-checkstyle-root.txt) | root: The patch generated 204 new + 41 unchanged - 0 fixed = 245 total (was 41) | | +1 :green_heart: | mvnsite | 3m 29s | | the patch passed | | -1 :x: | javadoc | 1m 11s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | -1 :x: | javadoc | 0m 46s | [/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | spotbugs | 5m 37s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 8s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 5m 44s | | hadoop-common in the patch passed. | | -1 :x: | unit | 3m 11s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6726/1/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | hadoop-aws in the patch passed. | | +1 :green_heart: | unit | 2m 47s | | hadoop-azure in the patch passed. | | -1 :x: | asflicense | 1m 4s | [/results-asflic
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1561315131 ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.file.AccessDeniedException; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse; +import software.amazon.awssdk.services.s3.model.ObjectIdentifier; +import software.amazon.awssdk.services.s3.model.S3Error; + +import org.apache.hadoop.fs.s3a.Retries; +import org.apache.hadoop.fs.s3a.S3AStore; +import org.apache.hadoop.fs.store.audit.AuditSpan; +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Collections.emptyList; +import static java.util.Collections.singletonList; +import static org.apache.hadoop.fs.s3a.Invoker.once; +import static org.apache.hadoop.util.Preconditions.checkArgument; +import static org.apache.hadoop.util.functional.Tuples.pair; + +/** + * Callbacks for the bulk delete operation. + */ +public class BulkDeleteOperationCallbacksImpl implements +BulkDeleteOperation.BulkDeleteOperationCallbacks { + + /** + * Path for logging. + */ + private final String path; + + /** Page size for bulk delete. */ + private final int pageSize; + + /** span for operations. */ + private final AuditSpan span; + + /** + * Store. + */ + private final S3AStore store; + + + public BulkDeleteOperationCallbacksImpl(final S3AStore store, + String path, int pageSize, AuditSpan span) { +this.span = span; +this.pageSize = pageSize; +this.path = path; +this.store = store; + } + + @Override + @Retries.RetryTranslated + public List> bulkDelete(final List keysToDelete) + throws IOException, IllegalArgumentException { +span.activate(); +final int size = keysToDelete.size(); +checkArgument(size <= pageSize, +"Too many paths to delete in one operation: %s", size); +if (size == 0) { + return emptyList(); +} + +if (size == 1) { + return deleteSingleObject(keysToDelete.get(0).key()); +} + +final DeleteObjectsResponse response = once("bulkDelete", path, () -> +store.deleteObjects(store.getRequestFactory() +.newBulkDeleteRequestBuilder(keysToDelete) +.build())).getValue(); +final List errors = response.errors(); +if (errors.isEmpty()) { + // all good. + return emptyList(); +} else { + return errors.stream() + .map(e -> pair(e.key(), e.message())) Review Comment: yes e.toString() sounds better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1561126867 ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.file.AccessDeniedException; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse; +import software.amazon.awssdk.services.s3.model.ObjectIdentifier; +import software.amazon.awssdk.services.s3.model.S3Error; + +import org.apache.hadoop.fs.s3a.Retries; +import org.apache.hadoop.fs.s3a.S3AStore; +import org.apache.hadoop.fs.store.audit.AuditSpan; +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Collections.emptyList; +import static java.util.Collections.singletonList; +import static org.apache.hadoop.fs.s3a.Invoker.once; +import static org.apache.hadoop.util.Preconditions.checkArgument; +import static org.apache.hadoop.util.functional.Tuples.pair; + +/** + * Callbacks for the bulk delete operation. + */ +public class BulkDeleteOperationCallbacksImpl implements +BulkDeleteOperation.BulkDeleteOperationCallbacks { + + /** + * Path for logging. + */ + private final String path; + + /** Page size for bulk delete. */ + private final int pageSize; + + /** span for operations. */ + private final AuditSpan span; + + /** + * Store. + */ + private final S3AStore store; + + + public BulkDeleteOperationCallbacksImpl(final S3AStore store, + String path, int pageSize, AuditSpan span) { +this.span = span; +this.pageSize = pageSize; +this.path = path; +this.store = store; + } + + @Override + @Retries.RetryTranslated + public List> bulkDelete(final List keysToDelete) + throws IOException, IllegalArgumentException { +span.activate(); +final int size = keysToDelete.size(); +checkArgument(size <= pageSize, +"Too many paths to delete in one operation: %s", size); +if (size == 0) { + return emptyList(); +} + +if (size == 1) { + return deleteSingleObject(keysToDelete.get(0).key()); +} + +final DeleteObjectsResponse response = once("bulkDelete", path, () -> +store.deleteObjects(store.getRequestFactory() +.newBulkDeleteRequestBuilder(keysToDelete) +.build())).getValue(); +final List errors = response.errors(); +if (errors.isEmpty()) { + // all good. + return emptyList(); +} else { + return errors.stream() + .map(e -> pair(e.key(), e.message())) Review Comment: or e.toString()? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1560055171 ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.file.AccessDeniedException; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse; +import software.amazon.awssdk.services.s3.model.ObjectIdentifier; +import software.amazon.awssdk.services.s3.model.S3Error; + +import org.apache.hadoop.fs.s3a.Retries; +import org.apache.hadoop.fs.s3a.S3AStore; +import org.apache.hadoop.fs.store.audit.AuditSpan; +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Collections.emptyList; +import static java.util.Collections.singletonList; +import static org.apache.hadoop.fs.s3a.Invoker.once; +import static org.apache.hadoop.util.Preconditions.checkArgument; +import static org.apache.hadoop.util.functional.Tuples.pair; + +/** + * Callbacks for the bulk delete operation. + */ +public class BulkDeleteOperationCallbacksImpl implements +BulkDeleteOperation.BulkDeleteOperationCallbacks { + + /** + * Path for logging. + */ + private final String path; + + /** Page size for bulk delete. */ + private final int pageSize; + + /** span for operations. */ + private final AuditSpan span; + + /** + * Store. + */ + private final S3AStore store; + + + public BulkDeleteOperationCallbacksImpl(final S3AStore store, + String path, int pageSize, AuditSpan span) { +this.span = span; +this.pageSize = pageSize; +this.path = path; +this.store = store; + } + + @Override + @Retries.RetryTranslated + public List> bulkDelete(final List keysToDelete) + throws IOException, IllegalArgumentException { +span.activate(); +final int size = keysToDelete.size(); +checkArgument(size <= pageSize, +"Too many paths to delete in one operation: %s", size); +if (size == 0) { + return emptyList(); +} + +if (size == 1) { + return deleteSingleObject(keysToDelete.get(0).key()); +} + +final DeleteObjectsResponse response = once("bulkDelete", path, () -> +store.deleteObjects(store.getRequestFactory() +.newBulkDeleteRequestBuilder(keysToDelete) +.build())).getValue(); +final List errors = response.errors(); +if (errors.isEmpty()) { + // all good. + return emptyList(); +} else { + return errors.stream() + .map(e -> pair(e.key(), e.message())) Review Comment: e.code() gives AccessDenied and e.message() gives Access Denied. Does it make sense to add both **e.code() + " " + e.message()** to have the max info returned to the user? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1556505328 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs; + +import java.io.Closeable; +import java.io.IOException; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.statistics.IOStatisticsSource; + +import static java.util.Objects.requireNonNull; + +/** + * API for bulk deletion of objects/files, + * but not directories. + * After use, call {@code close()} to release any resources and + * to guarantee store IOStatistics are updated. + * + * Callers MUST have no expectation that parent directories will exist after the + * operation completes; if an object store needs to explicitly look for and create + * directory markers, that step will be omitted. + * + * Be aware that on some stores (AWS S3) each object listed in a bulk delete counts + * against the write IOPS limit; large page sizes are counterproductive here, as + * are attempts at parallel submissions across multiple threads. + * @see https://issues.apache.org/jira/browse/HADOOP-16823";>HADOOP-16823. + * Large DeleteObject requests are their own Thundering Herd + * + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + + /** + * The maximum number of objects/files to delete in a single request. + * @return a number greater than or equal to zero. + */ + int pageSize(); Review Comment: shouldn't this be greater than 0? equal to 0 doesn't make sense. also we have the check in S3A impl. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1556490279 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs; + +import java.io.Closeable; +import java.io.IOException; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.statistics.IOStatisticsSource; + +import static java.util.Objects.requireNonNull; + +/** + * API for bulk deletion of objects/files, + * but not directories. + * After use, call {@code close()} to release any resources and + * to guarantee store IOStatistics are updated. + * + * Callers MUST have no expectation that parent directories will exist after the + * operation completes; if an object store needs to explicitly look for and create + * directory markers, that step will be omitted. + * + * Be aware that on some stores (AWS S3) each object listed in a bulk delete counts + * against the write IOPS limit; large page sizes are counterproductive here, as + * are attempts at parallel submissions across multiple threads. + * @see https://issues.apache.org/jira/browse/HADOOP-16823";>HADOOP-16823. + * Large DeleteObject requests are their own Thundering Herd + * + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + + /** + * The maximum number of objects/files to delete in a single request. + * @return a number greater than or equal to zero. + */ + int pageSize(); + + /** + * Base path of a bulk delete operation. + * All paths submitted in {@link #bulkDelete(List)} must be under this path. + */ + Path basePath(); + + /** + * Delete a list of files/objects. + * + * Files must be under the path provided in {@link #basePath()}. Review Comment: writing contract tests for this locally., can't find the implementation of this in S3A. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1556489762 ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.file.AccessDeniedException; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse; +import software.amazon.awssdk.services.s3.model.ObjectIdentifier; +import software.amazon.awssdk.services.s3.model.S3Error; + +import org.apache.hadoop.fs.s3a.Retries; +import org.apache.hadoop.fs.s3a.S3AStore; +import org.apache.hadoop.fs.store.audit.AuditSpan; +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Collections.emptyList; +import static java.util.Collections.singletonList; +import static org.apache.hadoop.fs.s3a.Invoker.once; +import static org.apache.hadoop.util.Preconditions.checkArgument; +import static org.apache.hadoop.util.functional.Tuples.pair; + +/** + * Callbacks for the bulk delete operation. + */ +public class BulkDeleteOperationCallbacksImpl implements +BulkDeleteOperation.BulkDeleteOperationCallbacks { + + /** + * Path for logging. + */ + private final String path; + + /** Page size for bulk delete. */ + private final int pageSize; + + /** span for operations. */ + private final AuditSpan span; + + /** + * Store. + */ + private final S3AStore store; + + + public BulkDeleteOperationCallbacksImpl(final S3AStore store, + String path, int pageSize, AuditSpan span) { +this.span = span; +this.pageSize = pageSize; +this.path = path; +this.store = store; + } + + @Override + @Retries.RetryTranslated + public List> bulkDelete(final List keysToDelete) + throws IOException, IllegalArgumentException { +span.activate(); +final int size = keysToDelete.size(); +checkArgument(size <= pageSize, +"Too many paths to delete in one operation: %s", size); +if (size == 0) { + return emptyList(); +} + +if (size == 1) { + return deleteSingleObject(keysToDelete.get(0).key()); +} + +final DeleteObjectsResponse response = once("bulkDelete", path, () -> +store.deleteObjects(store.getRequestFactory() +.newBulkDeleteRequestBuilder(keysToDelete) +.build())).getValue(); +final List errors = response.errors(); +if (errors.isEmpty()) { + // all good. + return emptyList(); +} else { + return errors.stream() + .map(e -> pair(e.key(), e.message())) + .collect(Collectors.toList()); +} + } + + /** + * Delete a single object. + * @param key key to delete + * @return list of keys which failed to delete: length 0 or 1. + * @throws IOException IO problem other than AccessDeniedException + */ + @Retries.RetryTranslated + private List> deleteSingleObject(final String key) throws IOException { Review Comment: after checking locally, this is fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-2025696873 In #6686 I'm creating a new utils class for reflection access, nothing else. And proposing that all tests of the API use reflection to be really confident it works and that there's no accidental changes which break reflection -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-2025591287 FYI i want to pull the rate limiter API of #6596 in here too; we'd have a rate limiter in s3a store which if enabled would limit #of deletes which can be issued on a bucket. Ideally it'd be at 3000 on s3 standard, off for s3 express and third party stores, so reduce load this call can generate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
ahmarsuhail commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1542647008 ## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ## @@ -0,0 +1,284 @@ + + +# interface `BulkDelete` + + + +The `BulkDelete` interface provides an API to perform bulk delete of files/objects +in an object store or filesystem. + +## Key Features + +* An API for submitting a list of paths to delete. +* This list must be no larger than the "page size" supported by the client; This size is also exposed as a method. +* Triggers a request to delete files at the specific paths. +* Returns a list of which paths were reported as delete failures by the store. +* Does not consider a nonexistent file to be a failure. +* Does not offer any atomicity guarantees. +* Idempotency guarantees are weak: retries may delete files newly created by other clients. +* Provides no guarantees as to the outcome if a path references a directory. +* Provides no guarantees that parent directories will exist after the call. + + +The API is designed to match the semantics of the AWS S3 [Bulk Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) REST API call, but it is not +exclusively restricted to this store. This is why the "provides no guarantees" +restrictions do not state what the outcome will be when executed on other stores. + +### Interface `org.apache.hadoop.fs.BulkDeleteSource` + +The interface `BulkDeleteSource` is offered by a FileSystem/FileContext class if +it supports the API. + +```java +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDeleteSource { + + /** + * Create a bulk delete operation. + * There is no network IO at this point, simply the creation of + * a bulk delete object. + * A path must be supplied to assist in link resolution. + * @param path path to delete under. + * @return the bulk delete. + * @throws UnsupportedOperationException bulk delete under that path is not supported. + * @throws IllegalArgumentException path not valid. + * @throws IOException problems resolving paths + */ + default BulkDelete createBulkDelete(Path path) + throws UnsupportedOperationException, IllegalArgumentException, IOException; + +} + +``` + +### Interface `org.apache.hadoop.fs.BulkDelete` + +This is the bulk delete implementation returned by the `createBulkDelete()` call. + +```java +/** + * API for bulk deletion of objects/files, + * but not directories. + * After use, call {@code close()} to release any resources and + * to guarantee store IOStatistics are updated. + * + * Callers MUST have no expectation that parent directories will exist after the + * operation completes; if an object store needs to explicitly look for and create + * directory markers, that step will be omitted. + * + * Be aware that on some stores (AWS S3) each object listed in a bulk delete counts + * against the write IOPS limit; large page sizes are counterproductive here, as + * are attempts at parallel submissions across multiple threads. + * @see https://issues.apache.org/jira/browse/HADOOP-16823";>HADOOP-16823. + * Large DeleteObject requests are their own Thundering Herd + * + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + + /** + * The maximum number of objects/files to delete in a single request. + * @return a number greater than or equal to zero. + */ + int pageSize(); + + /** + * Base path of a bulk delete operation. + * All paths submitted in {@link #bulkDelete(List)} must be under this path. + */ + Path basePath(); + + /** + * Delete a list of files/objects. + * + * Files must be under the path provided in {@link #basePath()}. + * The size of the list must be equal to or less than the page size + * declared in {@link #pageSize()}. + * Directories are not supported; the outcome of attempting to delete + * directories is undefined (ignored; undetected, listed as failures...). + * The operation is not atomic. + * The operation is treated as idempotent: network failures may + *trigger resubmission of the request -any new objects created under a + *path in the list may then be deleted. + *There is no guarantee that any parent directories exist after this call. + * + * + * @param paths list of paths which must be absolute and under the base path. + * provided in {@link #basePath()}. + * @throws IOException IO problems including networking, authentication and more. + * @throws IllegalArgumentException if a path argument is invalid. + */ + List> bulkDelete(List paths) + throws IOException, IllegalArgumentException; + +} + +``` + +### `bulkDelete(paths)` + + Preconditions + +```python +if length(paths) > pageSize: throw IllegalArgumentException +``` + + Postconditions + +All paths which
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1535463621 ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java: ## @@ -5457,7 +5421,11 @@ public boolean hasPathCapability(final Path path, final String capability) case STORE_CAPABILITY_DIRECTORY_MARKER_AWARE: return true; - // multi object delete flag +// this is always true, even if multi object +// delete is disabled -the page size is simply reduced to 1. +case CommonPathCapabilities.BULK_DELETE: Review Comment: it means the API is present and some of the semantics "parent dir existence not guaranteed". For that reason, it will always be faster than before: one DELETE; no LIST/HEAD etc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1535462548 ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.file.AccessDeniedException; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse; +import software.amazon.awssdk.services.s3.model.ObjectIdentifier; +import software.amazon.awssdk.services.s3.model.S3Error; + +import org.apache.hadoop.fs.s3a.Retries; +import org.apache.hadoop.fs.s3a.S3AStore; +import org.apache.hadoop.fs.store.audit.AuditSpan; +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Collections.emptyList; +import static java.util.Collections.singletonList; +import static org.apache.hadoop.fs.s3a.Invoker.once; +import static org.apache.hadoop.util.Preconditions.checkArgument; +import static org.apache.hadoop.util.functional.Tuples.pair; + +/** + * Callbacks for the bulk delete operation. + */ +public class BulkDeleteOperationCallbacksImpl implements +BulkDeleteOperation.BulkDeleteOperationCallbacks { + + /** + * Path for logging. + */ + private final String path; + + /** Page size for bulk delete. */ + private final int pageSize; + + /** span for operations. */ + private final AuditSpan span; + + /** + * Store. + */ + private final S3AStore store; + + + public BulkDeleteOperationCallbacksImpl(final S3AStore store, + String path, int pageSize, AuditSpan span) { +this.span = span; +this.pageSize = pageSize; +this.path = path; +this.store = store; + } + + @Override + @Retries.RetryTranslated + public List> bulkDelete(final List keysToDelete) + throws IOException, IllegalArgumentException { +span.activate(); +final int size = keysToDelete.size(); +checkArgument(size <= pageSize, +"Too many paths to delete in one operation: %s", size); +if (size == 0) { + return emptyList(); +} + +if (size == 1) { + return deleteSingleObject(keysToDelete.get(0).key()); +} + +final DeleteObjectsResponse response = once("bulkDelete", path, () -> +store.deleteObjects(store.getRequestFactory() +.newBulkDeleteRequestBuilder(keysToDelete) +.build())).getValue(); +final List errors = response.errors(); +if (errors.isEmpty()) { + // all good. + return emptyList(); +} else { + return errors.stream() + .map(e -> pair(e.key(), e.message())) + .collect(Collectors.toList()); +} + } + + /** + * Delete a single object. + * @param key key to delete + * @return list of keys which failed to delete: length 0 or 1. + * @throws IOException IO problem other than AccessDeniedException + */ + @Retries.RetryTranslated + private List> deleteSingleObject(final String key) throws IOException { Review Comment: prefer a collection? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
mukund-thakur commented on code in PR #6494: URL: https://github.com/apache/hadoop/pull/6494#discussion_r1532928858 ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java: ## @@ -5457,7 +5421,11 @@ public boolean hasPathCapability(final Path path, final String capability) case STORE_CAPABILITY_DIRECTORY_MARKER_AWARE: return true; - // multi object delete flag +// this is always true, even if multi object +// delete is disabled -the page size is simply reduced to 1. +case CommonPathCapabilities.BULK_DELETE: Review Comment: nit: won't this be a bit misleading? ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl; + +import java.io.FileNotFoundException; +import java.io.IOException; +import java.nio.file.AccessDeniedException; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import software.amazon.awssdk.services.s3.model.DeleteObjectsResponse; +import software.amazon.awssdk.services.s3.model.ObjectIdentifier; +import software.amazon.awssdk.services.s3.model.S3Error; + +import org.apache.hadoop.fs.s3a.Retries; +import org.apache.hadoop.fs.s3a.S3AStore; +import org.apache.hadoop.fs.store.audit.AuditSpan; +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Collections.emptyList; +import static java.util.Collections.singletonList; +import static org.apache.hadoop.fs.s3a.Invoker.once; +import static org.apache.hadoop.util.Preconditions.checkArgument; +import static org.apache.hadoop.util.functional.Tuples.pair; + +/** + * Callbacks for the bulk delete operation. + */ +public class BulkDeleteOperationCallbacksImpl implements +BulkDeleteOperation.BulkDeleteOperationCallbacks { + + /** + * Path for logging. + */ + private final String path; + + /** Page size for bulk delete. */ + private final int pageSize; + + /** span for operations. */ + private final AuditSpan span; + + /** + * Store. + */ + private final S3AStore store; + + + public BulkDeleteOperationCallbacksImpl(final S3AStore store, + String path, int pageSize, AuditSpan span) { +this.span = span; +this.pageSize = pageSize; +this.path = path; +this.store = store; + } + + @Override + @Retries.RetryTranslated + public List> bulkDelete(final List keysToDelete) + throws IOException, IllegalArgumentException { +span.activate(); +final int size = keysToDelete.size(); +checkArgument(size <= pageSize, +"Too many paths to delete in one operation: %s", size); +if (size == 0) { + return emptyList(); +} + +if (size == 1) { + return deleteSingleObject(keysToDelete.get(0).key()); +} + +final DeleteObjectsResponse response = once("bulkDelete", path, () -> +store.deleteObjects(store.getRequestFactory() +.newBulkDeleteRequestBuilder(keysToDelete) +.build())).getValue(); +final List errors = response.errors(); +if (errors.isEmpty()) { + // all good. + return emptyList(); +} else { + return errors.stream() + .map(e -> pair(e.key(), e.message())) + .collect(Collectors.toList()); +} + } + + /** + * Delete a single object. + * @param key key to delete + * @return list of keys which failed to delete: length 0 or 1. + * @throws IOException IO problem other than AccessDeniedException + */ + @Retries.RetryTranslated + private List> deleteSingleObject(final String key) throws IOException { Review Comment: do we need the return to be a List? ## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BulkDeleteOperationCallbacksImpl.java: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache Li
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1998394225 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 40s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 36m 15s | | trunk passed | | +1 :green_heart: | compile | 18m 57s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 17m 15s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 4m 44s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 49s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 2m 33s | [/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html) | hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 38m 11s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 25s | | the patch passed | | +1 :green_heart: | compile | 18m 26s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 18m 26s | | the patch passed | | +1 :green_heart: | compile | 17m 13s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 17m 13s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/blanks-eol.txt) | The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 37s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/results-checkstyle-root.txt) | root: The patch generated 23 new + 41 unchanged - 0 fixed = 64 total (was 41) | | +1 :green_heart: | mvnsite | 2m 29s | | the patch passed | | -1 :x: | javadoc | 1m 9s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | -1 :x: | javadoc | 0m 45s | [/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08 with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | spotbugs | 4m 5s | | the patch passed | | +1 :green_heart: | shadedclient | 39m 25s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 13s | | hadoop-common in the patch passed. | | -1 :x: | unit | 3m 6s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/6/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | hadoop-aws in the patch passed.
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1996080816 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 27s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 36m 46s | | trunk passed | | +1 :green_heart: | compile | 19m 33s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 20m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 5m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 32s | | trunk passed | | +1 :green_heart: | javadoc | 2m 10s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 32s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 2m 36s | [/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html) | hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 39m 4s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 30s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 18m 12s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 18m 12s | | the patch passed | | +1 :green_heart: | compile | 17m 29s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 17m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 4m 31s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/results-checkstyle-root.txt) | root: The patch generated 18 new + 41 unchanged - 0 fixed = 59 total (was 41) | | +1 :green_heart: | mvnsite | 2m 26s | | the patch passed | | -1 :x: | javadoc | 1m 8s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | -1 :x: | javadoc | 0m 45s | [/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08 with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) | | +1 :green_heart: | spotbugs | 4m 9s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 55s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 20m 3s | | hadoop-common in the patch passed. | | -1 :x: | unit | 3m 16s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 1m 1s | | The patch does not generate ASF License warnings. | | | | 270m 10s | | | | Reason | Tests | |---:|:--|
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1947315201 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 18m 18s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 27s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 35m 25s | | trunk passed | | +1 :green_heart: | compile | 18m 44s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 17m 14s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 4m 38s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 27s | | trunk passed | | +1 :green_heart: | javadoc | 1m 47s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 32s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 2m 31s | [/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html) | hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 38m 32s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 30s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 25s | | the patch passed | | +1 :green_heart: | compile | 18m 16s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 18m 16s | | the patch passed | | +1 :green_heart: | compile | 17m 14s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 17m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 4m 55s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/results-checkstyle-root.txt) | root: The patch generated 1 new + 39 unchanged - 0 fixed = 40 total (was 39) | | +1 :green_heart: | mvnsite | 2m 38s | | the patch passed | | -1 :x: | javadoc | 1m 10s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) | | +1 :green_heart: | javadoc | 1m 44s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 48s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 20m 8s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 2m 51s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 280m 10s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6494 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 28b9081dc0d4 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Buil
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1936382465 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 8s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 36m 26s | | trunk passed | | +1 :green_heart: | compile | 20m 5s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 37s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 4m 42s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 31s | | trunk passed | | +1 :green_heart: | javadoc | 1m 47s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 2m 33s | [/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html) | hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 38m 23s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 17m 29s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 29s | | the patch passed | | +1 :green_heart: | compile | 16m 29s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 16m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 4m 32s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/results-checkstyle-root.txt) | root: The patch generated 1 new + 39 unchanged - 0 fixed = 40 total (was 39) | | +1 :green_heart: | mvnsite | 2m 30s | | the patch passed | | -1 :x: | javadoc | 1m 7s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 7s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 3m 9s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 259m 23s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6494 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux d1aa5776a4a0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Buil
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1935875347 +add a FileUtils method to assist deletion here, with `FileUtils.bulkDeletePageSize(path) -> int` and `FileUtils.bulkDelete(path, List) -> List; each will create a bulk delete object, execute the operation/probe and then close. why so? Makes reflection binding straighforward: no new types; just two methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1912353079 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 57s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 10s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 35m 50s | | trunk passed | | +1 :green_heart: | compile | 18m 13s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 31s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 4m 37s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 47s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 48s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 36s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 8s | | the patch passed | | +1 :green_heart: | compile | 18m 41s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 18m 41s | | the patch passed | | +1 :green_heart: | compile | 17m 6s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 17m 6s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 4m 31s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/artifact/out/results-checkstyle-root.txt) | root: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) | | +1 :green_heart: | mvnsite | 2m 28s | | the patch passed | | -1 :x: | javadoc | 1m 9s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) | | +1 :green_heart: | javadoc | 1m 33s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 5s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 38s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 5s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 3m 7s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 260m 38s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6494 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux f605ff408523 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5afb6598adc1d81d8dcbbbaaecd2fa28d558e36f | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #6494: URL: https://github.com/apache/hadoop/pull/6494#issuecomment-1908768612 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 21s | | Maven dependency ordering for branch | | -1 :x: | mvninstall | 7m 7s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | -1 :x: | compile | 9m 3s | [/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | root in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | compile | 8m 32s | [/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | root in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | +1 :green_heart: | checkstyle | 4m 38s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 18s | | trunk passed | | +1 :green_heart: | javadoc | 1m 23s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 14s | | trunk passed | | -1 :x: | shadedclient | 11m 29s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 47s | | the patch passed | | -1 :x: | compile | 12m 33s | [/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | javac | 12m 33s | [/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | compile | 12m 23s | [/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | -1 :x: | javac | 12m 23s | [/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 6m 18s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/results-checkstyle-root.txt) | root: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) | | +1 :green_heart: | mvnsite | 2m 56s | | the patch passed | | -1 :x: | javadoc | 1m 18s | [/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0u
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #5993: URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1908677204 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 17s | | Maven dependency ordering for branch | | -1 :x: | mvninstall | 4m 17s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | -1 :x: | compile | 3m 53s | [/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | root in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | compile | 3m 36s | [/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | root in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | +1 :green_heart: | checkstyle | 1m 54s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 12s | | trunk passed | | +1 :green_heart: | javadoc | 0m 50s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 49s | | trunk passed | | -1 :x: | shadedclient | 4m 52s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 20s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 0m 45s | | the patch passed | | -1 :x: | compile | 3m 48s | [/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | javac | 3m 48s | [/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | compile | 3m 36s | [/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | -1 :x: | javac | 3m 36s | [/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 48s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/results-checkstyle-root.txt) | root: The patch generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3) | | +1 :green_heart: | mvnsite | 1m 6s | | the patch passed | | +1 :green_heart: | javadoc | 0m 39s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 41s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 1s | | the patch passed | | -1 :x: | shadedclient | 4m 57s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | +
[PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran opened a new pull request, #6494: URL: https://github.com/apache/hadoop/pull/6494 A more minimal design that is easier to use and implement than #5993 Caller creates a BulkOperation; they get the page size of it and then submit batches to delete of less than that size. The outcome of each call contains a list of failures. S3A implementation to show how straightforward it is. Even with the single entry page size, it is still more efficient to use this as it doesn't try to recreate a parent dir or perform any probes to see if it is a directory: it maps straight to a DELETE call. ### How was this patch tested? If the design looks good, I'll write some contract tests as well as a filesystem api specification. ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
hadoop-yetus commented on PR #5993: URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1873467248 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 47s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 28s | | trunk passed | | +1 :green_heart: | compile | 8m 19s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 7m 27s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 2m 4s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | javadoc | 1m 4s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 0s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 9s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 20s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 0m 50s | | the patch passed | | +1 :green_heart: | compile | 7m 52s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 7m 52s | | the patch passed | | +1 :green_heart: | compile | 7m 27s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 7m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 59s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/artifact/out/results-checkstyle-root.txt) | root: The patch generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3) | | +1 :green_heart: | mvnsite | 1m 18s | | the patch passed | | +1 :green_heart: | javadoc | 1m 2s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 57s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 19m 45s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 16m 22s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 2m 10s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 36s | | The patch does not generate ASF License warnings. | | | | 143m 28s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5993 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux e75e83010c54 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d69fac0192c14889f0b3aa62bdb76e1d196eec8c | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/testReport/ | | Max. process+thread count | 2432 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . | | Console output | https://ci-hadoop.apache.org/job/
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on code in PR #5993: URL: https://github.com/apache/hadoop/pull/5993#discussion_r1349184302 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java: ## @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs; + +import java.io.IOException; +import java.util.List; +import java.util.concurrent.CompletableFuture; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.statistics.IOStatistics; +import org.apache.hadoop.fs.statistics.IOStatisticsSource; + +import static org.apache.hadoop.fs.statistics.IOStatisticsLogging.ioStatisticsToPrettyString; + +/** + * Interface for bulk file delete operations. + * + * The expectation is that the iterator-provided list of paths + * will be batched into pages and submitted to the remote filesystem/store + * for bulk deletion, possibly in parallel. + * + * A remote iterator provides the list of paths to delete; all must be under Review Comment: its for multiple mounted filesystems (viewfs) to direct to the final fs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion [hadoop]
steveloughran commented on PR #5993: URL: https://github.com/apache/hadoop/pull/5993#issuecomment-1748524668 @ahmarsuhail - caller provides a remote iterator, such as the ones we do for listing or another source/transformation (see RemoteIterators) - build() call returns some result - implementation kicks off a worker thread to process the iterator, reading its values in until there's enough to kick off a DELETE request (page or maybe a parallel set in a thread pool) - after each page/set of deletes, invokes the supplied callback of results - then continues, unless told to stop - finish only on: iterator has nothing, iterator raises an exception - or maybe on reaching some limit on failures - including maybe those considered unrecoverable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org