steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1566433464
########## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ########## @@ -0,0 +1,284 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# <a name="BulkDelete"></a> interface `BulkDelete` Review Comment: needs to be referenced from index.md ########## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ########## @@ -0,0 +1,284 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# <a name="BulkDelete"></a> interface `BulkDelete` + +<!-- MACRO{toc|fromDepth=1|toDepth=2} --> + +The `BulkDelete` interface provides an API to perform bulk delete of files/objects +in an object store or filesystem. + +## Key Features + +* An API for submitting a list of paths to delete. +* This list must be no larger than the "page size" supported by the client; This size is also exposed as a method. +* Triggers a request to delete files at the specific paths. +* Returns a list of which paths were reported as delete failures by the store. +* Does not consider a nonexistent file to be a failure. +* Does not offer any atomicity guarantees. +* Idempotency guarantees are weak: retries may delete files newly created by other clients. +* Provides no guarantees as to the outcome if a path references a directory. +* Provides no guarantees that parent directories will exist after the call. + + +The API is designed to match the semantics of the AWS S3 [Bulk Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) REST API call, but it is not +exclusively restricted to this store. This is why the "provides no guarantees" +restrictions do not state what the outcome will be when executed on other stores. + +### Interface `org.apache.hadoop.fs.BulkDeleteSource` + +The interface `BulkDeleteSource` is offered by a FileSystem/FileContext class if +it supports the API. + +```java +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDeleteSource { + + /** + * Create a bulk delete operation. + * There is no network IO at this point, simply the creation of + * a bulk delete object. + * A path must be supplied to assist in link resolution. + * @param path path to delete under. + * @return the bulk delete. + * @throws UnsupportedOperationException bulk delete under that path is not supported. + * @throws IllegalArgumentException path not valid. + * @throws IOException problems resolving paths + */ + default BulkDelete createBulkDelete(Path path) + throws UnsupportedOperationException, IllegalArgumentException, IOException; + +} + +``` + +### Interface `org.apache.hadoop.fs.BulkDelete` + +This is the bulk delete implementation returned by the `createBulkDelete()` call. + +```java +/** + * API for bulk deletion of objects/files, + * <i>but not directories</i>. + * After use, call {@code close()} to release any resources and + * to guarantee store IOStatistics are updated. + * <p> + * Callers MUST have no expectation that parent directories will exist after the + * operation completes; if an object store needs to explicitly look for and create + * directory markers, that step will be omitted. + * <p> + * Be aware that on some stores (AWS S3) each object listed in a bulk delete counts + * against the write IOPS limit; large page sizes are counterproductive here, as + * are attempts at parallel submissions across multiple threads. + * @see <a href="https://issues.apache.org/jira/browse/HADOOP-16823">HADOOP-16823. + * Large DeleteObject requests are their own Thundering Herd</a> + * <p> + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + + /** + * The maximum number of objects/files to delete in a single request. + * @return a number greater than or equal to zero. + */ + int pageSize(); + + /** + * Base path of a bulk delete operation. + * All paths submitted in {@link #bulkDelete(List)} must be under this path. + */ + Path basePath(); + + /** + * Delete a list of files/objects. + * <ul> + * <li>Files must be under the path provided in {@link #basePath()}.</li> + * <li>The size of the list must be equal to or less than the page size + * declared in {@link #pageSize()}.</li> + * <li>Directories are not supported; the outcome of attempting to delete + * directories is undefined (ignored; undetected, listed as failures...).</li> + * <li>The operation is not atomic.</li> + * <li>The operation is treated as idempotent: network failures may + * trigger resubmission of the request -any new objects created under a + * path in the list may then be deleted.</li> + * <li>There is no guarantee that any parent directories exist after this call. + * </li> + * </ul> + * @param paths list of paths which must be absolute and under the base path. + * provided in {@link #basePath()}. + * @throws IOException IO problems including networking, authentication and more. + * @throws IllegalArgumentException if a path argument is invalid. + */ + List<Map.Entry<Path, String>> bulkDelete(List<Path> paths) + throws IOException, IllegalArgumentException; + +} + +``` + +### `bulkDelete(paths)` + +#### Preconditions + +```python +if length(paths) > pageSize: throw IllegalArgumentException +``` + +#### Postconditions + +All paths which refer to files are removed from the set of files. +```python +FS'Files = FS.Files - [paths] +``` + +No other restrictions are placed upon the outcome. + + +### Availability + +The `BulkDeleteSource` interface is exported by `FileSystem` and `FileContext` storage clients +which MAY support the API; it may still be unsupported by the +specific instance. + +Use the `PathCapabilities` probe `fs.capability.bulk.delete`. + +```java +store.hasPathCapability(path, "fs.capability.bulk.delete") +``` + +### Invocation through Reflection. + +The need for many Libraries to compile against very old versions of Hadoop +means that most of the cloud-first Filesystem API calls cannot be used except +through reflection -And the more complicated The API and its data types are, +The harder that reflection is to implement. + +To assist this, the class `org.apache.hadoop.fs.FileUtil` has two methods Review Comment: proposed, use the class/package of #6686 , `hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/wrappedio/WrappedOperations.java` ########## hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java: ########## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract; + +import org.apache.hadoop.fs.*; +import org.assertj.core.api.Assertions; +import org.junit.Before; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static org.apache.hadoop.fs.contract.ContractTestUtils.touch; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; + +public abstract class AbstractContractBulkDeleteTest extends AbstractFSContractTestBase { + + private static final Logger LOG = + LoggerFactory.getLogger(AbstractContractBulkDeleteTest.class); + + protected int pageSize; + + protected Path basePath; + + protected FileSystem fs; + + @Before + public void setUp() throws Exception { + fs = getFileSystem(); + basePath = path(getClass().getName()); Review Comment: use methodPath() if it is there. I'm going to have to look at this because it's creating race condition issued with abfs ########## hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java: ########## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract; + +import org.apache.hadoop.fs.*; +import org.assertj.core.api.Assertions; +import org.junit.Before; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static org.apache.hadoop.fs.contract.ContractTestUtils.touch; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; + +public abstract class AbstractContractBulkDeleteTest extends AbstractFSContractTestBase { Review Comment: javadocs. ########## hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java: ########## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract; Review Comment: my PR will reorder this and add a static import of bulkDelete, which is why I'm not worrying about this ########## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ########## @@ -0,0 +1,284 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# <a name="BulkDelete"></a> interface `BulkDelete` + +<!-- MACRO{toc|fromDepth=1|toDepth=2} --> + +The `BulkDelete` interface provides an API to perform bulk delete of files/objects +in an object store or filesystem. + +## Key Features + +* An API for submitting a list of paths to delete. +* This list must be no larger than the "page size" supported by the client; This size is also exposed as a method. +* Triggers a request to delete files at the specific paths. +* Returns a list of which paths were reported as delete failures by the store. +* Does not consider a nonexistent file to be a failure. +* Does not offer any atomicity guarantees. +* Idempotency guarantees are weak: retries may delete files newly created by other clients. +* Provides no guarantees as to the outcome if a path references a directory. +* Provides no guarantees that parent directories will exist after the call. + + +The API is designed to match the semantics of the AWS S3 [Bulk Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) REST API call, but it is not +exclusively restricted to this store. This is why the "provides no guarantees" +restrictions do not state what the outcome will be when executed on other stores. + +### Interface `org.apache.hadoop.fs.BulkDeleteSource` + +The interface `BulkDeleteSource` is offered by a FileSystem/FileContext class if +it supports the API. + +```java +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDeleteSource { + + /** + * Create a bulk delete operation. + * There is no network IO at this point, simply the creation of + * a bulk delete object. + * A path must be supplied to assist in link resolution. + * @param path path to delete under. + * @return the bulk delete. + * @throws UnsupportedOperationException bulk delete under that path is not supported. + * @throws IllegalArgumentException path not valid. + * @throws IOException problems resolving paths + */ + default BulkDelete createBulkDelete(Path path) + throws UnsupportedOperationException, IllegalArgumentException, IOException; + +} + +``` + +### Interface `org.apache.hadoop.fs.BulkDelete` + +This is the bulk delete implementation returned by the `createBulkDelete()` call. + +```java +/** + * API for bulk deletion of objects/files, + * <i>but not directories</i>. + * After use, call {@code close()} to release any resources and + * to guarantee store IOStatistics are updated. + * <p> + * Callers MUST have no expectation that parent directories will exist after the + * operation completes; if an object store needs to explicitly look for and create + * directory markers, that step will be omitted. + * <p> + * Be aware that on some stores (AWS S3) each object listed in a bulk delete counts + * against the write IOPS limit; large page sizes are counterproductive here, as + * are attempts at parallel submissions across multiple threads. + * @see <a href="https://issues.apache.org/jira/browse/HADOOP-16823">HADOOP-16823. + * Large DeleteObject requests are their own Thundering Herd</a> + * <p> + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + + /** + * The maximum number of objects/files to delete in a single request. + * @return a number greater than or equal to zero. + */ + int pageSize(); + + /** + * Base path of a bulk delete operation. + * All paths submitted in {@link #bulkDelete(List)} must be under this path. + */ + Path basePath(); + + /** + * Delete a list of files/objects. + * <ul> + * <li>Files must be under the path provided in {@link #basePath()}.</li> + * <li>The size of the list must be equal to or less than the page size + * declared in {@link #pageSize()}.</li> + * <li>Directories are not supported; the outcome of attempting to delete + * directories is undefined (ignored; undetected, listed as failures...).</li> + * <li>The operation is not atomic.</li> + * <li>The operation is treated as idempotent: network failures may + * trigger resubmission of the request -any new objects created under a + * path in the list may then be deleted.</li> + * <li>There is no guarantee that any parent directories exist after this call. + * </li> + * </ul> + * @param paths list of paths which must be absolute and under the base path. + * provided in {@link #basePath()}. + * @throws IOException IO problems including networking, authentication and more. + * @throws IllegalArgumentException if a path argument is invalid. + */ + List<Map.Entry<Path, String>> bulkDelete(List<Path> paths) + throws IOException, IllegalArgumentException; + +} + +``` + +### `bulkDelete(paths)` + +#### Preconditions + +```python +if length(paths) > pageSize: throw IllegalArgumentException +``` + +#### Postconditions + +All paths which refer to files are removed from the set of files. +```python +FS'Files = FS.Files - [paths] +``` + +No other restrictions are placed upon the outcome. + + +### Availability + +The `BulkDeleteSource` interface is exported by `FileSystem` and `FileContext` storage clients +which MAY support the API; it may still be unsupported by the +specific instance. + +Use the `PathCapabilities` probe `fs.capability.bulk.delete`. + +```java +store.hasPathCapability(path, "fs.capability.bulk.delete") +``` + +### Invocation through Reflection. + +The need for many Libraries to compile against very old versions of Hadoop +means that most of the cloud-first Filesystem API calls cannot be used except +through reflection -And the more complicated The API and its data types are, +The harder that reflection is to implement. + +To assist this, the class `org.apache.hadoop.fs.FileUtil` has two methods +which are intended to provide simple access to the API, especially +through reflection. + +```java + /** Review Comment: I'd recommend cutting the javadocs as it only creates more maintenance. just list the methods ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java: ########## @@ -52,6 +52,7 @@ import java.util.concurrent.atomic.AtomicBoolean; import javax.annotation.Nullable; +import org.apache.hadoop.util.*; Review Comment: wrong place and a bulk import ########## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java: ########## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract.s3a; + +import org.apache.hadoop.conf.Configuration; Review Comment: nit. imports. you know you can set up the import ordering in your IDE... ########## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java: ########## @@ -23,9 +23,14 @@ import java.io.IOException; import java.net.URI; import java.nio.file.AccessDeniedException; +import java.util.ArrayList; +import java.util.Arrays; import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; import java.util.stream.IntStream; +import org.apache.hadoop.fs.*; Review Comment: imports ########## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java: ########## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract.s3a; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.BulkDelete; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.contract.AbstractContractBulkDeleteTest; +import org.apache.hadoop.fs.contract.AbstractFSContract; +import org.apache.hadoop.fs.s3a.Constants; +import org.apache.hadoop.fs.s3a.S3AFileSystem; +import org.apache.hadoop.fs.s3a.S3ATestUtils; +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; +import java.util.List; + +import static java.util.stream.Collectors.toList; +import static org.apache.hadoop.fs.s3a.S3ATestUtils.createFiles; +import static org.apache.hadoop.fs.statistics.StoreStatisticNames.STORE_IO_RATE_LIMITED_DURATION; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; + +public class ITestS3AContractBulkDelete extends AbstractContractBulkDeleteTest { + + private static final Logger LOG = LoggerFactory.getLogger(ITestS3AContractBulkDelete.class); + + /** + * Delete Page size: {@value}. + * This is the default page size for bulk delete operation for this contract test. + * All the tests in this class should pass number of paths equal to or less than + * this page size during the bulk delete operation. + */ + private static final int DELETE_PAGE_SIZE = 20; + + @Override + protected Configuration createConfiguration() { + Configuration conf = super.createConfiguration(); + S3ATestUtils.disableFilesystemCaching(conf); + S3ATestUtils.removeBaseAndBucketOverrides(conf, + Constants.BULK_DELETE_PAGE_SIZE); + conf.setInt(Constants.BULK_DELETE_PAGE_SIZE, DELETE_PAGE_SIZE); + return conf; + } + + @Override + protected AbstractFSContract createContract(Configuration conf) { + return new S3AContract(createConfiguration()); + } + + @Override + public void validatePageSize() throws Exception { + Assertions.assertThat(pageSize) + .describedAs("Page size should match the configured page size") + .isEqualTo(DELETE_PAGE_SIZE); + } + + @Test + public void testBulkDeleteZeroPageSizePrecondition() throws Exception { + Configuration conf = getContract().getConf(); + conf.setInt(Constants.BULK_DELETE_PAGE_SIZE, 0); + Path testPath = path(getMethodName()); + try (S3AFileSystem fs = S3ATestUtils.createTestFileSystem(conf)) { + intercept(IllegalArgumentException.class, + () -> fs.createBulkDelete(testPath)); + } + } + + @Test + public void testPageSizeWhenMultiObjectsDisabled() throws Exception { + Configuration conf = getContract().getConf(); + conf.setBoolean(Constants.ENABLE_MULTI_DELETE, false); + Path testPath = path(getMethodName()); + try (S3AFileSystem fs = S3ATestUtils.createTestFileSystem(conf)) { + BulkDelete bulkDelete = fs.createBulkDelete(testPath); + Assertions.assertThat(bulkDelete.pageSize()) + .describedAs("Page size should be 1 when multi-object delete is disabled") + .isEqualTo(1); + } + } + + @Test + public void testRateLimiting() throws Exception { + Configuration conf = getContract().getConf(); + conf.setInt(Constants.S3A_IO_RATE_LIMIT, 5); + Path basePath = path(getMethodName()); + try (S3AFileSystem fs = S3ATestUtils.createTestFileSystem(conf)) { + createFiles(fs, basePath, 1, 20, 0); + FileStatus[] fileStatuses = fs.listStatus(basePath); + List<Path> paths = Arrays.stream(fileStatuses) + .map(FileStatus::getPath) + .collect(toList()); + BulkDelete bulkDelete = fs.createBulkDelete(basePath); + bulkDelete.bulkDelete(paths); + String mean = STORE_IO_RATE_LIMITED_DURATION + ".mean"; + Assertions.assertThat(fs.getIOStatistics().meanStatistics().get(mean).mean()) Review Comment: there are special IOStatisticsAssertions for asserting over stats ########## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java: ########## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract.s3a; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.BulkDelete; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.contract.AbstractContractBulkDeleteTest; +import org.apache.hadoop.fs.contract.AbstractFSContract; +import org.apache.hadoop.fs.s3a.Constants; +import org.apache.hadoop.fs.s3a.S3AFileSystem; +import org.apache.hadoop.fs.s3a.S3ATestUtils; +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; +import java.util.List; + +import static java.util.stream.Collectors.toList; +import static org.apache.hadoop.fs.s3a.S3ATestUtils.createFiles; +import static org.apache.hadoop.fs.statistics.StoreStatisticNames.STORE_IO_RATE_LIMITED_DURATION; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; + +public class ITestS3AContractBulkDelete extends AbstractContractBulkDeleteTest { + + private static final Logger LOG = LoggerFactory.getLogger(ITestS3AContractBulkDelete.class); + + /** + * Delete Page size: {@value}. + * This is the default page size for bulk delete operation for this contract test. + * All the tests in this class should pass number of paths equal to or less than + * this page size during the bulk delete operation. + */ + private static final int DELETE_PAGE_SIZE = 20; Review Comment: should be parameterized on bulk delete enabled/disabled, also setting that in createConfiguration. if bulk delete is disabled, page size == 1 ########## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ########## @@ -0,0 +1,284 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# <a name="BulkDelete"></a> interface `BulkDelete` + +<!-- MACRO{toc|fromDepth=1|toDepth=2} --> + +The `BulkDelete` interface provides an API to perform bulk delete of files/objects +in an object store or filesystem. + +## Key Features + +* An API for submitting a list of paths to delete. +* This list must be no larger than the "page size" supported by the client; This size is also exposed as a method. +* Triggers a request to delete files at the specific paths. +* Returns a list of which paths were reported as delete failures by the store. +* Does not consider a nonexistent file to be a failure. +* Does not offer any atomicity guarantees. +* Idempotency guarantees are weak: retries may delete files newly created by other clients. +* Provides no guarantees as to the outcome if a path references a directory. +* Provides no guarantees that parent directories will exist after the call. + + +The API is designed to match the semantics of the AWS S3 [Bulk Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) REST API call, but it is not +exclusively restricted to this store. This is why the "provides no guarantees" +restrictions do not state what the outcome will be when executed on other stores. + +### Interface `org.apache.hadoop.fs.BulkDeleteSource` + +The interface `BulkDeleteSource` is offered by a FileSystem/FileContext class if +it supports the API. + +```java +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDeleteSource { + + /** + * Create a bulk delete operation. + * There is no network IO at this point, simply the creation of + * a bulk delete object. + * A path must be supplied to assist in link resolution. + * @param path path to delete under. + * @return the bulk delete. + * @throws UnsupportedOperationException bulk delete under that path is not supported. + * @throws IllegalArgumentException path not valid. + * @throws IOException problems resolving paths + */ + default BulkDelete createBulkDelete(Path path) + throws UnsupportedOperationException, IllegalArgumentException, IOException; + +} + +``` + +### Interface `org.apache.hadoop.fs.BulkDelete` + +This is the bulk delete implementation returned by the `createBulkDelete()` call. + +```java +/** + * API for bulk deletion of objects/files, + * <i>but not directories</i>. + * After use, call {@code close()} to release any resources and + * to guarantee store IOStatistics are updated. + * <p> + * Callers MUST have no expectation that parent directories will exist after the + * operation completes; if an object store needs to explicitly look for and create + * directory markers, that step will be omitted. + * <p> + * Be aware that on some stores (AWS S3) each object listed in a bulk delete counts + * against the write IOPS limit; large page sizes are counterproductive here, as + * are attempts at parallel submissions across multiple threads. + * @see <a href="https://issues.apache.org/jira/browse/HADOOP-16823">HADOOP-16823. + * Large DeleteObject requests are their own Thundering Herd</a> + * <p> + */ +@InterfaceAudience.Public +@InterfaceStability.Unstable +public interface BulkDelete extends IOStatisticsSource, Closeable { + + /** + * The maximum number of objects/files to delete in a single request. + * @return a number greater than or equal to zero. + */ + int pageSize(); + + /** + * Base path of a bulk delete operation. + * All paths submitted in {@link #bulkDelete(List)} must be under this path. + */ + Path basePath(); + + /** + * Delete a list of files/objects. + * <ul> + * <li>Files must be under the path provided in {@link #basePath()}.</li> + * <li>The size of the list must be equal to or less than the page size + * declared in {@link #pageSize()}.</li> + * <li>Directories are not supported; the outcome of attempting to delete + * directories is undefined (ignored; undetected, listed as failures...).</li> + * <li>The operation is not atomic.</li> + * <li>The operation is treated as idempotent: network failures may + * trigger resubmission of the request -any new objects created under a + * path in the list may then be deleted.</li> + * <li>There is no guarantee that any parent directories exist after this call. + * </li> + * </ul> + * @param paths list of paths which must be absolute and under the base path. + * provided in {@link #basePath()}. + * @throws IOException IO problems including networking, authentication and more. + * @throws IllegalArgumentException if a path argument is invalid. + */ + List<Map.Entry<Path, String>> bulkDelete(List<Path> paths) + throws IOException, IllegalArgumentException; + +} + +``` + +### `bulkDelete(paths)` + +#### Preconditions + +```python +if length(paths) > pageSize: throw IllegalArgumentException +``` + +#### Postconditions + +All paths which refer to files are removed from the set of files. +```python +FS'Files = FS.Files - [paths] +``` + +No other restrictions are placed upon the outcome. + + +### Availability + +The `BulkDeleteSource` interface is exported by `FileSystem` and `FileContext` storage clients +which MAY support the API; it may still be unsupported by the +specific instance. + +Use the `PathCapabilities` probe `fs.capability.bulk.delete`. + +```java +store.hasPathCapability(path, "fs.capability.bulk.delete") +``` + +### Invocation through Reflection. + +The need for many Libraries to compile against very old versions of Hadoop Review Comment: nit: use lower case libraries ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java: ########## @@ -1641,4 +1641,7 @@ private Constants() { */ public static final String AWS_S3_ACCESS_GRANTS_FALLBACK_TO_IAM_ENABLED = "fs.s3a.access.grants.fallback.to.iam"; + public static final int DEFAULT_S3A_IO_RATE_LIMIT = 0; Review Comment: we need javadocs here, and should consider a default of *something* ########## hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_upgrade.md: ########## @@ -324,6 +324,7 @@ They have also been updated to return V2 SDK classes. public interface S3AInternals { S3Client getAmazonS3V2Client(String reason); + S3AStore getStore(); Review Comment: needs a javadoc. as should the one above ########## hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java: ########## @@ -702,6 +704,54 @@ public void testPartialDeleteSingleDelete() throws Throwable { executePartialDelete(createAssumedRoleConfig(), true); } + @Test + public void testBulkDelete() throws Throwable { + describe("Bulk delete with part of the child tree read only"); + executeBulkDelete(createAssumedRoleConfig()); + } + + private void executeBulkDelete(Configuration assumedRoleConfig) throws Exception { + Path destDir = methodPath(); + Path readOnlyDir = new Path(destDir, "readonlyDir"); + + // the full FS + S3AFileSystem fs = getFileSystem(); + FileUtil.bulkDelete(fs, destDir, new ArrayList<>()); + + bindRolePolicyStatements(assumedRoleConfig, STATEMENT_ALLOW_KMS_RW, + statement(true, S3_ALL_BUCKETS, S3_ALL_OPERATIONS), + new Statement(Effects.Deny) + .addActions(S3_PATH_WRITE_OPERATIONS) + .addResources(directory(readOnlyDir)) + ); + roleFS = (S3AFileSystem) destDir.getFileSystem(assumedRoleConfig); + + int range = 10; + touchFiles(fs, readOnlyDir, range); + touchFiles(roleFS, destDir, range); + FileStatus[] fileStatuses = roleFS.listStatus(readOnlyDir); + List<Path> pathsToDelete = Arrays.stream(fileStatuses) + .map(FileStatus::getPath) + .collect(Collectors.toList()); + // bulk delete in the read only FS should fail. + BulkDelete bulkDelete = roleFS.createBulkDelete(readOnlyDir); + assertAccessDeniedForEachPath(bulkDelete.bulkDelete(pathsToDelete)); + BulkDelete bulkDelete2 = roleFS.createBulkDelete(destDir); + assertAccessDeniedForEachPath(bulkDelete2.bulkDelete(pathsToDelete)); + // delete the files in the original FS should succeed. + BulkDelete bulkDelete3 = fs.createBulkDelete(readOnlyDir); + assertSuccessfulBulkDelete(bulkDelete3.bulkDelete(pathsToDelete)); + BulkDelete bulkDelete4 = fs.createBulkDelete(destDir); + assertSuccessfulBulkDelete(bulkDelete4.bulkDelete(pathsToDelete)); + // we can write a test for some successful and some failure as well. Review Comment: yes you can... ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContextFactory.java: ########## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.s3a.impl; + +import org.apache.hadoop.classification.InterfaceAudience; + +@InterfaceAudience.Private +public interface StoreContextFactory { Review Comment: javadocs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org