[ https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838974#comment-17838974 ]
ASF GitHub Bot commented on HADOOP-18679: ----------------------------------------- steveloughran commented on code in PR #6726: URL: https://github.com/apache/hadoop/pull/6726#discussion_r1572230146 ########## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ########## @@ -161,124 +105,20 @@ store.hasPathCapability(path, "fs.capability.bulk.delete") ### Invocation through Reflection. -The need for many Libraries to compile against very old versions of Hadoop +The need for many libraries to compile against very old versions of Hadoop means that most of the cloud-first Filesystem API calls cannot be used except through reflection -And the more complicated The API and its data types are, The harder that reflection is to implement. -To assist this, the class `org.apache.hadoop.fs.FileUtil` has two methods +To assist this, the class `org.apache.hadoop.io.wrappedio.WrappedIO` has few methods which are intended to provide simple access to the API, especially through reflection. ```java - /** - * Get the maximum number of objects/files to delete in a single request. - * @param fs filesystem - * @param path path to delete under. - * @return a number greater than or equal to zero. - * @throws UnsupportedOperationException bulk delete under that path is not supported. - * @throws IllegalArgumentException path not valid. - * @throws IOException problems resolving paths - */ + public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; - /** - * Delete a list of files/objects. - * <ul> - * <li>Files must be under the path provided in {@code base}.</li> - * <li>The size of the list must be equal to or less than the page size.</li> - * <li>Directories are not supported; the outcome of attempting to delete - * directories is undefined (ignored; undetected, listed as failures...).</li> - * <li>The operation is not atomic.</li> - * <li>The operation is treated as idempotent: network failures may - * trigger resubmission of the request -any new objects created under a - * path in the list may then be deleted.</li> - * <li>There is no guarantee that any parent directories exist after this call. - * </li> - * </ul> - * @param fs filesystem - * @param base path to delete under. - * @param paths list of paths which must be absolute and under the base path. - * @return a list of all the paths which couldn't be deleted for a reason other than "not found" and any associated error message. - * @throws UnsupportedOperationException bulk delete under that path is not supported. - * @throws IOException IO problems including networking, authentication and more. - * @throws IllegalArgumentException if a path argument is invalid. - */ - public static List<Map.Entry<Path, String>> bulkDelete(FileSystem fs, Path base, List<Path> paths) -``` + public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; -## S3A Implementation - -The S3A client exports this API. Review Comment: this needs to be covered, along with the default implementation "maps to delete(path, false)" ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java: ########## @@ -0,0 +1,84 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.fs; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Objects.requireNonNull; +import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths; +import static org.apache.hadoop.util.Preconditions.checkArgument; + +/** + * Default implementation of the {@link BulkDelete} interface. + */ +public class DefaultBulkDeleteOperation implements BulkDelete { + + private final int pageSize; + + private final Path basePath; + + private final FileSystem fs; + + public DefaultBulkDeleteOperation(int pageSize, + Path basePath, + FileSystem fs) { + checkArgument(pageSize == 1, "Page size must be equal to 1"); + this.pageSize = pageSize; + this.basePath = requireNonNull(basePath); + this.fs = fs; + } + + @Override + public int pageSize() { + return pageSize; + } + + @Override + public Path basePath() { + return basePath; + } + + @Override + public List<Map.Entry<Path, String>> bulkDelete(Collection<Path> paths) + throws IOException, IllegalArgumentException { + validateBulkDeletePaths(paths, pageSize, basePath); + List<Map.Entry<Path, String>> result = new ArrayList<>(); + // this for loop doesn't make sense as pageSize must be 1. + for (Path path : paths) { Review Comment: there's only even going to be 1 entry here ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java: ########## @@ -0,0 +1,84 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.fs; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; + +import org.apache.hadoop.util.functional.Tuples; + +import static java.util.Objects.requireNonNull; +import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths; +import static org.apache.hadoop.util.Preconditions.checkArgument; + +/** + * Default implementation of the {@link BulkDelete} interface. + */ +public class DefaultBulkDeleteOperation implements BulkDelete { + + private final int pageSize; + + private final Path basePath; + + private final FileSystem fs; + + public DefaultBulkDeleteOperation(int pageSize, + Path basePath, + FileSystem fs) { + checkArgument(pageSize == 1, "Page size must be equal to 1"); + this.pageSize = pageSize; + this.basePath = requireNonNull(basePath); + this.fs = fs; + } + + @Override + public int pageSize() { + return pageSize; + } + + @Override + public Path basePath() { + return basePath; + } + + @Override + public List<Map.Entry<Path, String>> bulkDelete(Collection<Path> paths) + throws IOException, IllegalArgumentException { + validateBulkDeletePaths(paths, pageSize, basePath); + List<Map.Entry<Path, String>> result = new ArrayList<>(); + // this for loop doesn't make sense as pageSize must be 1. + for (Path path : paths) { + try { + fs.delete(path, false); + // What to do if this return false? + // I think we should add the path to the result list with value "Not Deleted". Review Comment: good q. I'd say yes. or actually do a getFileStatus() cal and see what is there for * file doesn't exist (not an error) * path is a directory: add to result key is that file not found isn't escalated ########## hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java: ########## @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.contract; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import org.assertj.core.api.Assertions; +import org.junit.Before; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.hadoop.fs.CommonPathCapabilities; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.wrappedio.WrappedIO; + +import static org.apache.hadoop.fs.contract.ContractTestUtils.skip; +import static org.apache.hadoop.fs.contract.ContractTestUtils.touch; +import static org.apache.hadoop.io.wrappedio.WrappedIO.bulkDelete; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; + +/** + * Contract tests for bulk delete operation. + */ +public abstract class AbstractContractBulkDeleteTest extends AbstractFSContractTestBase { + + private static final Logger LOG = + LoggerFactory.getLogger(AbstractContractBulkDeleteTest.class); + + protected int pageSize; + + protected Path basePath; + + protected FileSystem fs; + + @Before + public void setUp() throws Exception { + fs = getFileSystem(); + basePath = path(getClass().getName()); + pageSize = WrappedIO.bulkDeletePageSize(getFileSystem(), basePath); + fs.mkdirs(basePath); + } + + public Path getBasePath() { + return basePath; + } + + /** + * Validate the page size for bulk delete operation. Different stores can have different + * implementations for bulk delete operation thus different page size. + */ + @Test + public void validatePageSize() throws Exception { + Assertions.assertThat(pageSize) + .describedAs("Page size should be 1 by default for all stores") + .isEqualTo(1); + } + + @Test + public void testPathsSizeEqualsPageSizePrecondition() throws Exception { + List<Path> listOfPaths = createListOfPaths(pageSize, basePath); + // Bulk delete call should pass with no exception. + bulkDelete(getFileSystem(), basePath, listOfPaths); + } + + @Test + public void testPathsSizeGreaterThanPageSizePrecondition() throws Exception { + List<Path> listOfPaths = createListOfPaths(pageSize + 1, basePath); + intercept(IllegalArgumentException.class, + () -> bulkDelete(getFileSystem(), basePath, listOfPaths)); + } + + @Test + public void testPathsSizeLessThanPageSizePrecondition() throws Exception { + List<Path> listOfPaths = createListOfPaths(pageSize - 1, basePath); + // Bulk delete call should pass with no exception. + bulkDelete(getFileSystem(), basePath, listOfPaths); + } + + @Test + public void testBulkDeleteSuccessful() throws Exception { + List<Path> listOfPaths = createListOfPaths(pageSize, basePath); + for (Path path : listOfPaths) { + touch(fs, path); + } + FileStatus[] fileStatuses = fs.listStatus(basePath); + Assertions.assertThat(fileStatuses) + .describedAs("File count after create") + .hasSize(pageSize); + assertSuccessfulBulkDelete( + bulkDelete(getFileSystem(), basePath, listOfPaths)); + FileStatus[] fileStatusesAfterDelete = fs.listStatus(basePath); + Assertions.assertThat(fileStatusesAfterDelete) + .describedAs("File statuses should be empty after delete") + .isEmpty(); + } + + @Test + public void validatePathCapabilityDeclared() throws Exception { + Assertions.assertThat(fs.hasPathCapability(basePath, CommonPathCapabilities.BULK_DELETE)) + .describedAs("Path capability BULK_DELETE should be declared") + .isTrue(); + } + + @Test + public void testDeletePathsNotUnderBase() throws Exception { + List<Path> paths = new ArrayList<>(); + Path pathNotUnderBase = path("not-under-base"); + paths.add(pathNotUnderBase); + // Should fail as path is not under the base path. + intercept(IllegalArgumentException.class, + () -> bulkDelete(getFileSystem(), basePath, paths)); + } + + @Test + public void testDeletePathsNotAbsolute() throws Exception { + List<Path> paths = new ArrayList<>(); + Path pathNotAbsolute = new Path("not-absolute"); + paths.add(pathNotAbsolute); + // Should fail as path is not absolute. + intercept(IllegalArgumentException.class, + () -> bulkDelete(getFileSystem(), basePath, paths)); + } + + @Test + public void testDeletePathsNotExists() throws Exception { + List<Path> paths = new ArrayList<>(); + Path pathNotExists = new Path(basePath, "not-exists"); + paths.add(pathNotExists); + // bulk delete call doesn't verify if a path exist or not before deleting. + assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath, paths)); + } + + @Test + public void testDeletePathsDirectory() throws Exception { + List<Path> paths = new ArrayList<>(); + Path dirPath = new Path(basePath, "dir"); + fs.mkdirs(dirPath); + paths.add(dirPath); + Path filePath = new Path(dirPath, "file"); + touch(fs, filePath); + paths.add(filePath); + pageSizePreconditionForTest(paths.size()); + // Outcome is undefined. But call shouldn't fail. In case of S3 directories will still be present. + assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath, paths)); + } + + @Test + public void testDeleteEmptyDirectory() throws Exception { + List<Path> paths = new ArrayList<>(); + Path emptyDirPath = new Path(basePath, "empty-dir"); + fs.mkdirs(emptyDirPath); + paths.add(emptyDirPath); + // Should pass as empty directory. + assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath, paths)); + } + + @Test + public void testDeleteEmptyList() throws Exception { + List<Path> paths = new ArrayList<>(); + // Empty list should pass. + assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath, paths)); + } + + @Test + public void testDeleteSamePathsMoreThanOnce() throws Exception { + List<Path> paths = new ArrayList<>(); + Path path = new Path(basePath, "file"); + touch(fs, path); + paths.add(path); + paths.add(path); + Path another = new Path(basePath, "another-file"); + touch(fs, another); + paths.add(another); + pageSizePreconditionForTest(paths.size()); Review Comment: build the paths and this precondition before the touch; saves any network IO before skipping ########## hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md: ########## @@ -0,0 +1,124 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# <a name="BulkDelete"></a> interface `BulkDelete` + +<! > Add API for bulk/paged object deletion > -------------------------------------- > > Key: HADOOP-18679 > URL: https://issues.apache.org/jira/browse/HADOOP-18679 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.3.5 > Reporter: Steve Loughran > Priority: Major > Labels: pull-request-available > > iceberg and hbase could benefit from being able to give a list of individual > files to delete -files which may be scattered round the bucket for better > read peformance. > Add some new optional interface for an object store which allows a caller to > submit a list of paths to files to delete, where > the expectation is > * if a path is a file: delete > * if a path is a dir, outcome undefined > For s3 that'd let us build these into DeleteRequest objects, and submit, > without any probes first. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org