[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-09-05 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r321248863
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/DeleteOperation.java
 ##
 @@ -207,7 +211,7 @@ public DeleteOperation(final StoreContext context,
 "page size out of range: %d", pageSize);
 this.pageSize = pageSize;
 metadataStore = context.getMetadataStore();
-executor = context.createThrottledExecutor(2);
+executor = context.createThrottledExecutor(1);
 
 Review comment:
   yeah, for now. It means that the delete and list can go in parallel, without 
having to deal with the complexity of multiple parallel deletes and failures 
within them. It's the error handling which scared me. And with the same pool of 
connections to Dynamo, you wouldn't automatically get a speed up. As the 
Javadoc says: do more experimentation here -but do it on EC2, so that the 
answers are valid.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-09-05 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r321247382
 
 

 ##
 File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextURIBase.java
 ##
 @@ -418,7 +419,7 @@ public void testDeleteDirectory() throws IOException {
 
   @Test
   public void testDeleteNonExistingDirectory() throws IOException {
-String testDirName = "testFile";
+String testDirName = "testDeleteNonExistingDirectory";
 
 Review comment:
junit test method's rule does this, and when you parameterize it you get 
the extended name including parameters, so you automatically get isolation. 
But, you'd better make sure all those params form valid paths, and not have, 
say : or / in them. I didn't do the fixup here as this was a emergency fixup, 
not rework.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-29 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r319117792
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/OperationCallbacks.java
 ##
 @@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.IOException;
+import java.io.InterruptedIOException;
+import java.util.List;
+
+import com.amazonaws.AmazonClientException;
+import com.amazonaws.services.s3.model.DeleteObjectsRequest;
+import com.amazonaws.services.s3.model.DeleteObjectsResult;
+import com.amazonaws.services.s3.model.MultiObjectDeleteException;
+import com.amazonaws.services.s3.transfer.model.CopyResult;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.InvalidRequestException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AFileStatus;
+import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
+
+/**
+ * These are all the callbacks which the {@link RenameOperation}
+ * and {@link DeleteOperation } operations need,
+ * derived from the appropriate S3AFileSystem methods.
+ */
+public interface OperationCallbacks {
+
+  /**
+   * Create the attributes of an object for subsequent use.
+   * @param path path path of the request.
+   * @param eTag the eTag of the S3 object
+   * @param versionId S3 object version ID
+   * @param len length of the file
+   * @return attributes to use when building the query.
+   */
+  S3ObjectAttributes createObjectAttributes(
+  Path path,
+  String eTag,
+  String versionId,
+  long len);
+
+  /**
+   * Create the attributes of an object for subsequent use.
+   * @param fileStatus file status to build from.
+   * @return attributes to use when building the query.
+   */
+  S3ObjectAttributes createObjectAttributes(
+  S3AFileStatus fileStatus);
+
+  /**
+   * Create the read context for reading from the referenced file,
+   * using FS state as well as the status.
+   * @param fileStatus file status.
+   * @return a context for read and select operations.
+   */
+  S3AReadOpContext createReadContext(
+  FileStatus fileStatus);
+
+  /**
+   * The rename has finished; perform any store cleanup operations
+   * such as creating/deleting directory markers.
+   * @param sourceRenamed renamed source
+   * @param destCreated destination file created.
+   * @throws IOException failure
+   */
+  void finishRename(Path sourceRenamed, Path destCreated) throws IOException;
+
+  /**
+   * Delete an object, also updating the metastore.
+   * This call does not create any mock parent entries.
+   * Retry policy: retry untranslated; delete considered idempotent.
+   * @param path path to delete
+   * @param key key of entry
+   * @param isFile is the path a file (used for instrumentation only)
+   * @throws AmazonClientException problems working with S3
+   * @throws IOException IO failure in the metastore
+   */
+  @Retries.RetryTranslated
+  void deleteObjectAtPath(Path path, String key, boolean isFile)
+  throws IOException;
+
+  /**
+   * Recursive list of files and empty directories.
+   *
+   * @param path path to list from
+   * @param status optional status of path to list.
+   * @param collectTombstones should tombstones be collected from S3Guard?
+   * @param includeSelf should the listing include this path if present?
+   * @return an iterator.
+   * @throws IOException failure
+   */
+  @Retries.RetryTranslated
+  RemoteIterator listFilesAndEmptyDirectories(
+  Path path,
+  S3AFileStatus status,
+  boolean collectTombstones,
+  boolean includeSelf) throws IOException;
+
+  /**
+   * Copy a single object in the bucket via a COPY operation.
+   * There's no update of metadata, directory markers, etc.
+   * Callers must implement.
+   * @param srcKey source obje

[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-29 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r319117008
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/DeleteOperation.java
 ##
 @@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.amazonaws.services.s3.model.DeleteObjectsRequest;
+import com.amazonaws.services.s3.model.DeleteObjectsResult;
+import com.google.common.base.Preconditions;
+import com.google.common.util.concurrent.ListeningExecutorService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AFileStatus;
+import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
+import org.apache.hadoop.fs.s3a.Tristate;
+import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
+import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
+import org.apache.hadoop.fs.s3a.s3guard.S3Guard;
+import org.apache.hadoop.util.DurationInfo;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.submit;
+import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.waitForCompletion;
+
+/**
+ * Implementation of the delete operation.
+ * For an authoritative S3Guarded store, after the list and delete of the
+ * combined store, we repeat against raw S3.
+ * This will correct for any situation where the authoritative listing is
+ * incomplete.
+ */
+public class DeleteOperation extends AbstractStoreOperation {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  DeleteOperation.class);
+
+  /**
+   * This is a switch to turn on when trying to debug
+   * deletion problems; it requests the results of
+   * the delete call from AWS then audits them.
+   */
+  private static final boolean AUDIT_DELETED_KEYS = true;
+
+  /**
+   * Used to stop any re-entrancy of the rename.
+   * This is an execute-once operation.
+   */
+  private final AtomicBoolean executed = new AtomicBoolean(false);
+
+  private final S3AFileStatus status;
+
+  private final boolean recursive;
+
+  private final OperationCallbacks callbacks;
+
+  private final int pageSize;
+
+  private final MetadataStore metadataStore;
+
+  private final ListeningExecutorService executor;
+
+  private List keys;
+
+  private List paths;
+
+  private CompletableFuture deleteFuture;
+
+  private long filesDeleted;
+  private long extraFilesDeleted;
+
+  /**
+   * Constructor.
+   * @param context store context
+   * @param status  pre-fetched source status
+   * @param recursive recursive delete?
+   * @param callbacks callback provider
+   * @param pageSize number of entries in a page
+   */
+  public DeleteOperation(final StoreContext context,
+  final S3AFileStatus status,
+  final boolean recursive,
+  final OperationCallbacks callbacks, int pageSize) {
+
+super(context);
+this.status = status;
+this.recursive = recursive;
+this.callbacks = callbacks;
+checkArgument(pageSize > 0
+&& pageSize <=InternalConstants.MAX_ENTRIES_TO_DELETE,
+"page size out of range: %d", pageSize);
+this.pageSize = pageSize;
+metadataStore = context.getMetadataStore();
+executor = context.createThrottledExecutor(2);
+  }
+
+  public long getFilesDeleted() {
+return filesDeleted;
+  }
+
+  public long getExtraFilesDeleted() {
+return extraFilesDeleted;
+  }
+
+  /**
+   * Delete a file or directory tree.
+   * This call does not create any fake parent directory; that is
+   * left to the caller.
+   * The actual delete call is done in a separate thread.
+   * Only one delete at a time is submitted, however, to redu

[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-29 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r319115905
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
 ##
 @@ -3770,19 +3731,54 @@ public LocatedFileStatus next() throws IOException {
   @Retries.RetryTranslated
   public RemoteIterator listFilesAndEmptyDirectories(
   Path f, boolean recursive) throws IOException {
-return invoker.retry("list", f.toString(), true,
-() -> innerListFiles(f, recursive, new Listing.AcceptAllButS3nDirs()));
+return innerListFiles(f, recursive, Listing.ACCEPT_ALL_BUT_S3N, null, 
true);
   }
 
-  @Retries.OnceTranslated
-  private RemoteIterator innerListFiles(Path f, boolean
-  recursive, Listing.FileStatusAcceptor acceptor) throws IOException {
+  /**
+   * List files under the path.
+   * 
+   *   
+   * If the path is authoritative, only S3Guard will be queried.
 
 Review comment:
   clarified this is client-side


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-29 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r319074080
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/DeleteOperation.java
 ##
 @@ -0,0 +1,452 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.amazonaws.services.s3.model.DeleteObjectsRequest;
+import com.amazonaws.services.s3.model.DeleteObjectsResult;
+import com.google.common.base.Preconditions;
+import com.google.common.util.concurrent.ListeningExecutorService;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AFileStatus;
+import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
+import org.apache.hadoop.fs.s3a.Tristate;
+import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
+import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
+import org.apache.hadoop.fs.s3a.s3guard.S3Guard;
+import org.apache.hadoop.util.DurationInfo;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.submit;
+import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.waitForCompletion;
+
+/**
+ * Implementation of the delete operation.
+ * For an authoritative S3Guarded store, after the list and delete of the
+ * combined store, we repeat against raw S3.
+ * This will correct for any situation where the authoritative listing is
+ * incomplete.
+ */
+public class DeleteOperation extends AbstractStoreOperation {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  DeleteOperation.class);
+
+  /**
+   * This is a switch to turn on when trying to debug
+   * deletion problems; it requests the results of
+   * the delete call from AWS then audits them.
+   */
+  private static final boolean AUDIT_DELETED_KEYS = true;
+
+  /**
+   * Used to stop any re-entrancy of the rename.
+   * This is an execute-once operation.
+   */
+  private final AtomicBoolean executed = new AtomicBoolean(false);
+
+  private final S3AFileStatus status;
+
+  private final boolean recursive;
+
+  private final OperationCallbacks callbacks;
+
+  private final int pageSize;
+
+  private final MetadataStore metadataStore;
+
+  private final ListeningExecutorService executor;
+
+  private List keys;
+
+  private List paths;
+
+  private CompletableFuture deleteFuture;
+
+  private long filesDeleted;
+  private long extraFilesDeleted;
+
+  /**
+   * Constructor.
+   * @param context store context
+   * @param status  pre-fetched source status
+   * @param recursive recursive delete?
+   * @param callbacks callback provider
+   * @param pageSize number of entries in a page
+   */
+  public DeleteOperation(final StoreContext context,
+  final S3AFileStatus status,
+  final boolean recursive,
+  final OperationCallbacks callbacks, int pageSize) {
+
+super(context);
+this.status = status;
+this.recursive = recursive;
+this.callbacks = callbacks;
+checkArgument(pageSize > 0
+&& pageSize <=InternalConstants.MAX_ENTRIES_TO_DELETE,
+"page size out of range: %d", pageSize);
+this.pageSize = pageSize;
+metadataStore = context.getMetadataStore();
+executor = context.createThrottledExecutor(2);
+  }
+
+  public long getFilesDeleted() {
+return filesDeleted;
+  }
+
+  public long getExtraFilesDeleted() {
+return extraFilesDeleted;
+  }
+
+  /**
+   * Delete a file or directory tree.
+   * This call does not create any fake parent directory; that is
+   * left to the caller.
+   * The actual delete call is done in a separate thread.
+   * Only one delete at a time is submitted, however, to redu

[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-29 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r319072002
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/OperationCallbacks.java
 ##
 @@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.IOException;
+import java.io.InterruptedIOException;
+import java.util.List;
+
+import com.amazonaws.AmazonClientException;
+import com.amazonaws.services.s3.model.DeleteObjectsRequest;
+import com.amazonaws.services.s3.model.DeleteObjectsResult;
+import com.amazonaws.services.s3.model.MultiObjectDeleteException;
+import com.amazonaws.services.s3.transfer.model.CopyResult;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.InvalidRequestException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AFileStatus;
+import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
+
+/**
+ * These are all the callbacks which the {@link RenameOperation}
+ * and {@link DeleteOperation } operations need,
+ * derived from the appropriate S3AFileSystem methods.
+ */
+public interface OperationCallbacks {
+
+  /**
+   * Create the attributes of an object for subsequent use.
+   * @param path path path of the request.
+   * @param eTag the eTag of the S3 object
+   * @param versionId S3 object version ID
+   * @param len length of the file
+   * @return attributes to use when building the query.
+   */
+  S3ObjectAttributes createObjectAttributes(
+  Path path,
+  String eTag,
+  String versionId,
+  long len);
+
+  /**
+   * Create the attributes of an object for subsequent use.
+   * @param fileStatus file status to build from.
+   * @return attributes to use when building the query.
+   */
+  S3ObjectAttributes createObjectAttributes(
+  S3AFileStatus fileStatus);
+
+  /**
+   * Create the read context for reading from the referenced file,
+   * using FS state as well as the status.
+   * @param fileStatus file status.
+   * @return a context for read and select operations.
+   */
+  S3AReadOpContext createReadContext(
+  FileStatus fileStatus);
+
+  /**
+   * The rename has finished; perform any store cleanup operations
+   * such as creating/deleting directory markers.
+   * @param sourceRenamed renamed source
+   * @param destCreated destination file created.
+   * @throws IOException failure
+   */
+  void finishRename(Path sourceRenamed, Path destCreated) throws IOException;
+
+  /**
+   * Delete an object, also updating the metastore.
+   * This call does not create any mock parent entries.
+   * Retry policy: retry untranslated; delete considered idempotent.
+   * @param path path to delete
+   * @param key key of entry
+   * @param isFile is the path a file (used for instrumentation only)
+   * @throws AmazonClientException problems working with S3
+   * @throws IOException IO failure in the metastore
+   */
+  @Retries.RetryTranslated
+  void deleteObjectAtPath(Path path, String key, boolean isFile)
+  throws IOException;
+
+  /**
+   * Recursive list of files and empty directories.
+   *
+   * @param path path to list from
+   * @param status optional status of path to list.
+   * @param collectTombstones should tombstones be collected from S3Guard?
+   * @param includeSelf should the listing include this path if present?
+   * @return an iterator.
+   * @throws IOException failure
+   */
+  @Retries.RetryTranslated
+  RemoteIterator listFilesAndEmptyDirectories(
+  Path path,
+  S3AFileStatus status,
+  boolean collectTombstones,
+  boolean includeSelf) throws IOException;
+
+  /**
+   * Copy a single object in the bucket via a COPY operation.
+   * There's no update of metadata, directory markers, etc.
+   * Callers must implement.
+   * @param srcKey source obje

[GitHub] [hadoop] steveloughran commented on a change in pull request #1359: HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions

2019-08-29 Thread GitBox
steveloughran commented on a change in pull request #1359: 
HADOOP-16430.S3AFilesystem.delete to incrementally update s3guard with deletions
URL: https://github.com/apache/hadoop/pull/1359#discussion_r319071023
 
 

 ##
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/OperationCallbacks.java
 ##
 @@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.IOException;
+import java.io.InterruptedIOException;
+import java.util.List;
+
+import com.amazonaws.AmazonClientException;
+import com.amazonaws.services.s3.model.DeleteObjectsRequest;
+import com.amazonaws.services.s3.model.DeleteObjectsResult;
+import com.amazonaws.services.s3.model.MultiObjectDeleteException;
+import com.amazonaws.services.s3.transfer.model.CopyResult;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.InvalidRequestException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3AFileStatus;
+import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
+import org.apache.hadoop.fs.s3a.S3AReadOpContext;
+import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
+import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
+
+/**
+ * These are all the callbacks which the {@link RenameOperation}
+ * and {@link DeleteOperation } operations need,
+ * derived from the appropriate S3AFileSystem methods.
+ */
+public interface OperationCallbacks {
+
+  /**
+   * Create the attributes of an object for subsequent use.
+   * @param path path path of the request.
+   * @param eTag the eTag of the S3 object
+   * @param versionId S3 object version ID
+   * @param len length of the file
+   * @return attributes to use when building the query.
+   */
+  S3ObjectAttributes createObjectAttributes(
+  Path path,
+  String eTag,
+  String versionId,
+  long len);
+
+  /**
+   * Create the attributes of an object for subsequent use.
+   * @param fileStatus file status to build from.
+   * @return attributes to use when building the query.
+   */
+  S3ObjectAttributes createObjectAttributes(
+  S3AFileStatus fileStatus);
+
+  /**
+   * Create the read context for reading from the referenced file,
+   * using FS state as well as the status.
+   * @param fileStatus file status.
+   * @return a context for read and select operations.
+   */
+  S3AReadOpContext createReadContext(
+  FileStatus fileStatus);
+
+  /**
+   * The rename has finished; perform any store cleanup operations
+   * such as creating/deleting directory markers.
+   * @param sourceRenamed renamed source
+   * @param destCreated destination file created.
+   * @throws IOException failure
+   */
+  void finishRename(Path sourceRenamed, Path destCreated) throws IOException;
+
+  /**
+   * Delete an object, also updating the metastore.
+   * This call does not create any mock parent entries.
+   * Retry policy: retry untranslated; delete considered idempotent.
+   * @param path path to delete
+   * @param key key of entry
+   * @param isFile is the path a file (used for instrumentation only)
+   * @throws AmazonClientException problems working with S3
+   * @throws IOException IO failure in the metastore
+   */
+  @Retries.RetryTranslated
+  void deleteObjectAtPath(Path path, String key, boolean isFile)
+  throws IOException;
+
+  /**
+   * Recursive list of files and empty directories.
+   *
+   * @param path path to list from
+   * @param status optional status of path to list.
+   * @param collectTombstones should tombstones be collected from S3Guard?
+   * @param includeSelf should the listing include this path if present?
+   * @return an iterator.
+   * @throws IOException failure
+   */
+  @Retries.RetryTranslated
+  RemoteIterator listFilesAndEmptyDirectories(
+  Path path,
+  S3AFileStatus status,
+  boolean collectTombstones,
+  boolean includeSelf) throws IOException;
+
+  /**
+   * Copy a single object in the bucket via a COPY operation.
+   * There's no update of metadata, directory markers, etc.
+   * Callers must implement.
+   * @param srcKey source obje