[ 
https://issues.apache.org/jira/browse/HADOOP-18012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704552#comment-17704552
 ] 

ASF GitHub Bot commented on HADOOP-18012:
-----------------------------------------

saxenapranav commented on code in PR #5488:
URL: https://github.com/apache/hadoop/pull/5488#discussion_r1147353317


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##########
@@ -55,6 +55,10 @@
 import java.util.concurrent.TimeUnit;
 
 import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants;
+import org.apache.hadoop.fs.azurebfs.constants.FileSystemUriSchemes;
+import org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations;
+import org.apache.hadoop.fs.azurebfs.constants.HttpHeaderConfigurations;

Review Comment:
   nit: let imports be at same place.



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -627,28 +683,46 @@ public boolean renameIdempotencyCheckOp(
       TracingContext tracingContext) {
     Preconditions.checkArgument(op.hasResult(), "Operations has null HTTP 
response");
 
-    if ((op.isARetriedRequest())
-        && (op.getResult().getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND)
-        && isNotEmpty(sourceEtag)) {
-
-      // Server has returned HTTP 404, which means rename source no longer
-      // exists. Check on destination status and if its etag matches
-      // that of the source, consider it to be a success.
-      LOG.debug("rename {} to {} failed, checking etag of destination",
-          source, destination);
+    // removing isDir from debug logs as it can be misleading
+    LOG.debug("rename({}, {}) failure {}; retry={} etag {}",
+              source, destination, op.getResult().getStatusCode(), 
op.isARetriedRequest(), sourceEtag);
+    if (!(op.isARetriedRequest()
+            && (op.getResult().getStatusCode() == 
HttpURLConnection.HTTP_NOT_FOUND))) {
+      // only attempt recovery if the failure was a 404 on a retried rename 
request.
+      return false;
+    }
 
+    if (isNotEmpty(sourceEtag)) {
+      // Server has returned HTTP 404, we have an etag, so see
+      // if the rename has actually taken place,
+      LOG.info("rename {} to {} failed, checking etag of destination",
+              source, destination);
       try {
-        final AbfsRestOperation destStatusOp = getPathStatus(destination,
-            false, tracingContext);
+        final AbfsRestOperation destStatusOp = getPathStatus(destination, 
false, tracingContext);
         final AbfsHttpOperation result = destStatusOp.getResult();
 
-        return result.getStatusCode() == HttpURLConnection.HTTP_OK
-            && sourceEtag.equals(extractEtagHeader(result));
-      } catch (AzureBlobFileSystemException ignored) {
+        final boolean recovered = result.getStatusCode() == 
HttpURLConnection.HTTP_OK
+                && sourceEtag.equals(extractEtagHeader(result));
+        LOG.info("File rename has taken place: recovery {}",
+                recovered ? "succeeded" : "failed");
+        return recovered;
+
+      } catch (AzureBlobFileSystemException ex) {
         // GetFileStatus on the destination failed, the rename did not take 
place
+        // or some other failure. log and swallow.
+        LOG.debug("Failed to get status of path {}", destination, ex);
       }
+    } else {
+      LOG.debug("No source etag; unable to probe for the operation's success");
     }
-    return false;
+      return false;
+  }
+
+

Review Comment:
   extra lines :).



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -578,26 +611,49 @@ public AbfsClientRenameResult renamePath(
           // Extract the sourceEtag, using the status Op, and set it
           // for future rename recovery.
           AbfsHttpOperation sourceStatusResult = sourceStatusOp.getResult();
-          String sourceEtagAfterFailure = 
extractEtagHeader(sourceStatusResult);
-          renamePath(source, destination, continuation, tracingContext,
-              sourceEtagAfterFailure, isMetadataIncompleteState);
-        }
-        // if we get out of the condition without a successful rename, then
-        // it isn't metadata incomplete state issue.
-        isMetadataIncompleteState = false;
-
-        boolean etagCheckSucceeded = renameIdempotencyCheckOp(
-            source,
-            sourceEtag, op, destination, tracingContext);
-        if (!etagCheckSucceeded) {
-          // idempotency did not return different result
-          // throw back the exception
-          throw e;
+          sourceEtagAfterFailure = extractEtagHeader(sourceStatusResult);
         }
+        renamePath(source, destination, continuation, tracingContext,
+                sourceEtagAfterFailure, isMetadataIncompleteState);
+      }
+      // if we get out of the condition without a successful rename, then
+      // it isn't metadata incomplete state issue.
+      isMetadataIncompleteState = false;
+
+      // setting default rename recovery success to false
+      boolean etagCheckSucceeded = false;
+      if (shouldAttemptRecovery) {

Review Comment:
   this depends on `renameResilience` well. This is a new field. For the case 
of ` if (op.getResult().getStorageErrorCode()
                 
.equals(RENAME_DESTINATION_PARENT_PATH_NOT_FOUND.getErrorCode())
                 && !isMetadataIncompleteState`, i believe we still want to 
call `renameIdempotencyCheckOp`. lets make it true when it get into the 
mentioned condition.





> ABFS: Enable config controlled ETag check for Rename idempotency
> ----------------------------------------------------------------
>
>                 Key: HADOOP-18012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18012
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.2
>            Reporter: Sneha Vijayarajan
>            Assignee: Sree Bhattacharyya
>            Priority: Major
>              Labels: pull-request-available
>
> ABFS driver has a handling for rename idempotency which relies on LMT of the 
> destination file to conclude if the rename was successful or not when source 
> file is absent and if the rename request had entered retry loop.
> This handling is incorrect as LMT of the destination does not change on 
> rename. 
> This Jira will track the change to undo the current implementation and add a 
> new one where for an incoming rename operation, source file eTag is fetched 
> first and then rename is done only if eTag matches for the source file.
> As this is going to be a costly operation given an extra HEAD request is 
> added to each rename, this implementation will be guarded over a config and 
> can enabled by customers who have workloads that do multiple renames. 
> Long term plan to handle rename idempotency without HEAD request is being 
> discussed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to