[ https://issues.apache.org/jira/browse/HADOOP-18012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704611#comment-17704611 ]
ASF GitHub Bot commented on HADOOP-18012: ----------------------------------------- steveloughran commented on code in PR #5488: URL: https://github.com/apache/hadoop/pull/5488#discussion_r1147588525 ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ########## @@ -55,6 +55,10 @@ import java.util.concurrent.TimeUnit; import org.apache.hadoop.classification.VisibleForTesting; +import org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants; Review Comment: move these back to where they were. You could also move the org.apache entry on L57 down too. It's one of those "guava search and replace" imports, which is why it is in the wrong block and confusing your IDE. ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ########## @@ -556,20 +588,23 @@ public AbfsClientRenameResult renamePath( // isMetadataIncompleteState is used for renameRecovery(as the 2nd param). return new AbfsClientRenameResult(op, isMetadataIncompleteState, isMetadataIncompleteState); } catch (AzureBlobFileSystemException e) { - // If we have no HTTP response, throw the original exception. - if (!op.hasResult()) { - throw e; - } - - // ref: HADOOP-18242. Rename failure occurring due to a rare case of - // tracking metadata being in incomplete state. - if (op.getResult().getStorageErrorCode() - .equals(RENAME_DESTINATION_PARENT_PATH_NOT_FOUND.getErrorCode()) - && !isMetadataIncompleteState) { - //Logging - ABFS_METADATA_INCOMPLETE_RENAME_FAILURE - .info("Rename Failure attempting to resolve tracking metadata state and retrying."); + // If we have no HTTP response, throw the original exception. + if (!op.hasResult()) { + throw e; + } + // ref: HADOOP-18242. Rename failure occurring due to a rare case of + // tracking metadata being in incomplete state. + if (op.getResult().getStorageErrorCode() + .equals(RENAME_DESTINATION_PARENT_PATH_NOT_FOUND.getErrorCode()) + && !isMetadataIncompleteState) { + //Logging + ABFS_METADATA_INCOMPLETE_RENAME_FAILURE + .info("Rename Failure attempting to resolve tracking metadata state and retrying."); + // rename recovery should be attempted in this case also + shouldAttemptRecovery = true; + String sourceEtagAfterFailure = sourceEtag; + if (isEmpty(sourceEtagAfterFailure)) { Review Comment: but the whole purpose of this clause is to trigger the operation which forces the metadata to be consistent, isn't it? so it would need to be called even if we had the source etag, just in the second call of rename(), the initial etag should be used if not empty ########## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemDelegationSAS.java: ########## @@ -26,6 +26,7 @@ import java.util.List; import java.util.UUID; +import org.apache.hadoop.fs.azurebfs.utils.TracingContext; Review Comment: cut if unused; if it is used then move into the next block ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ########## @@ -69,10 +73,6 @@ import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; -import org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants; Review Comment: and reinstate these. ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java: ########## @@ -212,6 +215,8 @@ public void initialize(URI uri, Configuration configuration) } } } + isNamespaceEnabled = abfsStore.getIsNamespaceEnabled(tracingContext); Review Comment: so now every filesystem.initialize() is doing a network call, rather than waiting until the first operation. this can have some adverse consequences, hence HADOOP-17313 ... all spark worker threads calling FileSystem.get() can try to create their own and create congestion around any locks. the current on-demand checking avoided this provided your auth mechanism wasn't trying to do anything at this point. For that reason, I'm unsure about this change. Lazy eval was potentially better. ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ########## @@ -556,20 +588,23 @@ public AbfsClientRenameResult renamePath( // isMetadataIncompleteState is used for renameRecovery(as the 2nd param). return new AbfsClientRenameResult(op, isMetadataIncompleteState, isMetadataIncompleteState); } catch (AzureBlobFileSystemException e) { - // If we have no HTTP response, throw the original exception. - if (!op.hasResult()) { - throw e; - } - - // ref: HADOOP-18242. Rename failure occurring due to a rare case of - // tracking metadata being in incomplete state. - if (op.getResult().getStorageErrorCode() - .equals(RENAME_DESTINATION_PARENT_PATH_NOT_FOUND.getErrorCode()) - && !isMetadataIncompleteState) { - //Logging - ABFS_METADATA_INCOMPLETE_RENAME_FAILURE - .info("Rename Failure attempting to resolve tracking metadata state and retrying."); + // If we have no HTTP response, throw the original exception. + if (!op.hasResult()) { + throw e; + } + // ref: HADOOP-18242. Rename failure occurring due to a rare case of + // tracking metadata being in incomplete state. + if (op.getResult().getStorageErrorCode() + .equals(RENAME_DESTINATION_PARENT_PATH_NOT_FOUND.getErrorCode()) + && !isMetadataIncompleteState) { + //Logging + ABFS_METADATA_INCOMPLETE_RENAME_FAILURE + .info("Rename Failure attempting to resolve tracking metadata state and retrying."); + // rename recovery should be attempted in this case also + shouldAttemptRecovery = true; Review Comment: I think this should only be set if the result on L615 is not a directory ########## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemDelegationSAS.java: ########## @@ -399,6 +402,7 @@ public void testSignatureMask() throws Exception { final AzureBlobFileSystem fs = getFileSystem(); String src = String.format("/testABC/test%s.xt", UUID.randomUUID()); fs.create(new Path(src)).close(); + Review Comment: just revert this ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ########## @@ -578,26 +613,49 @@ public AbfsClientRenameResult renamePath( // Extract the sourceEtag, using the status Op, and set it // for future rename recovery. AbfsHttpOperation sourceStatusResult = sourceStatusOp.getResult(); - String sourceEtagAfterFailure = extractEtagHeader(sourceStatusResult); - renamePath(source, destination, continuation, tracingContext, - sourceEtagAfterFailure, isMetadataIncompleteState); - } - // if we get out of the condition without a successful rename, then - // it isn't metadata incomplete state issue. - isMetadataIncompleteState = false; - - boolean etagCheckSucceeded = renameIdempotencyCheckOp( - source, - sourceEtag, op, destination, tracingContext); - if (!etagCheckSucceeded) { - // idempotency did not return different result - // throw back the exception - throw e; + sourceEtagAfterFailure = extractEtagHeader(sourceStatusResult); Review Comment: what happens if source is a directory? or is it the case that this never happens. > ABFS: Enable config controlled ETag check for Rename idempotency > ---------------------------------------------------------------- > > Key: HADOOP-18012 > URL: https://issues.apache.org/jira/browse/HADOOP-18012 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Affects Versions: 3.3.2 > Reporter: Sneha Vijayarajan > Assignee: Sree Bhattacharyya > Priority: Major > Labels: pull-request-available > > ABFS driver has a handling for rename idempotency which relies on LMT of the > destination file to conclude if the rename was successful or not when source > file is absent and if the rename request had entered retry loop. > This handling is incorrect as LMT of the destination does not change on > rename. > This Jira will track the change to undo the current implementation and add a > new one where for an incoming rename operation, source file eTag is fetched > first and then rename is done only if eTag matches for the source file. > As this is going to be a costly operation given an extra HEAD request is > added to each rename, this implementation will be guarded over a config and > can enabled by customers who have workloads that do multiple renames. > Long term plan to handle rename idempotency without HEAD request is being > discussed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org