[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722087#comment-17722087 ] ASF GitHub Bot commented on HADOOP-18657: - steveloughran commented on PR #5462: URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1545445339 reviewing this; too many other things have got in my way. I agree, with create overwrite=false, we must fail with a concurrency error what we don't want to do is overreact if we are doing overwrite=true and something does happen partway. I'll look at this in more detail, maybe focus purely on being meaningful on errors, in particular making sure that if the file is deleted before the error is raised, keep raising that concurrency error. > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > Based on experience trying to debug this happening > # add debug statements when create() fails > # generated exception text to reference string shared with tests, path and > error code > # generated exception to include inner exception for full stack trace > Currently the retry logic is > # create(overwrite=false) > # if HTTP_CONFLICT/409 raised; call HEAD > # use etag in create(path, overwrite=true, etag) > # special handling of error HTTP_PRECON_FAILED = 412 > There's a race condition here, which is if between 1 and 2 the file which > exists is deleted. The retry should succeed, but currently a 404 from the > head is escalated to a failure > proposed changes > # if HEAD is 404, leave etag == null and continue > # special handling of 412 also to handle 409 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713439#comment-17713439 ] ASF GitHub Bot commented on HADOOP-18657: - snvijaya commented on code in PR #5462: URL: https://github.com/apache/hadoop/pull/5462#discussion_r1131060581 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ## @@ -621,37 +622,57 @@ private AbfsRestOperation conditionalCreateOverwriteFile(final String relativePa isAppendBlob, null, tracingContext); } catch (AbfsRestOperationException e) { + LOG.debug("Failed to create {}", relativePath, e); if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) { // File pre-exists, fetch eTag +LOG.debug("Fetching etag of {}", relativePath); try { op = client.getPathStatus(relativePath, false, tracingContext); } catch (AbfsRestOperationException ex) { + LOG.debug("Failed to to getPathStatus {}", relativePath, ex); if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) { Review Comment: Hi @steveloughran, Given Hadoop is single writer semantic, would it be correct to expect that as part of job parallelization only one worker process should try to create a file ? As this check for FileNotFound is post an attempt to create the file with overwrite=false, which inturn failed with conflict indicating file was just present, concurrent operation on the file is indeed confirmed. Its quite possible that if we let this create proceed, some other operation such as delete can kick in later on as well. Below code that throws exception at the first indication of parallel activity would be the right thing to do ? As the workload pattern is not honoring the single writer semantic I feel we should retain the logic to throw ConcurrentWriteOperationDetectedException. > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > Based on experience trying to debug this happening > # add debug statements when create() fails > # generated exception text to reference string shared with tests, path and > error code > # generated exception to include inner exception for full stack trace > Currently the retry logic is > # create(overwrite=false) > # if HTTP_CONFLICT/409 raised; call HEAD > # use etag in create(path, overwrite=true, etag) > # special handling of error HTTP_PRECON_FAILED = 412 > There's a race condition here, which is if between 1 and 2 the file which > exists is deleted. The retry should succeed, but currently a 404 from the > head is escalated to a failure > proposed changes > # if HEAD is 404, leave etag == null and continue > # special handling of 412 also to handle 409 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712435#comment-17712435 ] ASF GitHub Bot commented on HADOOP-18657: - steveloughran commented on PR #5462: URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1508743152 any updates on this? big issue is whether to retry on 409 or not? > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > Based on experience trying to debug this happening > # add debug statements when create() fails > # generated exception text to reference string shared with tests, path and > error code > # generated exception to include inner exception for full stack trace > Currently the retry logic is > # create(overwrite=false) > # if HTTP_CONFLICT/409 raised; call HEAD > # use etag in create(path, overwrite=true, etag) > # special handling of error HTTP_PRECON_FAILED = 412 > There's a race condition here, which is if between 1 and 2 the file which > exists is deleted. The retry should succeed, but currently a 404 from the > head is escalated to a failure > proposed changes > # if HEAD is 404, leave etag == null and continue > # special handling of 412 also to handle 409 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700884#comment-17700884 ] ASF GitHub Bot commented on HADOOP-18657: - hadoop-yetus commented on PR #5462: URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1470924983 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 50m 19s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 0m 35s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 1m 16s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 16s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 29s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 0m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 19s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 32s | | the patch passed | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 23s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 1m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 11s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 37s | | The patch does not generate ASF License warnings. | | | | 105m 15s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5462 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 24e1da3b49dc 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 1853a46ecbb41baf82035664e30cf03584b77a64 | | Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/2/testReport/ | | Max. process+thread count | 554 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700851#comment-17700851 ] ASF GitHub Bot commented on HADOOP-18657: - steveloughran commented on code in PR #5462: URL: https://github.com/apache/hadoop/pull/5462#discussion_r1137732758 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ## @@ -621,37 +622,57 @@ private AbfsRestOperation conditionalCreateOverwriteFile(final String relativePa isAppendBlob, null, tracingContext); } catch (AbfsRestOperationException e) { + LOG.debug("Failed to create {}", relativePath, e); if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) { // File pre-exists, fetch eTag +LOG.debug("Fetching etag of {}", relativePath); try { op = client.getPathStatus(relativePath, false, tracingContext); } catch (AbfsRestOperationException ex) { + LOG.debug("Failed to to getPathStatus {}", relativePath, ex); if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) { // Is a parallel access case, as file which was found to be // present went missing by this request. -throw new ConcurrentWriteOperationDetectedException( -"Parallel access to the create path detected. Failing request " -+ "to honor single writer semantics"); +// this means the other thread deleted it and the conflict Review Comment: will do; text will indicate this may be due to a lease on the parent dir too. > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > Based on experience trying to debug this happening > # add debug statements when create() fails > # generated exception text to reference string shared with tests, path and > error code > # generated exception to include inner exception for full stack trace > Currently the retry logic is > # create(overwrite=false) > # if HTTP_CONFLICT/409 raised; call HEAD > # use etag in create(path, overwrite=true, etag) > # special handling of error HTTP_PRECON_FAILED = 412 > There's a race condition here, which is if between 1 and 2 the file which > exists is deleted. The retry should succeed, but currently a 404 from the > head is escalated to a failure > proposed changes > # if HEAD is 404, leave etag == null and continue > # special handling of 412 also to handle 409 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698299#comment-17698299 ] ASF GitHub Bot commented on HADOOP-18657: - saxenapranav commented on code in PR #5462: URL: https://github.com/apache/hadoop/pull/5462#discussion_r1130757159 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ## @@ -621,37 +622,57 @@ private AbfsRestOperation conditionalCreateOverwriteFile(final String relativePa isAppendBlob, null, tracingContext); } catch (AbfsRestOperationException e) { + LOG.debug("Failed to create {}", relativePath, e); if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) { // File pre-exists, fetch eTag +LOG.debug("Fetching etag of {}", relativePath); try { op = client.getPathStatus(relativePath, false, tracingContext); } catch (AbfsRestOperationException ex) { + LOG.debug("Failed to to getPathStatus {}", relativePath, ex); if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) { // Is a parallel access case, as file which was found to be // present went missing by this request. -throw new ConcurrentWriteOperationDetectedException( -"Parallel access to the create path detected. Failing request " -+ "to honor single writer semantics"); +// this means the other thread deleted it and the conflict Review Comment: There is a race condition in the job, and developer should be informed about the same. @snvijaya @anmolanmol1234 @sreeb-msft , what you feel. > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > Based on experience trying to debug this happening > # add debug statements when create() fails > # generated exception text to reference string shared with tests, path and > error code > # generated exception to include inner exception for full stack trace > Currently the retry logic is > # create(overwrite=false) > # if HTTP_CONFLICT/409 raised; call HEAD > # use etag in create(path, overwrite=true, etag) > # special handling of error HTTP_PRECON_FAILED = 412 > There's a race condition here, which is if between 1 and 2 the file which > exists is deleted. The retry should succeed, but currently a 404 from the > head is escalated to a failure > proposed changes > # if HEAD is 404, leave etag == null and continue > # special handling of 412 also to handle 409 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698176#comment-17698176 ] ASF GitHub Bot commented on HADOOP-18657: - saxenapranav commented on code in PR #5462: URL: https://github.com/apache/hadoop/pull/5462#discussion_r1130438266 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ## @@ -621,37 +622,57 @@ private AbfsRestOperation conditionalCreateOverwriteFile(final String relativePa isAppendBlob, null, tracingContext); } catch (AbfsRestOperationException e) { + LOG.debug("Failed to create {}", relativePath, e); if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) { // File pre-exists, fetch eTag +LOG.debug("Fetching etag of {}", relativePath); try { op = client.getPathStatus(relativePath, false, tracingContext); } catch (AbfsRestOperationException ex) { + LOG.debug("Failed to to getPathStatus {}", relativePath, ex); if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) { // Is a parallel access case, as file which was found to be // present went missing by this request. -throw new ConcurrentWriteOperationDetectedException( -"Parallel access to the create path detected. Failing request " -+ "to honor single writer semantics"); +// this means the other thread deleted it and the conflict +// has implicitly been resolved. +LOG.debug("File at {} has been deleted; creation can continue", relativePath); } else { throw ex; } } -String eTag = op.getResult() -.getResponseHeader(HttpHeaderConfigurations.ETAG); +String eTag = op != null +? op.getResult().getResponseHeader(HttpHeaderConfigurations.ETAG) +: null; +LOG.debug("Attempting to create file {} with etag of {}", relativePath, eTag); try { - // overwrite only if eTag matches with the file properties fetched befpre - op = client.createPath(relativePath, true, true, permission, umask, + // overwrite only if eTag matches with the file properties fetched or the file + // was deleted and there is no etag. + // if the etag was not retrieved, overwrite is still false, so will fail + // if another process has just created the file + op = client.createPath(relativePath, true, eTag != null, permission, umask, isAppendBlob, eTag, tracingContext); } catch (AbfsRestOperationException ex) { - if (ex.getStatusCode() == HttpURLConnection.HTTP_PRECON_FAILED) { + final int sc = ex.getStatusCode(); + LOG.debug("Failed to create file {} with etag {}; status code={}", + relativePath, eTag, sc, ex); + if (sc == HttpURLConnection.HTTP_PRECON_FAILED + || sc == HttpURLConnection.HTTP_CONFLICT) { Review Comment: good that we have taken care of 409 which can come when due to `etag!=null` -> overwrite argument to `client.createPath` = false. would be awesome if we can put it in comments, and also have log according to it. log1: about some file is there whose eTag is with our process. When we went back to createPath with the same eTag, some other process had replaced that file which would lead to 412, which is present in the added code: ``` final ConcurrentWriteOperationDetectedException ex2 = new ConcurrentWriteOperationDetectedException( AbfsErrors.ERR_PARALLEL_ACCESS_DETECTED + " Path =\"" + relativePath+ "\"" + "; Status code =" + sc + "; etag = \"" + eTag + "\"" + "; error =" + ex.getErrorMessage()); ``` suggestion to add log2: where in when we searched for etag, there was no file, now when we will try to createPath with overWrite = false, if it will give 409 in case some other process created a file on same path. Also, in case of 409, it is similar to the case we started with in this method. Should we get into 409 control as in https://github.com/apache/hadoop/blob/7f9ca101e2ae057a42829883596085732f8d5fa6/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L624 for a number of times. Like if we keep threshold as 2. If it happens that it gets 409 at this line, we will try once again to handle 409, post that we fail. @snvijaya @anmolanmol1234 @sreeb-msft, what you feel. > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 >
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698009#comment-17698009 ] ASF GitHub Bot commented on HADOOP-18657: - hadoop-yetus commented on PR #5462: URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1460491719 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 57s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 13s | | trunk passed | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 0m 35s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 43s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 1m 16s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 31s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 34s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 34s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 20s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 33s | | the patch passed | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 12s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 11s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 93m 52s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5462 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 0304206b7a96 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / a4122e276ad2264c6303eecc3584b63f865dd353 | | Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/1/testReport/ | | Max. process+thread count | 627 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697951#comment-17697951 ] ASF GitHub Bot commented on HADOOP-18657: - steveloughran commented on PR #5462: URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1460326545 fyi @saxenapranav @mehakmeet as well as improving diagnostics, this patch also changes the recovery code by handling a deletion of the target file between the first failure and the retry. > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > Based on experience trying to debug this happening > # add debug statements when create() fails > # generated exception text to reference string shared with tests, path and > error code > # generated exception to include inner exception for full stack trace > Currently the retry logic is > # create(overwrite=false) > # if HTTP_CONFLICT/409 raised; call HEAD > # use etag in create(path, overwrite=true, etag) > # special handling of error HTTP_PRECON_FAILED = 412 > There's a race condition here, which is if between 1 and 2 the file which > exists is deleted. The retry should succeed, but currently a 404 from the > head is escalated to a failure > proposed changes > # if HEAD is 404, leave etag == null and continue > # special handling of 412 also to handle 409 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic
[ https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697947#comment-17697947 ] ASF GitHub Bot commented on HADOOP-18657: - steveloughran opened a new pull request, #5462: URL: https://github.com/apache/hadoop/pull/5462 ### Description of PR Tunes how abfs handles a failure during create which may be due to concurrency *or* load-related retries happening in the store. * better logging * happy with the confict being resolved by the file being deleted * more diagnostics in failure raised ### How was this patch tested? lease test run already; doing full hadoop-azure test run ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [X] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Tune ABFS create() retry logic > -- > > Key: HADOOP-18657 > URL: https://issues.apache.org/jira/browse/HADOOP-18657 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > Based on experience trying to debug this happening > # add debug statements when create() fails > # generated exception text to reference string shared with tests, path and > error code > # generated exception to include inner exception for full stack trace > Currently the retry logic is > # create(overwrite=false) > # if HTTP_CONFLICT/409 raised; call HEAD > # use etag in create(path, overwrite=true, etag) > # special handling of error HTTP_PRECON_FAILED = 412 > There's a race condition here, which is if between 1 and 2 the file which > exists is deleted. The retry should succeed, but currently a 404 from the > head is escalated to a failure > proposed changes > # if HEAD is 404, leave etag == null and continue > # special handling of 412 also to handle 409 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org