[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722087#comment-17722087
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

steveloughran commented on PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1545445339

   reviewing this; too many other things have got in my way.
   
   I agree, with create overwrite=false, we must fail with a concurrency error
   
   what we don't want to do is overreact if we are doing overwrite=true and 
something does happen partway.
   
   I'll look at this in more detail, maybe focus purely on being meaningful on 
errors, in particular making sure that if the file is deleted before the error 
is raised, keep raising that concurrency error.




> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-04-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713439#comment-17713439
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

snvijaya commented on code in PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#discussion_r1131060581


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##
@@ -621,37 +622,57 @@ private AbfsRestOperation 
conditionalCreateOverwriteFile(final String relativePa
   isAppendBlob, null, tracingContext);
 
 } catch (AbfsRestOperationException e) {
+  LOG.debug("Failed to create {}", relativePath, e);
   if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) {
 // File pre-exists, fetch eTag
+LOG.debug("Fetching etag of {}", relativePath);
 try {
   op = client.getPathStatus(relativePath, false, tracingContext);
 } catch (AbfsRestOperationException ex) {
+  LOG.debug("Failed to to getPathStatus {}", relativePath, ex);
   if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) {

Review Comment:
   Hi @steveloughran, Given Hadoop is single writer semantic, would it be 
correct to expect that as part of job parallelization only one worker process 
should try to create a file ? As this check for FileNotFound is post an attempt 
to create the file with overwrite=false, which inturn failed with conflict 
indicating file was just present, concurrent operation on the file is indeed 
confirmed. 
   
   Its quite possible that if we let this create proceed, some other operation 
such as delete can kick in later on as well. Below code that throws exception 
at the first indication of parallel activity would be the right thing to do ? 
   
   
   As the workload pattern is not honoring the single writer semantic I feel we 
should retain the logic to throw  ConcurrentWriteOperationDetectedException. 





> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-04-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712435#comment-17712435
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

steveloughran commented on PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1508743152

   any updates on this? big issue is whether to retry on 409 or not?




> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-03-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700884#comment-17700884
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

hadoop-yetus commented on PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1470924983

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  javac  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  spotbugs  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 11s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 105m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5462 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 24e1da3b49dc 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 
19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 1853a46ecbb41baf82035664e30cf03584b77a64 |
   | Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
 /usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/2/testReport/ |
   | Max. process+thread count | 554 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 

[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-03-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700851#comment-17700851
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

steveloughran commented on code in PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#discussion_r1137732758


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##
@@ -621,37 +622,57 @@ private AbfsRestOperation 
conditionalCreateOverwriteFile(final String relativePa
   isAppendBlob, null, tracingContext);
 
 } catch (AbfsRestOperationException e) {
+  LOG.debug("Failed to create {}", relativePath, e);
   if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) {
 // File pre-exists, fetch eTag
+LOG.debug("Fetching etag of {}", relativePath);
 try {
   op = client.getPathStatus(relativePath, false, tracingContext);
 } catch (AbfsRestOperationException ex) {
+  LOG.debug("Failed to to getPathStatus {}", relativePath, ex);
   if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) {
 // Is a parallel access case, as file which was found to be
 // present went missing by this request.
-throw new ConcurrentWriteOperationDetectedException(
-"Parallel access to the create path detected. Failing request "
-+ "to honor single writer semantics");
+// this means the other thread deleted it and the conflict

Review Comment:
   will do; text will indicate this may be due to a lease on the parent dir too.





> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-03-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698299#comment-17698299
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

saxenapranav commented on code in PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#discussion_r1130757159


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##
@@ -621,37 +622,57 @@ private AbfsRestOperation 
conditionalCreateOverwriteFile(final String relativePa
   isAppendBlob, null, tracingContext);
 
 } catch (AbfsRestOperationException e) {
+  LOG.debug("Failed to create {}", relativePath, e);
   if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) {
 // File pre-exists, fetch eTag
+LOG.debug("Fetching etag of {}", relativePath);
 try {
   op = client.getPathStatus(relativePath, false, tracingContext);
 } catch (AbfsRestOperationException ex) {
+  LOG.debug("Failed to to getPathStatus {}", relativePath, ex);
   if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) {
 // Is a parallel access case, as file which was found to be
 // present went missing by this request.
-throw new ConcurrentWriteOperationDetectedException(
-"Parallel access to the create path detected. Failing request "
-+ "to honor single writer semantics");
+// this means the other thread deleted it and the conflict

Review Comment:
   There is a race condition in the job, and developer should be informed about 
the same. @snvijaya @anmolanmol1234 @sreeb-msft , what you feel.





> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698176#comment-17698176
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

saxenapranav commented on code in PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#discussion_r1130438266


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##
@@ -621,37 +622,57 @@ private AbfsRestOperation 
conditionalCreateOverwriteFile(final String relativePa
   isAppendBlob, null, tracingContext);
 
 } catch (AbfsRestOperationException e) {
+  LOG.debug("Failed to create {}", relativePath, e);
   if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) {
 // File pre-exists, fetch eTag
+LOG.debug("Fetching etag of {}", relativePath);
 try {
   op = client.getPathStatus(relativePath, false, tracingContext);
 } catch (AbfsRestOperationException ex) {
+  LOG.debug("Failed to to getPathStatus {}", relativePath, ex);
   if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) {
 // Is a parallel access case, as file which was found to be
 // present went missing by this request.
-throw new ConcurrentWriteOperationDetectedException(
-"Parallel access to the create path detected. Failing request "
-+ "to honor single writer semantics");
+// this means the other thread deleted it and the conflict
+// has implicitly been resolved.
+LOG.debug("File at {} has been deleted; creation can continue", 
relativePath);
   } else {
 throw ex;
   }
 }
 
-String eTag = op.getResult()
-.getResponseHeader(HttpHeaderConfigurations.ETAG);
+String eTag = op != null
+? op.getResult().getResponseHeader(HttpHeaderConfigurations.ETAG)
+: null;
 
+LOG.debug("Attempting to create file {} with etag of {}", 
relativePath, eTag);
 try {
-  // overwrite only if eTag matches with the file properties fetched 
befpre
-  op = client.createPath(relativePath, true, true, permission, umask,
+  // overwrite only if eTag matches with the file properties fetched 
or the file
+  // was deleted and there is no etag.
+  // if the etag was not retrieved, overwrite is still false, so will 
fail
+  // if another process has just created the file
+  op = client.createPath(relativePath, true, eTag != null, permission, 
umask,
   isAppendBlob, eTag, tracingContext);
 } catch (AbfsRestOperationException ex) {
-  if (ex.getStatusCode() == HttpURLConnection.HTTP_PRECON_FAILED) {
+  final int sc = ex.getStatusCode();
+  LOG.debug("Failed to create file {} with etag {}; status code={}",
+  relativePath, eTag, sc, ex);
+  if (sc == HttpURLConnection.HTTP_PRECON_FAILED
+  || sc == HttpURLConnection.HTTP_CONFLICT) {

Review Comment:
   good that we have taken care of 409 which can come when due to `etag!=null` 
->  overwrite argument to `client.createPath` = false.
   
   would be awesome if we can put it in comments, and also have log according 
to it.
   log1: about some file is there whose eTag is with our process. When we went 
back to createPath with the same eTag, some other process had replaced that 
file which would lead to 412, which is present in the added code:
   ```
final ConcurrentWriteOperationDetectedException ex2 =
   new ConcurrentWriteOperationDetectedException(
   AbfsErrors.ERR_PARALLEL_ACCESS_DETECTED
   + " Path =\"" + relativePath+ "\""
   + "; Status code =" + sc
   + "; etag = \"" + eTag + "\""
   + "; error =" + ex.getErrorMessage());
   ```
   suggestion to add log2: where in when we searched for etag, there was no 
file, now when we will try to createPath with overWrite = false, if it will 
give 409 in case some other process created a file on same path.
   
   Also, in case of 409, it is similar to the case we started with in this 
method. Should we get into 409 control as in 
https://github.com/apache/hadoop/blob/7f9ca101e2ae057a42829883596085732f8d5fa6/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L624
 for a number of times. Like if we keep threshold as 2. If it happens that it 
gets 409 at this line, we will try once again to handle 409, post that we fail. 
 @snvijaya @anmolanmol1234 @sreeb-msft,  what you feel.





> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> 

[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698009#comment-17698009
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

hadoop-yetus commented on PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1460491719

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 57s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 31s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09  |
   | +1 :green_heart: |  spotbugs  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 11s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  93m 52s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5462 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0304206b7a96 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a4122e276ad2264c6303eecc3584b63f865dd353 |
   | Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1
 /usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/1/testReport/ |
   | Max. process+thread count | 627 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5462/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 

[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697951#comment-17697951
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

steveloughran commented on PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#issuecomment-1460326545

   fyi @saxenapranav @mehakmeet
   
as well as improving diagnostics, this patch also changes the recovery code 
by handling a deletion of the target file between the first failure and the 
retry. 




> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

2023-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697947#comment-17697947
 ] 

ASF GitHub Bot commented on HADOOP-18657:
-

steveloughran opened a new pull request, #5462:
URL: https://github.com/apache/hadoop/pull/5462

   ### Description of PR
   
   Tunes how abfs handles a failure during create which may be due to 
concurrency *or* load-related retries happening in the store.
   
   * better logging
   * happy with the confict being resolved by the file being deleted
   * more diagnostics in failure raised
   
   ### How was this patch tested?
   
   lease test run already; doing full hadoop-azure test run
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [X] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Tune ABFS create() retry logic
> --
>
> Key: HADOOP-18657
> URL: https://issues.apache.org/jira/browse/HADOOP-18657
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org