[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

ASF GitHub Bot (Jira) Tue, 18 Apr 2023 00:16:13 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713439#comment-17713439
 ]


ASF GitHub Bot commented on HADOOP-18657:
-----------------------------------------

snvijaya commented on code in PR #5462:
URL: https://github.com/apache/hadoop/pull/5462#discussion_r1131060581


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##########
@@ -621,37 +622,57 @@ private AbfsRestOperation 
conditionalCreateOverwriteFile(final String relativePa
           isAppendBlob, null, tracingContext);
 
     } catch (AbfsRestOperationException e) {
+      LOG.debug("Failed to create {}", relativePath, e);
       if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) {
         // File pre-exists, fetch eTag
+        LOG.debug("Fetching etag of {}", relativePath);
         try {
           op = client.getPathStatus(relativePath, false, tracingContext);
         } catch (AbfsRestOperationException ex) {
+          LOG.debug("Failed to to getPathStatus {}", relativePath, ex);
           if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) {

Review Comment:
   Hi @steveloughran, Given Hadoop is single writer semantic, would it be 
correct to expect that as part of job parallelization only one worker process 
should try to create a file ? As this check for FileNotFound is post an attempt 
to create the file with overwrite=false, which inturn failed with conflict 
indicating file was just present, concurrent operation on the file is indeed 
confirmed. 
   
   Its quite possible that if we let this create proceed, some other operation 
such as delete can kick in later on as well. Below code that throws exception 
at the first indication of parallel activity would be the right thing to do ? 
   
   
   As the workload pattern is not honoring the single writer semantic I feel we 
should retain the logic to throw  ConcurrentWriteOperationDetectedException. 





> Tune ABFS create() retry logic
> ------------------------------
>
>                 Key: HADOOP-18657
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18657
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>    Affects Versions: 3.3.5
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> Based on experience trying to debug this happening
> # add debug statements when create() fails
> # generated exception text to reference string shared with tests, path and 
> error code
> # generated exception to include inner exception for full stack trace
> Currently the retry logic is
> # create(overwrite=false)
> # if HTTP_CONFLICT/409 raised; call HEAD
> # use etag in create(path, overwrite=true, etag)
> # special handling of error HTTP_PRECON_FAILED = 412
> There's a race condition here, which is if between 1 and 2 the file which 
> exists is deleted. The retry should succeed, but currently a 404 from the 
> head is escalated to a failure
> proposed changes
> # if HEAD is 404, leave etag == null and continue
> # special handling of 412 also to handle 409



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18657) Tune ABFS create() retry logic

Reply via email to