saxenapranav commented on code in PR #5462: URL: https://github.com/apache/hadoop/pull/5462#discussion_r1130438266
########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ########## @@ -621,37 +622,57 @@ private AbfsRestOperation conditionalCreateOverwriteFile(final String relativePa isAppendBlob, null, tracingContext); } catch (AbfsRestOperationException e) { + LOG.debug("Failed to create {}", relativePath, e); if (e.getStatusCode() == HttpURLConnection.HTTP_CONFLICT) { // File pre-exists, fetch eTag + LOG.debug("Fetching etag of {}", relativePath); try { op = client.getPathStatus(relativePath, false, tracingContext); } catch (AbfsRestOperationException ex) { + LOG.debug("Failed to to getPathStatus {}", relativePath, ex); if (ex.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) { // Is a parallel access case, as file which was found to be // present went missing by this request. - throw new ConcurrentWriteOperationDetectedException( - "Parallel access to the create path detected. Failing request " - + "to honor single writer semantics"); + // this means the other thread deleted it and the conflict + // has implicitly been resolved. + LOG.debug("File at {} has been deleted; creation can continue", relativePath); } else { throw ex; } } - String eTag = op.getResult() - .getResponseHeader(HttpHeaderConfigurations.ETAG); + String eTag = op != null + ? op.getResult().getResponseHeader(HttpHeaderConfigurations.ETAG) + : null; + LOG.debug("Attempting to create file {} with etag of {}", relativePath, eTag); try { - // overwrite only if eTag matches with the file properties fetched befpre - op = client.createPath(relativePath, true, true, permission, umask, + // overwrite only if eTag matches with the file properties fetched or the file + // was deleted and there is no etag. + // if the etag was not retrieved, overwrite is still false, so will fail + // if another process has just created the file + op = client.createPath(relativePath, true, eTag != null, permission, umask, isAppendBlob, eTag, tracingContext); } catch (AbfsRestOperationException ex) { - if (ex.getStatusCode() == HttpURLConnection.HTTP_PRECON_FAILED) { + final int sc = ex.getStatusCode(); + LOG.debug("Failed to create file {} with etag {}; status code={}", + relativePath, eTag, sc, ex); + if (sc == HttpURLConnection.HTTP_PRECON_FAILED + || sc == HttpURLConnection.HTTP_CONFLICT) { Review Comment: good that we have taken care of 409 which can come when due to `etag!=null` -> overwrite argument to `client.createPath` = false. would be awesome if we can put it in comments, and also have log according to it. log1: about some file is there whose eTag is with our process. When we went back to createPath with the same eTag, some other process had replaced that file which would lead to 412, which is present in the added code: ``` final ConcurrentWriteOperationDetectedException ex2 = new ConcurrentWriteOperationDetectedException( AbfsErrors.ERR_PARALLEL_ACCESS_DETECTED + " Path =\"" + relativePath+ "\"" + "; Status code =" + sc + "; etag = \"" + eTag + "\"" + "; error =" + ex.getErrorMessage()); ``` suggestion to add log2: where in when we searched for etag, there was no file, now when we will try to createPath with overWrite = false, if it will give 409 in case some other process created a file on same path. Also, in case of 409, it is similar to the case we started with in this method. Should we get into 409 control as in https://github.com/apache/hadoop/blob/7f9ca101e2ae057a42829883596085732f8d5fa6/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L624 for a number of times. Like if we keep threshold as 2. If it happens that it gets 409 at this line, we will try once again to handle 409, post that we fail. @snvijaya @anmolanmol1234 @sreeb-msft, what you feel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org