[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=514174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514174 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 19/Nov/20 15:59 Start Date: 19/Nov/20 15:59 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-730470829 Thank you, @steveloughran and guys! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 514174) Time Spent: 8h 10m (was: 8h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 8h 10m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=514138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514138 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 19/Nov/20 14:22 Start Date: 19/Nov/20 14:22 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-730407746 Merged to trunk, not yet 3.3. See #2473 for the test failure caused in code from a different PR *which this patch goes nowhere near*. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 514138) Time Spent: 8h (was: 7h 50m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 8h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=514137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514137 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 19/Nov/20 14:22 Start Date: 19/Nov/20 14:22 Worklog Time Spent: 10m Work Description: steveloughran closed pull request #2399: URL: https://github.com/apache/hadoop/pull/2399 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 514137) Time Spent: 7h 50m (was: 7h 40m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=513079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-513079 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 17/Nov/20 18:23 Start Date: 17/Nov/20 18:23 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-729115191 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 11s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 10s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 21s | | trunk passed | | +1 :green_heart: | compile | 21m 38s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | compile | 18m 13s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | checkstyle | 2m 53s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 13s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 48s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 26s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 2m 6s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +0 :ok: | spotbugs | 1m 11s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 27s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 28s | | the patch passed | | +1 :green_heart: | compile | 20m 42s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javac | 20m 42s | | the patch passed | | +1 :green_heart: | compile | 18m 5s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | javac | 18m 5s | | the patch passed | | +1 :green_heart: | checkstyle | 2m 46s | | root: The patch generated 0 new + 48 unchanged - 1 fixed = 48 total (was 49) | | +1 :green_heart: | mvnsite | 2m 12s | | the patch passed | | -1 :x: | whitespace | 0m 0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/artifact/out/whitespace-eol.txt) | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 17m 9s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 24s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 2m 6s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | findbugs | 3m 42s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 9m 49s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 1m 35s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 48s | | The patch does not generate ASF License warnings. | | | | 196m 23s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2399 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint | | uname | Linux 965a3f2ebeb9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / a7b923c80c6 | | Default Java | Private Build-
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512492 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 16/Nov/20 18:19 Start Date: 16/Nov/20 18:19 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-728238352 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 34m 5s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 2s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 39s | | trunk passed | | +1 :green_heart: | compile | 27m 52s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | compile | 24m 10s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | checkstyle | 3m 33s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 38s | | trunk passed | | +1 :green_heart: | shadedclient | 27m 1s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 2m 5s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 2m 43s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +0 :ok: | spotbugs | 1m 42s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 4m 40s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 12s | | the patch passed | | +1 :green_heart: | compile | 23m 37s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javac | 23m 37s | | the patch passed | | +1 :green_heart: | compile | 18m 12s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | javac | 18m 12s | | the patch passed | | -0 :warning: | checkstyle | 2m 46s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/diff-checkstyle-root.txt) | root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49) | | +1 :green_heart: | mvnsite | 2m 12s | | the patch passed | | -1 :x: | whitespace | 0m 0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/whitespace-eol.txt) | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 16m 58s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 25s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 2m 5s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | findbugs | 3m 43s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 9m 43s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 1m 35s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 49s | | The patch does not generate ASF License warnings. | | | | 257m 18s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2399 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint | | uname | Linux 4ee065eae8d6 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | mav
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512381 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 16/Nov/20 14:03 Start Date: 16/Nov/20 14:03 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-728070916 Pushed up an iteration with all the feedback addressed testing: s3 london, unguarded, markers=keep downstream testing (which now includes a test to generate 10K Job IDs through the spark API and verify they are different): s3 ireland, unguarded, markers = delete This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 512381) Time Spent: 7h 20m (was: 7h 10m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 7h 20m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512301 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 16/Nov/20 12:11 Start Date: 16/Nov/20 12:11 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-727939820 @rdblue yes, I did a bit more than was needed because I had to also let > 1 magic committer commit work side-by-side (all that active upload warning), and the IDE was trying to keep me in check too, on a piece of code which hasn't been revisited for a while. While I had the files open in the IDE, I moved to passing FileStatus down to line up with the changes in #2168 -if you open a file through the JsonSerializer by passing in the FileStatus, that will be handed off to the FileSystem's implementation of openFile(status.path).withFileStatus(status), and so be used by S3A FS to skip the initial HEAD request. Means if we are reading 1000 .pendingset files in S3A, we eliminate 1000 HEAD calls, which should have tangible benefits for committers using S3 as the place to keep those files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 512301) Time Spent: 7h 10m (was: 7h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512283 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 16/Nov/20 11:29 Start Date: 16/Nov/20 11:29 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r523130440 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -411,26 +464,63 @@ protected void maybeCreateSuccessMarker(JobContext context, * be deleted; creating it now ensures there is something at the end * while the job is in progress -and if nothing is created, that * it is still there. + * + * The option {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID} + * is set to the job UUID; if generated locally + * {@link InternalCommitterConstants#SPARK_WRITE_UUID} is also patched. + * The field {@link #jobSetup} is set to true to note that + * this specific committer instance was used to set up a job. + * * @param context context * @throws IOException IO failure */ @Override public void setupJob(JobContext context) throws IOException { -try (DurationInfo d = new DurationInfo(LOG, "preparing destination")) { +try (DurationInfo d = new DurationInfo(LOG, +"Job %s setting up", getUUID())) { + // record that the job has been set up + jobSetup = true; + // patch job conf with the job UUID. + Configuration c = context.getConfiguration(); + c.set(FS_S3A_COMMITTER_UUID, this.getUUID()); + if (getUUIDSource() == JobUUIDSource.GeneratedLocally) { +// we set the UUID up locally. Save it back to the job configuration +c.set(SPARK_WRITE_UUID, this.getUUID()); Review comment: I was just trying to be rigorous. will roll back. While I'm there I think I'll add the source attribute -i can then probe for it in the tests. I'm already saving it in the _SUCCESS file ## File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md ## @@ -248,6 +247,47 @@ As an example, the endpoint for S3 Frankfurt is `s3.eu-central-1.amazonaws.com`: ``` +### `Class does not implement AWSCredentialsProvider` Review comment: going to add that specific bit about spark hive classloaders here too, which is where this is coming from ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -1044,6 +1166,155 @@ protected void abortPendingUploads( } } + /** + * Scan for active uploads and list them along with a warning message. + * Errors are ignored. + * @param path output path of job. + */ + protected void warnOnActiveUploads(final Path path) { +List pending; +try { + pending = getCommitOperations() + .listPendingUploadsUnderPath(path); +} catch (IOException e) { + LOG.debug("Failed to list uploads under {}", + path, e); + return; +} +if (!pending.isEmpty()) { + // log a warning + LOG.warn("{} active upload(s) in progress under {}", + pending.size(), + path); + LOG.warn("Either jobs are running concurrently" + + " or failed jobs are not being cleaned up"); + // and the paths + timestamps + DateFormat df = DateFormat.getDateTimeInstance(); + pending.forEach(u -> + LOG.info("[{}] {}", + df.format(u.getInitiated()), + u.getKey())); + if (shouldAbortUploadsInCleanup()) { +LOG.warn("This committer will abort these uploads in job cleanup"); + } +} + } + + /** + * Build the job UUID. + * + * + * In MapReduce jobs, the application ID is issued by YARN, and + * unique across all jobs. + * + * + * Spark will use a fake app ID based on the current time. + * This can lead to collisions on busy clusters. + * + * + * + * Value of + * {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}. + * Value of + * {@link InternalCommitterConstants#SPARK_WRITE_UUID}. + * If enabled: Self-generated uuid. + * If not disabled: Application ID + * + * The UUID bonding takes place during construction; + * the staging committers use it to set up their wrapped + * committer to a path in the cluster FS which is unique to the + * job. + * + * In MapReduce jobs, the application ID is issued by YARN, and + * unique across all jobs. + * + * In {@link #setupJob(JobContext)} the job context's configuration + * will be patched + * be valid in all sequences where the job has been set up for the + * configuration passed in
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511377 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 13/Nov/20 14:06 Start Date: 13/Nov/20 14:06 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522970306 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -1044,6 +1166,155 @@ protected void abortPendingUploads( } } + /** + * Scan for active uploads and list them along with a warning message. + * Errors are ignored. + * @param path output path of job. + */ + protected void warnOnActiveUploads(final Path path) { +List pending; +try { + pending = getCommitOperations() + .listPendingUploadsUnderPath(path); +} catch (IOException e) { + LOG.debug("Failed to list uploads under {}", + path, e); + return; +} +if (!pending.isEmpty()) { + // log a warning + LOG.warn("{} active upload(s) in progress under {}", + pending.size(), + path); + LOG.warn("Either jobs are running concurrently" + + " or failed jobs are not being cleaned up"); + // and the paths + timestamps + DateFormat df = DateFormat.getDateTimeInstance(); + pending.forEach(u -> + LOG.info("[{}] {}", + df.format(u.getInitiated()), + u.getKey())); + if (shouldAbortUploadsInCleanup()) { +LOG.warn("This committer will abort these uploads in job cleanup"); + } +} + } + + /** + * Build the job UUID. + * + * + * In MapReduce jobs, the application ID is issued by YARN, and + * unique across all jobs. + * + * + * Spark will use a fake app ID based on the current time. + * This can lead to collisions on busy clusters. + * + * + * + * Value of + * {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}. + * Value of + * {@link InternalCommitterConstants#SPARK_WRITE_UUID}. + * If enabled: Self-generated uuid. + * If not disabled: Application ID Review comment: added the extra details This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 511377) Time Spent: 6h 50m (was: 6h 40m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511370 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 13/Nov/20 13:49 Start Date: 13/Nov/20 13:49 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522961040 ## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java ## @@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws Throwable { } + + /** + * Run two jobs with the same destination and different output paths. + * + * This only works if the jobs are set to NOT delete all outstanding + * uploads under the destination path. + * + * See HADOOP-17318. + */ + @Test + public void testParallelJobsToSameDestination() throws Throwable { + +describe("Run two jobs to the same destination, assert they both complete"); +Configuration conf = getConfiguration(); +conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false); + +// this job has a job ID generated and set as the spark UUID; +// the config is also set to require it. +// This mimics the Spark setup process. + +String stage1Id = UUID.randomUUID().toString(); +conf.set(SPARK_WRITE_UUID, stage1Id); +conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true); + +// create the job and write data in its task attempt +JobData jobData = startJob(true); +Job job1 = jobData.job; +AbstractS3ACommitter committer1 = jobData.committer; +JobContext jContext1 = jobData.jContext; +TaskAttemptContext tContext1 = jobData.tContext; +Path job1TaskOutputFile = jobData.writtenTextPath; + +// the write path +Assertions.assertThat(committer1.getWorkPath().toString()) +.describedAs("Work path path of %s", committer1) +.contains(stage1Id); +// now build up a second job +String jobId2 = randomJobId(); + +// second job will use same ID +String attempt2 = taskAttempt0.toString(); +TaskAttemptID taskAttempt2 = taskAttempt0; + +// create the second job +Configuration c2 = unsetUUIDOptions(new JobConf(conf)); +c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true); +Job job2 = newJob(outDir, +c2, +attempt2); +Configuration conf2 = job2.getConfiguration(); +conf2.set("mapreduce.output.basename", "task2"); +String stage2Id = UUID.randomUUID().toString(); +conf2.set(SPARK_WRITE_UUID, +stage2Id); + +JobContext jContext2 = new JobContextImpl(conf2, +taskAttempt2.getJobID()); +TaskAttemptContext tContext2 = +new TaskAttemptContextImpl(conf2, taskAttempt2); +AbstractS3ACommitter committer2 = createCommitter(outDir, tContext2); +Assertions.assertThat(committer2.getJobAttemptPath(jContext2)) +.describedAs("Job attempt path of %s", committer2) +.isNotEqualTo(committer1.getJobAttemptPath(jContext1)); +Assertions.assertThat(committer2.getTaskAttemptPath(tContext2)) +.describedAs("Task attempt path of %s", committer2) +.isNotEqualTo(committer1.getTaskAttemptPath(tContext1)); +Assertions.assertThat(committer2.getWorkPath().toString()) +.describedAs("Work path path of %s", committer2) +.isNotEqualTo(committer1.getWorkPath().toString()) +.contains(stage2Id); +Assertions.assertThat(committer2.getUUIDSource()) +.describedAs("UUID source of %s", committer2) +.isEqualTo(AbstractS3ACommitter.JobUUIDSource.SparkWriteUUID); +JobData jobData2 = new JobData(job2, jContext2, tContext2, committer2); +setup(jobData2); +abortInTeardown(jobData2); + +// the sequence is designed to ensure that job2 has active multipart +// uploads during/after job1's work + +// if the committer is a magic committer, MPUs start in the write, +// otherwise in task commit. +boolean multipartInitiatedInWrite = +committer2 instanceof MagicS3GuardCommitter; + +// job2. Here we start writing a file and have that write in progress +// when job 1 commits. + +LoggingTextOutputFormat.LoggingLineRecordWriter +recordWriter2 = new LoggingTextOutputFormat<>().getRecordWriter( +tContext2); + +LOG.info("Commit Task 1"); +commitTask(committer1, tContext1); + +if (multipartInitiatedInWrite) { + // magic committer runs -commit job1 while a job2 TA has an open + // writer (and hence: open MP Upload) + LOG.info("Commit Job 1"); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log o
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511369&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511369 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 13/Nov/20 13:44 Start Date: 13/Nov/20 13:44 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522957970 ## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java ## @@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws Throwable { } + + /** + * Run two jobs with the same destination and different output paths. + * + * This only works if the jobs are set to NOT delete all outstanding + * uploads under the destination path. + * + * See HADOOP-17318. + */ + @Test + public void testParallelJobsToSameDestination() throws Throwable { + +describe("Run two jobs to the same destination, assert they both complete"); +Configuration conf = getConfiguration(); +conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false); + +// this job has a job ID generated and set as the spark UUID; +// the config is also set to require it. +// This mimics the Spark setup process. + +String stage1Id = UUID.randomUUID().toString(); +conf.set(SPARK_WRITE_UUID, stage1Id); +conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true); + +// create the job and write data in its task attempt +JobData jobData = startJob(true); +Job job1 = jobData.job; +AbstractS3ACommitter committer1 = jobData.committer; +JobContext jContext1 = jobData.jContext; +TaskAttemptContext tContext1 = jobData.tContext; +Path job1TaskOutputFile = jobData.writtenTextPath; + +// the write path +Assertions.assertThat(committer1.getWorkPath().toString()) +.describedAs("Work path path of %s", committer1) +.contains(stage1Id); +// now build up a second job +String jobId2 = randomJobId(); + +// second job will use same ID +String attempt2 = taskAttempt0.toString(); +TaskAttemptID taskAttempt2 = taskAttempt0; + +// create the second job +Configuration c2 = unsetUUIDOptions(new JobConf(conf)); +c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true); +Job job2 = newJob(outDir, +c2, +attempt2); +Configuration conf2 = job2.getConfiguration(); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 511369) Time Spent: 6.5h (was: 6h 20m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511355 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 13/Nov/20 13:27 Start Date: 13/Nov/20 13:27 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522948661 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -147,6 +173,11 @@ protected AbstractS3ACommitter( this.jobContext = context; this.role = "Task committer " + context.getTaskAttemptID(); setConf(context.getConfiguration()); +Pair id = buildJobUUID( +conf, context.getJobID()); +uuid = id.getLeft(); +uuidSource = id.getRight(); Review comment: Makes sense in the constructor. Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 511355) Time Spent: 6h 10m (was: 6h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 6h 10m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511356 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 13/Nov/20 13:27 Start Date: 13/Nov/20 13:27 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522948976 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -202,24 +233,24 @@ protected final void setOutputPath(Path outputPath) { * @return the working path. */ @Override - public Path getWorkPath() { + public final Path getWorkPath() { return workPath; } /** * Set the work path for this committer. * @param workPath the work path to use. */ - protected void setWorkPath(Path workPath) { + protected final void setWorkPath(Path workPath) { LOG.debug("Setting work path to {}", workPath); this.workPath = workPath; } - public Configuration getConf() { + public final Configuration getConf() { return conf; } - protected void setConf(Configuration conf) { + protected final void setConf(Configuration conf) { Review comment: The IDE was whining about calling an override point in the constructor, so I turned it off at the same time. sorry This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 511356) Time Spent: 6h 20m (was: 6h 10m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511353 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 13/Nov/20 13:24 Start Date: 13/Nov/20 13:24 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522947215 ## File path: hadoop-common-project/hadoop-common/src/main/resources/core-default.xml ## @@ -1925,20 +1925,13 @@ - fs.s3a.committer.staging.abort.pending.uploads + fs.s3a.committer.abort.pending.uploads true -Should the staging committers abort all pending uploads to the destination +Should the committers abort all pending uploads to the destination directory? -Changing this if more than one partitioned committer is -writing to the same destination tree simultaneously; otherwise -the first job to complete will cancel all outstanding uploads from the -others. However, it may lead to leaked outstanding uploads from failed -tasks. If disabled, configure the bucket lifecycle to remove uploads -after a time period, and/or set up a workflow to explicitly delete -entries. Otherwise there is a risk that uncommitted uploads may run up -bills. +Set to false if more than one job is writing to the same directory tree. Review comment: taskAbort, yet. JobAbort/cleanup is where things are more trouble, because the job doesn't know what specific task attempts have uploaded. with the staging committer, there's no files uploaded until task commit. Tasks which fail before that moment don't have any pending uploads to cancel. with the magic committer, because the files are written direct to S3, there is more risk of pending uploads collecting. I'm not sure about spark here, but on MR when a task is considered to have failed, abortTask is called in the AM to abort that specific task; for the magic committer the task's set of .pending files is determined by listing the task attempt dir, and those operations cancelled. If that operation is called reliably, only the current upload is pending. Of course, if an entire job fails: no cleanup at all. The best thing to do is simply to tell everyone to have a scheduled cleanup. FWIW, the most leakage I see in the real world is actually from incomplete S3ABlockOutputStream writes as again, they accrue bills. Everyone needs a lifecycle rule to delete old ones. The sole exception there is one which our QE team used which (unknown to them) I'd use for testing the scalability of the "hadoop s3guard uploads" command -how well does it work when there are many, many incomplete uploads, can it still delete them all etc. If they had a rule then it'd screw up my test runs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 511353) Time Spent: 6h (was: 5h 50m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a j
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510916 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 17:27 Start Date: 12/Nov/20 17:27 Worklog Time Spent: 10m Work Description: rdblue commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522277893 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -147,6 +173,11 @@ protected AbstractS3ACommitter( this.jobContext = context; this.role = "Task committer " + context.getTaskAttemptID(); setConf(context.getConfiguration()); +Pair id = buildJobUUID( +conf, context.getJobID()); +uuid = id.getLeft(); +uuidSource = id.getRight(); Review comment: Other places use `this.` as a prefix when setting fields. I find that helpful when reading to know that an instance field is being set, vs a local variable. ## File path: hadoop-common-project/hadoop-common/src/main/resources/core-default.xml ## @@ -1925,20 +1925,13 @@ - fs.s3a.committer.staging.abort.pending.uploads + fs.s3a.committer.abort.pending.uploads true -Should the staging committers abort all pending uploads to the destination +Should the committers abort all pending uploads to the destination directory? -Changing this if more than one partitioned committer is -writing to the same destination tree simultaneously; otherwise -the first job to complete will cancel all outstanding uploads from the -others. However, it may lead to leaked outstanding uploads from failed -tasks. If disabled, configure the bucket lifecycle to remove uploads -after a time period, and/or set up a workflow to explicitly delete -entries. Otherwise there is a risk that uncommitted uploads may run up -bills. +Set to false if more than one job is writing to the same directory tree. Review comment: Committers don't cancel just their own pending uploads? ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -411,26 +464,63 @@ protected void maybeCreateSuccessMarker(JobContext context, * be deleted; creating it now ensures there is something at the end * while the job is in progress -and if nothing is created, that * it is still there. + * + * The option {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID} + * is set to the job UUID; if generated locally + * {@link InternalCommitterConstants#SPARK_WRITE_UUID} is also patched. + * The field {@link #jobSetup} is set to true to note that + * this specific committer instance was used to set up a job. + * * @param context context * @throws IOException IO failure */ @Override public void setupJob(JobContext context) throws IOException { -try (DurationInfo d = new DurationInfo(LOG, "preparing destination")) { +try (DurationInfo d = new DurationInfo(LOG, +"Job %s setting up", getUUID())) { + // record that the job has been set up + jobSetup = true; + // patch job conf with the job UUID. + Configuration c = context.getConfiguration(); + c.set(FS_S3A_COMMITTER_UUID, this.getUUID()); + if (getUUIDSource() == JobUUIDSource.GeneratedLocally) { +// we set the UUID up locally. Save it back to the job configuration +c.set(SPARK_WRITE_UUID, this.getUUID()); Review comment: It seems odd to set the Spark property. Does anything else use this? ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -1044,6 +1166,155 @@ protected void abortPendingUploads( } } + /** + * Scan for active uploads and list them along with a warning message. + * Errors are ignored. + * @param path output path of job. + */ + protected void warnOnActiveUploads(final Path path) { +List pending; +try { + pending = getCommitOperations() + .listPendingUploadsUnderPath(path); +} catch (IOException e) { + LOG.debug("Failed to list uploads under {}", + path, e); + return; +} +if (!pending.isEmpty()) { + // log a warning + LOG.warn("{} active upload(s) in progress under {}", + pending.size(), + path); + LOG.warn("Either jobs are running concurrently" + + " or failed jobs are not being cleaned up"); + // and the paths + timestamps + DateFormat df = DateFormat.getDateTimeInstance(); + pending.forEach(u -> + LOG.info("[{}] {}", + df.format(u.getInitiated()), + u.getKey())); + if (sh
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510792 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 13:18 Start Date: 12/Nov/20 13:18 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522098602 ## File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md ## @@ -535,20 +535,28 @@ Conflict management is left to the execution engine itself. | Option | Magic | Directory | Partitioned | Meaning | Default | ||---|---|-|-|-| -| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file at the end of each job | `true` | +| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file on the successful completion of the job. | `true` | +| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being written and/or staged. | `${hadoop.tmp.dir}/s3a` | +| `fs.s3a.committer.magic.enabled` | X | | | Enable "magic committer" support in the filesystem. | `false` | +| `fs.s3a.committer.abort.pending.uploads` | X | X | X | list and abort all pending uploads under the destination path when the job is committed or aborted. | `true` | | `fs.s3a.committer.threads` | X | X | X | Number of threads in committers for parallel operations on files. | 8 | -| `fs.s3a.committer.staging.conflict-mode` | | X | X | Conflict resolution: `fail`, `append` or `replace`| `append` | -| `fs.s3a.committer.staging.unique-filenames` | | X | X | Generate unique filenames | `true` | -| `fs.s3a.committer.magic.enabled` | X | | | Enable "magic committer" support in the filesystem | `false` | +| `fs.s3a.committer.generate.uuid` | | X | X | Generate a Job UUID if none is passed down from Spark | `false` | +| `fs.s3a.committer.require.uuid` | | X | X | Require the Job UUID to be passed down from Spark | `false` | +Staging committer (Directory and Partitioned) options | Option | Magic | Directory | Partitioned | Meaning | Default | ||---|---|-|-|-| -| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being written and/or staged. | | -| `fs.s3a.committer.staging.tmp.path` | | X | X | Path in the cluster filesystem for temporary data | `tmp/staging` | +| `fs.s3a.committer.staging.conflict-mode` | | X | X | Conflict resolution: `fail`, `append` or `replace`| `append` | Review comment: done. Also reviewed both tables, removed those columns about which committer supports what option, now they are split into common and staging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510792) Time Spent: 5h 40m (was: 5.5h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510790 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 13:14 Start Date: 12/Nov/20 13:14 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522096138 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java ## @@ -118,15 +114,14 @@ public StagingCommitter(Path outputPath, Configuration conf = getConf(); this.uploadPartSize = conf.getLongBytes( MULTIPART_SIZE, DEFAULT_MULTIPART_SIZE); -this.uuid = getUploadUUID(conf, context.getJobID()); this.uniqueFilenames = conf.getBoolean( FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES, DEFAULT_STAGING_COMMITTER_UNIQUE_FILENAMES); -setWorkPath(buildWorkPath(context, uuid)); +setWorkPath(buildWorkPath(context, this.getUUID())); Review comment: relic of wrapping/pulling up the old code. Fixed. Also clarified the uuid javadocs now that SPARK-33402 is generating more unique job IDs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510790) Time Spent: 5.5h (was: 5h 20m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510788 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 13:10 Start Date: 12/Nov/20 13:10 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522093816 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java ## @@ -68,7 +68,7 @@ /** * Serialization ID: {@value}. */ - private static final long serialVersionUID = 507133045258460084L; + private static final long serialVersionUID = 507133045258460083L + VERSION; Review comment: This is only for java serialization, obviously. It's to make sure anyone (me) who might pass them around in spark RDDs won't create serlalization problems. FWIW I use the JSON format in those cloud committer tests, primarily to verify the committer name correctness This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510788) Time Spent: 5h 20m (was: 5h 10m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510786 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 13:09 Start Date: 12/Nov/20 13:09 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522092840 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/InternalCommitterConstants.java ## @@ -97,4 +97,30 @@ private InternalCommitterConstants() { /** Error message for a path without a magic element in the list: {@value}. */ public static final String E_NO_MAGIC_PATH_ELEMENT = "No " + MAGIC + " element in path"; + + /** + * The UUID for jobs: {@value}. + * This was historically created in Spark 1.x's SQL queries, but "went away". + */ + public static final String SPARK_WRITE_UUID = + "spark.sql.sources.writeJobUUID"; + + /** + * The App ID for jobs: {@value}. + */ + public static final String SPARK_APP_ID = "spark.app.id"; Review comment: Cut it. this was a very old property passed down by spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510786) Time Spent: 5h 10m (was: 5h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510784 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 13:07 Start Date: 12/Nov/20 13:07 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522091672 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java ## @@ -264,4 +283,25 @@ private CommitConstants() { /** Extra Data key for task attempt in pendingset files. */ public static final String TASK_ATTEMPT_ID = "task.attempt.id"; + /** + * Require the spark UUID to be passed down: {@value}. + * This is to verify that SPARK-33230 has been applied to spark, and that + * {@link InternalCommitterConstants#SPARK_WRITE_UUID} is set. + * + * MUST ONLY BE SET WITH SPARK JOBS. + * + */ Review comment: +1. adding two new constants and referring to them in the production code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510784) Time Spent: 5h (was: 4h 50m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510782 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 13:03 Start Date: 12/Nov/20 13:03 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r522089565 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java ## @@ -585,7 +589,8 @@ public BulkOperationState initiateOperation(final Path path, @Retries.RetryTranslated public UploadPartResult uploadPart(UploadPartRequest request) throws IOException { -return retry("upload part", +return retry("upload part #" + request.getPartNumber() ++ " upload "+ request.getUploadId(), Review comment: This is the S3 multipart upload ID, so I'll use upload ID for it...its also used in BlockOutputStream This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510782) Time Spent: 4h 50m (was: 4h 40m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510781 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 13:01 Start Date: 12/Nov/20 13:01 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724951069 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 54s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 21s | | trunk passed | | +1 :green_heart: | compile | 21m 32s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 18m 5s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 2m 46s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 14s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 26s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 23s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 3s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 1m 10s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 23s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 22m 33s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javac | 22m 33s | | the patch passed | | +1 :green_heart: | compile | 19m 34s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | javac | 19m 34s | | the patch passed | | +1 :green_heart: | checkstyle | 2m 51s | | root: The patch generated 0 new + 48 unchanged - 1 fixed = 48 total (was 49) | | +1 :green_heart: | mvnsite | 2m 14s | | the patch passed | | -1 :x: | whitespace | 0m 0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/artifact/out/whitespace-eol.txt) | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | xml | 0m 2s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 19m 47s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 21s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | findbugs | 4m 34s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 12m 32s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 1m 54s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 206m 11s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2399 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint | | uname | Linux fa46a7df8a67 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2522bf2f9b0 | | Default Java | Priv
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510775 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 12:30 Start Date: 12/Nov/20 12:30 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-726048699 thanks. will go through comments and apply before merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510775) Time Spent: 4.5h (was: 4h 20m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510774 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 12:29 Start Date: 12/Nov/20 12:29 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724917048 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 510774) Time Spent: 4h 20m (was: 4h 10m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510773 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 12:29 Start Date: 12/Nov/20 12:29 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724104455 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 9s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 46s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 1s | | trunk passed | | +1 :green_heart: | compile | 21m 27s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 18m 11s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 2m 51s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 15s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 3s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 24s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 2s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 1m 10s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 24s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 22m 16s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javac | 22m 16s | | the patch passed | | +1 :green_heart: | compile | 19m 30s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | javac | 19m 30s | | the patch passed | | -0 :warning: | checkstyle | 2m 56s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/diff-checkstyle-root.txt) | root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49) | | +1 :green_heart: | mvnsite | 2m 42s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | xml | 0m 2s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 19m 52s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 36s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 15s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | findbugs | 5m 24s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 11m 40s | | hadoop-common in the patch passed. | | -1 :x: | unit | 1m 49s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 50s | | The patch does not generate ASF License warnings. | | | | 205m 19s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.fs.s3a.commit.staging.TestStagingCommitter | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2399 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint | | uname | Linux c32f6d9525bc 4.15.0-112-generic #113-U
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510661&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510661 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 12/Nov/20 08:17 Start Date: 12/Nov/20 08:17 Worklog Time Spent: 10m Work Description: mehakmeet commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r521770766 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -1044,6 +1166,155 @@ protected void abortPendingUploads( } } + /** + * Scan for active uploads and list them along with a warning message. + * Errors are ignored. + * @param path output path of job. + */ + protected void warnOnActiveUploads(final Path path) { +List pending; +try { + pending = getCommitOperations() + .listPendingUploadsUnderPath(path); +} catch (IOException e) { + LOG.debug("Failed to list uploads under {}", + path, e); + return; +} +if (!pending.isEmpty()) { + // log a warning + LOG.warn("{} active upload(s) in progress under {}", + pending.size(), + path); + LOG.warn("Either jobs are running concurrently" + + " or failed jobs are not being cleaned up"); + // and the paths + timestamps + DateFormat df = DateFormat.getDateTimeInstance(); + pending.forEach(u -> + LOG.info("[{}] {}", + df.format(u.getInitiated()), + u.getKey())); + if (shouldAbortUploadsInCleanup()) { +LOG.warn("This committer will abort these uploads in job cleanup"); + } +} + } + + /** + * Build the job UUID. + * + * + * In MapReduce jobs, the application ID is issued by YARN, and + * unique across all jobs. + * + * + * Spark will use a fake app ID based on the current time. + * This can lead to collisions on busy clusters. + * + * + * + * Value of + * {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}. + * Value of + * {@link InternalCommitterConstants#SPARK_WRITE_UUID}. + * If enabled: Self-generated uuid. + * If not disabled: Application ID Review comment: nit: Would this be "If disabled"? Also, what is the property we are talking about that is enabled or not, is it FS_S3A_COMMITTER_GENERATE_UUID, then we should mention it here too I think. ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java ## @@ -1044,6 +1166,155 @@ protected void abortPendingUploads( } } + /** + * Scan for active uploads and list them along with a warning message. + * Errors are ignored. + * @param path output path of job. + */ + protected void warnOnActiveUploads(final Path path) { +List pending; +try { + pending = getCommitOperations() + .listPendingUploadsUnderPath(path); +} catch (IOException e) { + LOG.debug("Failed to list uploads under {}", + path, e); + return; +} +if (!pending.isEmpty()) { + // log a warning + LOG.warn("{} active upload(s) in progress under {}", + pending.size(), + path); + LOG.warn("Either jobs are running concurrently" + + " or failed jobs are not being cleaned up"); + // and the paths + timestamps + DateFormat df = DateFormat.getDateTimeInstance(); + pending.forEach(u -> + LOG.info("[{}] {}", + df.format(u.getInitiated()), + u.getKey())); + if (shouldAbortUploadsInCleanup()) { +LOG.warn("This committer will abort these uploads in job cleanup"); + } +} + } + + /** + * Build the job UUID. + * + * + * In MapReduce jobs, the application ID is issued by YARN, and + * unique across all jobs. + * + * + * Spark will use a fake app ID based on the current time. + * This can lead to collisions on busy clusters. + * + * + * + * Value of + * {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}. + * Value of + * {@link InternalCommitterConstants#SPARK_WRITE_UUID}. + * If enabled: Self-generated uuid. + * If not disabled: Application ID + * + * The UUID bonding takes place during construction; + * the staging committers use it to set up their wrapped + * committer to a path in the cluster FS which is unique to the + * job. + * + * In MapReduce jobs, the application ID is issued by YARN, and + * unique across all jobs. + * + * In {@link #setupJob(JobContext)} the job context's configuration + * will be patched + * be valid in all sequences where the job has been set up for the + * configuration passed in. + * + * If the option {@link CommitConst
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510129 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 11/Nov/20 06:49 Start Date: 11/Nov/20 06:49 Worklog Time Spent: 10m Work Description: liuml07 commented on a change in pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#discussion_r521130866 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java ## @@ -585,7 +589,8 @@ public BulkOperationState initiateOperation(final Path path, @Retries.RetryTranslated public UploadPartResult uploadPart(UploadPartRequest request) throws IOException { -return retry("upload part", +return retry("upload part #" + request.getPartNumber() ++ " upload "+ request.getUploadId(), Review comment: nit: s/upload/upload ID/ I was thinking of consistent log keywords so taht for any retry log we can search "upload ID" or "commit ID" ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java ## @@ -131,6 +131,8 @@ protected WriteOperationHelper(S3AFileSystem owner, Configuration conf) { */ void operationRetried(String text, Exception ex, int retries, boolean idempotent) { +LOG.info("{}: Retried {}: {}", retries, text, ex.toString()); Review comment: the order of parameter is wrong. ## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java ## @@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws Throwable { } + + /** + * Run two jobs with the same destination and different output paths. + * + * This only works if the jobs are set to NOT delete all outstanding + * uploads under the destination path. + * + * See HADOOP-17318. + */ + @Test + public void testParallelJobsToSameDestination() throws Throwable { + +describe("Run two jobs to the same destination, assert they both complete"); +Configuration conf = getConfiguration(); +conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false); + +// this job has a job ID generated and set as the spark UUID; +// the config is also set to require it. +// This mimics the Spark setup process. + +String stage1Id = UUID.randomUUID().toString(); +conf.set(SPARK_WRITE_UUID, stage1Id); +conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true); + +// create the job and write data in its task attempt +JobData jobData = startJob(true); +Job job1 = jobData.job; +AbstractS3ACommitter committer1 = jobData.committer; +JobContext jContext1 = jobData.jContext; +TaskAttemptContext tContext1 = jobData.tContext; +Path job1TaskOutputFile = jobData.writtenTextPath; + +// the write path +Assertions.assertThat(committer1.getWorkPath().toString()) +.describedAs("Work path path of %s", committer1) +.contains(stage1Id); +// now build up a second job +String jobId2 = randomJobId(); + +// second job will use same ID +String attempt2 = taskAttempt0.toString(); +TaskAttemptID taskAttempt2 = taskAttempt0; + +// create the second job +Configuration c2 = unsetUUIDOptions(new JobConf(conf)); +c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true); +Job job2 = newJob(outDir, +c2, +attempt2); +Configuration conf2 = job2.getConfiguration(); Review comment: nit: may call this `conf2` like `jobConf2` to make it a bit clearer. ## File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md ## @@ -535,20 +535,28 @@ Conflict management is left to the execution engine itself. | Option | Magic | Directory | Partitioned | Meaning | Default | ||---|---|-|-|-| -| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file at the end of each job | `true` | +| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a `_SUCCESS` file on the successful completion of the job. | `true` | +| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being written and/or staged. | `${hadoop.tmp.dir}/s3a` | +| `fs.s3a.committer.magic.enabled` | X | | | Enable "magic committer" support in the filesystem. | `false` | +| `fs.s3a.committer.abort.pending.uploads` | X | X | X | list and abort all pending uploads under the destination path when the job is committed or aborted. | `true` | | `fs.s3a.committer.threads` | X | X | X | Number of threads in committers for parallel operations on files. | 8 | -| `fs.s3a.committer.staging.conflict-mode` | | X | X | Conflict resolution: `fa
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509919 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 10/Nov/20 20:34 Start Date: 10/Nov/20 20:34 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724951030 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 509919) Time Spent: 3h 40m (was: 3.5h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509897 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 10/Nov/20 19:28 Start Date: 10/Nov/20 19:28 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724917048 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 44s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 55s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 27m 52s | | trunk passed | | +1 :green_heart: | compile | 25m 20s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 18m 52s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 3m 7s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 29s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 14s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 24s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 13s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 1m 36s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 4m 6s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 44s | | the patch passed | | +1 :green_heart: | compile | 24m 37s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javac | 24m 37s | | the patch passed | | +1 :green_heart: | compile | 20m 13s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | javac | 20m 13s | | the patch passed | | -0 :warning: | checkstyle | 2m 53s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/diff-checkstyle-root.txt) | root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49) | | +1 :green_heart: | mvnsite | 2m 12s | | the patch passed | | -1 :x: | whitespace | 0m 0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/whitespace-eol.txt) | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 16m 51s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 10s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | findbugs | 4m 44s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 10m 44s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 1m 37s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 55s | | The patch does not generate ASF License warnings. | | | | 215m 31s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2399 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint | | uname | Linux 667abdab7b52 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool |
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509774 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 10/Nov/20 15:52 Start Date: 10/Nov/20 15:52 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724791317 latest test run against s3 london, no s3guard; markers deleted (classic config). Everything, even the flaky read() tests passed! -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete -Dfs.s3a.directory.marker.audit=true -Dscale This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 509774) Time Spent: 3h 20m (was: 3h 10m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509771 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 10/Nov/20 15:45 Start Date: 10/Nov/20 15:45 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724786879 some more detail for the watchers from my testing (hadoop-trunk + CDP spark 2.4). I could not get spark master and hadoop trunk to build together this week. * RDD.saveAs needs to pass down the setting too [https://issues.apache.org/jira/browse/SPARK-33402](https://issues.apache.org/jira/browse/SPARK-33402) * I'm getting errors with FileSystem instantiation in Hive and the isolated classloader [https://issues.apache.org/jira/browse/HADOOP-17372](https://issues.apache.org/jira/browse/HADOOP-17372). I'm not going near that other than to add a para in troubleshooting.md saying "you're in classloader hell". Will need to be testing against spark master before worrying about WTF is going on there I'm also now worried that if anyone does >1 job with the same dest dir and overwrite=true, then there's a risk that you get the same duplicate app attempt ID race condition. It's tempting just to do something ambitious like use a random number to generate a timestamp for the cluster launch, or some random(year-month-day)+ seconds-of-day, so that this problem goes away almost completely This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 509771) Time Spent: 3h 10m (was: 3h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509393&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509393 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 09/Nov/20 22:28 Start Date: 09/Nov/20 22:28 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724316222 Thank you for sharing, @steveloughran ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 509393) Time Spent: 3h (was: 2h 50m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509312 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 09/Nov/20 19:17 Start Date: 09/Nov/20 19:17 Worklog Time Spent: 10m Work Description: steveloughran edited a comment on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724222044 Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala versions and scalatest, I'm running tests in [cloud-integration](https://github.com/hortonworks-spark/cloud-integration) ``` S3AParquetPartitionSuite: 2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitter (AbstractS3ACommitter.java:(180)) - Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID 2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO output.FileOutputCommitter (FileOutputCommitter.java:(141)) - File Output Committer Algorithm version is 1 2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO output.FileOutputCommitter (FileOutputCommitter.java:(156)) - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitterFactory (S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory to output data to s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo 2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitterFactory (AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer StagingCommitter{AbstractS3ACommitter{role=Task committer attempt_20201109105536__m_00_0, name=directory, outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo, workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0, uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND. uploadPartSize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_}; taskId=attempt_20201109105536__m_00_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}} for s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo 2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO staging.DirectoryStagingCommitter (DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is APPEND 2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3AC ``` 1. Spark is passing down a unique job ID (committer is configured to require it) ` Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID` 1. This used for the local fs work path of the staging committer `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0,` 1. And for the cluster FS (which is file:// here) `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads` that is: spark is setting the UUID and the committer is picking it up and using as appropriate This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Wor
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509311 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 09/Nov/20 19:16 Start Date: 09/Nov/20 19:16 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724222044 Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala versions and scalatest, I'm running tests in [cloud-integration](https://github.com/hortonworks-spark/cloud-integration) ``` S3AParquetPartitionSuite: 2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitter (AbstractS3ACommitter.java:(180)) - Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID 2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO output.FileOutputCommitter (FileOutputCommitter.java:(141)) - File Output Committer Algorithm version is 1 2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO output.FileOutputCommitter (FileOutputCommitter.java:(156)) - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitterFactory (S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory to output data to s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo 2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitterFactory (AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer StagingCommitter{AbstractS3ACommitter{role=Task committer attempt_20201109105536__m_00_0, name=directory, outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo, workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0, uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND. uploadPartSize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_}; taskId=attempt_20201109105536__m_00_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}} for s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo 2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO staging.DirectoryStagingCommitter (DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is APPEND 2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3AC ``` 1. Spark is passing down a unique job ID (committer is configured to require it) ` Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID` 1. This used for the local fs work path of the staging committer `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0,` 1. And for the cluster FS (which is file:// here) `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 509311) Time Spent: 2h 40m (was: 2.5h) > S3A committer to support concurrent jobs wit
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509232 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 09/Nov/20 15:59 Start Date: 09/Nov/20 15:59 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724104455 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 9s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 46s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 1s | | trunk passed | | +1 :green_heart: | compile | 21m 27s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 18m 11s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 2m 51s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 15s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 3s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 24s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 2s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 1m 10s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 24s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 22m 16s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javac | 22m 16s | | the patch passed | | +1 :green_heart: | compile | 19m 30s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | javac | 19m 30s | | the patch passed | | -0 :warning: | checkstyle | 2m 56s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/diff-checkstyle-root.txt) | root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49) | | +1 :green_heart: | mvnsite | 2m 42s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | xml | 0m 2s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 19m 52s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 36s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 15s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | findbugs | 5m 24s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 11m 40s | | hadoop-common in the patch passed. | | -1 :x: | unit | 1m 49s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 50s | | The patch does not generate ASF License warnings. | | | | 205m 19s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.fs.s3a.commit.staging.TestStagingCommitter | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2399 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint | | uname | Linux c32f6d9525bc 4.15.0-112-generic #113-Ubuntu SM
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509150 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 09/Nov/20 13:47 Start Date: 09/Nov/20 13:47 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724023990 Test run with: -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Ds3guard -Ddynamo -Dfs.s3a.directory.marker.audit=true -Dscale ``` [INFO] [ERROR] Failures: [ERROR] ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferBeforeRead:63->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<93> [ERROR] ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferOnClosedFile:83->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<605> [INFO] [ERROR] Tests run: 1379, Failures: 2, Errors: 0, Skipped: 153 [INFO] ``` My next big bit of work is to do tests in spark itself This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 509150) Time Spent: 2h 20m (was: 2h 10m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509149 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 09/Nov/20 13:46 Start Date: 09/Nov/20 13:46 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720614677 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 56s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 58s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 27s | | trunk passed | | +1 :green_heart: | compile | 26m 16s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 20m 17s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 2m 51s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 18s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 7s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 1m 31s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 4m 14s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 48s | | the patch passed | | +1 :green_heart: | compile | 25m 39s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javac | 25m 39s | | the patch passed | | +1 :green_heart: | compile | 22m 42s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | javac | 22m 42s | | the patch passed | | -0 :warning: | checkstyle | 3m 23s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-checkstyle-root.txt) | root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49) | | +1 :green_heart: | mvnsite | 2m 56s | | the patch passed | | -1 :x: | whitespace | 0m 0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/whitespace-eol.txt) | The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | xml | 0m 3s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 18m 16s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 43s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | -1 :x: | javadoc | 0m 38s | [/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 generated 4 new + 88 unchanged - 0 fixed = 92 total (was 88) | | +1 :green_heart: | findbugs | 4m 18s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 11m 6s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | -1 :x: | unit | 1m 48s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=507048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-507048 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 03/Nov/20 14:15 Start Date: 03/Nov/20 14:15 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-719856113 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 507048) Time Spent: 2h (was: 1h 50m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=506927&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506927 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 03/Nov/20 14:01 Start Date: 03/Nov/20 14:01 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720614677 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 56s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 11 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 58s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 27s | | trunk passed | | +1 :green_heart: | compile | 26m 16s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | compile | 20m 17s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | checkstyle | 2m 51s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 18s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javadoc | 2m 7s | | trunk passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +0 :ok: | spotbugs | 1m 31s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 4m 14s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 48s | | the patch passed | | +1 :green_heart: | compile | 25m 39s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | +1 :green_heart: | javac | 25m 39s | | the patch passed | | +1 :green_heart: | compile | 22m 42s | | the patch passed with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 | | +1 :green_heart: | javac | 22m 42s | | the patch passed | | -0 :warning: | checkstyle | 3m 23s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-checkstyle-root.txt) | root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 49) | | +1 :green_heart: | mvnsite | 2m 56s | | the patch passed | | -1 :x: | whitespace | 0m 0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/whitespace-eol.txt) | The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | xml | 0m 3s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 18m 16s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 43s | | the patch passed with JDK Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 | | -1 :x: | javadoc | 0m 38s | [/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 generated 4 new + 88 unchanged - 0 fixed = 92 total (was 88) | | +1 :green_heart: | findbugs | 4m 18s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 11m 6s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | -1 :x: | unit | 1m 48s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) | had
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=506867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506867 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 03/Nov/20 13:54 Start Date: 03/Nov/20 13:54 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720638320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 506867) Time Spent: 1h 40m (was: 1.5h) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=505395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505395 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 27/Oct/20 20:16 Start Date: 27/Oct/20 20:16 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-717513018 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 7s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 8 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 11m 28s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 35s | | trunk passed | | +1 :green_heart: | compile | 21m 19s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | compile | 17m 59s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | checkstyle | 2m 49s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 2s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 24s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 2m 3s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +0 :ok: | spotbugs | 1m 10s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 22s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 20m 42s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javac | 20m 42s | | the patch passed | | +1 :green_heart: | compile | 18m 11s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | javac | 18m 11s | | the patch passed | | -0 :warning: | checkstyle | 2m 46s | [/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/diff-checkstyle-root.txt) | root: The patch generated 8 new + 30 unchanged - 1 fixed = 38 total (was 31) | | +1 :green_heart: | mvnsite | 2m 12s | | the patch passed | | -1 :x: | whitespace | 0m 0s | [/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/whitespace-eol.txt) | The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 16m 37s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 23s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | -1 :x: | javadoc | 0m 34s | [/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt) | hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 generated 6 new + 88 unchanged - 0 fixed = 94 total (was 88) | | +1 :green_heart: | findbugs | 3m 41s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 9m 42s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 1m 36s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 46s | | The patch does not generate ASF License warnings.
[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir
[ https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=505327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505327 ] ASF GitHub Bot logged work on HADOOP-17318: --- Author: ASF GitHub Bot Created on: 27/Oct/20 17:16 Start Date: 27/Oct/20 17:16 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-717395438 @dongjoon-hyun thanks...doing a bit more on this as the more tests I write, the more corner cases surface. Think I'm control now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 505327) Time Spent: 1h 20m (was: 1h 10m) > S3A committer to support concurrent jobs with same app attempt ID & dest dir > > > Key: HADOOP-17318 > URL: https://issues.apache.org/jira/browse/HADOOP-17318 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Reported failure of magic committer block uploads as pending upload ID is > unknown. Likely cause: it's been aborted by another job > # Make it possible to turn off cleanup of pending uploads in magic committer > # log more about uploads being deleted in committers > # and upload ID in the S3aBlockOutputStream errors > There are other concurrency issues when you look close, see SPARK-33230 > * magic committer uses app attempt ID as path under __magic; if there are > duplicate then they will conflict > * staging committer local temp dir uses app attempt id > Fix will be to have a job UUID which for spark will be picked up from the > SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ > older spark builds); fall back to app-attempt *unless that fallback has been > disabled* > MR: configure to use app attempt ID > Spark: configure to fail job setup if app attempt ID is the source of a job > uuid -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org