[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=514174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514174
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 19/Nov/20 15:59
Start Date: 19/Nov/20 15:59
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-730470829


   Thank you, @steveloughran and guys!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514174)
Time Spent: 8h 10m  (was: 8h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=514138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514138
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 19/Nov/20 14:22
Start Date: 19/Nov/20 14:22
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-730407746


   Merged to trunk, not yet 3.3. See #2473 for the test failure caused in code 
from a different PR *which this patch goes nowhere near*.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514138)
Time Spent: 8h  (was: 7h 50m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=514137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-514137
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 19/Nov/20 14:22
Start Date: 19/Nov/20 14:22
Worklog Time Spent: 10m 
  Work Description: steveloughran closed pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 514137)
Time Spent: 7h 50m  (was: 7h 40m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=513079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-513079
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 17/Nov/20 18:23
Start Date: 17/Nov/20 18:23
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-729115191


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 10s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |  18m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   2m 53s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   1m 11s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 27s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |  20m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |  18m  5s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 46s |  |  root: The patch generated 
0 new + 48 unchanged - 1 fixed = 48 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | 
[/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/artifact/out/whitespace-eol.txt)
 |  The patch has 2 line(s) that end in whitespace. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  17m  9s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m 42s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   9m 49s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 35s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 196m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 965a3f2ebeb9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a7b923c80c6 |
   | Default Java | Private Build-

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512492
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 16/Nov/20 18:19
Start Date: 16/Nov/20 18:19
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-728238352


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  34m  5s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  2s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  27m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |  24m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   3m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 38s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  27m  1s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   1m 42s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m 40s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |  23m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |  18m 12s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 46s | 
[/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/diff-checkstyle-root.txt)
 |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 
49)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | 
[/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/whitespace-eol.txt)
 |  The patch has 2 line(s) that end in whitespace. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  16m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m 43s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   9m 43s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 35s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 257m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 4ee065eae8d6 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | mav

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512381
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 16/Nov/20 14:03
Start Date: 16/Nov/20 14:03
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-728070916


   Pushed up an iteration with all the feedback addressed
   
   testing: s3 london, unguarded, markers=keep
   downstream testing (which now includes a test to generate 10K Job IDs 
through the spark API and verify they are different): s3 ireland, unguarded, 
markers = delete



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 512381)
Time Spent: 7h 20m  (was: 7h 10m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512301
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 16/Nov/20 12:11
Start Date: 16/Nov/20 12:11
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-727939820


   @rdblue
   
   yes, I did a bit more than was needed because I had to also let > 1 magic 
committer commit work side-by-side (all that active upload warning), and the 
IDE was trying to keep me in check too, on a piece of code which hasn't been 
revisited for a while.
   
   While I had the files open in the IDE, I moved to passing FileStatus down to 
line up with the changes in #2168 -if you open a file through the 
JsonSerializer by passing in the FileStatus, that will be handed off to the 
FileSystem's implementation of openFile(status.path).withFileStatus(status), 
and so be used by S3A FS to skip the initial HEAD request. Means if we are 
reading 1000 .pendingset files in S3A, we eliminate 1000 HEAD calls, which 
should have tangible benefits for committers using S3 as the place to keep 
those files. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 512301)
Time Spent: 7h 10m  (was: 7h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=512283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-512283
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 16/Nov/20 11:29
Start Date: 16/Nov/20 11:29
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r523130440



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -411,26 +464,63 @@ protected void maybeCreateSuccessMarker(JobContext 
context,
* be deleted; creating it now ensures there is something at the end
* while the job is in progress -and if nothing is created, that
* it is still there.
+   * 
+   *   The option {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}
+   *   is set to the job UUID; if generated locally
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID} is also patched.
+   *   The field {@link #jobSetup} is set to true to note that
+   *   this specific committer instance was used to set up a job.
+   * 
* @param context context
* @throws IOException IO failure
*/
 
   @Override
   public void setupJob(JobContext context) throws IOException {
-try (DurationInfo d = new DurationInfo(LOG, "preparing destination")) {
+try (DurationInfo d = new DurationInfo(LOG,
+"Job %s setting up", getUUID())) {
+  // record that the job has been set up
+  jobSetup = true;
+  // patch job conf with the job UUID.
+  Configuration c = context.getConfiguration();
+  c.set(FS_S3A_COMMITTER_UUID, this.getUUID());
+  if (getUUIDSource() == JobUUIDSource.GeneratedLocally) {
+// we set the UUID up locally. Save it back to the job configuration
+c.set(SPARK_WRITE_UUID, this.getUUID());

Review comment:
   I was just trying to be rigorous. will roll back. While I'm there I 
think I'll add the source attribute -i can then probe for it in the tests. I'm 
already saving it in the _SUCCESS file

##
File path: 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
##
@@ -248,6 +247,47 @@ As an example, the endpoint for S3 Frankfurt is 
`s3.eu-central-1.amazonaws.com`:
 
 ```
 
+### `Class does not implement AWSCredentialsProvider`

Review comment:
   going to add that specific bit about spark hive classloaders here too, 
which is where this is coming from

##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
 }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+List pending;
+try {
+  pending = getCommitOperations()
+  .listPendingUploadsUnderPath(path);
+} catch (IOException e) {
+  LOG.debug("Failed to list uploads under {}",
+  path, e);
+  return;
+}
+if (!pending.isEmpty()) {
+  // log a warning
+  LOG.warn("{} active upload(s) in progress under {}",
+  pending.size(),
+  path);
+  LOG.warn("Either jobs are running concurrently"
+  + " or failed jobs are not being cleaned up");
+  // and the paths + timestamps
+  DateFormat df = DateFormat.getDateTimeInstance();
+  pending.forEach(u ->
+  LOG.info("[{}] {}",
+  df.format(u.getInitiated()),
+  u.getKey()));
+  if (shouldAbortUploadsInCleanup()) {
+LOG.warn("This committer will abort these uploads in job cleanup");
+  }
+}
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * 
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * 
+   * 
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * 
+   * 
+   *   Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.
+   *   Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.
+   *   If enabled: Self-generated uuid.
+   *   If not disabled: Application ID
+   * 
+   * The UUID bonding takes place during construction;
+   * the staging committers use it to set up their wrapped
+   * committer to a path in the cluster FS which is unique to the
+   * job.
+   * 
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * 
+   * In {@link #setupJob(JobContext)} the job context's configuration
+   * will be patched
+   * be valid in all sequences where the job has been set up for the
+   * configuration passed in

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511377
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 13/Nov/20 14:06
Start Date: 13/Nov/20 14:06
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522970306



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
 }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+List pending;
+try {
+  pending = getCommitOperations()
+  .listPendingUploadsUnderPath(path);
+} catch (IOException e) {
+  LOG.debug("Failed to list uploads under {}",
+  path, e);
+  return;
+}
+if (!pending.isEmpty()) {
+  // log a warning
+  LOG.warn("{} active upload(s) in progress under {}",
+  pending.size(),
+  path);
+  LOG.warn("Either jobs are running concurrently"
+  + " or failed jobs are not being cleaned up");
+  // and the paths + timestamps
+  DateFormat df = DateFormat.getDateTimeInstance();
+  pending.forEach(u ->
+  LOG.info("[{}] {}",
+  df.format(u.getInitiated()),
+  u.getKey()));
+  if (shouldAbortUploadsInCleanup()) {
+LOG.warn("This committer will abort these uploads in job cleanup");
+  }
+}
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * 
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * 
+   * 
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * 
+   * 
+   *   Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.
+   *   Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.
+   *   If enabled: Self-generated uuid.
+   *   If not disabled: Application ID

Review comment:
   added the extra details





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511377)
Time Spent: 6h 50m  (was: 6h 40m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511370
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 13/Nov/20 13:49
Start Date: 13/Nov/20 13:49
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522961040



##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java
##
@@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws 
Throwable {
 
   }
 
+
+  /**
+   * Run two jobs with the same destination and different output paths.
+   * 
+   * This only works if the jobs are set to NOT delete all outstanding
+   * uploads under the destination path.
+   * 
+   * See HADOOP-17318.
+   */
+  @Test
+  public void testParallelJobsToSameDestination() throws Throwable {
+
+describe("Run two jobs to the same destination, assert they both 
complete");
+Configuration conf = getConfiguration();
+conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false);
+
+// this job has a job ID generated and set as the spark UUID;
+// the config is also set to require it.
+// This mimics the Spark setup process.
+
+String stage1Id = UUID.randomUUID().toString();
+conf.set(SPARK_WRITE_UUID, stage1Id);
+conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+
+// create the job and write data in its task attempt
+JobData jobData = startJob(true);
+Job job1 = jobData.job;
+AbstractS3ACommitter committer1 = jobData.committer;
+JobContext jContext1 = jobData.jContext;
+TaskAttemptContext tContext1 = jobData.tContext;
+Path job1TaskOutputFile = jobData.writtenTextPath;
+
+// the write path
+Assertions.assertThat(committer1.getWorkPath().toString())
+.describedAs("Work path path of %s", committer1)
+.contains(stage1Id);
+// now build up a second job
+String jobId2 = randomJobId();
+
+// second job will use same ID
+String attempt2 = taskAttempt0.toString();
+TaskAttemptID taskAttempt2 = taskAttempt0;
+
+// create the second job
+Configuration c2 = unsetUUIDOptions(new JobConf(conf));
+c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+Job job2 = newJob(outDir,
+c2,
+attempt2);
+Configuration conf2 = job2.getConfiguration();
+conf2.set("mapreduce.output.basename", "task2");
+String stage2Id = UUID.randomUUID().toString();
+conf2.set(SPARK_WRITE_UUID,
+stage2Id);
+
+JobContext jContext2 = new JobContextImpl(conf2,
+taskAttempt2.getJobID());
+TaskAttemptContext tContext2 =
+new TaskAttemptContextImpl(conf2, taskAttempt2);
+AbstractS3ACommitter committer2 = createCommitter(outDir, tContext2);
+Assertions.assertThat(committer2.getJobAttemptPath(jContext2))
+.describedAs("Job attempt path of %s", committer2)
+.isNotEqualTo(committer1.getJobAttemptPath(jContext1));
+Assertions.assertThat(committer2.getTaskAttemptPath(tContext2))
+.describedAs("Task attempt path of %s", committer2)
+.isNotEqualTo(committer1.getTaskAttemptPath(tContext1));
+Assertions.assertThat(committer2.getWorkPath().toString())
+.describedAs("Work path path of %s", committer2)
+.isNotEqualTo(committer1.getWorkPath().toString())
+.contains(stage2Id);
+Assertions.assertThat(committer2.getUUIDSource())
+.describedAs("UUID source of %s", committer2)
+.isEqualTo(AbstractS3ACommitter.JobUUIDSource.SparkWriteUUID);
+JobData jobData2 = new JobData(job2, jContext2, tContext2, committer2);
+setup(jobData2);
+abortInTeardown(jobData2);
+
+// the sequence is designed to ensure that job2 has active multipart
+// uploads during/after job1's work
+
+// if the committer is a magic committer, MPUs start in the write,
+// otherwise in task commit.
+boolean multipartInitiatedInWrite =
+committer2 instanceof MagicS3GuardCommitter;
+
+// job2. Here we start writing a file and have that write in progress
+// when job 1 commits.
+
+LoggingTextOutputFormat.LoggingLineRecordWriter
+recordWriter2 = new LoggingTextOutputFormat<>().getRecordWriter(
+tContext2);
+
+LOG.info("Commit Task 1");
+commitTask(committer1, tContext1);
+
+if (multipartInitiatedInWrite) {
+  // magic committer runs -commit job1 while a job2 TA has an open
+  // writer (and hence: open MP Upload)
+  LOG.info("Commit Job 1");

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log o

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511369&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511369
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 13/Nov/20 13:44
Start Date: 13/Nov/20 13:44
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522957970



##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java
##
@@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws 
Throwable {
 
   }
 
+
+  /**
+   * Run two jobs with the same destination and different output paths.
+   * 
+   * This only works if the jobs are set to NOT delete all outstanding
+   * uploads under the destination path.
+   * 
+   * See HADOOP-17318.
+   */
+  @Test
+  public void testParallelJobsToSameDestination() throws Throwable {
+
+describe("Run two jobs to the same destination, assert they both 
complete");
+Configuration conf = getConfiguration();
+conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false);
+
+// this job has a job ID generated and set as the spark UUID;
+// the config is also set to require it.
+// This mimics the Spark setup process.
+
+String stage1Id = UUID.randomUUID().toString();
+conf.set(SPARK_WRITE_UUID, stage1Id);
+conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+
+// create the job and write data in its task attempt
+JobData jobData = startJob(true);
+Job job1 = jobData.job;
+AbstractS3ACommitter committer1 = jobData.committer;
+JobContext jContext1 = jobData.jContext;
+TaskAttemptContext tContext1 = jobData.tContext;
+Path job1TaskOutputFile = jobData.writtenTextPath;
+
+// the write path
+Assertions.assertThat(committer1.getWorkPath().toString())
+.describedAs("Work path path of %s", committer1)
+.contains(stage1Id);
+// now build up a second job
+String jobId2 = randomJobId();
+
+// second job will use same ID
+String attempt2 = taskAttempt0.toString();
+TaskAttemptID taskAttempt2 = taskAttempt0;
+
+// create the second job
+Configuration c2 = unsetUUIDOptions(new JobConf(conf));
+c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+Job job2 = newJob(outDir,
+c2,
+attempt2);
+Configuration conf2 = job2.getConfiguration();

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511369)
Time Spent: 6.5h  (was: 6h 20m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511355
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 13/Nov/20 13:27
Start Date: 13/Nov/20 13:27
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522948661



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -147,6 +173,11 @@ protected AbstractS3ACommitter(
 this.jobContext = context;
 this.role = "Task committer " + context.getTaskAttemptID();
 setConf(context.getConfiguration());
+Pair id = buildJobUUID(
+conf, context.getJobID());
+uuid = id.getLeft();
+uuidSource = id.getRight();

Review comment:
   Makes sense in the constructor. Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511355)
Time Spent: 6h 10m  (was: 6h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511356
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 13/Nov/20 13:27
Start Date: 13/Nov/20 13:27
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522948976



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -202,24 +233,24 @@ protected final void setOutputPath(Path outputPath) {
* @return the working path.
*/
   @Override
-  public Path getWorkPath() {
+  public final Path getWorkPath() {
 return workPath;
   }
 
   /**
* Set the work path for this committer.
* @param workPath the work path to use.
*/
-  protected void setWorkPath(Path workPath) {
+  protected final void setWorkPath(Path workPath) {
 LOG.debug("Setting work path to {}", workPath);
 this.workPath = workPath;
   }
 
-  public Configuration getConf() {
+  public final Configuration getConf() {
 return conf;
   }
 
-  protected void setConf(Configuration conf) {
+  protected final void setConf(Configuration conf) {

Review comment:
   The IDE was whining about calling an override point in the constructor, 
so I turned it off at the same time. sorry





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511356)
Time Spent: 6h 20m  (was: 6h 10m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=511353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511353
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 13/Nov/20 13:24
Start Date: 13/Nov/20 13:24
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522947215



##
File path: 
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
##
@@ -1925,20 +1925,13 @@
 
 
 
-  fs.s3a.committer.staging.abort.pending.uploads
+  fs.s3a.committer.abort.pending.uploads
   true
   
-Should the staging committers abort all pending uploads to the destination
+Should the committers abort all pending uploads to the destination
 directory?
 
-Changing this if more than one partitioned committer is
-writing to the same destination tree simultaneously; otherwise
-the first job to complete will cancel all outstanding uploads from the
-others. However, it may lead to leaked outstanding uploads from failed
-tasks. If disabled, configure the bucket lifecycle to remove uploads
-after a time period, and/or set up a workflow to explicitly delete
-entries. Otherwise there is a risk that uncommitted uploads may run up
-bills.
+Set to false if more than one job is writing to the same directory tree.

Review comment:
   taskAbort, yet. JobAbort/cleanup is where things are more trouble, 
because the job doesn't know what specific task attempts have uploaded.
   
   with the staging committer, there's no files uploaded until task commit. 
Tasks which fail before that moment don't have any pending uploads to cancel. 
   with the magic committer, because the files are written direct to S3, there 
is more risk of pending uploads collecting. 
   
   I'm not sure about spark here, but on MR when a task is considered to have 
failed, abortTask is called in the AM to abort that specific task; for the 
magic committer the task's set of .pending files is determined by listing the 
task attempt dir, and those operations cancelled. If that operation is called 
reliably, only the current upload is pending. 
   
   Of course, if an entire job fails: no cleanup at all.
   
   The best thing to do is simply to tell everyone to have a scheduled cleanup.
   
   FWIW, the most leakage I see in the real world is actually from incomplete 
S3ABlockOutputStream writes as again, they accrue bills. Everyone needs a 
lifecycle rule to delete old ones. The sole exception there is one which our QE 
team used which (unknown to them) I'd use for testing the scalability of the 
"hadoop s3guard uploads" command -how well does it work when there are many, 
many incomplete uploads, can it still delete them all etc. If they had a rule 
then it'd screw up my test runs.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 511353)
Time Spent: 6h  (was: 5h 50m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a j

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510916
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 17:27
Start Date: 12/Nov/20 17:27
Worklog Time Spent: 10m 
  Work Description: rdblue commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522277893



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -147,6 +173,11 @@ protected AbstractS3ACommitter(
 this.jobContext = context;
 this.role = "Task committer " + context.getTaskAttemptID();
 setConf(context.getConfiguration());
+Pair id = buildJobUUID(
+conf, context.getJobID());
+uuid = id.getLeft();
+uuidSource = id.getRight();

Review comment:
   Other places use `this.` as a prefix when setting fields. I find that 
helpful when reading to know that an instance field is being set, vs a local 
variable.

##
File path: 
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
##
@@ -1925,20 +1925,13 @@
 
 
 
-  fs.s3a.committer.staging.abort.pending.uploads
+  fs.s3a.committer.abort.pending.uploads
   true
   
-Should the staging committers abort all pending uploads to the destination
+Should the committers abort all pending uploads to the destination
 directory?
 
-Changing this if more than one partitioned committer is
-writing to the same destination tree simultaneously; otherwise
-the first job to complete will cancel all outstanding uploads from the
-others. However, it may lead to leaked outstanding uploads from failed
-tasks. If disabled, configure the bucket lifecycle to remove uploads
-after a time period, and/or set up a workflow to explicitly delete
-entries. Otherwise there is a risk that uncommitted uploads may run up
-bills.
+Set to false if more than one job is writing to the same directory tree.

Review comment:
   Committers don't cancel just their own pending uploads?

##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -411,26 +464,63 @@ protected void maybeCreateSuccessMarker(JobContext 
context,
* be deleted; creating it now ensures there is something at the end
* while the job is in progress -and if nothing is created, that
* it is still there.
+   * 
+   *   The option {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}
+   *   is set to the job UUID; if generated locally
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID} is also patched.
+   *   The field {@link #jobSetup} is set to true to note that
+   *   this specific committer instance was used to set up a job.
+   * 
* @param context context
* @throws IOException IO failure
*/
 
   @Override
   public void setupJob(JobContext context) throws IOException {
-try (DurationInfo d = new DurationInfo(LOG, "preparing destination")) {
+try (DurationInfo d = new DurationInfo(LOG,
+"Job %s setting up", getUUID())) {
+  // record that the job has been set up
+  jobSetup = true;
+  // patch job conf with the job UUID.
+  Configuration c = context.getConfiguration();
+  c.set(FS_S3A_COMMITTER_UUID, this.getUUID());
+  if (getUUIDSource() == JobUUIDSource.GeneratedLocally) {
+// we set the UUID up locally. Save it back to the job configuration
+c.set(SPARK_WRITE_UUID, this.getUUID());

Review comment:
   It seems odd to set the Spark property. Does anything else use this?

##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
 }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+List pending;
+try {
+  pending = getCommitOperations()
+  .listPendingUploadsUnderPath(path);
+} catch (IOException e) {
+  LOG.debug("Failed to list uploads under {}",
+  path, e);
+  return;
+}
+if (!pending.isEmpty()) {
+  // log a warning
+  LOG.warn("{} active upload(s) in progress under {}",
+  pending.size(),
+  path);
+  LOG.warn("Either jobs are running concurrently"
+  + " or failed jobs are not being cleaned up");
+  // and the paths + timestamps
+  DateFormat df = DateFormat.getDateTimeInstance();
+  pending.forEach(u ->
+  LOG.info("[{}] {}",
+  df.format(u.getInitiated()),
+  u.getKey()));
+  if (sh

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510792
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 13:18
Start Date: 12/Nov/20 13:18
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522098602



##
File path: 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
##
@@ -535,20 +535,28 @@ Conflict management is left to the execution engine 
itself.
 
 | Option | Magic | Directory | Partitioned | Meaning | Default |
 ||---|---|-|-|-|
-| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a 
`_SUCCESS` file  at the end of each job | `true` |
+| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a 
`_SUCCESS` file on the successful completion of the job. | `true` |
+| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being 
written and/or staged. | `${hadoop.tmp.dir}/s3a` |
+| `fs.s3a.committer.magic.enabled` | X |  | | Enable "magic committer" support 
in the filesystem. | `false` |
+| `fs.s3a.committer.abort.pending.uploads` | X | X | X | list and abort all 
pending uploads under the destination path when the job is committed or 
aborted. | `true` |
 | `fs.s3a.committer.threads` | X | X | X | Number of threads in committers for 
parallel operations on files. | 8 |
-| `fs.s3a.committer.staging.conflict-mode` |  | X | X | Conflict resolution: 
`fail`, `append` or `replace`| `append` |
-| `fs.s3a.committer.staging.unique-filenames` |  | X | X | Generate unique 
filenames | `true` |
-| `fs.s3a.committer.magic.enabled` | X |  | | Enable "magic committer" support 
in the filesystem | `false` |
+| `fs.s3a.committer.generate.uuid` |  | X | X | Generate a Job UUID if none is 
passed down from Spark | `false` |
+| `fs.s3a.committer.require.uuid` |  | X | X | Require the Job UUID to be 
passed down from Spark | `false` |
 
 
+Staging committer (Directory and Partitioned) options
 
 
 | Option | Magic | Directory | Partitioned | Meaning | Default |
 ||---|---|-|-|-|
-| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being 
written and/or staged. | |
-| `fs.s3a.committer.staging.tmp.path` |  | X | X | Path in the cluster 
filesystem for temporary data | `tmp/staging` |
 
+| `fs.s3a.committer.staging.conflict-mode` |  | X | X | Conflict resolution: 
`fail`, `append` or `replace`| `append` |

Review comment:
   done. Also reviewed both tables, removed those columns about which 
committer supports what option, now they are split into common and staging





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510792)
Time Spent: 5h 40m  (was: 5.5h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510790
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 13:14
Start Date: 12/Nov/20 13:14
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522096138



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java
##
@@ -118,15 +114,14 @@ public StagingCommitter(Path outputPath,
 Configuration conf = getConf();
 this.uploadPartSize = conf.getLongBytes(
 MULTIPART_SIZE, DEFAULT_MULTIPART_SIZE);
-this.uuid = getUploadUUID(conf, context.getJobID());
 this.uniqueFilenames = conf.getBoolean(
 FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES,
 DEFAULT_STAGING_COMMITTER_UNIQUE_FILENAMES);
-setWorkPath(buildWorkPath(context, uuid));
+setWorkPath(buildWorkPath(context, this.getUUID()));

Review comment:
   relic of wrapping/pulling up the old code. Fixed. Also clarified the 
uuid javadocs now that SPARK-33402 is generating more unique job IDs





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510790)
Time Spent: 5.5h  (was: 5h 20m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510788
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 13:10
Start Date: 12/Nov/20 13:10
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522093816



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java
##
@@ -68,7 +68,7 @@
   /**
* Serialization ID: {@value}.
*/
-  private static final long serialVersionUID = 507133045258460084L;
+  private static final long serialVersionUID = 507133045258460083L + VERSION;

Review comment:
   This is only for java serialization, obviously. It's to make sure anyone 
(me) who might pass them around in spark RDDs won't create serlalization 
problems. FWIW I use the JSON format in those cloud committer tests, primarily 
to verify the committer name correctness





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510788)
Time Spent: 5h 20m  (was: 5h 10m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510786
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 13:09
Start Date: 12/Nov/20 13:09
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522092840



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/InternalCommitterConstants.java
##
@@ -97,4 +97,30 @@ private InternalCommitterConstants() {
   /** Error message for a path without a magic element in the list: {@value}. 
*/
   public static final String E_NO_MAGIC_PATH_ELEMENT
   = "No " + MAGIC + " element in path";
+
+  /**
+   * The UUID for jobs: {@value}.
+   * This was historically created in Spark 1.x's SQL queries, but "went away".
+   */
+  public static final String SPARK_WRITE_UUID =
+  "spark.sql.sources.writeJobUUID";
+
+  /**
+   * The App ID for jobs: {@value}.
+   */
+  public static final String SPARK_APP_ID = "spark.app.id";

Review comment:
   Cut it. this was a very old property passed down by spark.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510786)
Time Spent: 5h 10m  (was: 5h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510784
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 13:07
Start Date: 12/Nov/20 13:07
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522091672



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java
##
@@ -264,4 +283,25 @@ private CommitConstants() {
   /** Extra Data key for task attempt in pendingset files. */
   public static final String TASK_ATTEMPT_ID = "task.attempt.id";
 
+  /**
+   * Require the spark UUID to be passed down: {@value}.
+   * This is to verify that SPARK-33230 has been applied to spark, and that
+   * {@link InternalCommitterConstants#SPARK_WRITE_UUID} is set.
+   * 
+   *   MUST ONLY BE SET WITH SPARK JOBS.
+   * 
+   */

Review comment:
   +1. adding two new constants and referring to them in the production 
code 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510784)
Time Spent: 5h  (was: 4h 50m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510782
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 13:03
Start Date: 12/Nov/20 13:03
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on a change in pull request 
#2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522089565



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
##
@@ -585,7 +589,8 @@ public BulkOperationState initiateOperation(final Path path,
   @Retries.RetryTranslated
   public UploadPartResult uploadPart(UploadPartRequest request)
   throws IOException {
-return retry("upload part",
+return retry("upload part #" + request.getPartNumber()
++ " upload "+ request.getUploadId(),

Review comment:
   This is the S3 multipart upload ID, so I'll use upload ID for it...its 
also used in BlockOutputStream





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510782)
Time Spent: 4h 50m  (was: 4h 40m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510781
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 13:01
Start Date: 12/Nov/20 13:01
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724951069


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 54s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 32s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 26s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 23s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  22m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  19m 34s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  root: The patch generated 
0 new + 48 unchanged - 1 fixed = 48 total (was 49)  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | 
[/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/artifact/out/whitespace-eol.txt)
 |  The patch has 2 line(s) that end in whitespace. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  19m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   4m 34s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  12m 32s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 54s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 206m 11s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux fa46a7df8a67 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2522bf2f9b0 |
   | Default Java | Priv

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510775
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 12:30
Start Date: 12/Nov/20 12:30
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-726048699


   thanks. will go through comments and apply before merging



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510775)
Time Spent: 4.5h  (was: 4h 20m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510774
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 12:29
Start Date: 12/Nov/20 12:29
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724917048







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510774)
Time Spent: 4h 20m  (was: 4h 10m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510773
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 12:29
Start Date: 12/Nov/20 12:29
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724104455


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  9s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 24s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  22m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  19m 30s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 56s | 
[/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/diff-checkstyle-root.txt)
 |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 
49)  |
   | +1 :green_heart: |  mvnsite  |   2m 42s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  19m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   5m 24s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  11m 40s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  |   1m 49s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 205m 19s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.fs.s3a.commit.staging.TestStagingCommitter |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux c32f6d9525bc 4.15.0-112-generic #113-U

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510661&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510661
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 12/Nov/20 08:17
Start Date: 12/Nov/20 08:17
Worklog Time Spent: 10m 
  Work Description: mehakmeet commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r521770766



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
 }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+List pending;
+try {
+  pending = getCommitOperations()
+  .listPendingUploadsUnderPath(path);
+} catch (IOException e) {
+  LOG.debug("Failed to list uploads under {}",
+  path, e);
+  return;
+}
+if (!pending.isEmpty()) {
+  // log a warning
+  LOG.warn("{} active upload(s) in progress under {}",
+  pending.size(),
+  path);
+  LOG.warn("Either jobs are running concurrently"
+  + " or failed jobs are not being cleaned up");
+  // and the paths + timestamps
+  DateFormat df = DateFormat.getDateTimeInstance();
+  pending.forEach(u ->
+  LOG.info("[{}] {}",
+  df.format(u.getInitiated()),
+  u.getKey()));
+  if (shouldAbortUploadsInCleanup()) {
+LOG.warn("This committer will abort these uploads in job cleanup");
+  }
+}
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * 
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * 
+   * 
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * 
+   * 
+   *   Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.
+   *   Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.
+   *   If enabled: Self-generated uuid.
+   *   If not disabled: Application ID

Review comment:
   nit: Would this be "If disabled"? Also, what is the property we are 
talking about that is enabled or not, is it FS_S3A_COMMITTER_GENERATE_UUID, 
then we should mention it here too I think.

##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
 }
   }
 
+  /**
+   * Scan for active uploads and list them along with a warning message.
+   * Errors are ignored.
+   * @param path output path of job.
+   */
+  protected void warnOnActiveUploads(final Path path) {
+List pending;
+try {
+  pending = getCommitOperations()
+  .listPendingUploadsUnderPath(path);
+} catch (IOException e) {
+  LOG.debug("Failed to list uploads under {}",
+  path, e);
+  return;
+}
+if (!pending.isEmpty()) {
+  // log a warning
+  LOG.warn("{} active upload(s) in progress under {}",
+  pending.size(),
+  path);
+  LOG.warn("Either jobs are running concurrently"
+  + " or failed jobs are not being cleaned up");
+  // and the paths + timestamps
+  DateFormat df = DateFormat.getDateTimeInstance();
+  pending.forEach(u ->
+  LOG.info("[{}] {}",
+  df.format(u.getInitiated()),
+  u.getKey()));
+  if (shouldAbortUploadsInCleanup()) {
+LOG.warn("This committer will abort these uploads in job cleanup");
+  }
+}
+  }
+
+  /**
+   * Build the job UUID.
+   *
+   * 
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * 
+   * 
+   * Spark will use a fake app ID based on the current time.
+   * This can lead to collisions on busy clusters.
+   *
+   * 
+   * 
+   *   Value of
+   *   {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.
+   *   Value of
+   *   {@link InternalCommitterConstants#SPARK_WRITE_UUID}.
+   *   If enabled: Self-generated uuid.
+   *   If not disabled: Application ID
+   * 
+   * The UUID bonding takes place during construction;
+   * the staging committers use it to set up their wrapped
+   * committer to a path in the cluster FS which is unique to the
+   * job.
+   * 
+   *  In MapReduce jobs, the application ID is issued by YARN, and
+   *  unique across all jobs.
+   * 
+   * In {@link #setupJob(JobContext)} the job context's configuration
+   * will be patched
+   * be valid in all sequences where the job has been set up for the
+   * configuration passed in.
+   * 
+   *   If the option {@link CommitConst

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=510129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510129
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 11/Nov/20 06:49
Start Date: 11/Nov/20 06:49
Worklog Time Spent: 10m 
  Work Description: liuml07 commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r521130866



##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
##
@@ -585,7 +589,8 @@ public BulkOperationState initiateOperation(final Path path,
   @Retries.RetryTranslated
   public UploadPartResult uploadPart(UploadPartRequest request)
   throws IOException {
-return retry("upload part",
+return retry("upload part #" + request.getPartNumber()
++ " upload "+ request.getUploadId(),

Review comment:
   nit: s/upload/upload ID/
   
   I was thinking of consistent log keywords so taht for any retry log we can 
search "upload ID" or "commit ID"

##
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/WriteOperationHelper.java
##
@@ -131,6 +131,8 @@ protected WriteOperationHelper(S3AFileSystem owner, 
Configuration conf) {
*/
   void operationRetried(String text, Exception ex, int retries,
   boolean idempotent) {
+LOG.info("{}: Retried {}: {}", retries, text, ex.toString());

Review comment:
   the order of parameter is wrong.

##
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java
##
@@ -1430,6 +1450,255 @@ public void testParallelJobsToAdjacentPaths() throws 
Throwable {
 
   }
 
+
+  /**
+   * Run two jobs with the same destination and different output paths.
+   * 
+   * This only works if the jobs are set to NOT delete all outstanding
+   * uploads under the destination path.
+   * 
+   * See HADOOP-17318.
+   */
+  @Test
+  public void testParallelJobsToSameDestination() throws Throwable {
+
+describe("Run two jobs to the same destination, assert they both 
complete");
+Configuration conf = getConfiguration();
+conf.setBoolean(FS_S3A_COMMITTER_ABORT_PENDING_UPLOADS, false);
+
+// this job has a job ID generated and set as the spark UUID;
+// the config is also set to require it.
+// This mimics the Spark setup process.
+
+String stage1Id = UUID.randomUUID().toString();
+conf.set(SPARK_WRITE_UUID, stage1Id);
+conf.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+
+// create the job and write data in its task attempt
+JobData jobData = startJob(true);
+Job job1 = jobData.job;
+AbstractS3ACommitter committer1 = jobData.committer;
+JobContext jContext1 = jobData.jContext;
+TaskAttemptContext tContext1 = jobData.tContext;
+Path job1TaskOutputFile = jobData.writtenTextPath;
+
+// the write path
+Assertions.assertThat(committer1.getWorkPath().toString())
+.describedAs("Work path path of %s", committer1)
+.contains(stage1Id);
+// now build up a second job
+String jobId2 = randomJobId();
+
+// second job will use same ID
+String attempt2 = taskAttempt0.toString();
+TaskAttemptID taskAttempt2 = taskAttempt0;
+
+// create the second job
+Configuration c2 = unsetUUIDOptions(new JobConf(conf));
+c2.setBoolean(FS_S3A_COMMITTER_REQUIRE_UUID, true);
+Job job2 = newJob(outDir,
+c2,
+attempt2);
+Configuration conf2 = job2.getConfiguration();

Review comment:
   nit: may call this `conf2` like `jobConf2` to make it a bit clearer.

##
File path: 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
##
@@ -535,20 +535,28 @@ Conflict management is left to the execution engine 
itself.
 
 | Option | Magic | Directory | Partitioned | Meaning | Default |
 ||---|---|-|-|-|
-| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a 
`_SUCCESS` file  at the end of each job | `true` |
+| `mapreduce.fileoutputcommitter.marksuccessfuljobs` | X | X | X | Write a 
`_SUCCESS` file on the successful completion of the job. | `true` |
+| `fs.s3a.buffer.dir` | X | X | X | Local filesystem directory for data being 
written and/or staged. | `${hadoop.tmp.dir}/s3a` |
+| `fs.s3a.committer.magic.enabled` | X |  | | Enable "magic committer" support 
in the filesystem. | `false` |
+| `fs.s3a.committer.abort.pending.uploads` | X | X | X | list and abort all 
pending uploads under the destination path when the job is committed or 
aborted. | `true` |
 | `fs.s3a.committer.threads` | X | X | X | Number of threads in committers for 
parallel operations on files. | 8 |
-| `fs.s3a.committer.staging.conflict-mode` |  | X | X | Conflict resolution: 
`fa

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509919
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 10/Nov/20 20:34
Start Date: 10/Nov/20 20:34
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724951030







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509919)
Time Spent: 3h 40m  (was: 3.5h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509897
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 10/Nov/20 19:28
Start Date: 10/Nov/20 19:28
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724917048


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 55s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  27m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  25m 20s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m 52s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   3m  7s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 14s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 36s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m  6s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  24m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  20m 13s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 53s | 
[/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/diff-checkstyle-root.txt)
 |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 
49)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | 
[/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/whitespace-eol.txt)
 |  The patch has 2 line(s) that end in whitespace. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  16m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   4m 44s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  10m 44s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 37s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 215m 31s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux 667abdab7b52 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool |

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509774
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 10/Nov/20 15:52
Start Date: 10/Nov/20 15:52
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724791317


   latest test run against s3 london, no s3guard; markers deleted (classic 
config). Everything, even the flaky read() tests passed!
   
-Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete  
-Dfs.s3a.directory.marker.audit=true -Dscale



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509774)
Time Spent: 3h 20m  (was: 3h 10m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509771
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 10/Nov/20 15:45
Start Date: 10/Nov/20 15:45
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724786879


   some more detail for the watchers from my testing (hadoop-trunk + CDP spark 
2.4). I could not get spark master and hadoop trunk to build together this week.
   
   * RDD.saveAs needs to pass down the setting too 
[https://issues.apache.org/jira/browse/SPARK-33402](https://issues.apache.org/jira/browse/SPARK-33402)
   * I'm getting errors with FileSystem instantiation in Hive and the isolated 
classloader 
[https://issues.apache.org/jira/browse/HADOOP-17372](https://issues.apache.org/jira/browse/HADOOP-17372).
 
   
   I'm not going near that other than to add a para in troubleshooting.md 
saying "you're in classloader hell". Will need to be testing against spark 
master before worrying about WTF is going on there
   
   I'm also now worried that if anyone does >1 job with the same dest dir and 
overwrite=true, then there's a risk that you get the same duplicate app attempt 
ID race condition. It's tempting just to do something ambitious like use a 
random number to generate a timestamp for the cluster launch, or some 
random(year-month-day)+ seconds-of-day, so that this problem goes away almost 
completely



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509771)
Time Spent: 3h 10m  (was: 3h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509393&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509393
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 09/Nov/20 22:28
Start Date: 09/Nov/20 22:28
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724316222


   Thank you for sharing, @steveloughran !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509393)
Time Spent: 3h  (was: 2h 50m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509312
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 09/Nov/20 19:17
Start Date: 09/Nov/20 19:17
Worklog Time Spent: 10m 
  Work Description: steveloughran edited a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724222044


   Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT 
builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala 
versions and scalatest, I'm running tests in 
[cloud-integration](https://github.com/hortonworks-spark/cloud-integration)
   
   ```
   S3AParquetPartitionSuite:
   2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3ACommitter (AbstractS3ACommitter.java:(180)) - Job 
UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  output.FileOutputCommitter (FileOutputCommitter.java:(141)) - File 
Output Committer Algorithm version is 1
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  output.FileOutputCommitter (FileOutputCommitter.java:(156)) - 
FileOutputCommitter skip cleanup _temporary folders under output 
directory:false, ignore cleanup failures: false
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3ACommitterFactory 
(S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory 
to output data to 
s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3ACommitterFactory 
(AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer 
StagingCommitter{AbstractS3ACommitter{role=Task committer 
attempt_20201109105536__m_00_0, name=directory, 
outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo,
 
workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0,
 uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid 
source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, 
commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads,
 uniqueFilenames=true, conflictResolution=APPEND. uploadPartSize=67108864, 
wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_};
 taskId=attempt_20201109105536__m_00_0, status=''}; 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; 
outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads,
 workPath=null, algorithmVersion=1, skipCleanup=false, 
ignoreCleanupFailures=false}} for 
s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  staging.DirectoryStagingCommitter 
(DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is 
APPEND
   2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3AC
   ```
   
   1. Spark is passing down a unique job ID (committer is configured to require 
it) ` Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source 
spark.sql.sources.writeJobUUID`
   1. This used for the local fs work path of the staging committer 
`file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0,`
  
   1. And for the cluster FS (which is file:// here)
   
`file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads`
   
   that is: spark is setting the UUID and the committer is picking it up and 
using as appropriate



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Wor

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509311
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 09/Nov/20 19:16
Start Date: 09/Nov/20 19:16
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724222044


   Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT 
builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala 
versions and scalatest, I'm running tests in 
[cloud-integration](https://github.com/hortonworks-spark/cloud-integration)
   
   ```
   S3AParquetPartitionSuite:
   2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3ACommitter (AbstractS3ACommitter.java:(180)) - Job 
UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  output.FileOutputCommitter (FileOutputCommitter.java:(141)) - File 
Output Committer Algorithm version is 1
   2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  output.FileOutputCommitter (FileOutputCommitter.java:(156)) - 
FileOutputCommitter skip cleanup _temporary folders under output 
directory:false, ignore cleanup failures: false
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3ACommitterFactory 
(S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory 
to output data to 
s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3ACommitterFactory 
(AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer 
StagingCommitter{AbstractS3ACommitter{role=Task committer 
attempt_20201109105536__m_00_0, name=directory, 
outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo,
 
workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0,
 uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid 
source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, 
commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads,
 uniqueFilenames=true, conflictResolution=APPEND. uploadPartSize=67108864, 
wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_};
 taskId=attempt_20201109105536__m_00_0, status=''}; 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; 
outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads,
 workPath=null, algorithmVersion=1, skipCleanup=false, 
ignoreCleanupFailures=false}} for 
s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo
   2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  staging.DirectoryStagingCommitter 
(DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is 
APPEND
   2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] 
INFO  commit.AbstractS3AC
   ```
   
   1. Spark is passing down a unique job ID (committer is configured to require 
it) ` Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source 
spark.sql.sources.writeJobUUID`
   1. This used for the local fs work path of the staging committer 
`file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536__m_00_0/_temporary/0/_temporary/attempt_20201109105536__m_00_0,`
  
   1. And for the cluster FS (which is file:// here)
   
`file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509311)
Time Spent: 2h 40m  (was: 2.5h)

> S3A committer to support concurrent jobs wit

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509232
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 09/Nov/20 15:59
Start Date: 09/Nov/20 15:59
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724104455


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  9s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 24s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  22m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  19m 30s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 56s | 
[/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/diff-checkstyle-root.txt)
 |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 
49)  |
   | +1 :green_heart: |  mvnsite  |   2m 42s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  19m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  findbugs  |   5m 24s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  11m 40s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  |   1m 49s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 205m 19s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.fs.s3a.commit.staging.TestStagingCommitter |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2399 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient xml findbugs checkstyle markdownlint |
   | uname | Linux c32f6d9525bc 4.15.0-112-generic #113-Ubuntu SM

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509150
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 09/Nov/20 13:47
Start Date: 09/Nov/20 13:47
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724023990


   Test run with: -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep 
-Ds3guard -Ddynamo  -Dfs.s3a.directory.marker.audit=true -Dscale
   
   ```
   [INFO] 
   [ERROR] Failures: 
   [ERROR]   
ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferBeforeRead:63->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88
 failed to read expected number of bytes from stream. This may be transient 
expected:<1024> but was:<93>
   [ERROR]   
ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferOnClosedFile:83->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88
 failed to read expected number of bytes from stream. This may be transient 
expected:<1024> but was:<605>
   [INFO] 
   [ERROR] Tests run: 1379, Failures: 2, Errors: 0, Skipped: 153
   [INFO] 
   ```
   
   My next big bit of work is to do tests in spark itself



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509150)
Time Spent: 2h 20m  (was: 2h 10m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=509149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509149
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 09/Nov/20 13:46
Start Date: 09/Nov/20 13:46
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720614677


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  26m 16s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  20m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 18s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 31s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m 14s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  25m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  25m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  22m 42s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   3m 23s | 
[/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-checkstyle-root.txt)
 |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 
49)  |
   | +1 :green_heart: |  mvnsite  |   2m 56s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | 
[/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/whitespace-eol.txt)
 |  The patch has 5 line(s) that end in whitespace. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  3s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  18m 16s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | -1 :x: |  javadoc  |   0m 38s | 
[/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 
with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 generated 4 new + 
88 unchanged - 0 fixed = 92 total (was 88)  |
   | +1 :green_heart: |  findbugs  |   4m 18s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  11m  6s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   1m 48s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=507048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-507048
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 03/Nov/20 14:15
Start Date: 03/Nov/20 14:15
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-719856113







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 507048)
Time Spent: 2h  (was: 1h 50m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=506927&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506927
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 03/Nov/20 14:01
Start Date: 03/Nov/20 14:01
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720614677


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  26m 16s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  compile  |  20m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 18s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +0 :ok: |  spotbugs  |   1m 31s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m 14s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  25m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | +1 :green_heart: |  javac  |  25m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10  |
   | +1 :green_heart: |  javac  |  22m 42s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   3m 23s | 
[/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-checkstyle-root.txt)
 |  root: The patch generated 4 new + 48 unchanged - 1 fixed = 52 total (was 
49)  |
   | +1 :green_heart: |  mvnsite  |   2m 56s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | 
[/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/whitespace-eol.txt)
 |  The patch has 5 line(s) that end in whitespace. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  3s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  18m 16s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1  |
   | -1 :x: |  javadoc  |   0m 38s | 
[/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 
with JDK Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 generated 4 new + 
88 unchanged - 0 fixed = 92 total (was 88)  |
   | +1 :green_heart: |  findbugs  |   4m 18s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  11m  6s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   1m 48s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  had

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=506867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506867
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 03/Nov/20 13:54
Start Date: 03/Nov/20 13:54
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-720638320







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 506867)
Time Spent: 1h 40m  (was: 1.5h)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=505395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505395
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 27/Oct/20 20:16
Start Date: 27/Oct/20 20:16
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-717513018


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  7s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 8 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  11m 28s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 19s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  compile  |  17m 59s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   2m 49s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +0 :ok: |  spotbugs  |   1m 10s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 22s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | +1 :green_heart: |  javac  |  20m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01  |
   | +1 :green_heart: |  javac  |  18m 11s |  |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 46s | 
[/diff-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/diff-checkstyle-root.txt)
 |  root: The patch generated 8 new + 30 unchanged - 1 fixed = 38 total (was 
31)  |
   | +1 :green_heart: |  mvnsite  |   2m 12s |  |  the patch passed  |
   | -1 :x: |  whitespace  |   0m  0s | 
[/whitespace-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/whitespace-eol.txt)
 |  The patch has 5 line(s) that end in whitespace. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  16m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1  |
   | -1 :x: |  javadoc  |   0m 34s | 
[/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt)
 |  
hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 
with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 generated 6 new + 
88 unchanged - 0 fixed = 94 total (was 88)  |
   | +1 :green_heart: |  findbugs  |   3m 41s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   9m 42s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2399/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 36s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.

[jira] [Work logged] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-10-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?focusedWorklogId=505327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-505327
 ]

ASF GitHub Bot logged work on HADOOP-17318:
---

Author: ASF GitHub Bot
Created on: 27/Oct/20 17:16
Start Date: 27/Oct/20 17:16
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#issuecomment-717395438


   @dongjoon-hyun thanks...doing a bit more on this as the more tests I write, 
the more corner cases surface. Think I'm control now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 505327)
Time Spent: 1h 20m  (was: 1h 10m)

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org