[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760980#comment-17760980 ]
ASF GitHub Bot commented on HADOOP-18797: ----------------------------------------- hadoop-yetus commented on PR #6006: URL: https://github.com/apache/hadoop/pull/6006#issuecomment-1701170289 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 59s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 7 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 19s | | trunk passed | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 31s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 29s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 38s | | trunk passed | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 31s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 6s | | trunk passed | | -1 :x: | shadedclient | 37m 24s | | branch has errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 27s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 25s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 18s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 29s | | the patch passed | | +1 :green_heart: | javadoc | 0m 15s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 4s | | the patch passed | | -1 :x: | shadedclient | 37m 23s | | patch has errors when building and testing our client artifacts. | |||| _ Other Tests _ | | +1 :green_heart: | unit | 2m 43s | | hadoop-aws in the patch passed. | | +1 :green_heart: | asflicense | 0m 40s | | The patch does not generate ASF License warnings. | | | | 139m 43s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6006/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6006 | | JIRA Issue | HADOOP-18797 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 8d0521911b6b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d99eb99afc7f2c7c599ad09b466f340e3812bcbe | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6006/2/testReport/ | | Max. process+thread count | 456 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6006/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > S3A committer fix lost data on concurrent jobs > ---------------------------------------------- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Reporter: Emanuel Velzi > Assignee: Syed Shameerur Rahman > Priority: Major > Labels: pull-request-available > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org