[ https://issues.apache.org/jira/browse/HADOOP-18793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741189#comment-17741189 ]
Emanuel Velzi edited comment on HADOOP-18793 at 7/7/23 8:54 PM: ---------------------------------------------------------------- Hi [~ste...@apache.org] I have a similar issue related to this method: *cleanupStagingDirs()* using {*}magic committers{*}. We have two spark jobs writing to the same s3a directory. We have the property *spark.hadoop.fs.s3a.committer.abort.pending.uploads=false* So we see in logs this line: DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up pending uploads to s3a ... (from [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) But we also see in logs that when the first job finalize {*}the __magic directory is deleted{*}: INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s (from [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137]) I'm not sure but I think that this is affecting to the second job that is still running. The fact that /__{_}magic *is deleted recursively* (including all subdirectories like: /__{_}{_}magic/job-1 , /{_}__magic/job-2 ...) , *could be a problem?* was (Author: JIRAUSER301027): Hi [~ste...@apache.org] I have a similar issue related to this method: *cleanupStagingDirs()* using {*}magic committers{*}. We have two spark jobs writing to the same s3a directory. We have the property *spark.hadoop.fs.s3a.committer.abort.pending.uploads=false* So we see in logs this line: DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up pending uploads to s3a ... (from [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) But we also see in logs that when the first job finalize {*}the __magic directory is deleted{*}: INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s (from [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137]) I'm not sure but I think that this is affecting to the second job that is still running. The fact that /{_}{{_}}magic *is deleted recursively* (including all subdirectories like: /__magic/job-1 , /__magic/job-2 ...) , c{*}ould be a problem?{*} > S3A StagingCommitter does not clean up staging-uploads directory > ---------------------------------------------------------------- > > Key: HADOOP-18793 > URL: https://issues.apache.org/jira/browse/HADOOP-18793 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 3.2.2 > Reporter: Harunobu Daikoku > Priority: Minor > Labels: pull-request-available > > When setting up StagingCommitter and its internal FileOutputCommitter, a > temporary directory that holds MPU information will be created on the default > FS, which by default is to be > /user/${USER}/tmp/staging/${USER}/${UUID}/staging-uploads. > On a successful job commit, its child directory (_temporary) will be [cleaned > up|https://github.com/apache/hadoop/blob/a36d8adfd18e88f2752f4387ac4497aadd3a74e7/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java#L516] > properly, but ${UUID}/staging-uploads will remain. > This will result in having too many empty ${UUID}/staging-uploads directories > under /user/${USER}/tmp/staging/${USER}, and will eventually cause an issue > in an environment where the max number of items in a directory is capped > (e.g. by dfs.namenode.fs-limits.max-directory-items in HDFS). > {noformat} > The directory item limit of /user/${USER}/tmp/staging/${USER} is exceeded: > limit=1048576 items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1205) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org