[jira] [Commented] (HADOOP-19189) ITestS3ACommitterFactory failing
[ https://issues.apache.org/jira/browse/HADOOP-19189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880507#comment-17880507 ] Syed Shameerur Rahman commented on HADOOP-19189: [~ste...@apache.org] - I notice that the PR is merged yet the Jira is unresolved. > ITestS3ACommitterFactory failing > > > Key: HADOOP-19189 > URL: https://issues.apache.org/jira/browse/HADOOP-19189 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Priority: Minor > Labels: pull-request-available > > we've had ITestS3ACommitterFactory failing for a while, where it looks like > changed committer settings aren't being picked up. > {code} > ERROR] > ITestS3ACommitterFactory.testEverything:115->testInvalidFileBinding:165 > Expected a org.apache.hadoop.fs.s3a.commit.PathCommitException to be thrown, > but got the result: : > FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl > {code} > I've spent some time looking at it and it is happening because the test sets > the fileystem ref for the local test fs, and not that of the filesystem > created by the committer, which is where the option is picked up. > i've tried to parameterize it but things are still playing up and I'm not > sure how hard to try to fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19221) S3A: Unable to recover from failure of multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"
[ https://issues.apache.org/jira/browse/HADOOP-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868533#comment-17868533 ] Syed Shameerur Rahman commented on HADOOP-19221: [~ste...@apache.org] - It was a great analysis and a good catch. Sure i will review the PR. > S3A: Unable to recover from failure of multipart block upload attempt "Status > Code: 400; Error Code: RequestTimeout" > > > Key: HADOOP-19221 > URL: https://issues.apache.org/jira/browse/HADOOP-19221 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > If a multipart PUT request fails for some reason (e.g. networrk error) then > all subsequent retry attempts fail with a 400 Response and ErrorCode > RequestTimeout . > {code} > Your socket connection to the server was not read from or written to within > the timeout period. Idle connections will be closed. (Service: Amazon S3; > Status Code: 400; Error Code: RequestTimeout; Request ID:; S3 Extended > Request ID: > {code} > The list of supporessed exceptions contains the root cause (the initial > failure was a 500); all retries failed to upload properly from the source > input stream {{RequestBody.fromInputStream(fileStream, size)}}. > Hypothesis: the mark/reset stuff doesn't work for input streams. On the v1 > sdk we would build a multipart block upload request passing in (file, offset, > length), the way we are now doing this doesn't recover. > probably fixable by providing our own {{ContentStreamProvider}} > implementations for > # file + offset + length > # bytebuffer > # byte array > The sdk does have explicit support for the memory ones, but they copy the > data blocks first. we don't want that as it would double the memory > requirements of active blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18708) AWS SDK V2 - Implement CSE
[ https://issues.apache.org/jira/browse/HADOOP-18708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854368#comment-17854368 ] Syed Shameerur Rahman commented on HADOOP-18708: [~ste...@apache.org] - I have created a first cut PR and would like to get your review: https://github.com/apache/hadoop/pull/6884 > AWS SDK V2 - Implement CSE > -- > > Key: HADOOP-18708 > URL: https://issues.apache.org/jira/browse/HADOOP-18708 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > S3 Encryption client for SDK V2 is now available, so add client side > encryption back in. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18708) AWS SDK V2 - Implement CSE
[ https://issues.apache.org/jira/browse/HADOOP-18708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman reassigned HADOOP-18708: -- Assignee: Syed Shameerur Rahman (was: Ahmar Suhail) > AWS SDK V2 - Implement CSE > -- > > Key: HADOOP-18708 > URL: https://issues.apache.org/jira/browse/HADOOP-18708 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > S3 Encryption client for SDK V2 is now available, so add client side > encryption back in. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827768#comment-17827768 ] Syed Shameerur Rahman commented on HADOOP-19091: Ok, [~vnarayanan7] Please feel free to raise PR for the same. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823486#comment-17823486 ] Syed Shameerur Rahman commented on HADOOP-19091: [~vnarayanan7] - Thanks for the logs with example. Please feel free to raise the PR with the required changes. I can help with the review. Is it possible to scope down the changes only to Tez by setting `fs.s3a.committer.uuid` appropriately ? > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823030#comment-17823030 ] Syed Shameerur Rahman edited comment on HADOOP-19091 at 3/4/24 4:31 AM: [~vnarayanan7] - Could you please share the complete error stacktrace ? As i could see from the code implementation, During commitJob operation, [listPendingUploadToCommit|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L124] method is invoked which list all the files under the jobAttemptPath with a suffix `.pendingset`. So as per the logic, My understanding is that the individual file name under the jobAttemptPath should not be a concern here. was (Author: srahman): [~vnarayanan7] - Could you please share the complete error stacktrace ? As i could see from the code implementation, During commitJob operation, [listPendingUploadToCommit|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L124] method is invoked which list all the files under the jobAttemptPath with a suffix `.pendingset`. If so what is the value returned by (getJobAttemptPath) What i understand from your comment is that, The `getJobAttemptPath` is not returning correct value (for Hive,Pig with Tez) and hence the commitJob is not able to read the commit metadata. Is my understanding correct ? > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823030#comment-17823030 ] Syed Shameerur Rahman edited comment on HADOOP-19091 at 3/4/24 4:30 AM: [~vnarayanan7] - Could you please share the complete error stacktrace ? As i could see from the code implementation, During commitJob operation, [listPendingUploadToCommit|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L124] method is invoked which list all the files under the jobAttemptPath with a suffix `.pendingset`. If so what is the value returned by (getJobAttemptPath) What i understand from your comment is that, The `getJobAttemptPath` is not returning correct value (for Hive,Pig with Tez) and hence the commitJob is not able to read the commit metadata. Is my understanding correct ? was (Author: srahman): [~vnarayanan7] - Could you please share the complete error stacktrace ? As i could see from the code implementation, During commitJob operation, [listPendingUploadToCommit|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L124] method is invoked which list all the files under the jobAttemptPath with a suffix `.pendingset`. What i understand from your comment is that, The `getJobAttemptPath` is not returning correct value (for Hive,Pig with Tez) and hence the commitJob is not able to read the commit metadata. Is my understanding correct ? > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823030#comment-17823030 ] Syed Shameerur Rahman commented on HADOOP-19091: [~vnarayanan7] - Could you please share the complete error stacktrace ? As i could see from the code implementation, During commitJob operation, [listPendingUploadToCommit|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L124] method is invoked which list all the files under the jobAttemptPath with a suffix `.pendingset`. What i understand from your comment is that, The `getJobAttemptPath` is not returning correct value (for Hive,Pig with Tez) and hence the commitJob is not able to read the commit metadata. Is my understanding correct ? > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821935#comment-17821935 ] Syed Shameerur Rahman commented on HADOOP-19091: [~vnarayanan7] - Can you share the required logs (with DEBUG) if possible it will give some more clarity. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820967#comment-17820967 ] Syed Shameerur Rahman edited comment on HADOOP-19091 at 2/27/24 5:35 AM: - [~vnarayanan7] - I am not sure why MagicS3GuardCommitter won't work with Tez. In the past I vaguely remember running MagicS3GuardCommitter with Hive 3.1.3+ Tez 0.9.2 (by incorporating the changes mentioned in https://issues.apache.org/jira/browse/HIVE-16295) It would be really helpful, If you can share the replication steps for the same. was (Author: srahman): [~vnarayanan7] - I am not sure why MagicS3GuardCommitter won't work with Tez. In the past I vaguely remember running MagicS3GuardCommitter with Hive 3.1.3 and Tez 0.9.2 (by incorporating the changes mentioned in https://issues.apache.org/jira/browse/HIVE-16295) It would be really helpful, If you can share the replication steps for the same. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820967#comment-17820967 ] Syed Shameerur Rahman commented on HADOOP-19091: [~vnarayanan7] - I am not sure why MagicS3GuardCommitter won't work with Tez. In the past I vaguely remember running MagicS3GuardCommitter with Hive 3.1.3 and Tez 0.9.2 (by incorporating the changes mentioned in https://issues.apache.org/jira/browse/HIVE-16295) It would be really helpful, If you can share the replication steps for the same. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits
[ https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812770#comment-17812770 ] Syed Shameerur Rahman commented on HADOOP-19047: [~ste...@apache.org] - Gentle reminder: Could you please review the changes? > Support InMemory Tracking Of S3A Magic Commits > -- > > Key: HADOOP-19047 > URL: https://issues.apache.org/jira/browse/HADOOP-19047 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The following are the operations which happens within a Task when it uses S3A > Magic Committer. > *During closing of stream* > 1. A 0-byte file with a same name of the original file is uploaded to S3 > using PUT operation. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] > for more information. This is done so that the downstream application like > Spark could get the size of the file which is being written. > 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] > for more information. > *During TaskCommit* > 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number > of metadata file in S3 if a single task writes to 'x' files) are read and > rewritten to S3 as a single metadata file. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] > for more information > Since these operations happens with the Task JVM, We could optimize as well > as save cost by storing these information in memory when Task memory usage is > not a constraint. Hence the proposal here is to introduce a new MagicCommit > Tracker called "InMemoryMagicCommitTracker" which will store the > 1. Metadata of MPU in memory till the Task is committed > 2. Store the size of the file which can be used by the downstream application > to get the file size before it is committed/visible to the output path. > This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call > given a Task writes only 1 file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits
[ https://issues.apache.org/jira/browse/HADOOP-19047 ] Syed Shameerur Rahman deleted comment on HADOOP-19047: was (Author: srahman): [~ste...@apache.org] i have converted draft PR to final version. Could you please review the same ? > Support InMemory Tracking Of S3A Magic Commits > -- > > Key: HADOOP-19047 > URL: https://issues.apache.org/jira/browse/HADOOP-19047 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The following are the operations which happens within a Task when it uses S3A > Magic Committer. > *During closing of stream* > 1. A 0-byte file with a same name of the original file is uploaded to S3 > using PUT operation. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] > for more information. This is done so that the downstream application like > Spark could get the size of the file which is being written. > 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] > for more information. > *During TaskCommit* > 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number > of metadata file in S3 if a single task writes to 'x' files) are read and > rewritten to S3 as a single metadata file. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] > for more information > Since these operations happens with the Task JVM, We could optimize as well > as save cost by storing these information in memory when Task memory usage is > not a constraint. Hence the proposal here is to introduce a new MagicCommit > Tracker called "InMemoryMagicCommitTracker" which will store the > 1. Metadata of MPU in memory till the Task is committed > 2. Store the size of the file which can be used by the downstream application > to get the file size before it is committed/visible to the output path. > This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call > given a Task writes only 1 file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits
[ https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812104#comment-17812104 ] Syed Shameerur Rahman commented on HADOOP-19047: [~ste...@apache.org] i have converted draft PR to final version. Could you please review the same ? > Support InMemory Tracking Of S3A Magic Commits > -- > > Key: HADOOP-19047 > URL: https://issues.apache.org/jira/browse/HADOOP-19047 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The following are the operations which happens within a Task when it uses S3A > Magic Committer. > *During closing of stream* > 1. A 0-byte file with a same name of the original file is uploaded to S3 > using PUT operation. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] > for more information. This is done so that the downstream application like > Spark could get the size of the file which is being written. > 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] > for more information. > *During TaskCommit* > 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number > of metadata file in S3 if a single task writes to 'x' files) are read and > rewritten to S3 as a single metadata file. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] > for more information > Since these operations happens with the Task JVM, We could optimize as well > as save cost by storing these information in memory when Task memory usage is > not a constraint. Hence the proposal here is to introduce a new MagicCommit > Tracker called "InMemoryMagicCommitTracker" which will store the > 1. Metadata of MPU in memory till the Task is committed > 2. Store the size of the file which can be used by the downstream application > to get the file size before it is committed/visible to the output path. > This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call > given a Task writes only 1 file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits
[ https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-19047: --- Description: The following are the operations which happens within a Task when it uses S3A Magic Committer. *During closing of stream* 1. A 0-byte file with a same name of the original file is uploaded to S3 using PUT operation. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] for more information. This is done so that the downstream application like Spark could get the size of the file which is being written. 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] for more information. *During TaskCommit* 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number of metadata file in S3 if a single task writes to 'x' files) are read and rewritten to S3 as a single metadata file. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] for more information Since these operations happens with the Task JVM, We could optimize as well as save cost by storing these information in memory when Task memory usage is not a constraint. Hence the proposal here is to introduce a new MagicCommit Tracker called "InMemoryMagicCommitTracker" which will store the 1. Metadata of MPU in memory till the Task is committed 2. Store the size of the file which can be used by the downstream application to get the file size before it is committed/visible to the output path. This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call given a Task writes only 1 file. was: The following are the operations which happens within a Task when it uses S3A Magic Committer. *During the closing of stream* 1. A 0-byte file with a same name of the original file is uploaded to S3 using PUT operation. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] for more information. This is done so that the downstream application like Spark could get the size of the file which is being written. 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] for more information. *During TaskCommit* 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number of metadata file in S3 if a single task writes to 'x' files) are read and rewritten to S3 as a single metadata file. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] for more information Since these operations happens with the Task JVM, We could optimize as well as save cost by storing these information in memory when Task memory usage is not a constraint. Hence the proposal here is to introduce a new MagicCommit Tracker called "InMemoryMagicCommitTracker" which will store the 1. Metadata of MPU in memory till the Task is committed 2. Store the size of the file which can be used by the downstream application to get the file size before it is committed/visible to the output path. This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call given a Task writes only 1 file. > Support InMemory Tracking Of S3A Magic Commits > -- > > Key: HADOOP-19047 > URL: https://issues.apache.org/jira/browse/HADOOP-19047 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The following are the operations which happens within a Task when it uses S3A > Magic Committer. > *During closing of stream* > 1. A 0-byte file with a same name of the original file is uploaded to S3 > using PUT operation. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] > for more information. This is done so that the downstream application like > Spark could get the size of the file which is being written. > 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/o
[jira] [Commented] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits
[ https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808588#comment-17808588 ] Syed Shameerur Rahman commented on HADOOP-19047: I have created a Draft PR [https://github.com/apache/hadoop/pull/6468/files] for the approach. [~ste...@apache.org] Could you please review the approach ? > Support InMemory Tracking Of S3A Magic Commits > -- > > Key: HADOOP-19047 > URL: https://issues.apache.org/jira/browse/HADOOP-19047 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The following are the operations which happens within a Task when it uses S3A > Magic Committer. > *During the closing of stream* > 1. A 0-byte file with a same name of the original file is uploaded to S3 > using PUT operation. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] > for more information. This is done so that the downstream application like > Spark could get the size of the file which is being written. > 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] > for more information. > *During TaskCommit* > 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number > of metadata file in S3 if a single task writes to 'x' files) are read and > rewritten to S3 as a single metadata file. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] > for more information > Since these operations happens with the Task JVM, We could optimize as well > as save cost by storing these information in memory when Task memory usage is > not a constraint. Hence the proposal here is to introduce a new MagicCommit > Tracker called "InMemoryMagicCommitTracker" which will store the > 1. Metadata of MPU in memory till the Task is committed > 2. Store the size of the file which can be used by the downstream application > to get the file size before it is committed/visible to the output path. > This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call > given a Task writes only 1 file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits
Syed Shameerur Rahman created HADOOP-19047: -- Summary: Support InMemory Tracking Of S3A Magic Commits Key: HADOOP-19047 URL: https://issues.apache.org/jira/browse/HADOOP-19047 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Reporter: Syed Shameerur Rahman Assignee: Syed Shameerur Rahman The following are the operations which happens within a Task when it uses S3A Magic Committer. *During the closing of stream* 1. A 0-byte file with a same name of the original file is uploaded to S3 using PUT operation. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] for more information. This is done so that the downstream application like Spark could get the size of the file which is being written. 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] for more information. *During TaskCommit* 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number of metadata file in S3 if a single task writes to 'x' files) are read and rewritten to S3 as a single metadata file. Refer [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] for more information Since these operations happens with the Task JVM, We could optimize as well as save cost by storing these information in memory when Task memory usage is not a constraint. Hence the proposal here is to introduce a new MagicCommit Tracker called "InMemoryMagicCommitTracker" which will store the 1. Metadata of MPU in memory till the Task is committed 2. Store the size of the file which can be used by the downstream application to get the file size before it is committed/visible to the output path. This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call given a Task writes only 1 file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18797) Support Concurrent Writes With S3A Magic Committer
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769643#comment-17769643 ] Syed Shameerur Rahman commented on HADOOP-18797: PR for branch-3.3 : https://github.com/apache/hadoop/pull/6122 > Support Concurrent Writes With S3A Magic Committer > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18797) Support Concurrent Writes With S3A Magic Committer
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-18797: --- Issue Type: Improvement (was: Bug) > Support Concurrent Writes With S3A Magic Committer > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18797) Support Concurrent Writes With S3A Magic Committer
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-18797: --- Fix Version/s: 3.4.0 > Support Concurrent Writes With S3A Magic Committer > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18797) Support Concurrent Writes With S3A Magic Committer
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman resolved HADOOP-18797. Resolution: Fixed PR merged to trunk branch > Support Concurrent Writes With S3A Magic Committer > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18797) Support Concurrent Writes With S3A Magic Committer
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-18797: --- Summary: Support Concurrent Writes With S3A Magic Committer (was: S3A committer fix lost data on concurrent jobs) > Support Concurrent Writes With S3A Magic Committer > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762288#comment-17762288 ] Syed Shameerur Rahman commented on HADOOP-18797: [~ste...@apache.org] Could you please review the changes? > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760719#comment-17760719 ] Syed Shameerur Rahman commented on HADOOP-18797: [~ste...@apache.org], Please review the PR: https://github.com/apache/hadoop/pull/6006 > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-18797: --- Affects Version/s: (was: 3.3.6) > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman reassigned HADOOP-18797: -- Assignee: Syed Shameerur Rahman > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Assignee: Syed Shameerur Rahman >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759922#comment-17759922 ] Syed Shameerur Rahman commented on HADOOP-18797: yes, i noticed, MagicCommitPaths#isMagicPath and MagicCommitPaths#magicElementIndex needs to changed as well. > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759626#comment-17759626 ] Syed Shameerur Rahman commented on HADOOP-18797: [~ste...@apache.org] - I am more inclined towards Approach 3 , BTW FileOutputCommitters also faces the same issue of not cleaning failed jobs. The temporary files created in spark or hive staging directory will be left untouched if the job/Driver crashes. Let me know your thoughts, I am happy to contribute to the same by creating PR and running all tests > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797 ] Syed Shameerur Rahman deleted comment on HADOOP-18797: was (Author: srahman): [~ste...@apache.org] - I am more inclined towards Approach 3 , BTW FileOutputCommitters also faces the same issue of not cleaning failed jobs. The temporary files created in spark or hive staging directory will be left untouched if the job/Driver crashes. > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759575#comment-17759575 ] Syed Shameerur Rahman edited comment on HADOOP-18797 at 8/28/23 3:58 PM: - [~ste...@apache.org] - I am more inclined towards Approach 3 , BTW FileOutputCommitters also faces the same issue of not cleaning failed jobs. The temporary files created in spark or hive staging directory will be left untouched if the job/Driver crashes. was (Author: srahman): [~ste...@apache.org] - I am more inclined towards Approach 1 (as mentioned by Emanuel Velzi) , BTW FileOutputCommitters also faces the same issue of not cleaning failed jobs. The temporary files created in spark or hive staging directory will be left untouched if the job/Driver crashes. > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759527#comment-17759527 ] Syed Shameerur Rahman edited comment on HADOOP-18797 at 8/28/23 3:57 PM: - This looks like a valid use-case when multiple job writes to same table but different partitions, The MPU metadata (pendingset) of slower running jobs might be deleted by the the jobs which completes first. I could think of three approaches here Approach 1: Do job level magic directory deletion ie (__magic/job_/) (as mentioned by [~emanuelvelzi]) 1. After the job is completed delete the path __magic/job_/ Pros 1. Concurrent writes will be supported Cons 1. __magic directory will be visible in the table path even though it won't be considered 2. The remains of failed job which stay forever unless manually deleted or via some S3 policies Inorder to solve [HADOOP-18568|https://issues.apache.org/jira/browse/HADOOP-18568] we can put this behind a config similar to fs.s3a.cleanup.magic.enabled Approach 2: Optional delete of __magic directory as mentioned in [HADOOP-18568|https://issues.apache.org/jira/browse/HADOOP-18568] 1. Based on the config we can choose to delete or not delete the magic directory Pros 1. Solves both concurrent and scaling issues. Cons 1. Say we have two spark clusters, One with config enabled to delete the __magic and another with config disabled, If they simultaneously hit the same table but different partition we will again hit the same concurrency issue as mentioned in this Jira. Approach 3: Have unique magic directory for each job i.e __magic_job (similar to staging directory in FileOutputCommitter) 1. Each job will write pendingset to its specified __magic_job 2. The directory will be deleted after successful commit of the job. Pros 1. Concurrent writes will be supported 2. if all the jobs are successful no __magic_* directory will be visible Cons 1. The remains of failed job which stay forever unless manually deleted or via some S3 policies which is similar to FileOutputCommitter was (Author: srahman): This looks like a valid use-case when multiple job writes to same table but different partitions, The MPU metadata (pendingset) of slower running jobs might be deleted by the the jobs which completes first. I could think of two approaches here Approach 1: Do job level magic directory deletion ie (__magic/job_/) (as mentioned by [~emanuelvelzi]) 1. After the job is completed delete the path __magic/job_/ Pros 1. Concurrent writes will be supported Cons 1. __magic directory will be visible in the table path even though it won't be considered 2. The remains of failed job which stay forever unless manually deleted or via some S3 policies Inorder to solve [HADOOP-18568|https://issues.apache.org/jira/browse/HADOOP-18568] we can put this behind a config similar to fs.s3a.cleanup.magic.enabled Approach 2: Optional delete of __magic directory as mentioned in [HADOOP-18568|https://issues.apache.org/jira/browse/HADOOP-18568] 1. Based on the config we can choose to delete or not delete the magic directory Pros 1. Solves both concurrent and scaling issues. Cons 1. Say we have two spark clusters, One with config enabled to delete the __magic and another with config disabled, If they simultaneously hit the same table but different partition we will again hit the same concurrency issue as mentioned in this Jira. > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apa
[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759575#comment-17759575 ] Syed Shameerur Rahman commented on HADOOP-18797: [~ste...@apache.org] - I am more inclined towards Approach 1 (as mentioned by Emanuel Velzi) , BTW FileOutputCommitters also faces the same issue. The temporary files created in spark or hive staging directory will be left untouched if the job/Driver crashes. > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759575#comment-17759575 ] Syed Shameerur Rahman edited comment on HADOOP-18797 at 8/28/23 12:38 PM: -- [~ste...@apache.org] - I am more inclined towards Approach 1 (as mentioned by Emanuel Velzi) , BTW FileOutputCommitters also faces the same issue of not cleaning failed jobs. The temporary files created in spark or hive staging directory will be left untouched if the job/Driver crashes. was (Author: srahman): [~ste...@apache.org] - I am more inclined towards Approach 1 (as mentioned by Emanuel Velzi) , BTW FileOutputCommitters also faces the same issue. The temporary files created in spark or hive staging directory will be left untouched if the job/Driver crashes. > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is never > cleaned up. However, I believe this is a minor concern, even considering that > other folders such as "_SUCCESS" also persist after jobs end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18797) S3A committer fix lost data on concurrent jobs
[ https://issues.apache.org/jira/browse/HADOOP-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759527#comment-17759527 ] Syed Shameerur Rahman commented on HADOOP-18797: This looks like a valid use-case when multiple job writes to same table but different partitions, The MPU metadata (pendingset) of slower running jobs might be deleted by the the jobs which completes first. I could think of two approaches here Approach 1: Do job level magic directory deletion ie (__magic/job_/) (as mentioned by [~emanuelvelzi]) 1. After the job is completed delete the path __magic/job_/ Pros 1. Concurrent writes will be supported Cons 1. __magic directory will be visible in the table path even though it won't be considered 2. The remains of failed job which stay forever unless manually deleted or via some S3 policies Inorder to solve [HADOOP-18568|https://issues.apache.org/jira/browse/HADOOP-18568] we can put this behind a config similar to fs.s3a.cleanup.magic.enabled Approach 2: Optional delete of __magic directory as mentioned in [HADOOP-18568|https://issues.apache.org/jira/browse/HADOOP-18568] 1. Based on the config we can choose to delete or not delete the magic directory Pros 1. Solves both concurrent and scaling issues. Cons 1. Say we have two spark clusters, One with config enabled to delete the __magic and another with config disabled, If they simultaneously hit the same table but different partition we will again hit the same concurrency issue as mentioned in this Jira. > S3A committer fix lost data on concurrent jobs > -- > > Key: HADOOP-18797 > URL: https://issues.apache.org/jira/browse/HADOOP-18797 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Emanuel Velzi >Priority: Major > > There is a failure in the commit process when multiple jobs are writing to a > s3 directory *concurrently* using {*}magic committers{*}. > This issue is closely related HADOOP-17318. > When multiple Spark jobs write to the same S3A directory, they upload files > simultaneously using "__magic" as the base directory for staging. Inside this > directory, there are multiple "/job-some-uuid" directories, each representing > a concurrently running job. > To fix some preoblems related to concunrrency a property was introduced in > the previous fix: "spark.hadoop.fs.s3a.committer.abort.pending.uploads". When > set to false, it ensures that during the cleanup stage, finalizing jobs do > not abort pending uploads from other jobs. So we see in logs this line: > {code:java} > DEBUG [main] o.a.h.fs.s3a.commit.AbstractS3ACommitter (819): Not cleanup up > pending uploads to s3a ...{code} > (from > [AbstractS3ACommitter.java#L952|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L952]) > However, in the next step, the {*}"__magic" directory is recursively > deleted{*}: > {code:java} > INFO [main] o.a.h.fs.s3a.commit.magic.MagicS3GuardCommitter (98): Deleting > magic directory s3a://my-bucket/my-table/__magic: duration 0:00.560s {code} > (from [AbstractS3ACommitter.java#L1112 > |https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1112]and > > [MagicS3GuardCommitter.java#L137)|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L137)] > This deletion operation *affects the second job* that is still running > because it loses pending uploads (i.e., ".pendingset" and ".pending" files). > The consequences can range from an exception in the best case to a silent > loss of data in the worst case. The latter occurs when Job_1 deletes files > just before Job_2 executes "listPendingUploadsToCommit" to list ".pendingset" > files in the job attempt directory previous to complete the uploads with POST > requests. > To resolve this issue, it's important {*}to ensure that only the prefix > associated with the job currently finalizing is cleaned{*}. > Here's a possible solution: > {code:java} > /** > * Delete the magic directory. > */ > public void cleanupStagingDirs() { > final Path out = getOutputPath(); > //Path path = magicSubdir(getOutputPath()); > Path path = new Path(magicSubdir(out), formatJobDir(getUUID())); > try(DurationInfo ignored = new DurationInfo(LOG, true, > "Deleting magic directory %s", path)) { > Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", > path.toString(), > () -> deleteWithWarning(getDestFS(), path, true)); > } > } {code} > > The side effect of this issue is that the "__magic" directory is
[jira] [Commented] (HADOOP-18842) Support Overwrite Directory On Commit For S3A Committers
[ https://issues.apache.org/jira/browse/HADOOP-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759497#comment-17759497 ] Syed Shameerur Rahman commented on HADOOP-18842: > The decision to use disk is made by a config option, and would only need > enabling if scale problems were encountered. Use of the same marshalled > format in both forms of storage ensures consistent code coverage, gives us > efficient storage. Yes this makes sense! > Support Overwrite Directory On Commit For S3A Committers > > > Key: HADOOP-18842 > URL: https://issues.apache.org/jira/browse/HADOOP-18842 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new kind of commit mechanism in which the destination > directory is cleared off before committing the file. > *Use Case* > In case of dynamicPartition insert overwrite queries, The destination > directory which needs to be overwritten are not known before the execution > and hence it becomes a challenge to clear off the destination directory. > > One approach to handle this is, The underlying engines/client will clear off > all the destination directories before calling the commitJob operation but > the issue with this approach is that, In case of failures while committing > the files, We might end up with the whole of previous data being deleted > making the recovery process difficult or time consuming. > > *Solution* > Based on mode of commit operation either *INSERT* or *OVERWRITE* , During > commitJob operations, The committer will map each destination directory with > the commits which needs to be added in the directory and if the mode is > *OVERWRITE* , The committer will delete the directory recursively and then > commit each of the files in the directory. So in case of failures (worst > case) The number of destination directory which will be deleted will be equal > to the number of threads if we do it in multi-threaded way as compared to the > whole data if it was done in the engine side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18842) Support Overwrite Directory On Commit For S3A Committers
[ https://issues.apache.org/jira/browse/HADOOP-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752759#comment-17752759 ] Syed Shameerur Rahman commented on HADOOP-18842: [~ste...@apache.org] Thanks a lot for the pointers. The following are some of my observations wrt to your comments # Yes, This is similar to staging committer's partitioned overwrite. But what i could see is that, staging committers during precommit in commitJob operation clears all the directories/partitions if the conflict resolution is "{color:#871094}REPLACE{color}" . The issue with this approach is that, In worst case scenario when the job fails after precommit, The whole data will be lost which might not be desirable # I agree that storing all the SinglePendingCommit in memory puts an extra memory pressure on the driver. For instance in my setup to store ~1400 pending set files into memory took extra 7MB (this number will be different based on your S3 bucket or destination name length). So i guess it is not that much. # For a high write intensive jobs which commits tens of thousands of files, The memory pressure will be more but for such cases, It is recommended to have a larger driver memory size anyway. # Streaming the SinglePendingCommit to local fileSystem is a great idea but it causes extra delay for serialization/deserialization and extra overhead to read and write files which may be not desirable in all the cases. *Proposal* # Streaming the SinglePendingCommit to local fileSystem only if there are large number of pending files. We can do some approximation like 1 pending set will be of 'x' bytes and the user is willing to take in such 'y' such 'x' bytes into memory # On the other case let's store it in memory. For (1) i.e streaming the commits to local filesystem # Read pending set in multi-threaded way 2. for each pending set extract the single commit and corresponding destination directory and store the destination in memory 3. stream the single commit to file with a unique path for each destination directory 4. for each destination directory : 4.1 delete the destination directory 4.2 read the commits from the unique path and call commit **Unique path for each destination directory will help us to limit the number of directories/partitions which will be lost in case of failures. [~ste...@apache.org] Any thoughts on this? Thanks > Support Overwrite Directory On Commit For S3A Committers > > > Key: HADOOP-18842 > URL: https://issues.apache.org/jira/browse/HADOOP-18842 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new kind of commit mechanism in which the destination > directory is cleared off before committing the file. > *Use Case* > In case of dynamicPartition insert overwrite queries, The destination > directory which needs to be overwritten are not known before the execution > and hence it becomes a challenge to clear off the destination directory. > > One approach to handle this is, The underlying engines/client will clear off > all the destination directories before calling the commitJob operation but > the issue with this approach is that, In case of failures while committing > the files, We might end up with the whole of previous data being deleted > making the recovery process difficult or time consuming. > > *Solution* > Based on mode of commit operation either *INSERT* or *OVERWRITE* , During > commitJob operations, The committer will map each destination directory with > the commits which needs to be added in the directory and if the mode is > *OVERWRITE* , The committer will delete the directory recursively and then > commit each of the files in the directory. So in case of failures (worst > case) The number of destination directory which will be deleted will be equal > to the number of threads if we do it in multi-threaded way as compared to the > whole data if it was done in the engine side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18842) Support Overwrite Directory On Commit For S3A Committers
[ https://issues.apache.org/jira/browse/HADOOP-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751656#comment-17751656 ] Syed Shameerur Rahman commented on HADOOP-18842: [~ste...@apache.org] It would be great if you review the above PR or the proposed changes. Note: It is a WIP PR (need to add unit tests and integration tests). I would like to get your thoughts on this before taking it forward. Thanks > Support Overwrite Directory On Commit For S3A Committers > > > Key: HADOOP-18842 > URL: https://issues.apache.org/jira/browse/HADOOP-18842 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new kind of commit mechanism in which the destination > directory is cleared off before committing the file. > *Use Case* > In case of dynamicPartition insert overwrite queries, The destination > directory which needs to be overwritten are not known before the execution > and hence it becomes a challenge to clear off the destination directory. > > One approach to handle this is, The underlying engines/client will clear off > all the destination directories before calling the commitJob operation but > the issue with this approach is that, In case of failures while committing > the files, We might end up with the whole of previous data being deleted > making the recovery process difficult or time consuming. > > *Solution* > Based on mode of commit operation either *INSERT* or *OVERWRITE* , During > commitJob operations, The committer will map each destination directory with > the commits which needs to be added in the directory and if the mode is > *OVERWRITE* , The committer will delete the directory recursively and then > commit each of the files in the directory. So in case of failures (worst > case) The number of destination directory which will be deleted will be equal > to the number of threads if we do it in multi-threaded way as compared to the > whole data if it was done in the engine side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18842) Support Overwrite Directory On Commit For S3A Committers
Syed Shameerur Rahman created HADOOP-18842: -- Summary: Support Overwrite Directory On Commit For S3A Committers Key: HADOOP-18842 URL: https://issues.apache.org/jira/browse/HADOOP-18842 Project: Hadoop Common Issue Type: New Feature Reporter: Syed Shameerur Rahman The goal is to add a new kind of commit mechanism in which the destination directory is cleared off before committing the file. *Use Case* In case of dynamicPartition insert overwrite queries, The destination directory which needs to be overwritten are not known before the execution and hence it becomes a challenge to clear off the destination directory. One approach to handle this is, The underlying engines/client will clear off all the destination directories before calling the commitJob operation but the issue with this approach is that, In case of failures while committing the files, We might end up with the whole of previous data being deleted making the recovery process difficult or time consuming. *Solution* Based on mode of commit operation either *INSERT* or *OVERWRITE* , During commitJob operations, The committer will map each destination directory with the commits which needs to be added in the directory and if the mode is *OVERWRITE* , The committer will delete the directory recursively and then commit each of the files in the directory. So in case of failures (worst case) The number of destination directory which will be deleted will be equal to the number of threads if we do it in multi-threaded way as compared to the whole data if it was done in the engine side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743161#comment-17743161 ] Syed Shameerur Rahman commented on HADOOP-18776: [~ste...@apache.org] - If i understood your comment, You are proposing something like even if this committer(which does complete mpu in commitTask) is enabled when task attempt retry is 1 then we are okay, If not there should be some mechanism to fail the job when we use this committer and task attempt retry > 1 and the task which failed had called commitTask operation Am i correct? > Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints > -- > > Key: HADOOP-18776 > URL: https://issues.apache.org/jira/browse/HADOOP-18776 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* > which is an another type of S3 Magic committer but with a better performance > by taking in few tradeoffs. > The following are the differences in MagicCommitter vs OptimizedMagicCommitter > > ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| > |commitTask|1. Lists all {{.pending}} files in its attempt directory. > > 2. The contents are loaded into a list of single pending uploads. > > 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all > {{.pending}} files in its attempt directory > > 2. The contents are loaded into a list of single pending uploads. > > 3. For each pending upload, commit operation is called (complete > multiPartUpload)| > |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory > > 2. Then every pending commit in the job will be committed. > > 3. "SUCCESS" marker is created (if config is enabled) > > 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if > config is enabled) > > 2. "__magic" directory is cleaned up.| > > *Performance Benefits :-* > # The primary performance boost due to distributed complete multiPartUpload > call being made in the taskAttempts(Task containers/Executors) rather than a > single job driver. In case of MagicCommitter it is O(files/threads). > # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" > files and READ call to read them in the Job Driver. > > *TradeOffs :-* > The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users > migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no > see behavioral change as such > # During execution, intermediate data becomes visible after commitTask > operation > # On a failure, all output must be deleted and the job needs to be restarted. > > *Performance Benchmark :-* > Cluster : c4.8x large (ec2-instance) > Instance : 1 (primary) + 5 (core) > Data Size : 3TB Partitioned(TPC-DS store_sales data) > Engine : Apache Spark 3.3.1 / Hadoop 3.3.3 > Query: The following query inserts around 3000+ files into the table > directory (ran for 3 iterations) > {code:java} > insert into select ss_quantity from store_sales; {code} > ||Committer||Iteration 1||Iteration 2||Iteration 3|| > |Magic|126|127|122| > |OptimizedMagic|50|51|58| > So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to > MagicCommitter. > > _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for > all the cases where in user requires the guarantees of file not being visible > in failure scenarios. Given the performance benefit, user can may choose to > use this if they don't require any guarantees or have some mechanism to clean > up the data before retrying.*_ > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman resolved HADOOP-18776. Target Version/s: (was: 3.4.0) Resolution: Won't Fix Thanks steve for your pointers. Sure, I will that will help. I am closing this Jira as "won't fix". > Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints > -- > > Key: HADOOP-18776 > URL: https://issues.apache.org/jira/browse/HADOOP-18776 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* > which is an another type of S3 Magic committer but with a better performance > by taking in few tradeoffs. > The following are the differences in MagicCommitter vs OptimizedMagicCommitter > > ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| > |commitTask|1. Lists all {{.pending}} files in its attempt directory. > > 2. The contents are loaded into a list of single pending uploads. > > 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all > {{.pending}} files in its attempt directory > > 2. The contents are loaded into a list of single pending uploads. > > 3. For each pending upload, commit operation is called (complete > multiPartUpload)| > |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory > > 2. Then every pending commit in the job will be committed. > > 3. "SUCCESS" marker is created (if config is enabled) > > 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if > config is enabled) > > 2. "__magic" directory is cleaned up.| > > *Performance Benefits :-* > # The primary performance boost due to distributed complete multiPartUpload > call being made in the taskAttempts(Task containers/Executors) rather than a > single job driver. In case of MagicCommitter it is O(files/threads). > # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" > files and READ call to read them in the Job Driver. > > *TradeOffs :-* > The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users > migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no > see behavioral change as such > # During execution, intermediate data becomes visible after commitTask > operation > # On a failure, all output must be deleted and the job needs to be restarted. > > *Performance Benchmark :-* > Cluster : c4.8x large (ec2-instance) > Instance : 1 (primary) + 5 (core) > Data Size : 3TB Partitioned(TPC-DS store_sales data) > Engine : Apache Spark 3.3.1 / Hadoop 3.3.3 > Query: The following query inserts around 3000+ files into the table > directory (ran for 3 iterations) > {code:java} > insert into select ss_quantity from store_sales; {code} > ||Committer||Iteration 1||Iteration 2||Iteration 3|| > |Magic|126|127|122| > |OptimizedMagic|50|51|58| > So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to > MagicCommitter. > > _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for > all the cases where in user requires the guarantees of file not being visible > in failure scenarios. Given the performance benefit, user can may choose to > use this if they don't require any guarantees or have some mechanism to clean > up the data before retrying.*_ > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-18776: --- Description: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask|1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB Partitioned(TPC-DS store_sales data) Engine : Apache Spark 3.3.1 / Hadoop 3.3.3 Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations) {code:java} insert into select ss_quantity from store_sales; {code} ||Committer||Iteration 1||Iteration 2||Iteration 3|| |Magic|126|127|122| |OptimizedMagic|50|51|58| So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter. _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_ was: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask|1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB Partitioned(TPC-DS store_sales data) Engine : Apache S
[jira] [Comment Edited] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735980#comment-17735980 ] Syed Shameerur Rahman edited comment on HADOOP-18776 at 6/22/23 5:57 AM: - [~ste...@apache.org] - Thanks a lot for taking a took at this. I fully understand your concerns. I am also aware of the same. > "it lacks the ability to recover from task failure" Yes this is true. When a task fails or the task JVM crashes in commitTask operation. Some files gets committed(visible) in the final path and some may not. If task re-attempts are enabled, A new task will come up and will write the files leading to duplicate(some) data in the final path. This issue can be solved by using this type of committer only for the use case where there is no task attempts and if any of the taskAttempts fails the job will also fail. This can still have files written by the failed taskAttempts in the final path but then since the job had failed, The user can clear off the data manually and re-run the same job. I guess the same issue is still possible with MagicS3ACommitter as well, Since commitJob is not atomic and if the Job Driver JVM crashes in commitJob operation it can also lead to some files being visible in the final path. > Finally, I'd love to know size of jobs where you hit problems, use etc. If > there's anything you can say publicly, that'd be great My use case was, I had to write large number of files in a single query and since commitJob is single process(multi-threaded as opposed to distributed in the proposed use-case) which needs to call complete MPU for all these files it can become a bottleneck and hence explored other options ({*}~2.3x{*} faster as compared to MagicCommitter.) So my understanding is that when there is max 1 taskAttempt, This committer tend to behave similar (with same grantees) as MagicCommitter and hence can be used on specific use-cases. was (Author: srahman): [~ste...@apache.org] - Thanks a lot for taking a took at this. I fully understand your concerns. I am also aware of the same. > "it lacks the ability to recover from task failure" Yes this is true. When a task fails or the task JVM crashes in commitTask operation. Some files gets committed(visible) in the final path and some may not. If task re-attempts are enabled, A new task will come up and will write the files leading to duplicate(some) data in the final path. This issue can be solved by using this type of committer only for the use case where there is no task attempts and if any of the taskAttempts fails the job will also fail. This can still have files written by the failed taskAttempts in the final path but then since the job had failed, The user can clear off the data manually and re-run the same job. I guess the same issue is still possible with MagicS3ACommitter as well, Since commitJob is not atomic and if the Job Driver JVM crashes in commitJob operation it can also lead to some files being visible in the final path. > Finally, I'd love to know size of jobs where you hit problems, use etc. If > there's anything you can say publicly, that'd be great My use case was, I had to write large number of files in a single query and since commitJob is single process which needs to call complete MPU for all these files it can become a bottleneck and hence explored other options ({*}~2.3x{*} faster as compared to MagicCommitter.) So my understanding is that when there is max 1 taskAttempt, This committer tend to behave similar (with same grantees) as MagicCommitter and hence can be used on specific use-cases. > Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints > -- > > Key: HADOOP-18776 > URL: https://issues.apache.org/jira/browse/HADOOP-18776 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* > which is an another type of S3 Magic committer but with a better performance > by taking in few tradeoffs. > The following are the differences in MagicCommitter vs OptimizedMagicCommitter > > ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| > |commitTask|1. Lists all {{.pending}} files in its attempt directory. > > 2. The contents are loaded into a list of single pending uploads. > > 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all > {{.pending}} files in its attempt directory > > 2. The contents are loaded into a list of single pending uploads. > > 3. For each pending upload, commit operation is called (complete > mult
[jira] [Commented] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735980#comment-17735980 ] Syed Shameerur Rahman commented on HADOOP-18776: [~ste...@apache.org] - Thanks a lot for taking a took at this. I fully understand your concerns. I am also aware of the same. > "it lacks the ability to recover from task failure" Yes this is true. When a task fails or the task JVM crashes in commitTask operation. Some files gets committed(visible) in the final path and some may not. If task re-attempts are enabled, A new task will come up and will write the files leading to duplicate(some) data in the final path. This issue can be solved by using this type of committer only for the use case where there is no task attempts and if any of the taskAttempts fails the job will also fail. This can still have files written by the failed taskAttempts in the final path but then since the job had failed, The user can clear off the data manually and re-run the same job. I guess the same issue is still possible with MagicS3ACommitter as well, Since commitJob is not atomic and if the Job Driver JVM crashes in commitJob operation it can also lead to some files being visible in the final path. > Finally, I'd love to know size of jobs where you hit problems, use etc. If > there's anything you can say publicly, that'd be great My use case was, I had to write large number of files in a single query and since commitJob is single process which needs to call complete MPU for all these files it can become a bottleneck and hence explored other options ({*}~2.3x{*} faster as compared to MagicCommitter.) So my understanding is that when there is max 1 taskAttempt, This committer tend to behave similar (with same grantees) as MagicCommitter and hence can be used on specific use-cases. > Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints > -- > > Key: HADOOP-18776 > URL: https://issues.apache.org/jira/browse/HADOOP-18776 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* > which is an another type of S3 Magic committer but with a better performance > by taking in few tradeoffs. > The following are the differences in MagicCommitter vs OptimizedMagicCommitter > > ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| > |commitTask|1. Lists all {{.pending}} files in its attempt directory. > > 2. The contents are loaded into a list of single pending uploads. > > 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all > {{.pending}} files in its attempt directory > > 2. The contents are loaded into a list of single pending uploads. > > 3. For each pending upload, commit operation is called (complete > multiPartUpload)| > |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory > > 2. Then every pending commit in the job will be committed. > > 3. "SUCCESS" marker is created (if config is enabled) > > 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if > config is enabled) > > 2. "__magic" directory is cleaned up.| > > *Performance Benefits :-* > # The primary performance boost due to distributed complete multiPartUpload > call being made in the taskAttempts(Task containers/Executors) rather than a > single job driver. In case of MagicCommitter it is O(files/threads). > # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" > files and READ call to read them in the Job Driver. > > *TradeOffs :-* > The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users > migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no > see behavioral change as such > # During execution, intermediate data becomes visible after commitTask > operation > # On a failure, all output must be deleted and the job needs to be restarted. > > *Performance Benchmark :-* > Cluster : c4.8x large (ec2-instance) > Instance : 1 (primary) + 5 (core) > Data Size : 3TB Partitioned(TPC-DS store_sales data) > Engine : Apache Spark 3.3.1 > Query: The following query inserts around 3000+ files into the table > directory (ran for 3 iterations) > {code:java} > insert into select ss_quantity from store_sales; {code} > ||Committer||Iteration 1||Iteration 2||Iteration 3|| > |Magic|126|127|122| > |OptimizedMagic|50|51|58| > So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to > MagicCommitter. > > _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for > all
[jira] [Updated] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-18776: --- Description: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask|1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB Partitioned(TPC-DS store_sales data) Engine : Apache Spark 3.3.1 Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations) {code:java} insert into select ss_quantity from store_sales; {code} ||Committer||Iteration 1||Iteration 2||Iteration 3|| |Magic|126|127|122| |OptimizedMagic|50|51|58| So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter. _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_ was: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask |1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB (TPC-DS store_sales data) Engine : Apache Spark 3.3.1 Query: The fo
[jira] [Commented] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17734071#comment-17734071 ] Syed Shameerur Rahman commented on HADOOP-18776: [~ste...@apache.org] , It would be great if you review the above PR. Note: It is a WIP PR (need to add unit tests and integration tests). I would like to get the communities though on this before taking this forward. Thanks > Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints > -- > > Key: HADOOP-18776 > URL: https://issues.apache.org/jira/browse/HADOOP-18776 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > > The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* > which is an another type of S3 Magic committer but with a better performance > by taking in few tradeoffs. > The following are the differences in MagicCommitter vs OptimizedMagicCommitter > > ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| > |commitTask |1. Lists all {{.pending}} files in its attempt directory. > > 2. The contents are loaded into a list of single pending uploads. > > 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all > {{.pending}} files in its attempt directory > > 2. The contents are loaded into a list of single pending uploads. > > 3. For each pending upload, commit operation is called (complete > multiPartUpload)| > |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory > > 2. Then every pending commit in the job will be committed. > > 3. "SUCCESS" marker is created (if config is enabled) > > 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if > config is enabled) > > 2. "__magic" directory is cleaned up.| > > *Performance Benefits :-* > # The primary performance boost due to distributed complete multiPartUpload > call being made in the taskAttempts(Task containers/Executors) rather than a > single job driver. In case of MagicCommitter it is O(files/threads). > # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" > files and READ call to read them in the Job Driver. > > *TradeOffs :-* > The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users > migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no > see behavioral change as such > # During execution, intermediate data becomes visible after commitTask > operation > # On a failure, all output must be deleted and the job needs to be restarted. > > *Performance Benchmark :-* > Cluster : c4.8x large (ec2-instance) > Instance : 1 (primary) + 5 (core) > Data Size : 3TB (TPC-DS store_sales data) > Engine : Apache Spark 3.3.1 > Query: The following query inserts around 3000+ files into the table > directory (ran for 3 iterations) > {code:java} > insert into select ss_quantity from store_sales; {code} > ||Committer||Iteration 1||Iteration 2||Iteration 3|| > |Magic|126|127|122| > |OptimizedMagic|50|51|58| > So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to > MagicCommitter. > > _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for > all the cases where in user requires the guarantees of file not being visible > in failure scenarios. Given the performance benefit, user can may choose to > use this if they don't require any guarantees or have some mechanism to clean > up the data before retrying.*_ > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
Syed Shameerur Rahman created HADOOP-18776: -- Summary: Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints Key: HADOOP-18776 URL: https://issues.apache.org/jira/browse/HADOOP-18776 Project: Hadoop Common Issue Type: New Feature Components: fs/s3 Reporter: Syed Shameerur Rahman The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask|1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB (TPC-DS store_sales data) Engine : Apache Spark 3.3.1 Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations) {code:java} insert into select ss_quantity from store_sales; {code} ||Committer||Iteration 1||Iteration 2||Iteration 3|| |Magic|126|127|122| |OptimizedMagic|50|51|58| So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter. _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18776) Add OptimizedS3AMagicCommitter For Zero Rename Commits to S3 Endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HADOOP-18776: --- Description: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask |1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB (TPC-DS store_sales data) Engine : Apache Spark 3.3.1 Query: The following query inserts around 3000+ files into the table directory (ran for 3 iterations) {code:java} insert into select ss_quantity from store_sales; {code} ||Committer||Iteration 1||Iteration 2||Iteration 3|| |Magic|126|127|122| |OptimizedMagic|50|51|58| So on an average, OptimizedMagicCommitter was *~2.3x* faster as compared to MagicCommitter. _*Note: Unlike MagicCommitter , OptimizedMagicCommitter is not suitable for all the cases where in user requires the guarantees of file not being visible in failure scenarios. Given the performance benefit, user can may choose to use this if they don't require any guarantees or have some mechanism to clean up the data before retrying.*_ was: The goal is to add a new S3A committer named *OptimizedS3AMagicCommitter* which is an another type of S3 Magic committer but with a better performance by taking in few tradeoffs. The following are the differences in MagicCommitter vs OptimizedMagicCommitter ||Operation||Magic Committer||*OptimizedS3AMagicCommitter*|| |commitTask|1. Lists all {{.pending}} files in its attempt directory. 2. The contents are loaded into a list of single pending uploads. 3. Saved to a {{.pendingset}} file in the job attempt directory.|1. Lists all {{.pending}} files in its attempt directory 2. The contents are loaded into a list of single pending uploads. 3. For each pending upload, commit operation is called (complete multiPartUpload)| |commitJob|1. Loads all {{.pendingset}} files in its job attempt directory 2. Then every pending commit in the job will be committed. 3. "SUCCESS" marker is created (if config is enabled) 4. "__magic" directory is cleaned up.|1. "SUCCESS" marker is created (if config is enabled) 2. "__magic" directory is cleaned up.| *Performance Benefits :-* # The primary performance boost due to distributed complete multiPartUpload call being made in the taskAttempts(Task containers/Executors) rather than a single job driver. In case of MagicCommitter it is O(files/threads). # It also saves a couple of S3 calls needed to PUT the "{{{}.pendingset{}}}" files and READ call to read them in the Job Driver. *TradeOffs :-* The tradeoffs are similar to the one in FileOutputCommitter V2 version. Users migrating from FileOutputCommitter V2 to OptimizedS3AMagicCommitter will no see behavioral change as such # During execution, intermediate data becomes visible after commitTask operation # On a failure, all output must be deleted and the job needs to be restarted. *Performance Benchmark :-* Cluster : c4.8x large (ec2-instance) Instance : 1 (primary) + 5 (core) Data Size : 3TB (TPC-DS store_sales data) Engine : Apache Spark 3.3.1 Query: The following query i
[jira] [Comment Edited] (HADOOP-16963) HADOOP-16582 changed mkdirs() behavior
[ https://issues.apache.org/jira/browse/HADOOP-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142033#comment-17142033 ] Syed Shameerur Rahman edited comment on HADOOP-16963 at 6/22/20, 1:23 PM: -- [~ste...@apache.org] Yes, hive uses *ProxyFileSystem* for running qtests. As you said we need to override FilterFS.mkdir(path) in hive to avoid qtests failing. Verified locally by overriding mkdir(path) in ProxyFileSystem. I will raise a corresponding jira in hive. Sample failures: {code:java} Caused by: java.lang.IllegalArgumentException: Wrong FS: pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1, expected: file:/// {code} cc: [~kgyrtkirk] was (Author: srahman): [~ste...@apache.org] Yes, hive uses *ProxyFileSystem* for running qtests. As you said we need to override FilterFS.mkdir(path) in hive to avoid qtests failing. Verified locally by overriding mkdir(path) in ProxyFileSystem. I will raise a corresponding jira in hive cc: [~kgyrtkirk] > HADOOP-16582 changed mkdirs() behavior > -- > > Key: HADOOP-16963 > URL: https://issues.apache.org/jira/browse/HADOOP-16963 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.10.0, 3.3.0, 2.8.6, 2.9.3, 3.1.3, 3.2.2 >Reporter: Wei-Chiu Chuang >Priority: Critical > > HADOOP-16582 changed behavior of {{mkdirs()}} > Some Hive tests depend on the old behavior and they fail miserably. > {quote} > earlier: > all plain mkdirs(somePath) were fast-tracked to FileSystem.mkdirs which have > rerouted them to mkdirs(somePath, somePerm) method with some defaults (which > were static) > an implementation of FileSystem have only needed implement "mkdirs(somePath, > somePerm)" - because the other was not neccessarily called if it was always > in a FilterFileSystem or something like that > now: > especially FilterFileSystem forwards the call of mkdirs(p) to the actual fs > implementation...which may skip overriden mkdirs(somPath,somePerm) methods > ...and could cause issues for existing FileSystem implementations > {quote} > File this jira to address this problem. > [~kgyrtkirk] [~ste...@apache.org] [~kihwal] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16963) HADOOP-16582 changed mkdirs() behavior
[ https://issues.apache.org/jira/browse/HADOOP-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142033#comment-17142033 ] Syed Shameerur Rahman edited comment on HADOOP-16963 at 6/22/20, 1:22 PM: -- [~ste...@apache.org] Yes, hive uses *ProxyFileSystem* for running qtests. As you said we need to override FilterFS.mkdir(path) in hive to avoid qtests failing. Verified locally by overriding mkdir(path) in ProxyFileSystem. I will raise a corresponding jira in hive cc: [~kgyrtkirk] was (Author: srahman): [~ste...@apache.org] Yes, hive uses *ProxyFileSystem* for running qtests. As you said we need to override FilterFS.mkdir(path) in hive to avoid qtests failing. Verified locally by overriding mkdir(path) in ProxyFileSystem cc: [~kgyrtkirk] > HADOOP-16582 changed mkdirs() behavior > -- > > Key: HADOOP-16963 > URL: https://issues.apache.org/jira/browse/HADOOP-16963 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.10.0, 3.3.0, 2.8.6, 2.9.3, 3.1.3, 3.2.2 >Reporter: Wei-Chiu Chuang >Priority: Critical > > HADOOP-16582 changed behavior of {{mkdirs()}} > Some Hive tests depend on the old behavior and they fail miserably. > {quote} > earlier: > all plain mkdirs(somePath) were fast-tracked to FileSystem.mkdirs which have > rerouted them to mkdirs(somePath, somePerm) method with some defaults (which > were static) > an implementation of FileSystem have only needed implement "mkdirs(somePath, > somePerm)" - because the other was not neccessarily called if it was always > in a FilterFileSystem or something like that > now: > especially FilterFileSystem forwards the call of mkdirs(p) to the actual fs > implementation...which may skip overriden mkdirs(somPath,somePerm) methods > ...and could cause issues for existing FileSystem implementations > {quote} > File this jira to address this problem. > [~kgyrtkirk] [~ste...@apache.org] [~kihwal] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16963) HADOOP-16582 changed mkdirs() behavior
[ https://issues.apache.org/jira/browse/HADOOP-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142033#comment-17142033 ] Syed Shameerur Rahman commented on HADOOP-16963: [~ste...@apache.org] Yes, hive uses *ProxyFileSystem* for running qtests. As you said we need to override FilterFS.mkdir(path) in hive to avoid qtests failing. Verified locally by overriding mkdir(path) in ProxyFileSystem cc: [~kgyrtkirk] > HADOOP-16582 changed mkdirs() behavior > -- > > Key: HADOOP-16963 > URL: https://issues.apache.org/jira/browse/HADOOP-16963 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.10.0, 3.3.0, 2.8.6, 2.9.3, 3.1.3, 3.2.2 >Reporter: Wei-Chiu Chuang >Priority: Critical > > HADOOP-16582 changed behavior of {{mkdirs()}} > Some Hive tests depend on the old behavior and they fail miserably. > {quote} > earlier: > all plain mkdirs(somePath) were fast-tracked to FileSystem.mkdirs which have > rerouted them to mkdirs(somePath, somePerm) method with some defaults (which > were static) > an implementation of FileSystem have only needed implement "mkdirs(somePath, > somePerm)" - because the other was not neccessarily called if it was always > in a FilterFileSystem or something like that > now: > especially FilterFileSystem forwards the call of mkdirs(p) to the actual fs > implementation...which may skip overriden mkdirs(somPath,somePerm) methods > ...and could cause issues for existing FileSystem implementations > {quote} > File this jira to address this problem. > [~kgyrtkirk] [~ste...@apache.org] [~kihwal] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org