[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876373#comment-17876373 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~ste...@apache.org] Question about the Hadoop ITests. Due to how my AWS account is set up I'd like to use the ProfileCredentialsProvider for authentication for the ITests, but I see org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by ProfileCredentialsProvider : software.amazon.awssdk.core.exception.SdkClientException: Profile file contained no credentials for profile 'default': ProfileFile(sections=[]) despite the default profile existing in my ~/.aws/credentials file(and this persists even if I explicitly define the corresponding environment variables). Do the ITests do anything unusual wrt configuration that preclude using the ProfileCredentialsProvider? The failures are in tests that have nothing to do with my changes, so I'm confident this is just a Hadoop config thing. If anybody else has insights about this, I'm happy to take their suggestions. Running the ITests is the only thing in the way of me putting my Hadoop PR up at this point. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872416#comment-17872416 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~AnmolSun] Yes. I have the Hadoop implementation ready already, I've just been working on running the itests as per the hadoop-aws contribution guidelines. Once I manage to run those and maybe write a couple more tests I will put the Hadoop PR up. I haven't created a Hive Jira yet. I have an impl for both HCat and Hive but there are some more things that I need to iron out before those can be merged. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866491#comment-17866491 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~ste...@apache.org] Tez PR is almost merged, we just want a Hadoop-side opinion on one minor thing. Right now the Tez PR sets a property "job.committer.uuid", and my Hadoop impl checks it around here: [https://github.com/apache/hadoop/blob/51cb858cc8c23d873d4adfc21de5f2c1c22d346f/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1372] similar to how it checks for the SPARK_WRITE_UUID property. If Hadoop has any strong preferences about the property name, I can make a change, otherwise the Tez PR can be merged and I can put up my Hadoop PR next. If anybody else should be looped in for this discussion, let me know. Tez PR: https://github.com/apache/tez/pull/339 > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824851#comment-17824851 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~srahman] Yes, by setting fs.s3a.committer.uuid and having the magic committer pick that up, I was able to run my Hive test case without needing to modify Hadoop (3.3.3). However, I still intend to put up a Hadoop PR with my MRv1 wrapper of MagicS3GuardCommitter (it's implemented similarly to the MRv1 FileOutputCommitter where it just delegates the calls to the existing MRv2 version), and I will need to make one minor change to the existing MRv2 MagicS3GuardCommitter - I need to add a constructor that takes a JobContext since MRv1 types require it. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823282#comment-17823282 ] Venkatasubrahmanian Narayanan edited comment on HADOOP-19091 at 3/4/24 6:23 PM: [~srahman] While that's true, the problem is that the jobAttemptPath itself is different between task and AM. If you look at my previous message, you'll see the difference is in the "directory" name of the jobAttemptPath: (spaces inserted below because Jira doesn't like double underscore) Task: s3a://hive-east-1-bucket/emblembasic/ _ _ magic/ __ __ magic/job-job_17089738741890_0073/ vs AM: s3a://hive-east-1-bucket/emblembasic/ _ _ magic/ _ _ magic/job-job_1708973874189_0073/ The extra 0 in the JobID causes them to look at different "directories", and hence it doesn't find it. There isn't a stacktrace per se - the commitJob op just doesn't find any pending data to commit, so it just goes to the cleanup() code(where since my test are with Hadoop 3.3.3, it just looks under __magic and finds everything to be deleted). was (Author: vnarayanan7): [~srahman] While that's true, the problem is that the jobAttemptPath itself is different between task and AM. If you look at my previous message, you'll see the difference is in the "directory" name of the jobAttemptPath: Task: s3a://hive-east-1-bucket/emblembasic/__magic/__magic/job-job_17089738741890_0073/ vs AM: s3a://hive-east-1-bucket/emblembasic/_{_}_{_}magic/__magic/job-job_1708973874189_0073/ The extra 0 in the JobID causes them to look at different "directories", and hence it doesn't find it. There isn't a stacktrace per se - the commitJob op just doesn't find any pending data to commit, so it just goes to the cleanup() code(where since my test are with Hadoop 3.3.3, it just looks under __magic and finds everything to be deleted). > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823282#comment-17823282 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~srahman] While that's true, the problem is that the jobAttemptPath itself is different between task and AM. If you look at my previous message, you'll see the difference is in the "directory" name of the jobAttemptPath: Task: s3a://hive-east-1-bucket/emblembasic/__{_}magic/{_}__magic/job-job_17089738741890_0073/ vs AM: s3a://hive-east-1-bucket/emblembasic/__{_}magic/__{_}{_}magic/job-job_1708973874189_0073/{_} The extra 0 in the JobID causes them to look at different "directories", and hence it doesn't find it. There isn't a stacktrace per se - the commitJob op just doesn't find any pending data to commit, so it just goes to the cleanup() code(where since my test are with Hadoop 3.3.3, it just looks under __magic and finds everything to be deleted). > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823282#comment-17823282 ] Venkatasubrahmanian Narayanan edited comment on HADOOP-19091 at 3/4/24 6:22 PM: [~srahman] While that's true, the problem is that the jobAttemptPath itself is different between task and AM. If you look at my previous message, you'll see the difference is in the "directory" name of the jobAttemptPath: Task: s3a://hive-east-1-bucket/emblembasic/__magic/__magic/job-job_17089738741890_0073/ vs AM: s3a://hive-east-1-bucket/emblembasic/_{_}_{_}magic/__magic/job-job_1708973874189_0073/ The extra 0 in the JobID causes them to look at different "directories", and hence it doesn't find it. There isn't a stacktrace per se - the commitJob op just doesn't find any pending data to commit, so it just goes to the cleanup() code(where since my test are with Hadoop 3.3.3, it just looks under __magic and finds everything to be deleted). was (Author: vnarayanan7): [~srahman] While that's true, the problem is that the jobAttemptPath itself is different between task and AM. If you look at my previous message, you'll see the difference is in the "directory" name of the jobAttemptPath: Task: s3a://hive-east-1-bucket/emblembasic/__{_}magic/{_}__magic/job-job_17089738741890_0073/ vs AM: s3a://hive-east-1-bucket/emblembasic/__{_}magic/__{_}{_}magic/job-job_1708973874189_0073/{_} The extra 0 in the JobID causes them to look at different "directories", and hence it doesn't find it. There isn't a stacktrace per se - the commitJob op just doesn't find any pending data to commit, so it just goes to the cleanup() code(where since my test are with Hadoop 3.3.3, it just looks under __magic and finds everything to be deleted). > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822709#comment-17822709 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~ste...@apache.org] Tez is where the different jobID is generated, but it doesn't seem to be the vertex index even though Tez does append the vertex index to the generated jobID in MROutput. I'll look into the history of that code to see if I'm missing something. Yes, the magic committer is picking up the jobID from the config. I'll keep the parallel job change in mind. The patches I uploaded in the JIRA are patches to Hive, not Hadoop(since they were just to replicate the behavior). I'll run the tests and add those details when I put up the Hadoop PR. Unless you were referring to a Hive PR and I'm misunderstanding? > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822274#comment-17822274 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~ste...@apache.org] Will keep all that in mind when I work on the Hadoop patches, thanks. Hive uses MRv1 and migrating it to MRv2 would be a lot of effort to switch it over to use PathOutputCommitter etc. internally, however, I will see if something similar to the factory design can be done with the committer class since that is configured explicitly for this design. [~srahman] From the AM logs: Job UUID job_1708973874189_0073 source JobID Starting: Task committer attempt_1708973874189_0073_r_00_1: commitJob(job_1708973874189_0073) 2024-02-29 18:05:05,766 [DEBUG] [App Shared Pool - #2] |impl.IOStatisticsStoreImpl|: Incrementing counter op_list_files by 1 with final value 1 2024-02-29 18:05:05,766 [DEBUG] [App Shared Pool - #2] |s3a.S3AFileSystem|: listFiles(s3a://hive-east-1-bucket/emblembasic/__magic/__magic/job-job_1708973874189_0073, false) 2024-02-29 18:05:05,767 [DEBUG] [App Shared Pool - #2] |s3a.S3AFileSystem|: Requesting all entries under emblembasic/__magic/__magic/job-job_1708973874189_0073/ with delimiter '/' >From the task logs: Job UUID job_17089738741890_0073 source JobID Saving work of attempt_17089738741890_0073_r_00_0 to s3a://hive-east-1-bucket/emblembasic/__magic/__magic/job-job_17089738741890_0073/task_17089738741890_0073_r_00.pendingset It's a very subtle difference(there's an extra 0 in the ID/path used by the task). The post-commitJob cleanup does delete the files since it deletes everything under the __magic directory instead of looking under the job dir, commitJob itself just fails to find the pending set when it lists the files so it doesn't commit the results.. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821417#comment-17821417 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: Actually, a follow-up thing I remembered: The MagicS3GuardCommitter also does a correctness check on the jobID of the task commit vs the job commit which I've had to patch as well. Even if we do get Tez to create fs.s3a.uuid for use, we'll need to patch the committer to use that for its correctness check. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821372#comment-17821372 ] Venkatasubrahmanian Narayanan edited comment on HADOOP-19091 at 2/27/24 7:38 PM: - [~srahman] I've uploaded my WIP Hive patch (there are a couple of other open sourced patches which need to be backported to Hive 3.1 that I've uploaded as well). I still need to clean up a couple of things (hence why the patch hardcodes an expectation that tables are on S3), but the basic idea is to add an MRv1 wrapper of the MagicS3GuardCommitter similar to how the FileOutputCommitter for MRv1 is implemented, and since Hive uses MRv1 it only requires incidental changes to treat paths the way the magic committer expects. I was able to reproduce the behavior with a simple Pig load from csv - store into table with HCatStorer script on EMR 6-12.0. In the task and AM logs you can see the behavior I described where the path the task container writes the pending set to is subtly different from the path the AM tries to read it from(in my tests it differed by a single 0 appended after the first part of the jtIdentifier). The path is derived from the UUID, which in the default case is derived from the jobId. When I patch hadoop-aws to manually drop that extra digit from the jtIdentifier string the data is successfully committed(proving it's not any other factor at play), but obviously that approach would not work in a real solution. was (Author: vnarayanan7): [~srahman] I've uploaded my WIP Hive patch (there are a couple of other open sourced patches which need to be backported to Hive 3.1 that I've uploaded as well). I still need to clean up a couple of things (hence why the patch hardcodes an expectation that tables are on S3), but the basic idea is to add an MRv1 wrapper of the MagicS3GuardCommitter similar to how the FileOutputCommitter for MRv1 is implemented, and since Hive uses MRv1 it only requires incidental changes to treat paths the way the magic committer expects. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821372#comment-17821372 ] Venkatasubrahmanian Narayanan edited comment on HADOOP-19091 at 2/27/24 7:34 PM: - [~srahman] I've uploaded my WIP Hive patch (there are a couple of other open sourced patches which need to be backported to Hive 3.1 that I've uploaded as well). I still need to clean up a couple of things (hence why the patch hardcodes an expectation that tables are on S3), but the basic idea is to add an MRv1 wrapper of the MagicS3GuardCommitter similar to how the FileOutputCommitter for MRv1 is implemented, and since Hive uses MRv1 it only requires incidental changes to treat paths the way the magic committer expects. was (Author: vnarayanan7): [~srahman] I've uploaded my WIP Hive patch (there are a couple of other open sourced patches which need to be backported to Hive 3.1 that I've uploaded as well). I still need to clean up a couple of things (hence why the patch hardcodes an expectation that tables are on. S3), but the basic idea is to add an MRv1 wrapper of the MagicS3GuardCommitter similar to how the FileOutputCommitter for MRv1 is implemented, and since Hive uses MRv1 it only requires incidental changes to treat paths the way the magic committer expects. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821372#comment-17821372 ] Venkatasubrahmanian Narayanan commented on HADOOP-19091: [~srahman] I've uploaded my WIP Hive patch (there are a couple of other open sourced patches which need to be backported to Hive 3.1 that I've uploaded as well). I still need to clean up a couple of things (hence why the patch hardcodes an expectation that tables are on. S3), but the basic idea is to add an MRv1 wrapper of the MagicS3GuardCommitter similar to how the FileOutputCommitter for MRv1 is implemented, and since Hive uses MRv1 it only requires incidental changes to treat paths the way the magic committer expects. > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatasubrahmanian Narayanan updated HADOOP-19091: --- Attachment: 0001-AWS-Hive-Changes.patch 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkatasubrahmanian Narayanan updated HADOOP-19091: --- Attachment: HADOOP-19091-HIVE-WIP.patch > Add support for Tez to MagicS3GuardCommitter > > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 >Reporter: Venkatasubrahmanian Narayanan >Priority: Major > Attachments: HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter
Venkatasubrahmanian Narayanan created HADOOP-19091: -- Summary: Add support for Tez to MagicS3GuardCommitter Key: HADOOP-19091 URL: https://issues.apache.org/jira/browse/HADOOP-19091 Project: Hadoop Common Issue Type: Bug Components: tools Affects Versions: 3.3.3 Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 Reporter: Venkatasubrahmanian Narayanan The MagicS3GuardCommitter assumes that the JobID of the task is the same as that of the job's application master when writing/reading the .pendingset file. This assumption is not valid when running with Tez, which creates slightly different JobIDs for tasks and the application master. While the MagicS3GuardCommitter is intended only for MRv2, it mostly works fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run in MR mode. This issue only crops up when running queries with the Tez execution engine. I can upload a patch to Hive 3.1 to reproduce this error on EMR if needed. Fixing this will probably require work from both Tez and Hadoop, wanted to start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org