[ https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824851#comment-17824851 ]
Venkatasubrahmanian Narayanan commented on HADOOP-19091: -------------------------------------------------------- [~srahman] Yes, by setting fs.s3a.committer.uuid and having the magic committer pick that up, I was able to run my Hive test case without needing to modify Hadoop (3.3.3). However, I still intend to put up a Hadoop PR with my MRv1 wrapper of MagicS3GuardCommitter (it's implemented similarly to the MRv1 FileOutputCommitter where it just delegates the calls to the existing MRv2 version), and I will need to make one minor change to the existing MRv2 MagicS3GuardCommitter - I need to add a constructor that takes a JobContext since MRv1 types require it. > Add support for Tez to MagicS3GuardCommitter > -------------------------------------------- > > Key: HADOOP-19091 > URL: https://issues.apache.org/jira/browse/HADOOP-19091 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Affects Versions: 3.3.6 > Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0 > Reporter: Venkatasubrahmanian Narayanan > Assignee: Venkatasubrahmanian Narayanan > Priority: Major > Attachments: 0001-AWS-Hive-Changes.patch, > 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, > HADOOP-19091-HIVE-WIP.patch > > > The MagicS3GuardCommitter assumes that the JobID of the task is the same as > that of the job's application master when writing/reading the .pendingset > file. This assumption is not valid when running with Tez, which creates > slightly different JobIDs for tasks and the application master. > > While the MagicS3GuardCommitter is intended only for MRv2, it mostly works > fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run > in MR mode. This issue only crops up when running queries with the Tez > execution engine. I can upload a patch to Hive 3.1 to reproduce this error on > EMR if needed. > > Fixing this will probably require work from both Tez and Hadoop, wanted to > start a discussion here so we can figure out how exactly we go about this. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org