[ 
https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822709#comment-17822709
 ] 

Venkatasubrahmanian Narayanan commented on HADOOP-19091:
--------------------------------------------------------

[~ste...@apache.org] Tez is where the different jobID is generated, but it 
doesn't seem to be the vertex index even though Tez does append the vertex 
index to the generated jobID in MROutput. I'll look into the history of that 
code to see if I'm missing something.

Yes, the magic committer is picking up the jobID from the config.

I'll keep the parallel job change in mind.

The patches I uploaded in the JIRA are patches to Hive, not Hadoop(since they 
were just to replicate the behavior). I'll run the tests and add those details 
when I put up the Hadoop PR. Unless you were referring to a Hive PR and I'm 
misunderstanding?

> Add support for Tez to MagicS3GuardCommitter
> --------------------------------------------
>
>                 Key: HADOOP-19091
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19091
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.6
>         Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0
>            Reporter: Venkatasubrahmanian Narayanan
>            Assignee: Venkatasubrahmanian Narayanan
>            Priority: Major
>         Attachments: 0001-AWS-Hive-Changes.patch, 
> 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, 
> HADOOP-19091-HIVE-WIP.patch
>
>
> The MagicS3GuardCommitter assumes that the JobID of the task is the same as 
> that of the job's application master when writing/reading the .pendingset 
> file. This assumption is not valid when running with Tez, which creates 
> slightly different JobIDs for tasks and the application master.
>  
> While the MagicS3GuardCommitter is intended only for MRv2, it mostly works 
> fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run 
> in MR mode. This issue only crops up when running queries with the Tez 
> execution engine. I can upload a patch to Hive 3.1 to reproduce this error on 
> EMR if needed.
>  
> Fixing this will probably require work from both Tez and Hadoop, wanted to 
> start a discussion here so we can figure out how exactly we go about this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to