[ 
https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376355#comment-15376355
 ] 

Mithun Radhakrishnan commented on TEZ-3336:
-------------------------------------------

Ok, here's what's happening:

{{HiveSplitGenerator}} is only in play if Hive uses the {{HiveInputFormat}} 
when generating splits on the AM. It's not built to handle 
{{CombineHiveInputFormat}} at all. I suppose regrouping grouped splits is 
silly. 
If the user chooses {{CombineHiveInputFormat}}, then Hive's 
[{{DagUtils.createVertex()}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L612-L618]
 does the following:

{code:java|title=DagUtils.java#L612-L618|borderStyle=solid}
        // Not HiveInputFormat, or a custom VertexManager will take care of 
grouping splits
        if (vertexHasCustomInput) {
          dataSource =
              MultiMRInput.createConfigBuilder(conf, 
inputFormatClass).groupSplits(false).build();
        } else {
          dataSource =
              MRInputLegacy.createConfigBuilder(conf, 
inputFormatClass).groupSplits(false).build();
        }
{code}

So Hive delegates to Tez's {{MRInputLegacy.createConfigBuilder()}}, which 
eventually puts {{MRInput}} and {{MRInputAMSplitGenerator}} in play. 
I'm still curious about the nature of the events sent to 
{{MRInputAMSplitGenerator}}, and who's sending them. That'll help convince me 
that this is indeed a Hive bug. :]

> Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
> -------------------------------------------------------------------
>
>                 Key: TEZ-3336
>                 URL: https://issues.apache.org/jira/browse/TEZ-3336
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jason Lowe
>
> When Hive does a map-side join it can generate a DAG where a vertex has two 
> inputs, one from an upstream task and another using MRInputAMSplitGenerator.  
> If it takes a while for MRInputAMSplitGenerator to compute the splits and one 
> of the tasks for the other upstream vertex completes then the job can fail 
> with an error since MRInputAMSplitGenerator does not expect to receive any 
> events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to