[ https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376355#comment-15376355 ]
Mithun Radhakrishnan commented on TEZ-3336: ------------------------------------------- Ok, here's what's happening: {{HiveSplitGenerator}} is only in play if Hive uses the {{HiveInputFormat}} when generating splits on the AM. It's not built to handle {{CombineHiveInputFormat}} at all. I suppose regrouping grouped splits is silly. If the user chooses {{CombineHiveInputFormat}}, then Hive's [{{DagUtils.createVertex()}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L612-L618] does the following: {code:java|title=DagUtils.java#L612-L618|borderStyle=solid} // Not HiveInputFormat, or a custom VertexManager will take care of grouping splits if (vertexHasCustomInput) { dataSource = MultiMRInput.createConfigBuilder(conf, inputFormatClass).groupSplits(false).build(); } else { dataSource = MRInputLegacy.createConfigBuilder(conf, inputFormatClass).groupSplits(false).build(); } {code} So Hive delegates to Tez's {{MRInputLegacy.createConfigBuilder()}}, which eventually puts {{MRInput}} and {{MRInputAMSplitGenerator}} in play. I'm still curious about the nature of the events sent to {{MRInputAMSplitGenerator}}, and who's sending them. That'll help convince me that this is indeed a Hive bug. :] > Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE > ------------------------------------------------------------------- > > Key: TEZ-3336 > URL: https://issues.apache.org/jira/browse/TEZ-3336 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.7.1 > Reporter: Jason Lowe > > When Hive does a map-side join it can generate a DAG where a vertex has two > inputs, one from an upstream task and another using MRInputAMSplitGenerator. > If it takes a while for MRInputAMSplitGenerator to compute the splits and one > of the tasks for the other upstream vertex completes then the job can fail > with an error since MRInputAMSplitGenerator does not expect to receive any > events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)