[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089001#comment-14089001 ]
Rui Li commented on HIVE-7624: ------------------------------ Thanks very much [~csun]. After some debugging, I found this issue is caused in GenMapRedUtils.setKeyAndValueDescForTaskTree, which is called after we compiled the task. In that method we always set the keyDesc of the leaf reduce work according to the root map work. I suppose this is both incorrect and redundant because when a reduce work is created, we already call GenSparkUtils.setupReduceSink to set the keyDesc. I removed these code and the exception is gone. However I met another problem: no result is returned for the multi-MR query. (I cloned the jobConf and set a new plan path for the cloned) > Reduce operator initialization failed when running multiple MR query on spark > ----------------------------------------------------------------------------- > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)