[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090452#comment-14090452 ]
Rui Li commented on HIVE-7624: ------------------------------ Finally I found this is because we don't set output collector for RS in ExecReducer. While this is natural for MR where ExecReducer shouldn't contain RS, we have to do it for spark. The added code just looks for RS and sets collector for it, so there shouldn't be any regression. > Reduce operator initialization failed when running multiple MR query on spark > ----------------------------------------------------------------------------- > > Key: HIVE-7624 > URL: https://issues.apache.org/jira/browse/HIVE-7624 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, > HIVE-7624.patch > > > The following error occurs when I try to run a query with multiple reduce > works (M->R->R): > {quote} > 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) > java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from > [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > … > {quote} > I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)