[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090452#comment-14090452
 ] 

Rui Li commented on HIVE-7624:
------------------------------

Finally I found this is because we don't set output collector for RS in 
ExecReducer. While this is natural for MR where ExecReducer shouldn't contain 
RS, we have to do it for spark. The added code just looks for RS and sets 
collector for it, so there shouldn't be any regression.

> Reduce operator initialization failed when running multiple MR query on spark
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-7624
>                 URL: https://issues.apache.org/jira/browse/HIVE-7624
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
> HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
>         at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
>         at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
>         at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>         at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to