[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

Xuefu Zhang (JIRA) Thu, 07 Aug 2014 15:07:26 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089922#comment-14089922
 ]


Xuefu Zhang commented on HIVE-7624:
-----------------------------------

{quote}
However, no result is returned. I checked the log and found the second reduce 
work got nothing to process. Not sure what is missing here...
{quote}

I think the problem is caused by FileSinkOperator in reduce-side operator tree. 
That tree, in MR world, is probably always FileSinkOperator (which write on 
disk). If we have MRR, then the first R should not write to disk. Instead, it 
should have RedcueSinkOperator, which outputs the result to the SparkCollector. 
The result RDD is based on the SparkCollector, which can be picked up by the 
second R.

I think we need to modify the operator tree a bit for this to work correctly. 
Please follow Tez's way to do this.

> Reduce operator initialization failed when running multiple MR query on spark
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-7624
>                 URL: https://issues.apache.org/jira/browse/HIVE-7624
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
>         at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
>         at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
>         at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>         at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

Reply via email to