[
https://issues.apache.org/jira/browse/HIVE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088162#comment-14088162
]
Chao commented on HIVE-7569:
----------------------------
(Not sure it's related)
Sometimes when I run a multi-insertion job in Spark, I got exception like
following.
If I ran the SAME query in MR mode AND THEN in Spark mode, the query will
succeed and produce correct result.
{{code}}
2014-08-06 12:58:53,168 INFO [Executor task launch worker-0]:
exec.GroupByOperator (Operator.java:initialize(389)) - Initialization Done 35
GBY
2014-08-06 12:58:53,169 ERROR [Executor task launch worker-0]: ExecReducer
(ExecReducer.java:reduce(272)) -
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to
deserialize reduce input key from
x1x1x1x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x0x0x255 with properties
{columns=_col0,
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
serialization.sort.order=+, columns.types=map<string,string>}
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:212)
at
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:60)
at
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
at
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:191)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:210)
... 15 more
Caused by: java.io.EOFException
at
org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
at
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:201)
at
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:491)
at
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
... 16 more
{{code}}
> Make sure multi-MR queries work
> -------------------------------
>
> Key: HIVE-7569
> URL: https://issues.apache.org/jira/browse/HIVE-7569
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Chao
>
> With the latest dev effort, queries that would involve multiple MR jobs
> should be supported by spark now, except for sorting, multi-insert, union,
> and join (map join and smb might just work). However, this hasn't be verified
> and tested. This task is to ensure this is the case. Please create JIRAs for
> problems found.
--
This message was sent by Atlassian JIRA
(v6.2#6252)