Xin Hao created HIVE-13277:
--
Summary: Exception "Unable to create serializer
'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' "
occurred during query execution on spark engine when vectorized execution is
switched on
Key: HIVE-13277
URL: https://issues.apache.org/jira/browse/HIVE-13277
Project: Hive
Issue Type: Bug
Environment: Hive on Spark engine
Hive Version: Apache Hive 2.0.0
Spark Version: Apache Spark 1.6.0
Reporter: Xin Hao
Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
Found during TPCx-BB query2 execution on spark engine when vectorized execution
is switched on:
(1) set hive.vectorized.execution.enabled=true;
(2) set hive.vectorized.execution.reduce.enabled=true; (default value for
Apache Hive 2.0.0)
It's OK for spark engine when hive.vectorized.execution.enabled is switched off:
(1) set hive.vectorized.execution.enabled=false;
(2) set hive.vectorized.execution.reduce.enabled=true;
For MR engine, the query could pass and no exception occurred when vectorized
execution is either switched on or switched off.
Detail Error Message is below:
2016-03-14T10:09:33,692 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO
spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 bytes
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN
scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3):
java.lang.RuntimeException: Failed to load plan:
hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
org.apache.hive.com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Unable to create serializer
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for
class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) - Serialization trace:
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) - childOperators
(org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) - childOperators
(org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) - childOperators
(org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) - reducer
(org.apache.hadoop.hive.ql.plan.ReduceWork)
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl
(SparkClientImpl.java:run(593)) -at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.sca