HiveContext cannot be serialized

Haopu Wang Mon, 16 Feb 2015 04:28:26 -0800

When I'm investigating this issue (in the end of this email), I take a
look at HiveContext's code and find this change
(https://github.com/apache/spark/commit/64945f868443fbc59cb34b34c16d782d
da0fb63d#diff-ff50aea397a607b79df9bec6f2a841db):


 

-  @transient protected[hive] lazy val hiveconf = new
HiveConf(classOf[SessionState])

-  @transient protected[hive] lazy val sessionState = {

-    val ss = new SessionState(hiveconf)

-    setConf(hiveconf.getAllProperties)  // Have SQLConf pick up the
initial set of HiveConf.

-    ss

-  }

+  @transient protected[hive] lazy val (hiveconf, sessionState) =

+    Option(SessionState.get())

+      .orElse {

 

With the new change, Scala compiler always generate a Tuple2 field of
HiveContext as below:

 

    private Tuple2 x$3;

    private transient OutputStream outputBuffer;

    private transient HiveConf hiveconf;

    private transient SessionState sessionState;

    private transient HiveMetastoreCatalog catalog;

 

That "x$3" field's key is HiveConf object that cannot be serialized. So
can you suggest how to resolve this issue? Thank you very much!

 

================================

 

I have a streaming application which registered temp table on a
HiveContext for each batch duration.

The application runs well in Spark 1.1.0. But I get below error from
1.1.1.

Do you have any suggestions to resolve it? Thank you!

 

java.io.NotSerializableException: org.apache.hadoop.hive.conf.HiveConf

    - field (class "scala.Tuple2", name: "_1", type: "class
java.lang.Object")

    - object (class "scala.Tuple2", (Configuration: core-default.xml,
core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml,
yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2158ce23,org.apa
che.hadoop.hive.ql.session.SessionState@49b6eef9))

    - field (class "org.apache.spark.sql.hive.HiveContext", name: "x$3",
type: "class scala.Tuple2")

    - object (class "org.apache.spark.sql.hive.HiveContext",
org.apache.spark.sql.hive.HiveContext@4e6e66a4)

    - field (class
"example.BaseQueryableDStream$$anonfun$registerTempTable$2", name:
"sqlContext$1", type: "class org.apache.spark.sql.SQLContext")

   - object (class
"example.BaseQueryableDStream$$anonfun$registerTempTable$2",
<function1>)

    - field (class
"org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1",
name: "foreachFunc$1", type: "interface scala.Function1")

    - object (class
"org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1",
<function2>)

    - field (class "org.apache.spark.streaming.dstream.ForEachDStream",
name: "org$apache$spark$streaming$dstream$ForEachDStream$$foreachFunc",
type: "interface scala.Function2")

    - object (class "org.apache.spark.streaming.dstream.ForEachDStream",
org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20)

    - element of array (index: 0)

    - array (class "[Ljava.lang.Object;", size: 16)

    - field (class "scala.collection.mutable.ArrayBuffer", name:
"array", type: "class [Ljava.lang.Object;")

    - object (class "scala.collection.mutable.ArrayBuffer",
ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20))

    - field (class "org.apache.spark.streaming.DStreamGraph", name:
"outputStreams", type: "class scala.collection.mutable.ArrayBuffer")

    - custom writeObject data (class
"org.apache.spark.streaming.DStreamGraph")

    - object (class "org.apache.spark.streaming.DStreamGraph",
org.apache.spark.streaming.DStreamGraph@776ae7da)

    - field (class "org.apache.spark.streaming.Checkpoint", name:
"graph", type: "class org.apache.spark.streaming.DStreamGraph")

    - root object (class "org.apache.spark.streaming.Checkpoint",
org.apache.spark.streaming.Checkpoint@5eade065)

    at java.io.ObjectOutputStream.writeObject0(Unknown Source)

HiveContext cannot be serialized

Reply via email to