When I'm investigating this issue (in the end of this email), I take a look at HiveContext's code and find this change (https://github.com/apache/spark/commit/64945f868443fbc59cb34b34c16d782d da0fb63d#diff-ff50aea397a607b79df9bec6f2a841db):
- @transient protected[hive] lazy val hiveconf = new HiveConf(classOf[SessionState]) - @transient protected[hive] lazy val sessionState = { - val ss = new SessionState(hiveconf) - setConf(hiveconf.getAllProperties) // Have SQLConf pick up the initial set of HiveConf. - ss - } + @transient protected[hive] lazy val (hiveconf, sessionState) = + Option(SessionState.get()) + .orElse { With the new change, Scala compiler always generate a Tuple2 field of HiveContext as below: private Tuple2 x$3; private transient OutputStream outputBuffer; private transient HiveConf hiveconf; private transient SessionState sessionState; private transient HiveMetastoreCatalog catalog; That "x$3" field's key is HiveConf object that cannot be serialized. So can you suggest how to resolve this issue? Thank you very much! ================================ I have a streaming application which registered temp table on a HiveContext for each batch duration. The application runs well in Spark 1.1.0. But I get below error from 1.1.1. Do you have any suggestions to resolve it? Thank you! java.io.NotSerializableException: org.apache.hadoop.hive.conf.HiveConf - field (class "scala.Tuple2", name: "_1", type: "class java.lang.Object") - object (class "scala.Tuple2", (Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2158ce23,org.apa che.hadoop.hive.ql.session.SessionState@49b6eef9)) - field (class "org.apache.spark.sql.hive.HiveContext", name: "x$3", type: "class scala.Tuple2") - object (class "org.apache.spark.sql.hive.HiveContext", org.apache.spark.sql.hive.HiveContext@4e6e66a4) - field (class "example.BaseQueryableDStream$$anonfun$registerTempTable$2", name: "sqlContext$1", type: "class org.apache.spark.sql.SQLContext") - object (class "example.BaseQueryableDStream$$anonfun$registerTempTable$2", <function1>) - field (class "org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1", name: "foreachFunc$1", type: "interface scala.Function1") - object (class "org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1", <function2>) - field (class "org.apache.spark.streaming.dstream.ForEachDStream", name: "org$apache$spark$streaming$dstream$ForEachDStream$$foreachFunc", type: "interface scala.Function2") - object (class "org.apache.spark.streaming.dstream.ForEachDStream", org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20) - element of array (index: 0) - array (class "[Ljava.lang.Object;", size: 16) - field (class "scala.collection.mutable.ArrayBuffer", name: "array", type: "class [Ljava.lang.Object;") - object (class "scala.collection.mutable.ArrayBuffer", ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20)) - field (class "org.apache.spark.streaming.DStreamGraph", name: "outputStreams", type: "class scala.collection.mutable.ArrayBuffer") - custom writeObject data (class "org.apache.spark.streaming.DStreamGraph") - object (class "org.apache.spark.streaming.DStreamGraph", org.apache.spark.streaming.DStreamGraph@776ae7da) - field (class "org.apache.spark.streaming.Checkpoint", name: "graph", type: "class org.apache.spark.streaming.DStreamGraph") - root object (class "org.apache.spark.streaming.Checkpoint", org.apache.spark.streaming.Checkpoint@5eade065) at java.io.ObjectOutputStream.writeObject0(Unknown Source)