I submitted a patch https://github.com/apache/spark/pull/4628
On Mon, Feb 16, 2015 at 10:59 AM, Michael Armbrust <mich...@databricks.com> wrote: > I was suggesting you mark the variable that is holding the HiveContext > '@transient' since the scala compiler is not correctly propagating this > through the tuple extraction. This is only a workaround. We can also > remove the tuple extraction. > > On Mon, Feb 16, 2015 at 10:47 AM, Reynold Xin <r...@databricks.com> wrote: > >> Michael - it is already transient. This should probably considered a bug >> in the scala compiler, but we can easily work around it by removing the use >> of destructuring binding. >> >> On Mon, Feb 16, 2015 at 10:41 AM, Michael Armbrust < >> mich...@databricks.com> wrote: >> >>> I'd suggest marking the HiveContext as @transient since its not valid to >>> use it on the slaves anyway. >>> >>> On Mon, Feb 16, 2015 at 4:27 AM, Haopu Wang <hw...@qilinsoft.com> wrote: >>> >>> > When I'm investigating this issue (in the end of this email), I take a >>> > look at HiveContext's code and find this change >>> > ( >>> https://github.com/apache/spark/commit/64945f868443fbc59cb34b34c16d782d >>> > da0fb63d#diff-ff50aea397a607b79df9bec6f2a841db): >>> > >>> > >>> > >>> > - @transient protected[hive] lazy val hiveconf = new >>> > HiveConf(classOf[SessionState]) >>> > >>> > - @transient protected[hive] lazy val sessionState = { >>> > >>> > - val ss = new SessionState(hiveconf) >>> > >>> > - setConf(hiveconf.getAllProperties) // Have SQLConf pick up the >>> > initial set of HiveConf. >>> > >>> > - ss >>> > >>> > - } >>> > >>> > + @transient protected[hive] lazy val (hiveconf, sessionState) = >>> > >>> > + Option(SessionState.get()) >>> > >>> > + .orElse { >>> > >>> > >>> > >>> > With the new change, Scala compiler always generate a Tuple2 field of >>> > HiveContext as below: >>> > >>> > >>> > >>> > private Tuple2 x$3; >>> > >>> > private transient OutputStream outputBuffer; >>> > >>> > private transient HiveConf hiveconf; >>> > >>> > private transient SessionState sessionState; >>> > >>> > private transient HiveMetastoreCatalog catalog; >>> > >>> > >>> > >>> > That "x$3" field's key is HiveConf object that cannot be serialized. So >>> > can you suggest how to resolve this issue? Thank you very much! >>> > >>> > >>> > >>> > ================================ >>> > >>> > >>> > >>> > I have a streaming application which registered temp table on a >>> > HiveContext for each batch duration. >>> > >>> > The application runs well in Spark 1.1.0. But I get below error from >>> > 1.1.1. >>> > >>> > Do you have any suggestions to resolve it? Thank you! >>> > >>> > >>> > >>> > java.io.NotSerializableException: org.apache.hadoop.hive.conf.HiveConf >>> > >>> > - field (class "scala.Tuple2", name: "_1", type: "class >>> > java.lang.Object") >>> > >>> > - object (class "scala.Tuple2", (Configuration: core-default.xml, >>> > core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, >>> > yarn-site.xml, hdfs-default.xml, hdfs-site.xml, >>> > org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2158ce23 >>> ,org.apa >>> > che.hadoop.hive.ql.session.SessionState@49b6eef9)) >>> > >>> > - field (class "org.apache.spark.sql.hive.HiveContext", name: >>> "x$3", >>> > type: "class scala.Tuple2") >>> > >>> > - object (class "org.apache.spark.sql.hive.HiveContext", >>> > org.apache.spark.sql.hive.HiveContext@4e6e66a4) >>> > >>> > - field (class >>> > "example.BaseQueryableDStream$$anonfun$registerTempTable$2", name: >>> > "sqlContext$1", type: "class org.apache.spark.sql.SQLContext") >>> > >>> > - object (class >>> > "example.BaseQueryableDStream$$anonfun$registerTempTable$2", >>> > <function1>) >>> > >>> > - field (class >>> > "org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1", >>> > name: "foreachFunc$1", type: "interface scala.Function1") >>> > >>> > - object (class >>> > "org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1", >>> > <function2>) >>> > >>> > - field (class "org.apache.spark.streaming.dstream.ForEachDStream", >>> > name: "org$apache$spark$streaming$dstream$ForEachDStream$$foreachFunc", >>> > type: "interface scala.Function2") >>> > >>> > - object (class >>> "org.apache.spark.streaming.dstream.ForEachDStream", >>> > org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20) >>> > >>> > - element of array (index: 0) >>> > >>> > - array (class "[Ljava.lang.Object;", size: 16) >>> > >>> > - field (class "scala.collection.mutable.ArrayBuffer", name: >>> > "array", type: "class [Ljava.lang.Object;") >>> > >>> > - object (class "scala.collection.mutable.ArrayBuffer", >>> > ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20 >>> )) >>> > >>> > - field (class "org.apache.spark.streaming.DStreamGraph", name: >>> > "outputStreams", type: "class scala.collection.mutable.ArrayBuffer") >>> > >>> > - custom writeObject data (class >>> > "org.apache.spark.streaming.DStreamGraph") >>> > >>> > - object (class "org.apache.spark.streaming.DStreamGraph", >>> > org.apache.spark.streaming.DStreamGraph@776ae7da) >>> > >>> > - field (class "org.apache.spark.streaming.Checkpoint", name: >>> > "graph", type: "class org.apache.spark.streaming.DStreamGraph") >>> > >>> > - root object (class "org.apache.spark.streaming.Checkpoint", >>> > org.apache.spark.streaming.Checkpoint@5eade065) >>> > >>> > at java.io.ObjectOutputStream.writeObject0(Unknown Source) >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >> >> >