Re: Got NotSerializableException when access broadcast variable

2014-08-21 Thread tianyi
Thanks  for help



On Aug 21, 2014, at 10:56, Yin Huai huaiyin@gmail.com wrote:

 If you want to filter the table name, you can use 
 
 hc.sql(show tables).filter(row = !test.equals(row.getString(0
 
 Seems making functionRegistry transient can fix the error.
 
 
 On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote:
 Hi,
 
 I doubt the the broadcast variable is your problem, since you are seeing:
 
 org.apache.spark.SparkException: Task not serializable
 Caused by: java.io.NotSerializableException: 
 org.apache.spark.sql.hive.HiveContext$$anon$3
 
 We have a knowledgebase article that explains why this happens - it's a very 
 common error I see users triggering on the mailing list:
 
 https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md
 
 Are you using the HiveContext within a tranformation that is called on an 
 RDD?  That will definitely create a problem.
 
 -Vida
 
 
 
 
 
 On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote:
 Thanks for help.
 
 I run this script again with bin/spark-shell --conf 
 spark.serializer=org.apache.spark.serializer.KryoSerializer”
 
 in the console, I can see:
 
 scala sc.getConf.getAll.foreach(println)
 (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
 (spark.driver.host,10.1.51.127)
 (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.serializer,org.apache.spark.serializer.KryoSerializer)
 (spark.repl.class.uri,http://10.1.51.127:51319)
 (spark.app.name,Spark shell)
 (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.fileserver.uri,http://10.1.51.127:51322)
 (spark.jars,)
 (spark.driver.port,51320)
 (spark.master,local[*])
 
 But it fails again with the same error.
 
 
 
 
 On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote:
 
 try: 
 
 sparkConf.set(spark.serializer, 
 org.apache.spark.serializer.KryoSerializer)
 
 
 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com:
 
 Hi everyone!
 
 I got a exception when i run my script with spark-shell:
 
 I added 
 
 SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true
 
 in spark-env.sh to show the following stack:
 
 
 org.apache.spark.SparkException: Task not serializable
  at 
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
  at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
  at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
  at $iwC$$iwC$$iwC$$iwC.init(console:18)
  at $iwC$$iwC$$iwC.init(console:23)
  at $iwC$$iwC.init(console:25)
  at $iwC.init(console:27)
  at init(console:29)
  at .init(console:33)
  at .clinit(console)
  at .init(console:7)
  at .clinit(console)
  at $print(console)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at 
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
  at 
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
 ……
 Caused by: java.io.NotSerializableException: 
 org.apache.spark.sql.hive.HiveContext$$anon$3
  - field (class org.apache.spark.sql.hive.HiveContext, name: 
 functionRegistry, type: class 
 org.apache.spark.sql.hive.HiveFunctionRegistry)
  - object (class org.apache.spark.sql.hive.HiveContext, 
 org.apache.spark.sql.hive.HiveContext@4648e685)
  - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class 
 org.apache.spark.sql.hive.HiveContext)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef)
  - field (class $iwC$$iwC$$iwC, name: $iw, type: class 
 $iwC$$iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1)
  - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC, $iwC$$iwC@74eca89e)
  - field (class $iwC, name: $iw, type: class $iwC$$iwC)
  - object (class $iwC, $iwC@685c4cc4)
  - field (class $line9.$read, name: $iw, type: class $iwC)
  - object (class $line9.$read, $line9.$read@519f9aae)
  - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class 
 $line9.$read)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858)
  - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class 
 $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4)
  - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: 
 class $iwC$$iwC$$iwC$$iwC)
  - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1)
  at 

Re: Got NotSerializableException when access broadcast variable

2014-08-20 Thread tianyi
Thanks for help.

I run this script again with bin/spark-shell --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer”

in the console, I can see:

scala sc.getConf.getAll.foreach(println)
(spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
(spark.driver.host,10.1.51.127)
(spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.repl.class.uri,http://10.1.51.127:51319)
(spark.app.name,Spark shell)
(spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
(spark.fileserver.uri,http://10.1.51.127:51322)
(spark.jars,)
(spark.driver.port,51320)
(spark.master,local[*])

But it fails again with the same error.




On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote:

 try: 
 
 sparkConf.set(spark.serializer, 
 org.apache.spark.serializer.KryoSerializer)
 
 
 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com:
 Hi everyone!
 
 I got a exception when i run my script with spark-shell:
 
 I added 
 
 SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true
 
 in spark-env.sh to show the following stack:
 
 
 org.apache.spark.SparkException: Task not serializable
   at 
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
   at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
   at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
   at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
   at $iwC$$iwC$$iwC$$iwC.init(console:18)
   at $iwC$$iwC$$iwC.init(console:23)
   at $iwC$$iwC.init(console:25)
   at $iwC.init(console:27)
   at init(console:29)
   at .init(console:33)
   at .clinit(console)
   at .init(console:7)
   at .clinit(console)
   at $print(console)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
   at 
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
 ……
 Caused by: java.io.NotSerializableException: 
 org.apache.spark.sql.hive.HiveContext$$anon$3
   - field (class org.apache.spark.sql.hive.HiveContext, name: 
 functionRegistry, type: class 
 org.apache.spark.sql.hive.HiveFunctionRegistry)
   - object (class org.apache.spark.sql.hive.HiveContext, 
 org.apache.spark.sql.hive.HiveContext@4648e685)
   - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class 
 org.apache.spark.sql.hive.HiveContext)
   - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef)
   - field (class $iwC$$iwC$$iwC, name: $iw, type: class 
 $iwC$$iwC$$iwC$$iwC)
   - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1)
   - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC)
   - object (class $iwC$$iwC, $iwC$$iwC@74eca89e)
   - field (class $iwC, name: $iw, type: class $iwC$$iwC)
   - object (class $iwC, $iwC@685c4cc4)
   - field (class $line9.$read, name: $iw, type: class $iwC)
   - object (class $line9.$read, $line9.$read@519f9aae)
   - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class 
 $line9.$read)
   - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858)
   - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class 
 $iwC$$iwC$$iwC)
   - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4)
   - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: 
 class $iwC$$iwC$$iwC$$iwC)
   - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
 
 I write some simple script to reproduce this problem.
 
 case 1 :
 val barr1 = sc.broadcast(test)
 val sret = sc.parallelize(1 to 10, 2)
 val ret = sret.filter(row = !barr1.equals(test))
 ret.collect.foreach(println)
 
 It’s working fine with local mode and yarn-client mode.
 
 case 2 :
 val barr1 = sc.broadcast(test)
 val hc = new org.apache.spark.sql.hive.HiveContext(sc)
 val sret = hc.sql(show tables)
 val ret = sret.filter(row = !barr1.equals(test))
 ret.collect.foreach(println)
 
 It will throw java.io.NotSerializableException: 
 org.apache.spark.sql.hive.HiveContext
  with local mode and yarn-client mode
 
 But it working fine if I write the same code in a scala file and run in 
 Intellij IDEA.
 
 import org.apache.spark.{SparkConf, SparkContext}
 
 object TestBroadcast2 {
   def main(args: Array[String]) {
 val sparkConf = new 

Re: Got NotSerializableException when access broadcast variable

2014-08-20 Thread Vida Ha
Hi,

I doubt the the broadcast variable is your problem, since you are seeing:

org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.apache.spark.sql
.hive.HiveContext$$anon$3

We have a knowledgebase article that explains why this happens - it's a
very common error I see users triggering on the mailing list:

https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md

Are you using the HiveContext within a tranformation that is called on an
RDD?  That will definitely create a problem.

-Vida





On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote:

 Thanks for help.

 I run this script again with bin/spark-shell --conf
 spark.serializer=org.apache.spark.serializer.KryoSerializer”

 in the console, I can see:

 scala sc.getConf.getAll.foreach(println)
 (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
 (spark.driver.host,10.1.51.127)

 (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.serializer,org.apache.spark.serializer.KryoSerializer)
 (spark.repl.class.uri,http://10.1.51.127:51319)
 (spark.app.name,Spark shell)

 (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.fileserver.uri,http://10.1.51.127:51322)
 (spark.jars,)
 (spark.driver.port,51320)
 (spark.master,local[*])

 But it fails again with the same error.




 On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote:

 try:

 sparkConf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)


 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com:

 Hi everyone!

 I got a exception when i run my script with spark-shell:

 I added

 SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true

 in spark-env.sh to show the following stack:


 org.apache.spark.SparkException: Task not serializable
  at
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
 at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
 at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
  at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
 at $iwC$$iwC$$iwC$$iwC.init(console:18)
  at $iwC$$iwC$$iwC.init(console:23)
 at $iwC$$iwC.init(console:25)
  at $iwC.init(console:27)
 at init(console:29)
  at .init(console:33)
 at .clinit(console)
  at .init(console:7)
 at .clinit(console)
  at $print(console)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
 at
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
  at
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
 ……
 Caused by: java.io.NotSerializableException:
 org.apache.spark.sql.hive.HiveContext$$anon$3
 - field (class org.apache.spark.sql.hive.HiveContext, name:
 functionRegistry, type: class
 org.apache.spark.sql.hive.HiveFunctionRegistry)
  - object (class org.apache.spark.sql.hive.HiveContext,
 org.apache.spark.sql.hive.HiveContext@4648e685)
  - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class
 org.apache.spark.sql.hive.HiveContext)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef)
 - field (class $iwC$$iwC$$iwC, name: $iw, type: class
 $iwC$$iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1)
 - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC, $iwC$$iwC@74eca89e)
 - field (class $iwC, name: $iw, type: class $iwC$$iwC)
  - object (class $iwC, $iwC@685c4cc4)
 - field (class $line9.$read, name: $iw, type: class $iwC)
  - object (class $line9.$read, $line9.$read@519f9aae)
 - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class
 $line9.$read)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858)
 - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class
 $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4)
 - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type:
 class $iwC$$iwC$$iwC$$iwC)
  - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1)
 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)

 I write some simple script to reproduce this problem.

 case 1 :
 val barr1 = sc.broadcast(test)
 val sret = sc.parallelize(1 to 10, 2)
 val ret = sret.filter(row = !barr1.equals(test))
 ret.collect.foreach(println)

 It’s working fine with local mode and yarn-client mode.

 case 2 :
 val barr1 = sc.broadcast(test)
 val hc = new org.apache.spark.sql.hive.HiveContext(sc)
 val sret = hc.sql(show tables)
 val ret 

Re: Got NotSerializableException when access broadcast variable

2014-08-20 Thread Yin Huai
If you want to filter the table name, you can use

hc.sql(show tables).filter(row = !test.equals(row.getString(0

Seems making functionRegistry transient can fix the error.


On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote:

 Hi,

 I doubt the the broadcast variable is your problem, since you are seeing:

 org.apache.spark.SparkException: Task not serializable
 Caused by: java.io.NotSerializableException: org.apache.spark.sql
 .hive.HiveContext$$anon$3

 We have a knowledgebase article that explains why this happens - it's a
 very common error I see users triggering on the mailing list:


 https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md

 Are you using the HiveContext within a tranformation that is called on an
 RDD?  That will definitely create a problem.

 -Vida





 On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote:

 Thanks for help.

 I run this script again with bin/spark-shell --conf
 spark.serializer=org.apache.spark.serializer.KryoSerializer”

 in the console, I can see:

 scala sc.getConf.getAll.foreach(println)
 (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
 (spark.driver.host,10.1.51.127)

 (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.serializer,org.apache.spark.serializer.KryoSerializer)
 (spark.repl.class.uri,http://10.1.51.127:51319)
 (spark.app.name,Spark shell)

 (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.fileserver.uri,http://10.1.51.127:51322)
 (spark.jars,)
 (spark.driver.port,51320)
 (spark.master,local[*])

 But it fails again with the same error.




 On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote:

 try:

 sparkConf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)


 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com:

 Hi everyone!

 I got a exception when i run my script with spark-shell:

 I added

 SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true

 in spark-env.sh to show the following stack:


 org.apache.spark.SparkException: Task not serializable
  at
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
 at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
 at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
  at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
 at $iwC$$iwC$$iwC$$iwC.init(console:18)
  at $iwC$$iwC$$iwC.init(console:23)
 at $iwC$$iwC.init(console:25)
  at $iwC.init(console:27)
 at init(console:29)
  at .init(console:33)
 at .clinit(console)
  at .init(console:7)
 at .clinit(console)
  at $print(console)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
 at
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
  at
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
 ……
 Caused by: java.io.NotSerializableException:
 org.apache.spark.sql.hive.HiveContext$$anon$3
 - field (class org.apache.spark.sql.hive.HiveContext, name:
 functionRegistry, type: class
 org.apache.spark.sql.hive.HiveFunctionRegistry)
  - object (class org.apache.spark.sql.hive.HiveContext,
 org.apache.spark.sql.hive.HiveContext@4648e685)
  - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class
 org.apache.spark.sql.hive.HiveContext)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef)
 - field (class $iwC$$iwC$$iwC, name: $iw, type: class
 $iwC$$iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1)
 - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC, $iwC$$iwC@74eca89e)
 - field (class $iwC, name: $iw, type: class $iwC$$iwC)
  - object (class $iwC, $iwC@685c4cc4)
 - field (class $line9.$read, name: $iw, type: class $iwC)
  - object (class $line9.$read, $line9.$read@519f9aae)
 - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class
 $line9.$read)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858)
 - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class
 $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4)
 - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type:
 class $iwC$$iwC$$iwC$$iwC)
  - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1)
 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
  at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)

 I write some simple script to reproduce this problem.

 case 1 :
 val barr1 = sc.broadcast(test)
 val sret = sc.parallelize(1 to 10, 2)
 val ret = sret.filter(row = 

RE: Got NotSerializableException when access broadcast variable

2014-08-20 Thread Yin Huai
PR is https://github.com/apache/spark/pull/2074.
--
From: Yin Huai huaiyin@gmail.com
Sent: ‎8/‎20/‎2014 10:56 PM
To: Vida Ha v...@databricks.com
Cc: tianyi tia...@asiainfo.com; Fengyun RAO raofeng...@gmail.com;
user@spark.apache.org
Subject: Re: Got NotSerializableException when access broadcast variable

If you want to filter the table name, you can use

hc.sql(show tables).filter(row = !test.equals(row.getString(0

Seems making functionRegistry transient can fix the error.


On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote:

 Hi,

 I doubt the the broadcast variable is your problem, since you are seeing:

 org.apache.spark.SparkException: Task not serializable
 Caused by: java.io.NotSerializableException: org.apache.spark.sql
 .hive.HiveContext$$anon$3

 We have a knowledgebase article that explains why this happens - it's a
 very common error I see users triggering on the mailing list:


 https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md

 Are you using the HiveContext within a tranformation that is called on an
 RDD?  That will definitely create a problem.

 -Vida





 On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote:

 Thanks for help.

 I run this script again with bin/spark-shell --conf
 spark.serializer=org.apache.spark.serializer.KryoSerializer”

 in the console, I can see:

 scala sc.getConf.getAll.foreach(println)
 (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
 (spark.driver.host,10.1.51.127)

 (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.serializer,org.apache.spark.serializer.KryoSerializer)
 (spark.repl.class.uri,http://10.1.51.127:51319)
 (spark.app.name,Spark shell)

 (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
 (spark.fileserver.uri,http://10.1.51.127:51322)
 (spark.jars,)
 (spark.driver.port,51320)
 (spark.master,local[*])

 But it fails again with the same error.




 On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote:

 try:

 sparkConf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)


 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com:

 Hi everyone!

 I got a exception when i run my script with spark-shell:

 I added

 SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true

 in spark-env.sh to show the following stack:


 org.apache.spark.SparkException: Task not serializable
  at
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
 at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
 at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
  at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
 at $iwC$$iwC$$iwC$$iwC.init(console:18)
  at $iwC$$iwC$$iwC.init(console:23)
 at $iwC$$iwC.init(console:25)
  at $iwC.init(console:27)
 at init(console:29)
  at .init(console:33)
 at .clinit(console)
  at .init(console:7)
 at .clinit(console)
  at $print(console)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
 at
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
  at
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
 ……
 Caused by: java.io.NotSerializableException:
 org.apache.spark.sql.hive.HiveContext$$anon$3
 - field (class org.apache.spark.sql.hive.HiveContext, name:
 functionRegistry, type: class
 org.apache.spark.sql.hive.HiveFunctionRegistry)
  - object (class org.apache.spark.sql.hive.HiveContext,
 org.apache.spark.sql.hive.HiveContext@4648e685)
  - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class
 org.apache.spark.sql.hive.HiveContext)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef)
 - field (class $iwC$$iwC$$iwC, name: $iw, type: class
 $iwC$$iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1)
 - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC, $iwC$$iwC@74eca89e)
 - field (class $iwC, name: $iw, type: class $iwC$$iwC)
  - object (class $iwC, $iwC@685c4cc4)
 - field (class $line9.$read, name: $iw, type: class $iwC)
  - object (class $line9.$read, $line9.$read@519f9aae)
 - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class
 $line9.$read)
  - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858)
 - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class
 $iwC$$iwC$$iwC)
  - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4)
 - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type:
 class $iwC$$iwC$$iwC$$iwC)
  - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1