Re: Got NotSerializableException when access broadcast variable
Thanks for help On Aug 21, 2014, at 10:56, Yin Huai huaiyin@gmail.com wrote: If you want to filter the table name, you can use hc.sql(show tables).filter(row = !test.equals(row.getString(0 Seems making functionRegistry transient can fix the error. On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote: Hi, I doubt the the broadcast variable is your problem, since you are seeing: org.apache.spark.SparkException: Task not serializable Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 We have a knowledgebase article that explains why this happens - it's a very common error I see users triggering on the mailing list: https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md Are you using the HiveContext within a tranformation that is called on an RDD? That will definitely create a problem. -Vida On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote: Thanks for help. I run this script again with bin/spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer” in the console, I can see: scala sc.getConf.getAll.foreach(println) (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048) (spark.driver.host,10.1.51.127) (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.repl.class.uri,http://10.1.51.127:51319) (spark.app.name,Spark shell) (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.fileserver.uri,http://10.1.51.127:51322) (spark.jars,) (spark.driver.port,51320) (spark.master,local[*]) But it fails again with the same error. On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote: try: sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com: Hi everyone! I got a exception when i run my script with spark-shell: I added SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true in spark-env.sh to show the following stack: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.filter(RDD.scala:282) at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) at $iwC$$iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC$$iwC.init(console:23) at $iwC$$iwC.init(console:25) at $iwC.init(console:27) at init(console:29) at .init(console:33) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) …… Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 - field (class org.apache.spark.sql.hive.HiveContext, name: functionRegistry, type: class org.apache.spark.sql.hive.HiveFunctionRegistry) - object (class org.apache.spark.sql.hive.HiveContext, org.apache.spark.sql.hive.HiveContext@4648e685) - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class org.apache.spark.sql.hive.HiveContext) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef) - field (class $iwC$$iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1) - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC, $iwC$$iwC@74eca89e) - field (class $iwC, name: $iw, type: class $iwC$$iwC) - object (class $iwC, $iwC@685c4cc4) - field (class $line9.$read, name: $iw, type: class $iwC) - object (class $line9.$read, $line9.$read@519f9aae) - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class $line9.$read) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858) - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4) - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC) - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1) at
Re: Got NotSerializableException when access broadcast variable
Thanks for help. I run this script again with bin/spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer” in the console, I can see: scala sc.getConf.getAll.foreach(println) (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048) (spark.driver.host,10.1.51.127) (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.repl.class.uri,http://10.1.51.127:51319) (spark.app.name,Spark shell) (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.fileserver.uri,http://10.1.51.127:51322) (spark.jars,) (spark.driver.port,51320) (spark.master,local[*]) But it fails again with the same error. On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote: try: sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com: Hi everyone! I got a exception when i run my script with spark-shell: I added SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true in spark-env.sh to show the following stack: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.filter(RDD.scala:282) at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) at $iwC$$iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC$$iwC.init(console:23) at $iwC$$iwC.init(console:25) at $iwC.init(console:27) at init(console:29) at .init(console:33) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) …… Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 - field (class org.apache.spark.sql.hive.HiveContext, name: functionRegistry, type: class org.apache.spark.sql.hive.HiveFunctionRegistry) - object (class org.apache.spark.sql.hive.HiveContext, org.apache.spark.sql.hive.HiveContext@4648e685) - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class org.apache.spark.sql.hive.HiveContext) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef) - field (class $iwC$$iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1) - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC, $iwC$$iwC@74eca89e) - field (class $iwC, name: $iw, type: class $iwC$$iwC) - object (class $iwC, $iwC@685c4cc4) - field (class $line9.$read, name: $iw, type: class $iwC) - object (class $line9.$read, $line9.$read@519f9aae) - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class $line9.$read) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858) - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4) - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC) - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528) I write some simple script to reproduce this problem. case 1 : val barr1 = sc.broadcast(test) val sret = sc.parallelize(1 to 10, 2) val ret = sret.filter(row = !barr1.equals(test)) ret.collect.foreach(println) It’s working fine with local mode and yarn-client mode. case 2 : val barr1 = sc.broadcast(test) val hc = new org.apache.spark.sql.hive.HiveContext(sc) val sret = hc.sql(show tables) val ret = sret.filter(row = !barr1.equals(test)) ret.collect.foreach(println) It will throw java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext with local mode and yarn-client mode But it working fine if I write the same code in a scala file and run in Intellij IDEA. import org.apache.spark.{SparkConf, SparkContext} object TestBroadcast2 { def main(args: Array[String]) { val sparkConf = new
Re: Got NotSerializableException when access broadcast variable
Hi, I doubt the the broadcast variable is your problem, since you are seeing: org.apache.spark.SparkException: Task not serializable Caused by: java.io.NotSerializableException: org.apache.spark.sql .hive.HiveContext$$anon$3 We have a knowledgebase article that explains why this happens - it's a very common error I see users triggering on the mailing list: https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md Are you using the HiveContext within a tranformation that is called on an RDD? That will definitely create a problem. -Vida On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote: Thanks for help. I run this script again with bin/spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer” in the console, I can see: scala sc.getConf.getAll.foreach(println) (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048) (spark.driver.host,10.1.51.127) (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.repl.class.uri,http://10.1.51.127:51319) (spark.app.name,Spark shell) (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.fileserver.uri,http://10.1.51.127:51322) (spark.jars,) (spark.driver.port,51320) (spark.master,local[*]) But it fails again with the same error. On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote: try: sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com: Hi everyone! I got a exception when i run my script with spark-shell: I added SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true in spark-env.sh to show the following stack: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.filter(RDD.scala:282) at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) at $iwC$$iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC$$iwC.init(console:23) at $iwC$$iwC.init(console:25) at $iwC.init(console:27) at init(console:29) at .init(console:33) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) …… Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 - field (class org.apache.spark.sql.hive.HiveContext, name: functionRegistry, type: class org.apache.spark.sql.hive.HiveFunctionRegistry) - object (class org.apache.spark.sql.hive.HiveContext, org.apache.spark.sql.hive.HiveContext@4648e685) - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class org.apache.spark.sql.hive.HiveContext) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef) - field (class $iwC$$iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1) - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC, $iwC$$iwC@74eca89e) - field (class $iwC, name: $iw, type: class $iwC$$iwC) - object (class $iwC, $iwC@685c4cc4) - field (class $line9.$read, name: $iw, type: class $iwC) - object (class $line9.$read, $line9.$read@519f9aae) - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class $line9.$read) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858) - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4) - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC) - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528) I write some simple script to reproduce this problem. case 1 : val barr1 = sc.broadcast(test) val sret = sc.parallelize(1 to 10, 2) val ret = sret.filter(row = !barr1.equals(test)) ret.collect.foreach(println) It’s working fine with local mode and yarn-client mode. case 2 : val barr1 = sc.broadcast(test) val hc = new org.apache.spark.sql.hive.HiveContext(sc) val sret = hc.sql(show tables) val ret
Re: Got NotSerializableException when access broadcast variable
If you want to filter the table name, you can use hc.sql(show tables).filter(row = !test.equals(row.getString(0 Seems making functionRegistry transient can fix the error. On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote: Hi, I doubt the the broadcast variable is your problem, since you are seeing: org.apache.spark.SparkException: Task not serializable Caused by: java.io.NotSerializableException: org.apache.spark.sql .hive.HiveContext$$anon$3 We have a knowledgebase article that explains why this happens - it's a very common error I see users triggering on the mailing list: https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md Are you using the HiveContext within a tranformation that is called on an RDD? That will definitely create a problem. -Vida On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote: Thanks for help. I run this script again with bin/spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer” in the console, I can see: scala sc.getConf.getAll.foreach(println) (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048) (spark.driver.host,10.1.51.127) (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.repl.class.uri,http://10.1.51.127:51319) (spark.app.name,Spark shell) (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.fileserver.uri,http://10.1.51.127:51322) (spark.jars,) (spark.driver.port,51320) (spark.master,local[*]) But it fails again with the same error. On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote: try: sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com: Hi everyone! I got a exception when i run my script with spark-shell: I added SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true in spark-env.sh to show the following stack: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.filter(RDD.scala:282) at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) at $iwC$$iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC$$iwC.init(console:23) at $iwC$$iwC.init(console:25) at $iwC.init(console:27) at init(console:29) at .init(console:33) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) …… Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 - field (class org.apache.spark.sql.hive.HiveContext, name: functionRegistry, type: class org.apache.spark.sql.hive.HiveFunctionRegistry) - object (class org.apache.spark.sql.hive.HiveContext, org.apache.spark.sql.hive.HiveContext@4648e685) - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class org.apache.spark.sql.hive.HiveContext) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef) - field (class $iwC$$iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1) - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC, $iwC$$iwC@74eca89e) - field (class $iwC, name: $iw, type: class $iwC$$iwC) - object (class $iwC, $iwC@685c4cc4) - field (class $line9.$read, name: $iw, type: class $iwC) - object (class $line9.$read, $line9.$read@519f9aae) - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class $line9.$read) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858) - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4) - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC) - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528) I write some simple script to reproduce this problem. case 1 : val barr1 = sc.broadcast(test) val sret = sc.parallelize(1 to 10, 2) val ret = sret.filter(row =
RE: Got NotSerializableException when access broadcast variable
PR is https://github.com/apache/spark/pull/2074. -- From: Yin Huai huaiyin@gmail.com Sent: 8/20/2014 10:56 PM To: Vida Ha v...@databricks.com Cc: tianyi tia...@asiainfo.com; Fengyun RAO raofeng...@gmail.com; user@spark.apache.org Subject: Re: Got NotSerializableException when access broadcast variable If you want to filter the table name, you can use hc.sql(show tables).filter(row = !test.equals(row.getString(0 Seems making functionRegistry transient can fix the error. On Wed, Aug 20, 2014 at 8:53 PM, Vida Ha v...@databricks.com wrote: Hi, I doubt the the broadcast variable is your problem, since you are seeing: org.apache.spark.SparkException: Task not serializable Caused by: java.io.NotSerializableException: org.apache.spark.sql .hive.HiveContext$$anon$3 We have a knowledgebase article that explains why this happens - it's a very common error I see users triggering on the mailing list: https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md Are you using the HiveContext within a tranformation that is called on an RDD? That will definitely create a problem. -Vida On Wed, Aug 20, 2014 at 1:20 AM, tianyi tia...@asiainfo.com wrote: Thanks for help. I run this script again with bin/spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer” in the console, I can see: scala sc.getConf.getAll.foreach(println) (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048) (spark.driver.host,10.1.51.127) (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.repl.class.uri,http://10.1.51.127:51319) (spark.app.name,Spark shell) (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.fileserver.uri,http://10.1.51.127:51322) (spark.jars,) (spark.driver.port,51320) (spark.master,local[*]) But it fails again with the same error. On Aug 20, 2014, at 15:59, Fengyun RAO raofeng...@gmail.com wrote: try: sparkConf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) 2014-08-20 14:27 GMT+08:00 田毅 tia...@asiainfo.com: Hi everyone! I got a exception when i run my script with spark-shell: I added SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true in spark-env.sh to show the following stack: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.filter(RDD.scala:282) at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) at $iwC$$iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC$$iwC.init(console:23) at $iwC$$iwC.init(console:25) at $iwC.init(console:27) at init(console:29) at .init(console:33) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) …… Caused by: java.io.NotSerializableException: org.apache.spark.sql.hive.HiveContext$$anon$3 - field (class org.apache.spark.sql.hive.HiveContext, name: functionRegistry, type: class org.apache.spark.sql.hive.HiveFunctionRegistry) - object (class org.apache.spark.sql.hive.HiveContext, org.apache.spark.sql.hive.HiveContext@4648e685) - field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class org.apache.spark.sql.hive.HiveContext) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef) - field (class $iwC$$iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1) - field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC, $iwC$$iwC@74eca89e) - field (class $iwC, name: $iw, type: class $iwC$$iwC) - object (class $iwC, $iwC@685c4cc4) - field (class $line9.$read, name: $iw, type: class $iwC) - object (class $line9.$read, $line9.$read@519f9aae) - field (class $iwC$$iwC$$iwC, name: $VAL7, type: class $line9.$read) - object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858) - field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class $iwC$$iwC$$iwC) - object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4) - field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC) - root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1