Check for any variables you've declared in your class. Even if you're not calling them from the function they are passed to the worker nodes as part of the context. Consequently, if you have something without a default serializer (like an imported class) it will also get passed.
To fix this you can either move that variable out of the class (make it global) or you can implement kryo serialization (see the Spark tuning guide for this). On Oct 17, 2014 6:37 AM, "shahab" <shahab.mok...@gmail.com> wrote: > Hi, > > Probably I am missing very simple principle , but something is wrong with > my filter, > i get "org.apache.spark.SparkException: Task not serializable" expetion. > > here is my filter function: > object OBJ { > def f1(): Boolean = { > var i = 1; > for (j<-1 to 10) i = i +1; > true; > } > } > > rdd.filter(row => OBJ.f1()) > > > And when I run, I get the following exception: > > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) > at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) > at org.apache.spark.rdd.RDD.filter(RDD.scala:282) > ....... > Caused by: java.io.NotSerializableException: org.apache.spark.SparkConf > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > ........... > > > > best, > /Shahab > >