Check for any variables you've declared in your class. Even if you're not
calling them from the function they are passed to the worker nodes as part
of the context. Consequently, if you have something without a default
serializer (like an imported class) it will also get passed.

To fix this you can either move that variable out of the class (make it
global) or you can implement kryo serialization (see the Spark tuning guide
for this).
On Oct 17, 2014 6:37 AM, "shahab" <shahab.mok...@gmail.com> wrote:

> Hi,
>
> Probably I am missing very simple principle , but something is wrong with
> my filter,
> i get "org.apache.spark.SparkException: Task not serializable" expetion.
>
> here is my filter function:
> object OBJ {
>    def f1(): Boolean = {
>          var i = 1;
>          for (j<-1 to 10) i = i +1;
>          true;
>    }
> }
>
> rdd.filter(row => OBJ.f1())
>
>
> And when I run, I get the following exception:
>
> org.apache.spark.SparkException: Task not serializable
> at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
> at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
> .......
> Caused by: java.io.NotSerializableException: org.apache.spark.SparkConf
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> ...........
>
>
>
> best,
> /Shahab
>
>

Reply via email to