pyspark equivalent to Extends Serializable

keegan Tue, 21 Jul 2015 08:52:01 -0700

I'm trying to define a class that contains as attributes some of Spark's
objects and am running into a problem that I think would be solved I can
find python's equivalent of Scala's Extends Serializable.


Here's a simple class that has a Spark RDD as one of its attributes.

class Foo:
    def __init__(self):
        self.rdd = sc.parallelize([1,2,3,4,5])

    def combine(self,first,second):
       return first + second

    def f1(self):
        return self.rdd.reduce(lambda a,b : self.combine(a,b)) 

When I try

b = Foo()
b.f1()

I get the error:
PicklingError: Can't pickle builtin <type 'method_descriptor'>

My guess is that this has to do with serialization of the class I created
and an error there.

So how can I use Spark's RDD methods (such as reduce()) in conjunction with
the methods of the class I've created (combine() here) ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-equivalent-to-Extends-Serializable-tp23933.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

pyspark equivalent to Extends Serializable

Reply via email to