Currently, PySpark can not support pickle a class object in current
script ( '__main__'), the workaround could be put the implementation
of the class into a separate module, then use "bin/spark-submit
--py-files xxx.py" in deploy it.

in xxx.py:

class test(object):
  def __init__(self, a, b):
    self.total = a + b

in job.py:

from xxx import test
a = sc.parallelize([(True,False),(False,False)])
a.map(lambda (x,y): test(x,y))

run it by:

bin/spark-submit --py-files xxx.py job.py


On Wed, Feb 18, 2015 at 1:48 PM, Guillaume Guy
<guillaume.c....@gmail.com> wrote:
> Hi,
>
> This is a duplicate of the stack-overflow question here. I hope to generate
> more interest  on this mailing list.
>
>
> The problem:
>
> I am running into some attribute lookup problems when trying to initiate a
> class within my RDD.
>
> My workflow is quite standard:
>
> 1- Start with an RDD
>
> 2- Take each element of the RDD, initiate an object for each
>
> 3- Reduce (I will write a method that will define the reduce operation later
> on)
>
> Here is #2:
>
> class test(object):
> def __init__(self, a,b):
>     self.total = a + b
>
> a = sc.parallelize([(True,False),(False,False)])
> a.map(lambda (x,y): test(x,y))
>
> Here is the error I get:
>
> PicklingError: Can't pickle < class 'main.test' >: attribute lookup
> main.test failed
>
> I'd like to know if there is any way around it. Please, answer with a
> working example to achieve the intended results (i.e. creating a RDD of
> objects of class "tests").
>
> Thanks in advance!
>
> Related question:
>
> https://groups.google.com/forum/#!topic/edx-code/9xzRJFyQwn
>
>
> GG
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to