Python can pickle only objects not classes. It means that SimpleClass
has to importable on every worker node to enable correct
deserialization. Typically it means keeping class definitions in a
separate module and distributing using for example --py-files.


On 01/19/2016 12:34 AM, efwalkermit wrote:
> Should I be able to broadcast a fairly simple user-defined class?  I'm having
> no success in 1.6.0 (or 1.5.2):
>
> $ cat test_spark.py
> import pyspark
>
>
> class SimpleClass:
>     def __init__(self):
>         self.val = 5
>     def get(self):
>         return self.val
>
>
> def main():
>     sc = pyspark.SparkContext()
>     b = sc.broadcast(SimpleClass())
>     results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
> b.value.get()).collect() 
>
> if __name__ == '__main__':
>     main()
>
>
> $ spark-submit --master local[1] test_spark.py
> [snip]
>   File "/Users/ed/src/mrspark/examples/fortyler/test_spark.py", line 14, in
> <lambda>
>     results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
> b.value.get()).collect()
>   File
> "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
> line 97, in value
>     self._value = self.load(self._path)
>   File
> "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
> line 88, in load
>     return pickle.load(f)
> AttributeError: 'module' object has no attribute 'SimpleClass'
> [snip]
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Broadcast-of-User-Defined-Class-No-Work-tp26000.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to