Python can pickle only objects not classes. It means that SimpleClass has to importable on every worker node to enable correct deserialization. Typically it means keeping class definitions in a separate module and distributing using for example --py-files.
On 01/19/2016 12:34 AM, efwalkermit wrote: > Should I be able to broadcast a fairly simple user-defined class? I'm having > no success in 1.6.0 (or 1.5.2): > > $ cat test_spark.py > import pyspark > > > class SimpleClass: > def __init__(self): > self.val = 5 > def get(self): > return self.val > > > def main(): > sc = pyspark.SparkContext() > b = sc.broadcast(SimpleClass()) > results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x + > b.value.get()).collect() > > if __name__ == '__main__': > main() > > > $ spark-submit --master local[1] test_spark.py > [snip] > File "/Users/ed/src/mrspark/examples/fortyler/test_spark.py", line 14, in > <lambda> > results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x + > b.value.get()).collect() > File > "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py", > line 97, in value > self._value = self.load(self._path) > File > "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py", > line 88, in load > return pickle.load(f) > AttributeError: 'module' object has no attribute 'SimpleClass' > [snip] > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Broadcast-of-User-Defined-Class-No-Work-tp26000.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
signature.asc
Description: OpenPGP digital signature