Thomas Graves created SPARK-21753:
-------------------------------------

             Summary: running pi example with pypy on spark fails to serialize 
                 Key: SPARK-21753
                 URL: https://issues.apache.org/jira/browse/SPARK-21753
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.1.1
            Reporter: Thomas Graves


I'm trying to run the pi example 
(https://github.com/apache/spark/blob/master/examples/src/main/python/pi.py)  
on pyspark using pypy 2.5.1 but everything I've tried results in a 
serialization error:

Traceback (most recent call last):
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 147, in 
dump
    return Pickler.dump(self, obj)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
224, in dump
    self.save(obj)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
562, in save_tuple
    save(element)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 254, in 
save_function
    self.save_function_tuple(obj)
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 291, in 
save_function_tuple
    save((code, closure, base_globals))
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
548, in save_tuple
    save(element)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
600, in save_list
    self._batch_appends(iter(obj))
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
633, in _batch_appends
    save(x)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 254, in 
save_function
    self.save_function_tuple(obj)
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 291, in 
save_function_tuple
    save((code, closure, base_globals))
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
548, in save_tuple
    save(element)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
600, in save_list
    self._batch_appends(iter(obj))
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
636, in _batch_appends
    save(tmp[0])
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 248, in 
save_function
    self.save_function_tuple(obj)
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 296, in 
save_function_tuple
    save(f_globals)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
653, in save_dict
    self._batch_setitems(obj.iteritems())
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
690, in _batch_setitems
    save(v)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 447, in 
save_instancemethod
    obj=obj)
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 581, in 
save_reduce
    save(args)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
548, in save_tuple
    save(element)
  File "//home/tgraves/pypy-my-own-package-name/lib-python/2.7/pickle.py", line 
286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 246, in 
save_function
    if islambda(obj) or obj.__code__.co_filename == '<stdin>' or themodule is 
None:
AttributeError: 'builtin-code' object has no attribute 'co_filename'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 834, in reduce
    vals = self.mapPartitions(func).collect()
  File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 808, in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 2440, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 2373, in 
_wrap_function
    pickled_command, broadcast_vars, env, includes = 
_prepare_for_python_RDD(sc, command)
  File "/home/tgraves/y-spark-git/python/pyspark/rdd.py", line 2359, in 
_prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/home/tgraves/y-spark-git/python/pyspark/serializers.py", line 460, in 
dumps
    return cloudpickle.dumps(obj, 2)
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 703, in 
dumps
    cp.dump(obj)
  File "/home/tgraves/y-spark-git/python/pyspark/cloudpickle.py", line 160, in 
dump
    raise pickle.PicklingError(msg)

It looks like the issue is with serializing random().  If you remove random() 
from the function then everything works fine.

I'm just running 
PYSPARK_PYTHON=//home/tgraves/pypy-my-own-package-name/bin/pypy ./bin/pyspark

I've tried multiple versions of pypy from 2.5.1 to 5.8.0. I tried the portable 
version as well as built pypy from source.

If it works for others perhaps I have a setup issue, any hints on that would be 
appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to