Hi, Tom.

What version of PyPy do you use?

In the Jenkins environment, `pypy` always passes like Python 2.7 and Python 3.4.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull

========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in 
/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'python3.4', 
'pypy']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Starting test(python2.7): pyspark.mllib.tests
Starting test(pypy): pyspark.sql.tests
Starting test(pypy): pyspark.tests
Starting test(pypy): pyspark.streaming.tests
Finished test(pypy): pyspark.tests (181s)
…

Tests passed in 1130 seconds


Bests,
Dongjoon.


From: Tom Graves <tgraves...@yahoo.com.INVALID>
Date: Monday, August 14, 2017 at 1:55 PM
To: "dev@spark.apache.org" <dev@spark.apache.org>
Subject: spark pypy support?

Anyone know if pypy works with spark. Saw a jira that it was supported back in 
Spark 1.2 but getting an error when trying and not sure if its something with 
my pypy version of just something spark doesn't support.


AttributeError: 'builtin-code' object has no attribute 'co_filename'
Traceback (most recent call last):
  File "<builtin>/app_main.py", line 75, in run_toplevel
  File "/homes/tgraves/mbe.py", line 40, in <module>
    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 834, 
in reduce
    vals = self.mapPartitions(func).collect()
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 808, 
in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 
2440, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 
2373, in _wrap_function
    pickled_command, broadcast_vars, env, includes = 
_prepare_for_python_RDD(sc, command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 
2359, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", 
line 460, in dumps
    return cloudpickle.dumps(obj, 2)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
line 703, in dumps
    cp.dump(obj)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
line 160, in dump

Thanks,
Tom

Reply via email to