Hi, Tom. What version of PyPy do you use?
In the Jenkins environment, `pypy` always passes like Python 2.7 and Python 3.4. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull ======================================================================== Running PySpark tests ======================================================================== Running PySpark tests. Output is in /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'python3.4', 'pypy'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Starting test(python2.7): pyspark.mllib.tests Starting test(pypy): pyspark.sql.tests Starting test(pypy): pyspark.tests Starting test(pypy): pyspark.streaming.tests Finished test(pypy): pyspark.tests (181s) … Tests passed in 1130 seconds Bests, Dongjoon. From: Tom Graves <tgraves...@yahoo.com.INVALID> Date: Monday, August 14, 2017 at 1:55 PM To: "dev@spark.apache.org" <dev@spark.apache.org> Subject: spark pypy support? Anyone know if pypy works with spark. Saw a jira that it was supported back in Spark 1.2 but getting an error when trying and not sure if its something with my pypy version of just something spark doesn't support. AttributeError: 'builtin-code' object has no attribute 'co_filename' Traceback (most recent call last): File "<builtin>/app_main.py", line 75, in run_toplevel File "/homes/tgraves/mbe.py", line 40, in <module> count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in reduce vals = self.mapPartitions(func).collect() File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 808, in collect port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2440, in _jrdd self._jrdd_deserializer, profiler) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2373, in _wrap_function pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2359, in _prepare_for_python_RDD pickled_command = ser.dumps(command) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line 460, in dumps return cloudpickle.dumps(obj, 2) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 703, in dumps cp.dump(obj) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 160, in dump Thanks, Tom