Re: spark pypy support?
Just curious, is this using the portable version of pypy or standard version (ubuntu?)? Tom On Monday, August 14, 2017, 5:27:11 PM CDT, Holden Karauwrote: Ah interesting, looking at our latest docs we imply that it should work with PyPy 2.3+ -- we might want to update that to 2.5+ since we aren't testing with 2.3 anymore? On Mon, Aug 14, 2017 at 3:09 PM, Tom Graves wrote: I tried 5.7 and 2.5.1 so its probably something in my setup. I'll investigate that more, wanted to make sure it was still supported because I didn't see anything about it since the original jira that added it. Thanks,Tom On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp wrote: actually, we *have* locked on a particular pypy versions for the jenkins workers: 2.5.1 this applies to both the 2.7 and 3.5 conda environments. (py3k)-bash-4.1$ pypy --version Python 2.7.9 ( 9c4588d731b7fe0b08669bd732c2b6 76cb0a8233, Apr 09 2015, 02:17:39) [PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau wrote: > As Dong says yes we do test with PyPy in our CI env; but we expect a "newer" > version of PyPy (although I don't think we ever bothered to write down what > the exact version requirements are for the PyPy support unlike regular > Python). > > On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun > wrote: >> >> Hi, Tom. >> >> >> >> What version of PyPy do you use? >> >> >> >> In the Jenkins environment, `pypy` always passes like Python 2.7 and >> Python 3.4. >> >> >> >> >> https://amplab.cs.berkeley. edu/jenkins/view/Spark%20QA% >> 20Test%20(Dashboard)/job/ spark-master-test-sbt-hadoop- 2.7/3340/consoleFull >> >> >> >> == == >> >> Running PySpark tests >> >> == == >> >> Running PySpark tests. Output is in >> /home/jenkins/workspace/spark- master-test-sbt-hadoop-2.7/ >> python/unit-tests.log >> >> Will test against the following Python executables: ['python2.7', >> 'python3.4', 'pypy'] >> >> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', >> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] >> >> Starting test(python2.7): pyspark.mllib.tests >> >> Starting test(pypy): pyspark.sql.tests >> >> Starting test(pypy): pyspark.tests >> >> Starting test(pypy): pyspark.streaming.tests >> >> Finished test(pypy): pyspark.tests (181s) >> >> … >> >> >> >> Tests passed in 1130 seconds >> >> >> >> >> >> Bests, >> >> Dongjoon. >> >> >> >> >> >> From: Tom Graves >> Date: Monday, August 14, 2017 at 1:55 PM >> To: "dev@spark.apache.org" >> Subject: spark pypy support? >> >> >> >> Anyone know if pypy works with spark. Saw a jira that it was supported >> back in Spark 1.2 but getting an error when trying and not sure if its >> something with my pypy version of just something spark doesn't support. >> >> >> >> >> >> AttributeError: 'builtin-code' object has no attribute 'co_filename' >> Traceback (most recent call last): >> File "/app_main.py", line 75, in run_toplevel >> File "/homes/tgraves/mbe.py", line 40, in >> count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) >> File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line >> 834, in reduce >> vals = self.mapPartitions(func). collect() >> File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line >> 808, in collect >> port = self.ctx._jvm.PythonRDD. collectAndServe(self._jrdd. rdd()) >> File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line >> 2440, in _jrdd >> self._jrdd_deserializer, profiler) >> File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line >> 2373, in _wrap_function >> pickled_command, broadcast_vars, env, includes = >> _prepare_for_python_RDD(sc, command) >> File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line >> 2359, in _prepare_for_python_RDD >> pickled_command = ser.dumps(command) >> File >> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ serializers.py", line >> 460, in dumps >> return cloudpickle.dumps(obj, 2) >> File >> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ cloudpickle.py", line >> 703, in dumps >> cp.dump(obj) >> File >> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ cloudpickle.py", line >> 160, in dump >> >> >> >> Thanks, >> >> Tom > > > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/ holdenkarau -- -- - To unsubscribe e-mail: dev-unsubscribe@spark.apache. org -- Cell : 425-233-8271Twitter: https://twitter.com/holdenkarau
Re: spark pypy support?
Ah interesting, looking at our latest docs we imply that it should work with PyPy 2.3+ -- we might want to update that to 2.5+ since we aren't testing with 2.3 anymore? On Mon, Aug 14, 2017 at 3:09 PM, Tom Graveswrote: > I tried 5.7 and 2.5.1 so its probably something in my setup. I'll > investigate that more, wanted to make sure it was still supported because I > didn't see anything about it since the original jira that added it. > > Thanks, > Tom > > > On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp < > skn...@berkeley.edu> wrote: > > > actually, we *have* locked on a particular pypy versions for the > jenkins workers: 2.5.1 > > this applies to both the 2.7 and 3.5 conda environments. > > (py3k)-bash-4.1$ pypy --version > Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015, > 02:17:39) > [PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] > > On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau > wrote: > > As Dong says yes we do test with PyPy in our CI env; but we expect a > "newer" > > version of PyPy (although I don't think we ever bothered to write down > what > > the exact version requirements are for the PyPy support unlike regular > > Python). > > > > On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun > > wrote: > >> > >> Hi, Tom. > >> > >> > >> > >> What version of PyPy do you use? > >> > >> > >> > >> In the Jenkins environment, `pypy` always passes like Python 2.7 and > >> Python 3.4. > >> > >> > >> > >> > >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20 > (Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull > >> > >> > >> > >> > > >> > >> Running PySpark tests > >> > >> > > >> > >> Running PySpark tests. Output is in > >> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/ > python/unit-tests.log > >> > >> Will test against the following Python executables: ['python2.7', > >> 'python3.4', 'pypy'] > >> > >> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', > >> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] > >> > >> Starting test(python2.7): pyspark.mllib.tests > >> > >> Starting test(pypy): pyspark.sql.tests > >> > >> Starting test(pypy): pyspark.tests > >> > >> Starting test(pypy): pyspark.streaming.tests > >> > >> Finished test(pypy): pyspark.tests (181s) > >> > >> … > >> > >> > >> > >> Tests passed in 1130 seconds > >> > >> > >> > >> > >> > >> Bests, > >> > >> Dongjoon. > >> > >> > >> > >> > >> > >> From: Tom Graves > >> Date: Monday, August 14, 2017 at 1:55 PM > >> To: "dev@spark.apache.org" > >> Subject: spark pypy support? > >> > >> > >> > >> Anyone know if pypy works with spark. Saw a jira that it was supported > >> back in Spark 1.2 but getting an error when trying and not sure if its > >> something with my pypy version of just something spark doesn't support. > >> > >> > >> > >> > >> > >> AttributeError: 'builtin-code' object has no attribute 'co_filename' > >> Traceback (most recent call last): > >> File "/app_main.py", line 75, in run_toplevel > >> File "/homes/tgraves/mbe.py", line 40, in > >>count = sc.parallelize(range(1, n + 1), > partitions).map(f).reduce(add) > >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line > >> 834, in reduce > >>vals = self.mapPartitions(func).collect() > >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line > >> 808, in collect > >>port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line > >> 2440, in _jrdd > >>self._jrdd_deserializer, profiler) > >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line > >> 2373, in _wrap_function > >>pickled_command, broadcast_vars, env, includes = > >> _prepare_for_python_RDD(sc, command) > >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line > >> 2359, in _prepare_for_python_RDD > >>pickled_command = ser.dumps(command) > >> File > >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", > line > >> 460, in dumps > >>return cloudpickle.dumps(obj, 2) > >> File > >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", > line > >> 703, in dumps > >>cp.dump(obj) > >> File > >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", > line > >> 160, in dump > >> > >> > >> > >> Thanks, > >> > >> Tom > > > > > > > > > > -- > > Cell : 425-233-8271 <(425)%20233-8271> > > Twitter: https://twitter.com/holdenkarau > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Cell : 425-233-8271 Twitter:
Re: spark pypy support?
I tried 5.7 and 2.5.1 so its probably something in my setup. I'll investigate that more, wanted to make sure it was still supported because I didn't see anything about it since the original jira that added it. Thanks,Tom On Monday, August 14, 2017, 4:29:01 PM CDT, shane knappwrote: actually, we *have* locked on a particular pypy versions for the jenkins workers: 2.5.1 this applies to both the 2.7 and 3.5 conda environments. (py3k)-bash-4.1$ pypy --version Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015, 02:17:39) [PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau wrote: > As Dong says yes we do test with PyPy in our CI env; but we expect a "newer" > version of PyPy (although I don't think we ever bothered to write down what > the exact version requirements are for the PyPy support unlike regular > Python). > > On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun > wrote: >> >> Hi, Tom. >> >> >> >> What version of PyPy do you use? >> >> >> >> In the Jenkins environment, `pypy` always passes like Python 2.7 and >> Python 3.4. >> >> >> >> >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull >> >> >> >> >> >> Running PySpark tests >> >> >> >> Running PySpark tests. Output is in >> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log >> >> Will test against the following Python executables: ['python2.7', >> 'python3.4', 'pypy'] >> >> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', >> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] >> >> Starting test(python2.7): pyspark.mllib.tests >> >> Starting test(pypy): pyspark.sql.tests >> >> Starting test(pypy): pyspark.tests >> >> Starting test(pypy): pyspark.streaming.tests >> >> Finished test(pypy): pyspark.tests (181s) >> >> … >> >> >> >> Tests passed in 1130 seconds >> >> >> >> >> >> Bests, >> >> Dongjoon. >> >> >> >> >> >> From: Tom Graves >> Date: Monday, August 14, 2017 at 1:55 PM >> To: "dev@spark.apache.org" >> Subject: spark pypy support? >> >> >> >> Anyone know if pypy works with spark. Saw a jira that it was supported >> back in Spark 1.2 but getting an error when trying and not sure if its >> something with my pypy version of just something spark doesn't support. >> >> >> >> >> >> AttributeError: 'builtin-code' object has no attribute 'co_filename' >> Traceback (most recent call last): >> File "/app_main.py", line 75, in run_toplevel >> File "/homes/tgraves/mbe.py", line 40, in >> count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 834, in reduce >> vals = self.mapPartitions(func).collect() >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 808, in collect >> port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 2440, in _jrdd >> self._jrdd_deserializer, profiler) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 2373, in _wrap_function >> pickled_command, broadcast_vars, env, includes = >> _prepare_for_python_RDD(sc, command) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 2359, in _prepare_for_python_RDD >> pickled_command = ser.dumps(command) >> File >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line >> 460, in dumps >> return cloudpickle.dumps(obj, 2) >> File >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line >> 703, in dumps >> cp.dump(obj) >> File >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line >> 160, in dump >> >> >> >> Thanks, >> >> Tom > > > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: spark pypy support?
actually, we *have* locked on a particular pypy versions for the jenkins workers: 2.5.1 this applies to both the 2.7 and 3.5 conda environments. (py3k)-bash-4.1$ pypy --version Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015, 02:17:39) [PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] On Mon, Aug 14, 2017 at 2:24 PM, Holden Karauwrote: > As Dong says yes we do test with PyPy in our CI env; but we expect a "newer" > version of PyPy (although I don't think we ever bothered to write down what > the exact version requirements are for the PyPy support unlike regular > Python). > > On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun > wrote: >> >> Hi, Tom. >> >> >> >> What version of PyPy do you use? >> >> >> >> In the Jenkins environment, `pypy` always passes like Python 2.7 and >> Python 3.4. >> >> >> >> >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull >> >> >> >> >> >> Running PySpark tests >> >> >> >> Running PySpark tests. Output is in >> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log >> >> Will test against the following Python executables: ['python2.7', >> 'python3.4', 'pypy'] >> >> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', >> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] >> >> Starting test(python2.7): pyspark.mllib.tests >> >> Starting test(pypy): pyspark.sql.tests >> >> Starting test(pypy): pyspark.tests >> >> Starting test(pypy): pyspark.streaming.tests >> >> Finished test(pypy): pyspark.tests (181s) >> >> … >> >> >> >> Tests passed in 1130 seconds >> >> >> >> >> >> Bests, >> >> Dongjoon. >> >> >> >> >> >> From: Tom Graves >> Date: Monday, August 14, 2017 at 1:55 PM >> To: "dev@spark.apache.org" >> Subject: spark pypy support? >> >> >> >> Anyone know if pypy works with spark. Saw a jira that it was supported >> back in Spark 1.2 but getting an error when trying and not sure if its >> something with my pypy version of just something spark doesn't support. >> >> >> >> >> >> AttributeError: 'builtin-code' object has no attribute 'co_filename' >> Traceback (most recent call last): >> File "/app_main.py", line 75, in run_toplevel >> File "/homes/tgraves/mbe.py", line 40, in >> count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 834, in reduce >> vals = self.mapPartitions(func).collect() >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 808, in collect >> port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 2440, in _jrdd >> self._jrdd_deserializer, profiler) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 2373, in _wrap_function >> pickled_command, broadcast_vars, env, includes = >> _prepare_for_python_RDD(sc, command) >> File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line >> 2359, in _prepare_for_python_RDD >> pickled_command = ser.dumps(command) >> File >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line >> 460, in dumps >> return cloudpickle.dumps(obj, 2) >> File >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line >> 703, in dumps >> cp.dump(obj) >> File >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line >> 160, in dump >> >> >> >> Thanks, >> >> Tom > > > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: spark pypy support?
As Dong says yes we do test with PyPy in our CI env; but we expect a "newer" version of PyPy (although I don't think we ever bothered to write down what the exact version requirements are for the PyPy support unlike regular Python). On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyunwrote: > Hi, Tom. > > > > What version of PyPy do you use? > > > > In the Jenkins environment, `pypy` always passes like Python 2.7 and > Python 3.4. > > > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA% > 20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull > > > > > > Running PySpark tests > > > > Running PySpark tests. Output is in /home/jenkins/workspace/spark- > master-test-sbt-hadoop-2.7/python/unit-tests.log > > Will test against the following Python executables: ['python2.7', > 'python3.4', 'pypy'] > > Will test the following Python modules: ['pyspark-core', 'pyspark-ml', > 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] > > Starting test(python2.7): pyspark.mllib.tests > > Starting test(pypy): pyspark.sql.tests > > Starting test(pypy): pyspark.tests > > Starting test(pypy): pyspark.streaming.tests > > Finished test(pypy): pyspark.tests (181s) > > … > > > > Tests passed in 1130 seconds > > > > > > Bests, > > Dongjoon. > > > > > > *From: *Tom Graves > *Date: *Monday, August 14, 2017 at 1:55 PM > *To: *"dev@spark.apache.org" > *Subject: *spark pypy support? > > > > Anyone know if pypy works with spark. Saw a jira that it was supported > back in Spark 1.2 but getting an error when trying and not sure if its > something with my pypy version of just something spark doesn't support. > > > > > > AttributeError: 'builtin-code' object has no attribute 'co_filename' > Traceback (most recent call last): > File "/app_main.py", line 75, in run_toplevel > File "/homes/tgraves/mbe.py", line 40, in > count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line 834, in reduce > vals = self.mapPartitions(func).collect() > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line 808, in collect > port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line 2440, in _jrdd > self._jrdd_deserializer, profiler) > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line 2373, in _wrap_function > pickled_command, broadcast_vars, env, includes = > _prepare_for_python_RDD(sc, command) > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", > line 2359, in _prepare_for_python_RDD > pickled_command = ser.dumps(command) > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", > line 460, in dumps > return cloudpickle.dumps(obj, 2) > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", > line 703, in dumps > cp.dump(obj) > File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", > line 160, in dump > > > > Thanks, > > Tom > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau
Re: spark pypy support?
Hi, Tom. What version of PyPy do you use? In the Jenkins environment, `pypy` always passes like Python 2.7 and Python 3.4. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull Running PySpark tests Running PySpark tests. Output is in /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'python3.4', 'pypy'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Starting test(python2.7): pyspark.mllib.tests Starting test(pypy): pyspark.sql.tests Starting test(pypy): pyspark.tests Starting test(pypy): pyspark.streaming.tests Finished test(pypy): pyspark.tests (181s) … Tests passed in 1130 seconds Bests, Dongjoon. From: Tom GravesDate: Monday, August 14, 2017 at 1:55 PM To: "dev@spark.apache.org" Subject: spark pypy support? Anyone know if pypy works with spark. Saw a jira that it was supported back in Spark 1.2 but getting an error when trying and not sure if its something with my pypy version of just something spark doesn't support. AttributeError: 'builtin-code' object has no attribute 'co_filename' Traceback (most recent call last): File "/app_main.py", line 75, in run_toplevel File "/homes/tgraves/mbe.py", line 40, in count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in reduce vals = self.mapPartitions(func).collect() File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 808, in collect port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2440, in _jrdd self._jrdd_deserializer, profiler) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2373, in _wrap_function pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2359, in _prepare_for_python_RDD pickled_command = ser.dumps(command) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line 460, in dumps return cloudpickle.dumps(obj, 2) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 703, in dumps cp.dump(obj) File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 160, in dump Thanks, Tom