Re: spark pypy support?

2017-08-15 Thread Tom Graves
Just curious, is this using the portable version of pypy or standard version 
(ubuntu?)?
Tom

On Monday, August 14, 2017, 5:27:11 PM CDT, Holden Karau  
wrote:

Ah interesting, looking at our latest docs we imply that it should work with 
PyPy 2.3+ -- we might want to update that to 2.5+ since we aren't testing with 
2.3 anymore?
On Mon, Aug 14, 2017 at 3:09 PM, Tom Graves  
wrote:

I tried 5.7 and 2.5.1 so its probably something in my setup.  I'll investigate 
that more, wanted to make sure it was still supported because I didn't see 
anything about it since the original jira that added it.
Thanks,Tom

On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp  
wrote:

actually, we *have* locked on a particular pypy versions for the
jenkins workers:  2.5.1

this applies to both the 2.7 and 3.5 conda environments.

(py3k)-bash-4.1$ pypy --version
Python 2.7.9 ( 9c4588d731b7fe0b08669bd732c2b6 76cb0a8233, Apr 09 2015, 02:17:39)
[PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau  wrote:
> As Dong says yes we do test with PyPy in our CI env; but we expect a "newer"
> version of PyPy (although I don't think we ever bothered to write down what
> the exact version requirements are for the PyPy support unlike regular
> Python).
>
> On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun 
> wrote:
>>
>> Hi, Tom.
>>
>>
>>
>> What version of PyPy do you use?
>>
>>
>>
>> In the Jenkins environment, `pypy` always passes like Python 2.7 and
>> Python 3.4.
>>
>>
>>
>>
>> https://amplab.cs.berkeley. edu/jenkins/view/Spark%20QA% 
>> 20Test%20(Dashboard)/job/ spark-master-test-sbt-hadoop- 2.7/3340/consoleFull
>>
>>
>>
>> == == 
>>
>> Running PySpark tests
>>
>> == == 
>>
>> Running PySpark tests. Output is in
>> /home/jenkins/workspace/spark- master-test-sbt-hadoop-2.7/ 
>> python/unit-tests.log
>>
>> Will test against the following Python executables: ['python2.7',
>> 'python3.4', 'pypy']
>>
>> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
>> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>>
>> Starting test(python2.7): pyspark.mllib.tests
>>
>> Starting test(pypy): pyspark.sql.tests
>>
>> Starting test(pypy): pyspark.tests
>>
>> Starting test(pypy): pyspark.streaming.tests
>>
>> Finished test(pypy): pyspark.tests (181s)
>>
>> …
>>
>>
>>
>> Tests passed in 1130 seconds
>>
>>
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> From: Tom Graves 
>> Date: Monday, August 14, 2017 at 1:55 PM
>> To: "dev@spark.apache.org" 
>> Subject: spark pypy support?
>>
>>
>>
>> Anyone know if pypy works with spark. Saw a jira that it was supported
>> back in Spark 1.2 but getting an error when trying and not sure if its
>> something with my pypy version of just something spark doesn't support.
>>
>>
>>
>>
>>
>> AttributeError: 'builtin-code' object has no attribute 'co_filename'
>> Traceback (most recent call last):
>>  File "/app_main.py", line 75, in run_toplevel
>>  File "/homes/tgraves/mbe.py", line 40, in 
>>    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 834, in reduce
>>    vals = self.mapPartitions(func). collect()
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 808, in collect
>>    port = self.ctx._jvm.PythonRDD. collectAndServe(self._jrdd. rdd())
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 2440, in _jrdd
>>    self._jrdd_deserializer, profiler)
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 2373, in _wrap_function
>>    pickled_command, broadcast_vars, env, includes =
>> _prepare_for_python_RDD(sc, command)
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 2359, in _prepare_for_python_RDD
>>    pickled_command = ser.dumps(command)
>>  File
>> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ serializers.py", line
>> 460, in dumps
>>    return cloudpickle.dumps(obj, 2)
>>  File
>> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ cloudpickle.py", line
>> 703, in dumps
>>    cp.dump(obj)
>>  File
>> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ cloudpickle.py", line
>> 160, in dump
>>
>>
>>
>> Thanks,
>>
>> Tom
>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/ holdenkarau

-- -- -
To unsubscribe e-mail: dev-unsubscribe@spark.apache. org




-- 
Cell : 425-233-8271Twitter: https://twitter.com/holdenkarau

Re: spark pypy support?

2017-08-14 Thread Holden Karau
Ah interesting, looking at our latest docs we imply that it should work
with PyPy 2.3+ -- we might want to update that to 2.5+ since we aren't
testing with 2.3 anymore?

On Mon, Aug 14, 2017 at 3:09 PM, Tom Graves 
wrote:

> I tried 5.7 and 2.5.1 so its probably something in my setup.  I'll
> investigate that more, wanted to make sure it was still supported because I
> didn't see anything about it since the original jira that added it.
>
> Thanks,
> Tom
>
>
> On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp <
> skn...@berkeley.edu> wrote:
>
>
> actually, we *have* locked on a particular pypy versions for the
> jenkins workers:  2.5.1
>
> this applies to both the 2.7 and 3.5 conda environments.
>
> (py3k)-bash-4.1$ pypy --version
> Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015,
> 02:17:39)
> [PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]
>
> On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau 
> wrote:
> > As Dong says yes we do test with PyPy in our CI env; but we expect a
> "newer"
> > version of PyPy (although I don't think we ever bothered to write down
> what
> > the exact version requirements are for the PyPy support unlike regular
> > Python).
> >
> > On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun 
> > wrote:
> >>
> >> Hi, Tom.
> >>
> >>
> >>
> >> What version of PyPy do you use?
> >>
> >>
> >>
> >> In the Jenkins environment, `pypy` always passes like Python 2.7 and
> >> Python 3.4.
> >>
> >>
> >>
> >>
> >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20
> (Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
> >>
> >>
> >>
> >> 
> 
> >>
> >> Running PySpark tests
> >>
> >> 
> 
> >>
> >> Running PySpark tests. Output is in
> >> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/
> python/unit-tests.log
> >>
> >> Will test against the following Python executables: ['python2.7',
> >> 'python3.4', 'pypy']
> >>
> >> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
> >> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> >>
> >> Starting test(python2.7): pyspark.mllib.tests
> >>
> >> Starting test(pypy): pyspark.sql.tests
> >>
> >> Starting test(pypy): pyspark.tests
> >>
> >> Starting test(pypy): pyspark.streaming.tests
> >>
> >> Finished test(pypy): pyspark.tests (181s)
> >>
> >> …
> >>
> >>
> >>
> >> Tests passed in 1130 seconds
> >>
> >>
> >>
> >>
> >>
> >> Bests,
> >>
> >> Dongjoon.
> >>
> >>
> >>
> >>
> >>
> >> From: Tom Graves 
> >> Date: Monday, August 14, 2017 at 1:55 PM
> >> To: "dev@spark.apache.org" 
> >> Subject: spark pypy support?
> >>
> >>
> >>
> >> Anyone know if pypy works with spark. Saw a jira that it was supported
> >> back in Spark 1.2 but getting an error when trying and not sure if its
> >> something with my pypy version of just something spark doesn't support.
> >>
> >>
> >>
> >>
> >>
> >> AttributeError: 'builtin-code' object has no attribute 'co_filename'
> >> Traceback (most recent call last):
> >>  File "/app_main.py", line 75, in run_toplevel
> >>  File "/homes/tgraves/mbe.py", line 40, in 
> >>count = sc.parallelize(range(1, n + 1),
> partitions).map(f).reduce(add)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 834, in reduce
> >>vals = self.mapPartitions(func).collect()
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 808, in collect
> >>port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2440, in _jrdd
> >>self._jrdd_deserializer, profiler)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2373, in _wrap_function
> >>pickled_command, broadcast_vars, env, includes =
> >> _prepare_for_python_RDD(sc, command)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2359, in _prepare_for_python_RDD
> >>pickled_command = ser.dumps(command)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py",
> line
> >> 460, in dumps
> >>return cloudpickle.dumps(obj, 2)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line
> >> 703, in dumps
> >>cp.dump(obj)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line
> >> 160, in dump
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Tom
> >
> >
> >
> >
> > --
> > Cell : 425-233-8271 <(425)%20233-8271>
> > Twitter: https://twitter.com/holdenkarau
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Cell : 425-233-8271
Twitter: 

Re: spark pypy support?

2017-08-14 Thread Tom Graves
I tried 5.7 and 2.5.1 so its probably something in my setup.  I'll investigate 
that more, wanted to make sure it was still supported because I didn't see 
anything about it since the original jira that added it.
Thanks,Tom

On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp  
wrote:

actually, we *have* locked on a particular pypy versions for the
jenkins workers:  2.5.1

this applies to both the 2.7 and 3.5 conda environments.

(py3k)-bash-4.1$ pypy --version
Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015, 02:17:39)
[PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau  wrote:
> As Dong says yes we do test with PyPy in our CI env; but we expect a "newer"
> version of PyPy (although I don't think we ever bothered to write down what
> the exact version requirements are for the PyPy support unlike regular
> Python).
>
> On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun 
> wrote:
>>
>> Hi, Tom.
>>
>>
>>
>> What version of PyPy do you use?
>>
>>
>>
>> In the Jenkins environment, `pypy` always passes like Python 2.7 and
>> Python 3.4.
>>
>>
>>
>>
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
>>
>>
>>
>> 
>>
>> Running PySpark tests
>>
>> 
>>
>> Running PySpark tests. Output is in
>> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log
>>
>> Will test against the following Python executables: ['python2.7',
>> 'python3.4', 'pypy']
>>
>> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
>> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>>
>> Starting test(python2.7): pyspark.mllib.tests
>>
>> Starting test(pypy): pyspark.sql.tests
>>
>> Starting test(pypy): pyspark.tests
>>
>> Starting test(pypy): pyspark.streaming.tests
>>
>> Finished test(pypy): pyspark.tests (181s)
>>
>> …
>>
>>
>>
>> Tests passed in 1130 seconds
>>
>>
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> From: Tom Graves 
>> Date: Monday, August 14, 2017 at 1:55 PM
>> To: "dev@spark.apache.org" 
>> Subject: spark pypy support?
>>
>>
>>
>> Anyone know if pypy works with spark. Saw a jira that it was supported
>> back in Spark 1.2 but getting an error when trying and not sure if its
>> something with my pypy version of just something spark doesn't support.
>>
>>
>>
>>
>>
>> AttributeError: 'builtin-code' object has no attribute 'co_filename'
>> Traceback (most recent call last):
>>  File "/app_main.py", line 75, in run_toplevel
>>  File "/homes/tgraves/mbe.py", line 40, in 
>>    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 834, in reduce
>>    vals = self.mapPartitions(func).collect()
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 808, in collect
>>    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2440, in _jrdd
>>    self._jrdd_deserializer, profiler)
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2373, in _wrap_function
>>    pickled_command, broadcast_vars, env, includes =
>> _prepare_for_python_RDD(sc, command)
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2359, in _prepare_for_python_RDD
>>    pickled_command = ser.dumps(command)
>>  File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line
>> 460, in dumps
>>    return cloudpickle.dumps(obj, 2)
>>  File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 703, in dumps
>>    cp.dump(obj)
>>  File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 160, in dump
>>
>>
>>
>> Thanks,
>>
>> Tom
>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Re: spark pypy support?

2017-08-14 Thread shane knapp
actually, we *have* locked on a particular pypy versions for the
jenkins workers:  2.5.1

this applies to both the 2.7 and 3.5 conda environments.

(py3k)-bash-4.1$ pypy --version
Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015, 02:17:39)
[PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau  wrote:
> As Dong says yes we do test with PyPy in our CI env; but we expect a "newer"
> version of PyPy (although I don't think we ever bothered to write down what
> the exact version requirements are for the PyPy support unlike regular
> Python).
>
> On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun 
> wrote:
>>
>> Hi, Tom.
>>
>>
>>
>> What version of PyPy do you use?
>>
>>
>>
>> In the Jenkins environment, `pypy` always passes like Python 2.7 and
>> Python 3.4.
>>
>>
>>
>>
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
>>
>>
>>
>> 
>>
>> Running PySpark tests
>>
>> 
>>
>> Running PySpark tests. Output is in
>> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log
>>
>> Will test against the following Python executables: ['python2.7',
>> 'python3.4', 'pypy']
>>
>> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
>> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>>
>> Starting test(python2.7): pyspark.mllib.tests
>>
>> Starting test(pypy): pyspark.sql.tests
>>
>> Starting test(pypy): pyspark.tests
>>
>> Starting test(pypy): pyspark.streaming.tests
>>
>> Finished test(pypy): pyspark.tests (181s)
>>
>> …
>>
>>
>>
>> Tests passed in 1130 seconds
>>
>>
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> From: Tom Graves 
>> Date: Monday, August 14, 2017 at 1:55 PM
>> To: "dev@spark.apache.org" 
>> Subject: spark pypy support?
>>
>>
>>
>> Anyone know if pypy works with spark. Saw a jira that it was supported
>> back in Spark 1.2 but getting an error when trying and not sure if its
>> something with my pypy version of just something spark doesn't support.
>>
>>
>>
>>
>>
>> AttributeError: 'builtin-code' object has no attribute 'co_filename'
>> Traceback (most recent call last):
>>   File "/app_main.py", line 75, in run_toplevel
>>   File "/homes/tgraves/mbe.py", line 40, in 
>> count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 834, in reduce
>> vals = self.mapPartitions(func).collect()
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 808, in collect
>> port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2440, in _jrdd
>> self._jrdd_deserializer, profiler)
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2373, in _wrap_function
>> pickled_command, broadcast_vars, env, includes =
>> _prepare_for_python_RDD(sc, command)
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2359, in _prepare_for_python_RDD
>> pickled_command = ser.dumps(command)
>>   File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line
>> 460, in dumps
>> return cloudpickle.dumps(obj, 2)
>>   File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 703, in dumps
>> cp.dump(obj)
>>   File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 160, in dump
>>
>>
>>
>> Thanks,
>>
>> Tom
>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: spark pypy support?

2017-08-14 Thread Holden Karau
As Dong says yes we do test with PyPy in our CI env; but we expect a
"newer" version of PyPy (although I don't think we ever bothered to write
down what the exact version requirements are for the PyPy support unlike
regular Python).

On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun 
wrote:

> Hi, Tom.
>
>
>
> What version of PyPy do you use?
>
>
>
> In the Jenkins environment, `pypy` always passes like Python 2.7 and
> Python 3.4.
>
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%
> 20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
>
>
>
> 
>
> Running PySpark tests
>
> 
>
> Running PySpark tests. Output is in /home/jenkins/workspace/spark-
> master-test-sbt-hadoop-2.7/python/unit-tests.log
>
> Will test against the following Python executables: ['python2.7',
> 'python3.4', 'pypy']
>
> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>
> Starting test(python2.7): pyspark.mllib.tests
>
> Starting test(pypy): pyspark.sql.tests
>
> Starting test(pypy): pyspark.tests
>
> Starting test(pypy): pyspark.streaming.tests
>
> Finished test(pypy): pyspark.tests (181s)
>
> …
>
>
>
> Tests passed in 1130 seconds
>
>
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> *From: *Tom Graves 
> *Date: *Monday, August 14, 2017 at 1:55 PM
> *To: *"dev@spark.apache.org" 
> *Subject: *spark pypy support?
>
>
>
> Anyone know if pypy works with spark. Saw a jira that it was supported
> back in Spark 1.2 but getting an error when trying and not sure if its
> something with my pypy version of just something spark doesn't support.
>
>
>
>
>
> AttributeError: 'builtin-code' object has no attribute 'co_filename'
> Traceback (most recent call last):
>   File "/app_main.py", line 75, in run_toplevel
>   File "/homes/tgraves/mbe.py", line 40, in 
> count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 834, in reduce
> vals = self.mapPartitions(func).collect()
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 808, in collect
> port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2440, in _jrdd
> self._jrdd_deserializer, profiler)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2373, in _wrap_function
> pickled_command, broadcast_vars, env, includes =
> _prepare_for_python_RDD(sc, command)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2359, in _prepare_for_python_RDD
> pickled_command = ser.dumps(command)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py",
> line 460, in dumps
> return cloudpickle.dumps(obj, 2)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 703, in dumps
> cp.dump(obj)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 160, in dump
>
>
>
> Thanks,
>
> Tom
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: spark pypy support?

2017-08-14 Thread Dong Joon Hyun
Hi, Tom.

What version of PyPy do you use?

In the Jenkins environment, `pypy` always passes like Python 2.7 and Python 3.4.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull


Running PySpark tests

Running PySpark tests. Output is in 
/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'python3.4', 
'pypy']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Starting test(python2.7): pyspark.mllib.tests
Starting test(pypy): pyspark.sql.tests
Starting test(pypy): pyspark.tests
Starting test(pypy): pyspark.streaming.tests
Finished test(pypy): pyspark.tests (181s)
…

Tests passed in 1130 seconds


Bests,
Dongjoon.


From: Tom Graves 
Date: Monday, August 14, 2017 at 1:55 PM
To: "dev@spark.apache.org" 
Subject: spark pypy support?

Anyone know if pypy works with spark. Saw a jira that it was supported back in 
Spark 1.2 but getting an error when trying and not sure if its something with 
my pypy version of just something spark doesn't support.


AttributeError: 'builtin-code' object has no attribute 'co_filename'
Traceback (most recent call last):
  File "/app_main.py", line 75, in run_toplevel
  File "/homes/tgraves/mbe.py", line 40, in 
count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 834, 
in reduce
vals = self.mapPartitions(func).collect()
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 808, 
in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 
2440, in _jrdd
self._jrdd_deserializer, profiler)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 
2373, in _wrap_function
pickled_command, broadcast_vars, env, includes = 
_prepare_for_python_RDD(sc, command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 
2359, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", 
line 460, in dumps
return cloudpickle.dumps(obj, 2)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
line 703, in dumps
cp.dump(obj)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
line 160, in dump

Thanks,
Tom