Re: pyspark with pypy not work for spark-1.5.1

Josh Rosen Thu, 05 Nov 2015 10:24:18 -0800

You could try running PySpark's own unit tests. Try ./python/run-tests
--help for instructions.


On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <sumti...@gmail.com> wrote:

> I've test on following pypy version against to spark-1.5.1
>
>   pypy-2.2.1
>   pypy-2.3
>   pypy-2.3.1
>   pypy-2.4.0
>   pypy-2.5.0
>   pypy-2.5.1
>   pypy-2.6.0
>   pypy-2.6.1
>
> I run
>
>     $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
> /path/to/spark-1.5.1/bin/pyspark
>
> and only pypy-2.2.1 failed.
>
> Any suggestion to run advanced test?
>
> On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <sumti...@gmail.com> wrote:
>
>> Thanks for your quickly reply.
>>
>> I will test several pypy versions and report the result later.
>>
>> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <rosenvi...@gmail.com> wrote:
>>
>>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
>>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
>>> version to see if that works?
>>>
>>> I just checked and it looks like our Jenkins tests are running against
>>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
>>> minimum supported PyPy version is. Would you be interested in helping to
>>> investigate so that we can update the documentation or produce a fix to
>>> restore compatibility with earlier PyPy builds?
>>>
>>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <sumti...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to run pyspark with pypy, and it is work when using
>>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>>>
>>>> my pypy version:
>>>>
>>>> $ /usr/bin/pypy --version
>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>> [PyPy 2.2.1 with GCC 4.8.4]
>>>>
>>>> works with spark-1.3.1
>>>>
>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>>>> eth0)
>>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>>>> another address
>>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> Welcome to
>>>>       ____              __
>>>>      / __/__  ___ _____/ /__
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>>>       /_/
>>>>
>>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>> And now for something completely different: ``Armin: "Prolog is a
>>>> mess.", CF:
>>>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>>>> >>>
>>>>
>>>> error message for 1.5.1
>>>>
>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> Traceback (most recent call last):
>>>>   File "app_main.py", line 72, in run_toplevel
>>>>   File "app_main.py", line 614, in run_it
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
>>>> line 30, in <module>
>>>>     import pyspark
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>>>> line 41, in <module>
>>>>     from pyspark.context import SparkContext
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>>>> line 26, in <module>
>>>>     from pyspark import accumulators
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>>>> line 98, in <module>
>>>>     from pyspark.serializers import read_int, PickleSerializer
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>> line 400, in <module>
>>>>     _hijack_namedtuple()
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>> line 378, in _hijack_namedtuple
>>>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>> line 376, in _copy_func
>>>>     f.__defaults__, f.__closure__)
>>>> AttributeError: 'function' object has no attribute '__closure__'
>>>> And now for something completely different: ``the traces don't lie''
>>>>
>>>> is this a known issue? any suggestion to resolve it? or how can I help
>>>> to fix this problem?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>>
>> --
>> -- 張雅軒
>>
>
>
>
> --
> -- 張雅軒
>

Re: pyspark with pypy not work for spark-1.5.1

Reply via email to