pyspark with pypy not work for spark-1.5.1

Chang Ya-Hsuan Wed, 04 Nov 2015 23:58:14 -0800

Hi all,

I am trying to run pyspark with pypy, and it is work when using spark-1.3.1
but failed when using spark-1.4.1 and spark-1.5.1


my pypy version:

$ /usr/bin/pypy --version
Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
[PyPy 2.2.1 with GCC 4.8.4]

works with spark-1.3.1

$ PYSPARK_PYTHON=/usr/bin/pypy ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
[PyPy 2.2.1 with GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a loopback
address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface eth0)
15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
      /_/

Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
SparkContext available as sc, HiveContext available as sqlContext.
And now for something completely different: ``Armin: "Prolog is a mess.",
CF:
"No, it's very cool!", Armin: "Isn't this what I said?"''
>>>

error message for 1.5.1

$ PYSPARK_PYTHON=/usr/bin/pypy ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
[PyPy 2.2.1 with GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "app_main.py", line 72, in run_toplevel
  File "app_main.py", line 614, in run_it
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
line 30, in <module>
    import pyspark
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
line 41, in <module>
    from pyspark.context import SparkContext
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
line 26, in <module>
    from pyspark import accumulators
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
line 98, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
line 400, in <module>
    _hijack_namedtuple()
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
line 378, in _hijack_namedtuple
    _old_namedtuple = _copy_func(collections.namedtuple)
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
line 376, in _copy_func
    f.__defaults__, f.__closure__)
AttributeError: 'function' object has no attribute '__closure__'
And now for something completely different: ``the traces don't lie''

is this a known issue? any suggestion to resolve it? or how can I help to
fix this problem?

Thanks.

pyspark with pypy not work for spark-1.5.1

Reply via email to