You could try running PySpark's own unit tests. Try ./python/run-tests --help for instructions.
On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <sumti...@gmail.com> wrote: > I've test on following pypy version against to spark-1.5.1 > > pypy-2.2.1 > pypy-2.3 > pypy-2.3.1 > pypy-2.4.0 > pypy-2.5.0 > pypy-2.5.1 > pypy-2.6.0 > pypy-2.6.1 > > I run > > $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy > /path/to/spark-1.5.1/bin/pyspark > > and only pypy-2.2.1 failed. > > Any suggestion to run advanced test? > > On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <sumti...@gmail.com> wrote: > >> Thanks for your quickly reply. >> >> I will test several pypy versions and report the result later. >> >> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <rosenvi...@gmail.com> wrote: >> >>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's >>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy >>> version to see if that works? >>> >>> I just checked and it looks like our Jenkins tests are running against >>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual >>> minimum supported PyPy version is. Would you be interested in helping to >>> investigate so that we can update the documentation or produce a fix to >>> restore compatibility with earlier PyPy builds? >>> >>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <sumti...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> I am trying to run pyspark with pypy, and it is work when using >>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1 >>>> >>>> my pypy version: >>>> >>>> $ /usr/bin/pypy --version >>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) >>>> [PyPy 2.2.1 with GCC 4.8.4] >>>> >>>> works with spark-1.3.1 >>>> >>>> $ PYSPARK_PYTHON=/usr/bin/pypy >>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark >>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) >>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a >>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface >>>> eth0) >>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to >>>> another address >>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop >>>> library for your platform... using builtin-java classes where applicable >>>> Welcome to >>>> ____ __ >>>> / __/__ ___ _____/ /__ >>>> _\ \/ _ \/ _ `/ __/ '_/ >>>> /__ / .__/\_,_/_/ /_/\_\ version 1.3.1 >>>> /_/ >>>> >>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015) >>>> SparkContext available as sc, HiveContext available as sqlContext. >>>> And now for something completely different: ``Armin: "Prolog is a >>>> mess.", CF: >>>> "No, it's very cool!", Armin: "Isn't this what I said?"'' >>>> >>> >>>> >>>> error message for 1.5.1 >>>> >>>> $ PYSPARK_PYTHON=/usr/bin/pypy >>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark >>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) >>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>> Traceback (most recent call last): >>>> File "app_main.py", line 72, in run_toplevel >>>> File "app_main.py", line 614, in run_it >>>> File >>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py", >>>> line 30, in <module> >>>> import pyspark >>>> File >>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py", >>>> line 41, in <module> >>>> from pyspark.context import SparkContext >>>> File >>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py", >>>> line 26, in <module> >>>> from pyspark import accumulators >>>> File >>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py", >>>> line 98, in <module> >>>> from pyspark.serializers import read_int, PickleSerializer >>>> File >>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", >>>> line 400, in <module> >>>> _hijack_namedtuple() >>>> File >>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", >>>> line 378, in _hijack_namedtuple >>>> _old_namedtuple = _copy_func(collections.namedtuple) >>>> File >>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", >>>> line 376, in _copy_func >>>> f.__defaults__, f.__closure__) >>>> AttributeError: 'function' object has no attribute '__closure__' >>>> And now for something completely different: ``the traces don't lie'' >>>> >>>> is this a known issue? any suggestion to resolve it? or how can I help >>>> to fix this problem? >>>> >>>> Thanks. >>>> >>> >>> >> >> >> -- >> -- 張雅軒 >> > > > > -- > -- 張雅軒 >