Hi everyone, We just merged Python 3 support for PySpark into Spark's master branch (which will become Spark 1.4.0). This means that PySpark now supports Python 2.6+, PyPy 2.5+, and Python 3.4+.
To run with Python 3, download and build Spark from the master branch then configure the PYSPARK_PYTHON environment variable to point to a Python 3.4 executable. For example: PYSPARK_PYTHON=python3.4 ./bin/pyspark For more details on this feature, see the pull request and JIRA: - https://github.com/apache/spark/pull/5173 - https://issues.apache.org/jira/browse/SPARK-4897 For Spark contributors, this change means that any open PySpark pull requests are now likely to have merge conflicts. If a pull request does not have merge conflicts, we should still re-test it with Jenkins to check that it still works under Python 3. When backporting Python patches, committers may wish to run the PySpark unit tests locally to make sure that the change still work correctly in older branches. I can also help with backports / fixing conflicts. Thanks to Davies Liu, Shane Knapp, Thom Neale, Xiangrui Meng, and everyone else who helped with this patch. - Josh