[ https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307463#comment-14307463 ]
Josh Rosen commented on SPARK-4897: ----------------------------------- Hi [~ianozsvald], Until now, the main motivation for Python 2.6 support was that it's the default system Python on a few Linux distributions. So far, I think the overhead of supporting 2.6 has been fairly minimal, mostly involving a handful of small changes such as not treating certain object as context managers (e.g. Zipfile objects). Let's try porting to 2.7 / 3.4 and then re-assess how hard Python 2.6 support will be. If it's really easy (a couple hours of work, max) then I don't see a reason to drop it, but if we have to go to increasingly convoluted lengths to keep it then it's probably not worth it if we're gaining 3.4 support in return. I think the main blocker to Python 3.4 support is the fact that nobody has really had time to work on it. I'd be happy to work with anyone who is interested in taking this on. > Python 3 support > ---------------- > > Key: SPARK-4897 > URL: https://issues.apache.org/jira/browse/SPARK-4897 > Project: Spark > Issue Type: Improvement > Components: PySpark > Reporter: Josh Rosen > Priority: Minor > > It would be nice to have Python 3 support in PySpark, provided that we can do > it in a way that maintains backwards-compatibility with Python 2.6. > I started looking into porting this; my WIP work can be found at > https://github.com/JoshRosen/spark/compare/python3 > I was able to use the > [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] > tool to handle the basic conversion of things like {{print}} statements, etc. > and had to manually fix up a few imports for packages that moved / were > renamed, but the major blocker that I hit was {{cloudpickle}}: > {code} > [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark > Python 3.4.2 (default, Oct 19 2014, 17:52:17) > [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > Traceback (most recent call last): > File "/Users/joshrosen/Documents/Spark/python/pyspark/shell.py", line 28, > in <module> > import pyspark > File "/Users/joshrosen/Documents/spark/python/pyspark/__init__.py", line > 41, in <module> > from pyspark.context import SparkContext > File "/Users/joshrosen/Documents/spark/python/pyspark/context.py", line 26, > in <module> > from pyspark import accumulators > File "/Users/joshrosen/Documents/spark/python/pyspark/accumulators.py", > line 97, in <module> > from pyspark.cloudpickle import CloudPickler > File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line > 120, in <module> > class CloudPickler(pickle.Pickler): > File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line > 122, in CloudPickler > dispatch = pickle.Pickler.dispatch.copy() > AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch' > {code} > This code looks like it will be hard difficult to port to Python 3, so this > might be a good reason to switch to > [Dill|https://github.com/uqfoundation/dill] for Python serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org