[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307463#comment-14307463
 ] 

Josh Rosen commented on SPARK-4897:
-----------------------------------

Hi [~ianozsvald],

Until now, the main motivation for Python 2.6 support was that it's the default 
system Python on a few Linux distributions.  So far, I think the overhead of 
supporting 2.6 has been fairly minimal, mostly involving a handful of small 
changes such as not treating certain object as context managers (e.g. Zipfile 
objects).

Let's try porting to 2.7 / 3.4 and then re-assess how hard Python 2.6 support 
will be.  If it's really easy (a couple hours of work, max) then I don't see a 
reason to drop it, but if we have to go to increasingly convoluted lengths to 
keep it then it's probably not worth it if we're gaining 3.4 support in return.

I think the main blocker to Python 3.4 support is the fact that nobody has 
really had time to work on it.  I'd be happy to work with anyone who is 
interested in taking this on.



> Python 3 support
> ----------------
>
>                 Key: SPARK-4897
>                 URL: https://issues.apache.org/jira/browse/SPARK-4897
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Josh Rosen
>            Priority: Minor
>
> It would be nice to have Python 3 support in PySpark, provided that we can do 
> it in a way that maintains backwards-compatibility with Python 2.6.
> I started looking into porting this; my WIP work can be found at 
> https://github.com/JoshRosen/spark/compare/python3
> I was able to use the 
> [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
> tool to handle the basic conversion of things like {{print}} statements, etc. 
> and had to manually fix up a few imports for packages that moved / were 
> renamed, but the major blocker that I hit was {{cloudpickle}}:
> {code}
> [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
> Python 3.4.2 (default, Oct 19 2014, 17:52:17)
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Traceback (most recent call last):
>   File "/Users/joshrosen/Documents/Spark/python/pyspark/shell.py", line 28, 
> in <module>
>     import pyspark
>   File "/Users/joshrosen/Documents/spark/python/pyspark/__init__.py", line 
> 41, in <module>
>     from pyspark.context import SparkContext
>   File "/Users/joshrosen/Documents/spark/python/pyspark/context.py", line 26, 
> in <module>
>     from pyspark import accumulators
>   File "/Users/joshrosen/Documents/spark/python/pyspark/accumulators.py", 
> line 97, in <module>
>     from pyspark.cloudpickle import CloudPickler
>   File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 
> 120, in <module>
>     class CloudPickler(pickle.Pickler):
>   File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 
> 122, in CloudPickler
>     dispatch = pickle.Pickler.dispatch.copy()
> AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
> {code}
> This code looks like it will be hard difficult to port to Python 3, so this 
> might be a good reason to switch to 
> [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to