[jira] [Commented] (SPARK-4897) Python 3 support

2015-04-14 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495439#comment-14495439
 ] 

Nicholas Chammas commented on SPARK-4897:
-

[~watsonix] You can follow the [active 
PR|https://github.com/apache/spark/pull/5173] linked above in this JIRA issue.

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Assignee: Davies Liu
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-04-14 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495443#comment-14495443
 ] 

Davies Liu commented on SPARK-4897:
---

That PR is pretty close to merge, we are targeting this for 1.4 release. It 
will be helpful if you guy can test this in your environments. Currently, it's 
only covered by unit tests.

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Assignee: Davies Liu
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-04-14 Thread watson xi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495152#comment-14495152
 ] 

watson xi commented on SPARK-4897:
--

Hi guys, whats the status of this project? I know a few people (including 
myself) who are ready to wave goodbye to Python 2 (its been 6.5 years now!)... 
from an outside perspective looking it, Python 3 compatibility appears close!

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Assignee: Davies Liu
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-03-26 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382722#comment-14382722
 ] 

Josh Rosen commented on SPARK-4897:
---

There's now an open pull request for this, which is now passing tests, and I'm 
beginning to review it now: https://github.com/apache/spark/pull/5173.  If 
anyone is interested in helping, it would be great to get more eyes on this PR.

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Assignee: Davies Liu
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-03-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378626#comment-14378626
 ] 

Apache Spark commented on SPARK-4897:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/5173

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Assignee: Davies Liu
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-02-12 Thread Ryan Ovas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318935#comment-14318935
 ] 

Ryan Ovas commented on SPARK-4897:
--

I'm interested in using Spark in my startup, but everything we do is in Python 
3.4 which makes adopting Spark difficult for me as well. I was surprised and 
disappointed (since I will have trouble using it myself) to see that there is 
no Python 3.x support when (as [~ianozsvald] suggested) the community as a 
whole is moving towards Python 3.4.

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-02-06 Thread Ian Ozsvald (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308901#comment-14308901
 ] 

Ian Ozsvald commented on SPARK-4897:


Hi Josh. After my post I went to do some digging, I believe (now) that Py2.6 
support isn't hard relative to Py2.7 and that only Py2.5 makes it a real pain. 
Apologies for confusing the issue.

I'll stand by my position that Py3.4 support is the way to go as the whole 
community is marching in that direction. I do think that the intersection of 
Python  Spark users will increasingly be around Py3.4+ over the coming year.

Cheers, i.

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-02-05 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307463#comment-14307463
 ] 

Josh Rosen commented on SPARK-4897:
---

Hi [~ianozsvald],

Until now, the main motivation for Python 2.6 support was that it's the default 
system Python on a few Linux distributions.  So far, I think the overhead of 
supporting 2.6 has been fairly minimal, mostly involving a handful of small 
changes such as not treating certain object as context managers (e.g. Zipfile 
objects).

Let's try porting to 2.7 / 3.4 and then re-assess how hard Python 2.6 support 
will be.  If it's really easy (a couple hours of work, max) then I don't see a 
reason to drop it, but if we have to go to increasingly convoluted lengths to 
keep it then it's probably not worth it if we're gaining 3.4 support in return.

I think the main blocker to Python 3.4 support is the fact that nobody has 
really had time to work on it.  I'd be happy to work with anyone who is 
interested in taking this on.



 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-02-05 Thread thom neale (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307474#comment-14307474
 ] 

thom neale commented on SPARK-4897:
---

I'm still very interested in helping with the 3.4 port, have only been
prohibited by lack of free time. I'll ask if work will give me a half day
to work on it.



 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-02-05 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307482#comment-14307482
 ] 

Josh Rosen commented on SPARK-4897:
---

By the way, it might be nice to see if we can figure out a good way of 
subdividing this task across multiple PRs so that the pieces that we have 
already figured out don't end up bitrotting / becoming merge-conflicts.  For 
instance, if we can test the `cloudpickle.py` file separately from the other 
modules, then we could submit a PR that only adds 3.4 support to that file.  If 
you can spot any other natural subproblems here, leave a comment or create a 
sub-task on this JIRA ticket.

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-02-03 Thread Ian Ozsvald (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303154#comment-14303154
 ] 

Ian Ozsvald commented on SPARK-4897:


If I can cast a vote...

I note that Python 2.6 is the lowest version of Python that's supported, some 
recent data might suggest that Python 2.6 support isn't so useful in the wider 
ecosystem and so might be slowing Spark development. A Python 2 vs 3 survey 
was conducted before Christmas, the results are recently in:
http://www.randalolson.com/2015/01/30/python-usage-survey-2014/

Of 6,746 respondents less than 10% use Python 2.6 day-to-day. 81% use Python 
2.7 (and 43% Python 3.4 - including me) for day-to-day use (presumably for 
work), there's an approximate 50/50 split between Python 2  3 for personal 
projects. I'd humbly suggest that supporting Python 2.6 will slow development 
and avoiding Python 3.4 will hinder winder adoption.

The same survey a year back had 4,790 respondents, the second diagram on 
randalolson's site compares 2013 to 2014 - fewer people now are writing Python 
2 day-to-day and more people are writing Python 3 (though Python 2.7 is still 
significantly dominant). Given that Python 2.7 will be deprecated by 2020 the 
trend to Python 3.4 is clear. Core scientific libraries (e.g. scipy, numpy, 
pandas, matplotlib) all work in Python 3.4 and have done for several years.

The survey doesn't ask respondents whether they are web-devs, data scientists, 
ETL-folk, dev-ops etc so it is hard to extrapolate whether Spark-users are 
predominantly Python 2.6/2.7/3.4 but I'd suggest that a local survey in this 
community might provide useful guidance. 

Although it is on a longer cycle the major Linux distros like Ubuntu are 
switching away from Python 2.7 to Python 3+:
https://www.archlinux.org/news/python-is-now-python-3/  # switched 2010
http://www.phoronix.com/scan.php?page=news_itempx=Fedora-22-Python-3-Status  # 
Fedora to Python 3 around May 2015
https://wiki.ubuntu.com/Python/3  # work on-going, maybe the switch occurs in 
2015?

What is the use case for Python 2.6 support? Personally I'd vote for supporting 
2.7 as a minimum with a strong push for Python 3.4 compatibility to reduce 
wasted hours supporting older Python versions. Supporting older Pythons will 
also hinder the creation of a Python 2.7/3.4 compatible code-base due to 
cross-language complications.

About me - long-time speaker/teacher at Python conferences, O'Reilly author 
(High Performance Python), co-org of the 1000+ member PyDataLondon meetup and 
conference series, Python3.4 proponent since April 2014. At my PyData meetup I 
regularly query my usergroup (approx. 100 attendees each month), 1% use Python 
2.6, the majority use Python 2.7, each month more people switch up to Python 
3.4 (mainly to get away from unicode errors during text processing).

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This 

[jira] [Commented] (SPARK-4897) Python 3 support

2015-01-05 Thread Matthew Cornell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264643#comment-14264643
 ] 

Matthew Cornell commented on SPARK-4897:


Please!!

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2014-12-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262459#comment-14262459
 ] 

Josh Rosen commented on SPARK-4897:
---

I've merged [~twneale]'s PR but there are still a bunch of things that don't 
actually run.  It looks like this is going to be quite a bit of work.  My 
{{python3}] branch is up-to-date, in case anyone wants to pick this up.

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2014-12-19 Thread thom neale (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254071#comment-14254071
 ] 

thom neale commented on SPARK-4897:
---

Thank you for looking into this, Josh! Was able to get the module itself to 
import/run with some changes, but can't run the tests yet because I don't have 
spark built. But here's the pull request to your fork: 
https://github.com/JoshRosen/spark/pull/1. Some of these changes merit further 
investigation, but it's a start. 

 Python 3 support
 

 Key: SPARK-4897
 URL: https://issues.apache.org/jira/browse/SPARK-4897
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Josh Rosen
Priority: Minor

 It would be nice to have Python 3 support in PySpark, provided that we can do 
 it in a way that maintains backwards-compatibility with Python 2.6.
 I started looking into porting this; my WIP work can be found at 
 https://github.com/JoshRosen/spark/compare/python3
 I was able to use the 
 [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
 tool to handle the basic conversion of things like {{print}} statements, etc. 
 and had to manually fix up a few imports for packages that moved / were 
 renamed, but the major blocker that I hit was {{cloudpickle}}:
 {code}
 [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
 Python 3.4.2 (default, Oct 19 2014, 17:52:17)
 [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
 Type help, copyright, credits or license for more information.
 Traceback (most recent call last):
   File /Users/joshrosen/Documents/Spark/python/pyspark/shell.py, line 28, 
 in module
 import pyspark
   File /Users/joshrosen/Documents/spark/python/pyspark/__init__.py, line 
 41, in module
 from pyspark.context import SparkContext
   File /Users/joshrosen/Documents/spark/python/pyspark/context.py, line 26, 
 in module
 from pyspark import accumulators
   File /Users/joshrosen/Documents/spark/python/pyspark/accumulators.py, 
 line 97, in module
 from pyspark.cloudpickle import CloudPickler
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 120, in module
 class CloudPickler(pickle.Pickler):
   File /Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py, line 
 122, in CloudPickler
 dispatch = pickle.Pickler.dispatch.copy()
 AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
 {code}
 This code looks like it will be hard difficult to port to Python 3, so this 
 might be a good reason to switch to 
 [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org