Re: PySpark: preference for Python 2.7 or Python 3.5?

2016-09-02 Thread Ian Stokes Rees
On 9/2/16 3:47 AM, Felix Cheung wrote: There is an Anaconda parcel one could readily install on CDH https://docs.continuum.io/anaconda/cloudera As Sean says it is Python 2.7.x. Spark should work for both 2.7 and 3.5. Yes, I'm actually an engineer at Continuum, so I know the Anaconda parcel

Re: PySpark: preference for Python 2.7 or Python 3.5?

2016-09-02 Thread Felix Cheung
Sent: Friday, September 2, 2016 12:41 AM Subject: Re: PySpark: preference for Python 2.7 or Python 3.5? To: Ian Stokes Rees <ijsto...@continuum.io<mailto:ijsto...@continuum.io>> Cc: user @spark <user@spark.apache.org<mailto:user@spark.apache.org>> Spark should work fine with Python

Re: PySpark: preference for Python 2.7 or Python 3.5?

2016-09-02 Thread Sean Owen
Spark should work fine with Python 3. I'm not a Python person, but all else equal I'd use 3.5 too. I assume the issue could be libraries you want that don't support Python 3. I don't think that changes with CDH. It includes a version of Anaconda from Continuum, but that lays down Python 2.7.11. I

PySpark: preference for Python 2.7 or Python 3.5?

2016-09-01 Thread Ian Stokes Rees
I have the option of running PySpark with Python 2.7 or Python 3.5. I am fairly expert with Python and know the Python-side history of the differences. All else being the same, I have a preference for Python 3.5. I'm using CDH 5.8 and I'm wondering if that biases whether I should proceed