Re: [discuss] dropping Python 2.6 support

2016-01-11 Thread shane knapp
(this is a build system-specific reply, but quite pertinent to the conversation)

we currently test spark on a centos 6.X deployment, but in the next
~month will be bumping everything to centos 7.  by default, centos 7
comes w/python 2.7.5 installed as the system python.  for any builds
that need python 2.6 (1.5 and earlier), we'll be using anaconda
environments to manage them.

i'm generally VERY happy with how easy it is to manage our three
different python environments with anaconda, and don't plan on
changing that at all in the foreseeable future.

shane

On Mon, Jan 11, 2016 at 8:52 AM, David Chin  wrote:
> FWIW, RHEL 6 still uses Python 2.6, although 2.7.8 and 3.3.2 are available
> through Red Hat Software Collections. See:
> https://www.softwarecollections.org/en/
>
> I run an academic compute cluster on RHEL 6. We do, however, provide Python
> 2.7.x and 3.5.x via modulefiles.
>
> On Tue, Jan 5, 2016 at 8:45 AM, Nicholas Chammas
>  wrote:
>>
>> +1
>>
>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes,
>> Python 2.6 is ancient history and the core Python developers stopped
>> supporting it in 2013. REHL 5 is not a good enough reason to continue
>> support for Python 2.6 IMO.
>>
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>>
>> Nick
>>
>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:
>>>
>>> plus 1,
>>>
>>> we are currently using python 2.7.2 in production environment.
>>>
>>>
>>>
>>>
>>>
>>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>>
>>> +1
>>> We use Python 2.7
>>>
>>> Regards,
>>>
>>> Meethu Mathew
>>>
>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:

 Does anybody here care about us dropping support for Python 2.6 in Spark
 2.0?

 Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
 parsing) when compared with Python 2.7. Some libraries that Spark depend on
 stopped supporting 2.6. We can still convince the library maintainers to
 support 2.6, but it will be extra work. I'm curious if anybody still uses
 Python 2.6 to run Spark.

 Thanks.


>>>
>
>
>
> --
> David Chin, Ph.D.
> david.c...@drexel.eduSr. Systems Administrator, URCF, Drexel U.
> http://www.drexel.edu/research/urcf/
> https://linuxfollies.blogspot.com/
> +1.215.221.4747 (mobile)
> https://github.com/prehensilecode
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] dropping Python 2.6 support

2016-01-11 Thread David Chin
FWIW, RHEL 6 still uses Python 2.6, although 2.7.8 and 3.3.2 are available
through Red Hat Software Collections. See:
https://www.softwarecollections.org/en/

I run an academic compute cluster on RHEL 6. We do, however, provide Python
2.7.x and 3.5.x via modulefiles.

On Tue, Jan 5, 2016 at 8:45 AM, Nicholas Chammas  wrote:

> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020
> , but
> otherwise yes, Python 2.6 is ancient history and the core Python developers
> stopped supporting it in 2013. REHL 5 is not a good enough reason to
> continue support for Python 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:
>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>>


-- 
David Chin, Ph.D.
david.c...@drexel.eduSr. Systems Administrator, URCF, Drexel U.
http://www.drexel.edu/research/urcf/
https://linuxfollies.blogspot.com/
+1.215.221.4747 (mobile)
https://github.com/prehensilecode


Re: [discuss] dropping Python 2.6 support

2016-01-10 Thread Dmitry Kniazev
Sasha, it is more complicated than that: many RHEL 6 OS utilities rely on 
Python 2.6. Upgrading it to 2.7 breaks the system. For large enterprises 
migrating to another server OS means re-certifying (re-testing) hundreds of 
applications, so yes, they do prefer to stay where they are until the benefits 
of migrating outweigh the overhead. Long story short: you cannot simply upgrade 
built-in Python 2.6 in RHEL 6 and it will take years for enterprises to migrate 
to RHEL 7.

Having said that, I don't think that it is a problem though, because Python 2.6 
and Python 2.7 can easily co-exist in the same environment. For example, we use 
virtualenv to run Spark with Python 2.7 and do not touch system Python 2.6.

Thank you,
Dmitry

09.01.2016, 06:36, "Sasha Kacanski" :
> +1
> Companies that use stock python in redhat 2.6 will need to upgrade or install 
> fresh version wich is total of 3.5 minutes so no issues ...
>
> On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin  wrote:
>> Does anybody here care about us dropping support for Python 2.6 in Spark 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json 
>> parsing) when compared with Python 2.7. Some libraries that Spark depend on 
>> stopped supporting 2.6. We can still convince the library maintainers to 
>> support 2.6, but it will be extra work. I'm curious if anybody still uses 
>> Python 2.6 to run Spark.
>>
>> Thanks.
>
> --
> Aleksandar Kacanski

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Jacek Laskowski
On Sat, Jan 9, 2016 at 1:48 PM, Sean Owen  wrote:

> (For similar reasons I personally don't favor supporting Java 7 or
> Scala 2.10 in Spark 2.x.)

That reflects my sentiments as well. Thanks Sean for bringing that up!

Jacek

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Sean Owen
Chiming in late, but my take on this line of argument is: these
companies are welcome to keep using Spark 1.x. If anything the
argument here is about how long to maintain 1.x, and indeed, it's
going to go dormant quite soon.

But using RHEL 6 (or any old-er version of any platform) and not
wanting to update already means you prefer stability more than change.
I don't receive an expectation that major releases of major things
support older major releases of other things.

Conversely: supporting something in Spark 2.x means making sure
nothing breaks compatibility with it for a couple years. This is
effort than can be spent elsewhere; this has to be weighed.

(For similar reasons I personally don't favor supporting Java 7 or
Scala 2.10 in Spark 2.x.)

On Tue, Jan 5, 2016 at 7:07 PM, Koert Kuipers  wrote:
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the only
> option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland 
> wrote:
>>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>> point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>> version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Sasha Kacanski
+1
Companies that use stock python in redhat 2.6 will need to upgrade or
install fresh version wich is total of 3.5 minutes so no issues ...

On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin  wrote:

> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>
>


-- 
Aleksandar Kacanski


Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Steve Loughran

> On 7 Jan 2016, at 19:55, Juliet Hougland  wrote:
> 
> @ Reynold Xin @Josh Rosen: What is current maintenance burden of supporting 
> Python 2.6? What libraries are no longer supporting Python 2.6 and where does 
> Spark use them?
> 

generally the cost comes in the test matrix: one more thing to test against. 
You can test for the extremes with the right VMs (me: kerberos-java7-linux) 
(windows-server+java-8) but you still need to keep those combinations down —and 
be setup to locally debug/replicate problems. 




Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Jeff Zhang
+1

On Wed, Jan 6, 2016 at 9:18 AM, Juliet Hougland 
wrote:

> Most admins I talk to about python and spark are already actively (or on
> their way to) managing their cluster python installations. Even if people
> begin using the system python with pyspark, there is eventually a user who
> needs a complex dependency (like pandas or sklearn) on the cluster. No
> admin would muck around installing libs into system python, so you end up
> with other python installations.
>
> Installing a non-system python is something users intending to use pyspark
> on a real cluster should be thinking about, eventually, anyway. It would
> work in situations where people are running pyspark locally or actively
> managing python installations on a cluster. There is an awkward middle
> point where someone has installed spark but not configured their cluster
> (by installing non default python) in any other way. Most clusters I see
> are RHEL/CentOS and have something other than system python used by spark.
>
> What libraries stopped supporting python 2.6 and where does spark use
> them? The "ease of transitioning to pyspark onto a cluster" problem may be
> an easier pill to swallow if it only affected something like mllib or spark
> sql and not parts of the core api. You end up hoping numpy or pandas are
> installed in the runtime components of spark anyway. At that point people
> really should just go install a non system python. There are tradeoffs to
> using pyspark and I feel pretty fine explaining to people that managing
> their cluster's python installations is something that comes with using
> pyspark.
>
> RHEL/CentOS is so common that this would probably be a little work for a
> lot of people.
>
> --Juliet
>
> On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers  wrote:
>
>> hey evil admin:)
>> i think the bit about java was from me?
>> if so, i meant to indicate that the reality for us is java is 1.7 on most
>> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
>> even although java 1.7 is getting old as well it would be a major issue for
>> me if spark dropped java 1.7 support.
>>
>> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken 
>> wrote:
>>
>>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>>> provide quite a few different version of python on our cluster pretty darn
>>> easily. All you need is a separate install directory and to set the
>>> PYTHON_HOME environment variable to point to the correct python, then have
>>> the users make sure the correct python is in their PATH. I understand that
>>> other administrators may not be so compliant.
>>>
>>> Saw a small bit about the java version in there; does Spark currently
>>> prefer Java 1.8.x?
>>>
>>> —Ken
>>>
>>> On Jan 5, 2016, at 6:08 PM, Josh Rosen  wrote:
>>>
>>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
 while continuing to use a vanilla `python` executable on the executors
>>>
>>>
>>> Whoops, just to be clear, this should actually read "while continuing to
>>> use a vanilla `python` 2.7 executable".
>>>
>>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen 
>>> wrote:
>>>
 Yep, the driver and executors need to have compatible Python versions.
 I think that there are some bytecode-level incompatibilities between 2.6
 and 2.7 which would impact the deserialization of Python closures, so I
 think you need to be running the same 2.x version for all communicating
 Spark processes. Note that you _can_ use a Python 2.7 `ipython` executable
 on the driver while continuing to use a vanilla `python` executable on the
 executors (we have environment variables which allow you to control these
 separately).

 On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> I think all the slaves need the same (or a compatible) version of
> Python installed since they run Python code in PySpark jobs natively.
>
> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers 
> wrote:
>
>> interesting i didnt know that!
>>
>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> even if python 2.7 was needed only on this one machine that launches
>>> the app we can not ship it with our software because its gpl licensed
>>>
>>> Not to nitpick, but maybe this is important. The Python license is 
>>> GPL-compatible
>>> but not GPL :
>>>
>>> Note GPL-compatible doesn’t mean that we’re distributing Python
>>> under the GPL. All Python licenses, unlike the GPL, let you distribute a
>>> modified version without making your changes open source. The
>>> GPL-compatible licenses make it possible to combine Python with other
>>> software that is released under the GPL; the others don’t.
>>>
>>> Nick
>>> ​
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers 
>>> wrote:

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Juliet Hougland
Most admins I talk to about python and spark are already actively (or on
their way to) managing their cluster python installations. Even if people
begin using the system python with pyspark, there is eventually a user who
needs a complex dependency (like pandas or sklearn) on the cluster. No
admin would muck around installing libs into system python, so you end up
with other python installations.

Installing a non-system python is something users intending to use pyspark
on a real cluster should be thinking about, eventually, anyway. It would
work in situations where people are running pyspark locally or actively
managing python installations on a cluster. There is an awkward middle
point where someone has installed spark but not configured their cluster
(by installing non default python) in any other way. Most clusters I see
are RHEL/CentOS and have something other than system python used by spark.

What libraries stopped supporting python 2.6 and where does spark use them?
The "ease of transitioning to pyspark onto a cluster" problem may be an
easier pill to swallow if it only affected something like mllib or spark
sql and not parts of the core api. You end up hoping numpy or pandas are
installed in the runtime components of spark anyway. At that point people
really should just go install a non system python. There are tradeoffs to
using pyspark and I feel pretty fine explaining to people that managing
their cluster's python installations is something that comes with using
pyspark.

RHEL/CentOS is so common that this would probably be a little work for a
lot of people.

--Juliet

On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers  wrote:

> hey evil admin:)
> i think the bit about java was from me?
> if so, i meant to indicate that the reality for us is java is 1.7 on most
> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
> even although java 1.7 is getting old as well it would be a major issue for
> me if spark dropped java 1.7 support.
>
> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken 
> wrote:
>
>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>> provide quite a few different version of python on our cluster pretty darn
>> easily. All you need is a separate install directory and to set the
>> PYTHON_HOME environment variable to point to the correct python, then have
>> the users make sure the correct python is in their PATH. I understand that
>> other administrators may not be so compliant.
>>
>> Saw a small bit about the java version in there; does Spark currently
>> prefer Java 1.8.x?
>>
>> —Ken
>>
>> On Jan 5, 2016, at 6:08 PM, Josh Rosen  wrote:
>>
>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>> while continuing to use a vanilla `python` executable on the executors
>>
>>
>> Whoops, just to be clear, this should actually read "while continuing to
>> use a vanilla `python` 2.7 executable".
>>
>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen 
>> wrote:
>>
>>> Yep, the driver and executors need to have compatible Python versions. I
>>> think that there are some bytecode-level incompatibilities between 2.6 and
>>> 2.7 which would impact the deserialization of Python closures, so I think
>>> you need to be running the same 2.x version for all communicating Spark
>>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>>> driver while continuing to use a vanilla `python` executable on the
>>> executors (we have environment variables which allow you to control these
>>> separately).
>>>
>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 I think all the slaves need the same (or a compatible) version of
 Python installed since they run Python code in PySpark jobs natively.

 On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers  wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches
>> the app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is 
>> GPL-compatible
>> but not GPL :
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>> the GPL. All Python licenses, unlike the GPL, let you distribute a 
>> modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers 
>> wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>> not have direct access to those.
>>>
>>> also, spark is

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
I don't think that we're planning to drop Java 7 support for Spark 2.0.

Personally, I would recommend using Java 8 if you're running Spark 1.5.0+
and are using SQL/DataFrames so that you can benefit from improvements to
code cache flushing in the Java 8 JVMs. Spark SQL's generated classes can
fill up the JVM's code cache, which causes JIT to stop working for new
bytecode. Empirically, it looks like the Java 8 JVMs have an improved
ability to flush this code cache, thereby avoiding this problem.

TL;DR: I'd prefer to run Java 8 with Spark if given the choice.

On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers  wrote:

> hey evil admin:)
> i think the bit about java was from me?
> if so, i meant to indicate that the reality for us is java is 1.7 on most
> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
> even although java 1.7 is getting old as well it would be a major issue for
> me if spark dropped java 1.7 support.
>
> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken 
> wrote:
>
>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>> provide quite a few different version of python on our cluster pretty darn
>> easily. All you need is a separate install directory and to set the
>> PYTHON_HOME environment variable to point to the correct python, then have
>> the users make sure the correct python is in their PATH. I understand that
>> other administrators may not be so compliant.
>>
>> Saw a small bit about the java version in there; does Spark currently
>> prefer Java 1.8.x?
>>
>> —Ken
>>
>> On Jan 5, 2016, at 6:08 PM, Josh Rosen  wrote:
>>
>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>> while continuing to use a vanilla `python` executable on the executors
>>
>>
>> Whoops, just to be clear, this should actually read "while continuing to
>> use a vanilla `python` 2.7 executable".
>>
>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen 
>> wrote:
>>
>>> Yep, the driver and executors need to have compatible Python versions. I
>>> think that there are some bytecode-level incompatibilities between 2.6 and
>>> 2.7 which would impact the deserialization of Python closures, so I think
>>> you need to be running the same 2.x version for all communicating Spark
>>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>>> driver while continuing to use a vanilla `python` executable on the
>>> executors (we have environment variables which allow you to control these
>>> separately).
>>>
>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 I think all the slaves need the same (or a compatible) version of
 Python installed since they run Python code in PySpark jobs natively.

 On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers  wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches
>> the app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is 
>> GPL-compatible
>> but not GPL :
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>> the GPL. All Python licenses, unlike the GPL, let you distribute a 
>> modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers 
>> wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>> not have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its
>>> apache 2 licensed, and it only needs to be present on the machine that
>>> launches the app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches
>>> the app we can not ship it with our software because its gpl licensed, 
>>> so
>>> the client would have to download it and install it themselves, and this
>>> would mean its an independent install which has to be audited and 
>>> approved
>>> and now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen >> > wrote:
>>>
 If users are able to install Spark 2.0 on their RHEL clusters, then
 I imagine that they're also capable of installing a standalone Python
 alongside that Spark version (without changing Python systemwide). For
 instance, Anaconda/Miniconda make it really easy to install Python
 2.7.x/3.x without impacting / changing th

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Koert Kuipers
hey evil admin:)
i think the bit about java was from me?
if so, i meant to indicate that the reality for us is java is 1.7 on most
(all?) clusters. i do not believe spark prefers java 1.8. my point was that
even although java 1.7 is getting old as well it would be a major issue for
me if spark dropped java 1.7 support.

On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken 
wrote:

> As one of the evil administrators that runs a RHEL 6 cluster, we already
> provide quite a few different version of python on our cluster pretty darn
> easily. All you need is a separate install directory and to set the
> PYTHON_HOME environment variable to point to the correct python, then have
> the users make sure the correct python is in their PATH. I understand that
> other administrators may not be so compliant.
>
> Saw a small bit about the java version in there; does Spark currently
> prefer Java 1.8.x?
>
> —Ken
>
> On Jan 5, 2016, at 6:08 PM, Josh Rosen  wrote:
>
> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>> while continuing to use a vanilla `python` executable on the executors
>
>
> Whoops, just to be clear, this should actually read "while continuing to
> use a vanilla `python` 2.7 executable".
>
> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen 
> wrote:
>
>> Yep, the driver and executors need to have compatible Python versions. I
>> think that there are some bytecode-level incompatibilities between 2.6 and
>> 2.7 which would impact the deserialization of Python closures, so I think
>> you need to be running the same 2.x version for all communicating Spark
>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>> driver while continuing to use a vanilla `python` executable on the
>> executors (we have environment variables which allow you to control these
>> separately).
>>
>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> I think all the slaves need the same (or a compatible) version of Python
>>> installed since they run Python code in PySpark jobs natively.
>>>
>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers  wrote:
>>>
 interesting i didnt know that!

 On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> even if python 2.7 was needed only on this one machine that launches
> the app we can not ship it with our software because its gpl licensed
>
> Not to nitpick, but maybe this is important. The Python license is 
> GPL-compatible
> but not GPL :
>
> Note GPL-compatible doesn’t mean that we’re distributing Python under
> the GPL. All Python licenses, unlike the GPL, let you distribute a 
> modified
> version without making your changes open source. The GPL-compatible
> licenses make it possible to combine Python with other software that is
> released under the GPL; the others don’t.
>
> Nick
> ​
>
> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers 
> wrote:
>
>> i do not think so.
>>
>> does the python 2.7 need to be installed on all slaves? if so, we do
>> not have direct access to those.
>>
>> also, spark is easy for us to ship with our software since its apache
>> 2 licensed, and it only needs to be present on the machine that launches
>> the app (thanks to yarn).
>> even if python 2.7 was needed only on this one machine that launches
>> the app we can not ship it with our software because its gpl licensed, so
>> the client would have to download it and install it themselves, and this
>> would mean its an independent install which has to be audited and 
>> approved
>> and now you are in for a lot of fun. basically it will never happen.
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen 
>> wrote:
>>
>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>> I imagine that they're also capable of installing a standalone Python
>>> alongside that Spark version (without changing Python systemwide). For
>>> instance, Anaconda/Miniconda make it really easy to install Python
>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>> require any special permissions to install (you don't need root / sudo
>>> access). Does this address the Python versioning concerns for RHEL 
>>> users?
>>>
>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers 
>>> wrote:
>>>
 yeah, the practical concern is that we have no control over java or
 python version on large company clusters. our current reality for the 
 vast
 majority of them is java 7 and python 2.6, no matter how outdated that 
 is.

 i dont like it either, but i cannot change it.

 we currently don't use pyspark so i have no stake in this, but if
 we did i can assure you we would

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
>
> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
> while continuing to use a vanilla `python` executable on the executors


Whoops, just to be clear, this should actually read "while continuing to
use a vanilla `python` 2.7 executable".

On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen  wrote:

> Yep, the driver and executors need to have compatible Python versions. I
> think that there are some bytecode-level incompatibilities between 2.6 and
> 2.7 which would impact the deserialization of Python closures, so I think
> you need to be running the same 2.x version for all communicating Spark
> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
> driver while continuing to use a vanilla `python` executable on the
> executors (we have environment variables which allow you to control these
> separately).
>
> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> I think all the slaves need the same (or a compatible) version of Python
>> installed since they run Python code in PySpark jobs natively.
>>
>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers  wrote:
>>
>>> interesting i didnt know that!
>>>
>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 even if python 2.7 was needed only on this one machine that launches
 the app we can not ship it with our software because its gpl licensed

 Not to nitpick, but maybe this is important. The Python license is 
 GPL-compatible
 but not GPL :

 Note GPL-compatible doesn’t mean that we’re distributing Python under
 the GPL. All Python licenses, unlike the GPL, let you distribute a modified
 version without making your changes open source. The GPL-compatible
 licenses make it possible to combine Python with other software that is
 released under the GPL; the others don’t.

 Nick
 ​

 On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers  wrote:

> i do not think so.
>
> does the python 2.7 need to be installed on all slaves? if so, we do
> not have direct access to those.
>
> also, spark is easy for us to ship with our software since its apache
> 2 licensed, and it only needs to be present on the machine that launches
> the app (thanks to yarn).
> even if python 2.7 was needed only on this one machine that launches
> the app we can not ship it with our software because its gpl licensed, so
> the client would have to download it and install it themselves, and this
> would mean its an independent install which has to be audited and approved
> and now you are in for a lot of fun. basically it will never happen.
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen 
> wrote:
>
>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>> imagine that they're also capable of installing a standalone Python
>> alongside that Spark version (without changing Python systemwide). For
>> instance, Anaconda/Miniconda make it really easy to install Python
>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>> require any special permissions to install (you don't need root / sudo
>> access). Does this address the Python versioning concerns for RHEL users?
>>
>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers 
>> wrote:
>>
>>> yeah, the practical concern is that we have no control over java or
>>> python version on large company clusters. our current reality for the 
>>> vast
>>> majority of them is java 7 and python 2.6, no matter how outdated that 
>>> is.
>>>
>>> i dont like it either, but i cannot change it.
>>>
>>> we currently don't use pyspark so i have no stake in this, but if we
>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>> dropped. no point in developing something that doesnt run for majority 
>>> of
>>> customers.
>>>
>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 As I pointed out in my earlier email, RHEL will support Python 2.6
 until 2020. So I'm assuming these large companies will have the option 
 of
 riding out Python 2.6 until then.

 Are we seriously saying that Spark should likewise support Python
 2.6 for the next several years? Even though the core Python devs 
 stopped
 supporting it in 2013?

 If that's not what we're suggesting, then when, roughly, can we
 drop support? What are the criteria?

 I understand the practical concern here. If companies are stuck
 using 2.6, it doesn't matter to them that it is deprecated. But 
 balancing
 that concern against the maintenance burden on this project, I would 
>>>

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
Yep, the driver and executors need to have compatible Python versions. I
think that there are some bytecode-level incompatibilities between 2.6 and
2.7 which would impact the deserialization of Python closures, so I think
you need to be running the same 2.x version for all communicating Spark
processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
driver while continuing to use a vanilla `python` executable on the
executors (we have environment variables which allow you to control these
separately).

On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas  wrote:

> I think all the slaves need the same (or a compatible) version of Python
> installed since they run Python code in PySpark jobs natively.
>
> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers  wrote:
>
>> interesting i didnt know that!
>>
>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed
>>>
>>> Not to nitpick, but maybe this is important. The Python license is 
>>> GPL-compatible
>>> but not GPL :
>>>
>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>> version without making your changes open source. The GPL-compatible
>>> licenses make it possible to combine Python with other software that is
>>> released under the GPL; the others don’t.
>>>
>>> Nick
>>> ​
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers  wrote:
>>>
 i do not think so.

 does the python 2.7 need to be installed on all slaves? if so, we do
 not have direct access to those.

 also, spark is easy for us to ship with our software since its apache 2
 licensed, and it only needs to be present on the machine that launches the
 app (thanks to yarn).
 even if python 2.7 was needed only on this one machine that launches
 the app we can not ship it with our software because its gpl licensed, so
 the client would have to download it and install it themselves, and this
 would mean its an independent install which has to be audited and approved
 and now you are in for a lot of fun. basically it will never happen.


 On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen 
 wrote:

> If users are able to install Spark 2.0 on their RHEL clusters, then I
> imagine that they're also capable of installing a standalone Python
> alongside that Spark version (without changing Python systemwide). For
> instance, Anaconda/Miniconda make it really easy to install Python
> 2.7.x/3.x without impacting / changing the system Python and doesn't
> require any special permissions to install (you don't need root / sudo
> access). Does this address the Python versioning concerns for RHEL users?
>
> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers 
> wrote:
>
>> yeah, the practical concern is that we have no control over java or
>> python version on large company clusters. our current reality for the 
>> vast
>> majority of them is java 7 and python 2.6, no matter how outdated that 
>> is.
>>
>> i dont like it either, but i cannot change it.
>>
>> we currently don't use pyspark so i have no stake in this, but if we
>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>> dropped. no point in developing something that doesnt run for majority of
>> customers.
>>
>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>> until 2020. So I'm assuming these large companies will have the option 
>>> of
>>> riding out Python 2.6 until then.
>>>
>>> Are we seriously saying that Spark should likewise support Python
>>> 2.6 for the next several years? Even though the core Python devs stopped
>>> supporting it in 2013?
>>>
>>> If that's not what we're suggesting, then when, roughly, can we drop
>>> support? What are the criteria?
>>>
>>> I understand the practical concern here. If companies are stuck
>>> using 2.6, it doesn't matter to them that it is deprecated. But 
>>> balancing
>>> that concern against the maintenance burden on this project, I would say
>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>> position to take. There are many tiny annoyances one has to put up with 
>>> to
>>> support 2.6.
>>>
>>> I suppose if our main PySpark contributors are fine putting up with
>>> those annoyances, then maybe we don't need to drop support just yet...
>>>
>>> Nick
>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>> ju...@esbet.es>님이 작성:
>>>

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
I think all the slaves need the same (or a compatible) version of Python
installed since they run Python code in PySpark jobs natively.

On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers  wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is 
>> GPL-compatible
>> but not GPL :
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under the
>> GPL. All Python licenses, unlike the GPL, let you distribute a modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers  wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do not
>>> have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its apache 2
>>> licensed, and it only needs to be present on the machine that launches the
>>> app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed, so the
>>> client would have to download it and install it themselves, and this would
>>> mean its an independent install which has to be audited and approved and
>>> now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen 
>>> wrote:
>>>
 If users are able to install Spark 2.0 on their RHEL clusters, then I
 imagine that they're also capable of installing a standalone Python
 alongside that Spark version (without changing Python systemwide). For
 instance, Anaconda/Miniconda make it really easy to install Python
 2.7.x/3.x without impacting / changing the system Python and doesn't
 require any special permissions to install (you don't need root / sudo
 access). Does this address the Python versioning concerns for RHEL users?

 On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers 
 wrote:

> yeah, the practical concern is that we have no control over java or
> python version on large company clusters. our current reality for the vast
> majority of them is java 7 and python 2.6, no matter how outdated that is.
>
> i dont like it either, but i cannot change it.
>
> we currently don't use pyspark so i have no stake in this, but if we
> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
> dropped. no point in developing something that doesnt run for majority of
> customers.
>
> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> As I pointed out in my earlier email, RHEL will support Python 2.6
>> until 2020. So I'm assuming these large companies will have the option of
>> riding out Python 2.6 until then.
>>
>> Are we seriously saying that Spark should likewise support Python 2.6
>> for the next several years? Even though the core Python devs stopped
>> supporting it in 2013?
>>
>> If that's not what we're suggesting, then when, roughly, can we drop
>> support? What are the criteria?
>>
>> I understand the practical concern here. If companies are stuck using
>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>> concern against the maintenance burden on this project, I would say that
>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position 
>> to
>> take. There are many tiny annoyances one has to put up with to support 
>> 2.6.
>>
>> I suppose if our main PySpark contributors are fine putting up with
>> those annoyances, then maybe we don't need to drop support just yet...
>>
>> Nick
>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>> ju...@esbet.es>님이 작성:
>>
>>> Unfortunately, Koert is right.
>>>
>>> I've been in a couple of projects using Spark (banking industry)
>>> where CentOS + Python 2.6 is the toolbox available.
>>>
>>> That said, I believe it should not be a concern for Spark. Python
>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>> IMO.
>>>
>>>
>>> El 5 ene 2016, a las 20:07, Koert Kuipers 
>>> escribió:
>>>
>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>
>>> if so, i still know plenty of large companies where python 2.6 is
>>> the only option. asking them for python 2.7 is not going to work
>>>
>>> so i think its a bad idea
>>>
>>>

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Koert Kuipers
if python 2.7 only has to be present on the node that launches the app
(does it?) than that could be important indeed.

On Tue, Jan 5, 2016 at 6:02 PM, Koert Kuipers  wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is 
>> GPL-compatible
>> but not GPL :
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under the
>> GPL. All Python licenses, unlike the GPL, let you distribute a modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers  wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do not
>>> have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its apache 2
>>> licensed, and it only needs to be present on the machine that launches the
>>> app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed, so the
>>> client would have to download it and install it themselves, and this would
>>> mean its an independent install which has to be audited and approved and
>>> now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen 
>>> wrote:
>>>
 If users are able to install Spark 2.0 on their RHEL clusters, then I
 imagine that they're also capable of installing a standalone Python
 alongside that Spark version (without changing Python systemwide). For
 instance, Anaconda/Miniconda make it really easy to install Python
 2.7.x/3.x without impacting / changing the system Python and doesn't
 require any special permissions to install (you don't need root / sudo
 access). Does this address the Python versioning concerns for RHEL users?

 On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers 
 wrote:

> yeah, the practical concern is that we have no control over java or
> python version on large company clusters. our current reality for the vast
> majority of them is java 7 and python 2.6, no matter how outdated that is.
>
> i dont like it either, but i cannot change it.
>
> we currently don't use pyspark so i have no stake in this, but if we
> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
> dropped. no point in developing something that doesnt run for majority of
> customers.
>
> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> As I pointed out in my earlier email, RHEL will support Python 2.6
>> until 2020. So I'm assuming these large companies will have the option of
>> riding out Python 2.6 until then.
>>
>> Are we seriously saying that Spark should likewise support Python 2.6
>> for the next several years? Even though the core Python devs stopped
>> supporting it in 2013?
>>
>> If that's not what we're suggesting, then when, roughly, can we drop
>> support? What are the criteria?
>>
>> I understand the practical concern here. If companies are stuck using
>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>> concern against the maintenance burden on this project, I would say that
>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position 
>> to
>> take. There are many tiny annoyances one has to put up with to support 
>> 2.6.
>>
>> I suppose if our main PySpark contributors are fine putting up with
>> those annoyances, then maybe we don't need to drop support just yet...
>>
>> Nick
>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>> ju...@esbet.es>님이 작성:
>>
>>> Unfortunately, Koert is right.
>>>
>>> I've been in a couple of projects using Spark (banking industry)
>>> where CentOS + Python 2.6 is the toolbox available.
>>>
>>> That said, I believe it should not be a concern for Spark. Python
>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>> IMO.
>>>
>>>
>>> El 5 ene 2016, a las 20:07, Koert Kuipers 
>>> escribió:
>>>
>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>
>>> if so, i still know plenty of large companies where python 2.6 is
>>> the only option. asking them for python 2.7 is not going to work
>>>
>>> so i think its a bad idea
>>>
>>> On Tue, Jan 5, 

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Koert Kuipers
interesting i didnt know that!

On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas  wrote:

> even if python 2.7 was needed only on this one machine that launches the
> app we can not ship it with our software because its gpl licensed
>
> Not to nitpick, but maybe this is important. The Python license is 
> GPL-compatible
> but not GPL :
>
> Note GPL-compatible doesn’t mean that we’re distributing Python under the
> GPL. All Python licenses, unlike the GPL, let you distribute a modified
> version without making your changes open source. The GPL-compatible
> licenses make it possible to combine Python with other software that is
> released under the GPL; the others don’t.
>
> Nick
> ​
>
> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers  wrote:
>
>> i do not think so.
>>
>> does the python 2.7 need to be installed on all slaves? if so, we do not
>> have direct access to those.
>>
>> also, spark is easy for us to ship with our software since its apache 2
>> licensed, and it only needs to be present on the machine that launches the
>> app (thanks to yarn).
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed, so the
>> client would have to download it and install it themselves, and this would
>> mean its an independent install which has to be audited and approved and
>> now you are in for a lot of fun. basically it will never happen.
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen 
>> wrote:
>>
>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>> imagine that they're also capable of installing a standalone Python
>>> alongside that Spark version (without changing Python systemwide). For
>>> instance, Anaconda/Miniconda make it really easy to install Python
>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>> require any special permissions to install (you don't need root / sudo
>>> access). Does this address the Python versioning concerns for RHEL users?
>>>
>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers  wrote:
>>>
 yeah, the practical concern is that we have no control over java or
 python version on large company clusters. our current reality for the vast
 majority of them is java 7 and python 2.6, no matter how outdated that is.

 i dont like it either, but i cannot change it.

 we currently don't use pyspark so i have no stake in this, but if we
 did i can assure you we would not upgrade to spark 2.x if python 2.6 was
 dropped. no point in developing something that doesnt run for majority of
 customers.

 On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> As I pointed out in my earlier email, RHEL will support Python 2.6
> until 2020. So I'm assuming these large companies will have the option of
> riding out Python 2.6 until then.
>
> Are we seriously saying that Spark should likewise support Python 2.6
> for the next several years? Even though the core Python devs stopped
> supporting it in 2013?
>
> If that's not what we're suggesting, then when, roughly, can we drop
> support? What are the criteria?
>
> I understand the practical concern here. If companies are stuck using
> 2.6, it doesn't matter to them that it is deprecated. But balancing that
> concern against the maintenance burden on this project, I would say that
> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
> take. There are many tiny annoyances one has to put up with to support 
> 2.6.
>
> I suppose if our main PySpark contributors are fine putting up with
> those annoyances, then maybe we don't need to drop support just yet...
>
> Nick
> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente 님이
> 작성:
>
>> Unfortunately, Koert is right.
>>
>> I've been in a couple of projects using Spark (banking industry)
>> where CentOS + Python 2.6 is the toolbox available.
>>
>> That said, I believe it should not be a concern for Spark. Python 2.6
>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>
>>
>> El 5 ene 2016, a las 20:07, Koert Kuipers 
>> escribió:
>>
>> rhel/centos 6 ships with python 2.6, doesnt it?
>>
>> if so, i still know plenty of large companies where python 2.6 is the
>> only option. asking them for python 2.7 is not going to work
>>
>> so i think its a bad idea
>>
>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>> juliet.hougl...@gmail.com> wrote:
>>
>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>> this point, Python 3 should be the default that is encouraged.
>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>> the version they should theoretically us

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
even if python 2.7 was needed only on this one machine that launches the
app we can not ship it with our software because its gpl licensed

Not to nitpick, but maybe this is important. The Python license is
GPL-compatible
but not GPL :

Note GPL-compatible doesn’t mean that we’re distributing Python under the
GPL. All Python licenses, unlike the GPL, let you distribute a modified
version without making your changes open source. The GPL-compatible
licenses make it possible to combine Python with other software that is
released under the GPL; the others don’t.

Nick
​

On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers  wrote:

> i do not think so.
>
> does the python 2.7 need to be installed on all slaves? if so, we do not
> have direct access to those.
>
> also, spark is easy for us to ship with our software since its apache 2
> licensed, and it only needs to be present on the machine that launches the
> app (thanks to yarn).
> even if python 2.7 was needed only on this one machine that launches the
> app we can not ship it with our software because its gpl licensed, so the
> client would have to download it and install it themselves, and this would
> mean its an independent install which has to be audited and approved and
> now you are in for a lot of fun. basically it will never happen.
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen 
> wrote:
>
>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>> imagine that they're also capable of installing a standalone Python
>> alongside that Spark version (without changing Python systemwide). For
>> instance, Anaconda/Miniconda make it really easy to install Python
>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>> require any special permissions to install (you don't need root / sudo
>> access). Does this address the Python versioning concerns for RHEL users?
>>
>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers  wrote:
>>
>>> yeah, the practical concern is that we have no control over java or
>>> python version on large company clusters. our current reality for the vast
>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>
>>> i dont like it either, but i cannot change it.
>>>
>>> we currently don't use pyspark so i have no stake in this, but if we did
>>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>> dropped. no point in developing something that doesnt run for majority of
>>> customers.
>>>
>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 As I pointed out in my earlier email, RHEL will support Python 2.6
 until 2020. So I'm assuming these large companies will have the option of
 riding out Python 2.6 until then.

 Are we seriously saying that Spark should likewise support Python 2.6
 for the next several years? Even though the core Python devs stopped
 supporting it in 2013?

 If that's not what we're suggesting, then when, roughly, can we drop
 support? What are the criteria?

 I understand the practical concern here. If companies are stuck using
 2.6, it doesn't matter to them that it is deprecated. But balancing that
 concern against the maintenance burden on this project, I would say that
 "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
 take. There are many tiny annoyances one has to put up with to support 2.6.

 I suppose if our main PySpark contributors are fine putting up with
 those annoyances, then maybe we don't need to drop support just yet...

 Nick
 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente 님이
 작성:

> Unfortunately, Koert is right.
>
> I've been in a couple of projects using Spark (banking industry) where
> CentOS + Python 2.6 is the toolbox available.
>
> That said, I believe it should not be a concern for Spark. Python 2.6
> is old and busted, which is totally opposite to the Spark philosophy IMO.
>
>
> El 5 ene 2016, a las 20:07, Koert Kuipers 
> escribió:
>
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the
> only option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
> juliet.hougl...@gmail.com> wrote:
>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>> this point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind
>> the version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.
>>
>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> +1
>>>
>>> Red Hat supp

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Davies Liu
Created JIRA: https://issues.apache.org/jira/browse/SPARK-12661

On Tue, Jan 5, 2016 at 2:49 PM, Koert Kuipers  wrote:
> i do not think so.
>
> does the python 2.7 need to be installed on all slaves? if so, we do not
> have direct access to those.
>
> also, spark is easy for us to ship with our software since its apache 2
> licensed, and it only needs to be present on the machine that launches the
> app (thanks to yarn).
> even if python 2.7 was needed only on this one machine that launches the app
> we can not ship it with our software because its gpl licensed, so the client
> would have to download it and install it themselves, and this would mean its
> an independent install which has to be audited and approved and now you are
> in for a lot of fun. basically it will never happen.
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen  wrote:
>>
>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>> imagine that they're also capable of installing a standalone Python
>> alongside that Spark version (without changing Python systemwide). For
>> instance, Anaconda/Miniconda make it really easy to install Python 2.7.x/3.x
>> without impacting / changing the system Python and doesn't require any
>> special permissions to install (you don't need root / sudo access). Does
>> this address the Python versioning concerns for RHEL users?
>>
>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers  wrote:
>>>
>>> yeah, the practical concern is that we have no control over java or
>>> python version on large company clusters. our current reality for the vast
>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>
>>> i dont like it either, but i cannot change it.
>>>
>>> we currently don't use pyspark so i have no stake in this, but if we did
>>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>> dropped. no point in developing something that doesnt run for majority of
>>> customers.
>>>
>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas
>>>  wrote:

 As I pointed out in my earlier email, RHEL will support Python 2.6 until
 2020. So I'm assuming these large companies will have the option of riding
 out Python 2.6 until then.

 Are we seriously saying that Spark should likewise support Python 2.6
 for the next several years? Even though the core Python devs stopped
 supporting it in 2013?

 If that's not what we're suggesting, then when, roughly, can we drop
 support? What are the criteria?

 I understand the practical concern here. If companies are stuck using
 2.6, it doesn't matter to them that it is deprecated. But balancing that
 concern against the maintenance burden on this project, I would say that
 "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
 take. There are many tiny annoyances one has to put up with to support 2.6.

 I suppose if our main PySpark contributors are fine putting up with
 those annoyances, then maybe we don't need to drop support just yet...

 Nick
 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente
 님이 작성:
>
> Unfortunately, Koert is right.
>
> I've been in a couple of projects using Spark (banking industry) where
> CentOS + Python 2.6 is the toolbox available.
>
> That said, I believe it should not be a concern for Spark. Python 2.6
> is old and busted, which is totally opposite to the Spark philosophy IMO.
>
>
> El 5 ene 2016, a las 20:07, Koert Kuipers  escribió:
>
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the
> only option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland
>  wrote:
>>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>> this point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind
>> the version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.
>>
>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
>>  wrote:
>>>
>>> +1
>>>
>>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes,
>>> Python 2.6 is ancient history and the core Python developers stopped
>>> supporting it in 2013. REHL 5 is not a good enough reason to continue
>>> support for Python 2.6 IMO.
>>>
>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>> we currently do).
>>>
>>> Nick
>>>
>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang 
>>> wrote:

 plus 1,

 we are currently using python 2.7.2 in production environment.





 在 2016-01-05 18:11:45

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Koert Kuipers
i do not think so.

does the python 2.7 need to be installed on all slaves? if so, we do not
have direct access to those.

also, spark is easy for us to ship with our software since its apache 2
licensed, and it only needs to be present on the machine that launches the
app (thanks to yarn).
even if python 2.7 was needed only on this one machine that launches the
app we can not ship it with our software because its gpl licensed, so the
client would have to download it and install it themselves, and this would
mean its an independent install which has to be audited and approved and
now you are in for a lot of fun. basically it will never happen.


On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen  wrote:

> If users are able to install Spark 2.0 on their RHEL clusters, then I
> imagine that they're also capable of installing a standalone Python
> alongside that Spark version (without changing Python systemwide). For
> instance, Anaconda/Miniconda make it really easy to install Python
> 2.7.x/3.x without impacting / changing the system Python and doesn't
> require any special permissions to install (you don't need root / sudo
> access). Does this address the Python versioning concerns for RHEL users?
>
> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers  wrote:
>
>> yeah, the practical concern is that we have no control over java or
>> python version on large company clusters. our current reality for the vast
>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>
>> i dont like it either, but i cannot change it.
>>
>> we currently don't use pyspark so i have no stake in this, but if we did
>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>> dropped. no point in developing something that doesnt run for majority of
>> customers.
>>
>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>>> 2020. So I'm assuming these large companies will have the option of riding
>>> out Python 2.6 until then.
>>>
>>> Are we seriously saying that Spark should likewise support Python 2.6
>>> for the next several years? Even though the core Python devs stopped
>>> supporting it in 2013?
>>>
>>> If that's not what we're suggesting, then when, roughly, can we drop
>>> support? What are the criteria?
>>>
>>> I understand the practical concern here. If companies are stuck using
>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>> concern against the maintenance burden on this project, I would say that
>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>
>>> I suppose if our main PySpark contributors are fine putting up with
>>> those annoyances, then maybe we don't need to drop support just yet...
>>>
>>> Nick
>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente 님이
>>> 작성:
>>>
 Unfortunately, Koert is right.

 I've been in a couple of projects using Spark (banking industry) where
 CentOS + Python 2.6 is the toolbox available.

 That said, I believe it should not be a concern for Spark. Python 2.6
 is old and busted, which is totally opposite to the Spark philosophy IMO.


 El 5 ene 2016, a las 20:07, Koert Kuipers  escribió:

 rhel/centos 6 ships with python 2.6, doesnt it?

 if so, i still know plenty of large companies where python 2.6 is the
 only option. asking them for python 2.7 is not going to work

 so i think its a bad idea

 On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
 juliet.hougl...@gmail.com> wrote:

> I don't see a reason Spark 2.0 would need to support Python 2.6. At
> this point, Python 3 should be the default that is encouraged.
> Most organizations acknowledge the 2.7 is common, but lagging behind
> the version they should theoretically use. Dropping python 2.6
> support sounds very reasonable to me.
>
> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> +1
>>
>> Red Hat supports Python 2.6 on REHL 5 until 2020
>> ,
>> but otherwise yes, Python 2.6 is ancient history and the core Python
>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>> reason to continue support for Python 2.6 IMO.
>>
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>> we currently do).
>>
>> Nick
>>
>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang 
>> wrote:
>>
>>> plus 1,
>>>
>>> we are currently using python 2.7.2 in production environment.
>>>
>>>
>>>
>>>
>>>
>>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>>
>>> +1
>>> We use Python 2.7
>>>
>>> Regards,

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
If users are able to install Spark 2.0 on their RHEL clusters, then I
imagine that they're also capable of installing a standalone Python
alongside that Spark version (without changing Python systemwide). For
instance, Anaconda/Miniconda make it really easy to install Python
2.7.x/3.x without impacting / changing the system Python and doesn't
require any special permissions to install (you don't need root / sudo
access). Does this address the Python versioning concerns for RHEL users?

On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers  wrote:

> yeah, the practical concern is that we have no control over java or python
> version on large company clusters. our current reality for the vast
> majority of them is java 7 and python 2.6, no matter how outdated that is.
>
> i dont like it either, but i cannot change it.
>
> we currently don't use pyspark so i have no stake in this, but if we did i
> can assure you we would not upgrade to spark 2.x if python 2.6 was dropped.
> no point in developing something that doesnt run for majority of customers.
>
> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>> 2020. So I'm assuming these large companies will have the option of riding
>> out Python 2.6 until then.
>>
>> Are we seriously saying that Spark should likewise support Python 2.6 for
>> the next several years? Even though the core Python devs stopped supporting
>> it in 2013?
>>
>> If that's not what we're suggesting, then when, roughly, can we drop
>> support? What are the criteria?
>>
>> I understand the practical concern here. If companies are stuck using
>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>> concern against the maintenance burden on this project, I would say that
>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>
>> I suppose if our main PySpark contributors are fine putting up with those
>> annoyances, then maybe we don't need to drop support just yet...
>>
>> Nick
>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente 님이
>> 작성:
>>
>>> Unfortunately, Koert is right.
>>>
>>> I've been in a couple of projects using Spark (banking industry) where
>>> CentOS + Python 2.6 is the toolbox available.
>>>
>>> That said, I believe it should not be a concern for Spark. Python 2.6 is
>>> old and busted, which is totally opposite to the Spark philosophy IMO.
>>>
>>>
>>> El 5 ene 2016, a las 20:07, Koert Kuipers  escribió:
>>>
>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>
>>> if so, i still know plenty of large companies where python 2.6 is the
>>> only option. asking them for python 2.7 is not going to work
>>>
>>> so i think its a bad idea
>>>
>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>> juliet.hougl...@gmail.com> wrote:
>>>
 I don't see a reason Spark 2.0 would need to support Python 2.6. At
 this point, Python 3 should be the default that is encouraged.
 Most organizations acknowledge the 2.7 is common, but lagging behind
 the version they should theoretically use. Dropping python 2.6
 support sounds very reasonable to me.

 On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020
> ,
> but otherwise yes, Python 2.6 is ancient history and the core Python
> developers stopped supporting it in 2013. REHL 5 is not a good enough
> reason to continue support for Python 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe
> we currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang 
> wrote:
>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin 
>> wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in
>>> Spark 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark 
>>> depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still 
>>> uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>>

>>>
>


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Koert Kuipers
yeah, the practical concern is that we have no control over java or python
version on large company clusters. our current reality for the vast
majority of them is java 7 and python 2.6, no matter how outdated that is.

i dont like it either, but i cannot change it.

we currently don't use pyspark so i have no stake in this, but if we did i
can assure you we would not upgrade to spark 2.x if python 2.6 was dropped.
no point in developing something that doesnt run for majority of customers.

On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas  wrote:

> As I pointed out in my earlier email, RHEL will support Python 2.6 until
> 2020. So I'm assuming these large companies will have the option of riding
> out Python 2.6 until then.
>
> Are we seriously saying that Spark should likewise support Python 2.6 for
> the next several years? Even though the core Python devs stopped supporting
> it in 2013?
>
> If that's not what we're suggesting, then when, roughly, can we drop
> support? What are the criteria?
>
> I understand the practical concern here. If companies are stuck using 2.6,
> it doesn't matter to them that it is deprecated. But balancing that concern
> against the maintenance burden on this project, I would say that "upgrade
> to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to take.
> There are many tiny annoyances one has to put up with to support 2.6.
>
> I suppose if our main PySpark contributors are fine putting up with those
> annoyances, then maybe we don't need to drop support just yet...
>
> Nick
> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente 님이
> 작성:
>
>> Unfortunately, Koert is right.
>>
>> I've been in a couple of projects using Spark (banking industry) where
>> CentOS + Python 2.6 is the toolbox available.
>>
>> That said, I believe it should not be a concern for Spark. Python 2.6 is
>> old and busted, which is totally opposite to the Spark philosophy IMO.
>>
>>
>> El 5 ene 2016, a las 20:07, Koert Kuipers  escribió:
>>
>> rhel/centos 6 ships with python 2.6, doesnt it?
>>
>> if so, i still know plenty of large companies where python 2.6 is the
>> only option. asking them for python 2.7 is not going to work
>>
>> so i think its a bad idea
>>
>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>> juliet.hougl...@gmail.com> wrote:
>>
>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>>> point, Python 3 should be the default that is encouraged.
>>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>>> version they should theoretically use. Dropping python 2.6
>>> support sounds very reasonable to me.
>>>
>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 +1

 Red Hat supports Python 2.6 on REHL 5 until 2020
 ,
 but otherwise yes, Python 2.6 is ancient history and the core Python
 developers stopped supporting it in 2013. REHL 5 is not a good enough
 reason to continue support for Python 2.6 IMO.

 We should aim to support Python 2.7 and Python 3.3+ (which I believe we
 currently do).

 Nick

 On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang 
 wrote:

> plus 1,
>
> we are currently using python 2.7.2 in production environment.
>
>
>
>
>
> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>
> +1
> We use Python 2.7
>
> Regards,
>
> Meethu Mathew
>
> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin 
> wrote:
>
>> Does anybody here care about us dropping support for Python 2.6 in
>> Spark 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>> parsing) when compared with Python 2.7. Some libraries that Spark depend 
>> on
>> stopped supporting 2.6. We can still convince the library maintainers to
>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>> Python 2.6 to run Spark.
>>
>> Thanks.
>>
>>
>>
>
>>>
>>


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
As I pointed out in my earlier email, RHEL will support Python 2.6 until
2020. So I'm assuming these large companies will have the option of riding
out Python 2.6 until then.

Are we seriously saying that Spark should likewise support Python 2.6 for
the next several years? Even though the core Python devs stopped supporting
it in 2013?

If that's not what we're suggesting, then when, roughly, can we drop
support? What are the criteria?

I understand the practical concern here. If companies are stuck using 2.6,
it doesn't matter to them that it is deprecated. But balancing that concern
against the maintenance burden on this project, I would say that "upgrade
to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to take.
There are many tiny annoyances one has to put up with to support 2.6.

I suppose if our main PySpark contributors are fine putting up with those
annoyances, then maybe we don't need to drop support just yet...

Nick
2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente 님이
작성:

> Unfortunately, Koert is right.
>
> I've been in a couple of projects using Spark (banking industry) where
> CentOS + Python 2.6 is the toolbox available.
>
> That said, I believe it should not be a concern for Spark. Python 2.6 is
> old and busted, which is totally opposite to the Spark philosophy IMO.
>
>
> El 5 ene 2016, a las 20:07, Koert Kuipers  escribió:
>
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the only
> option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland  > wrote:
>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>> point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>> version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.
>>
>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> +1
>>>
>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>> ,
>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>> reason to continue support for Python 2.6 IMO.
>>>
>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>>> currently do).
>>>
>>> Nick
>>>
>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang 
>>> wrote:
>>>
 plus 1,

 we are currently using python 2.7.2 in production environment.





 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:

 +1
 We use Python 2.7

 Regards,

 Meethu Mathew

 On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin 
 wrote:

> Does anybody here care about us dropping support for Python 2.6 in
> Spark 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend 
> on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>
>

>>
>


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Julio Antonio Soto de Vicente
Unfortunately, Koert is right.

I've been in a couple of projects using Spark (banking industry) where CentOS + 
Python 2.6 is the toolbox available. 

That said, I believe it should not be a concern for Spark. Python 2.6 is old 
and busted, which is totally opposite to the Spark philosophy IMO.


> El 5 ene 2016, a las 20:07, Koert Kuipers  escribió:
> 
> rhel/centos 6 ships with python 2.6, doesnt it?
> 
> if so, i still know plenty of large companies where python 2.6 is the only 
> option. asking them for python 2.7 is not going to work
> 
> so i think its a bad idea
> 
>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland  
>> wrote:
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this 
>> point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the 
>> version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.
>> 
>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas 
>>>  wrote:
>> 
>>> +1
>>> 
>>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python 
>>> 2.6 is ancient history and the core Python developers stopped supporting it 
>>> in 2013. REHL 5 is not a good enough reason to continue support for Python 
>>> 2.6 IMO.
>>> 
>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we 
>>> currently do).
>>> 
>>> Nick
>>> 
 On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:
 plus 1,
 
 we are currently using python 2.7.2 in production environment.
 
 
 
 
 
 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
 +1
 We use Python 2.7
 
 Regards,
  
 Meethu Mathew
 
> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:
> Does anybody here care about us dropping support for Python 2.6 in Spark 
> 2.0? 
> 
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json 
> parsing) when compared with Python 2.7. Some libraries that Spark depend 
> on stopped supporting 2.6. We can still convince the library maintainers 
> to support 2.6, but it will be extra work. I'm curious if anybody still 
> uses Python 2.6 to run Spark.
> 
> Thanks.
> 


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Ted Yu
+1

> On Jan 5, 2016, at 10:49 AM, Davies Liu  wrote:
> 
> +1
> 
> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
>  wrote:
>> +1
>> 
>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python
>> 2.6 is ancient history and the core Python developers stopped supporting it
>> in 2013. REHL 5 is not a good enough reason to continue support for Python
>> 2.6 IMO.
>> 
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>> 
>> Nick
>> 
>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:
>>> 
>>> plus 1,
>>> 
>>> we are currently using python 2.7.2 in production environment.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>> 
>>> +1
>>> We use Python 2.7
>>> 
>>> Regards,
>>> 
>>> Meethu Mathew
>>> 
 On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:
 
 Does anybody here care about us dropping support for Python 2.6 in Spark
 2.0?
 
 Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
 parsing) when compared with Python 2.7. Some libraries that Spark depend on
 stopped supporting 2.6. We can still convince the library maintainers to
 support 2.6, but it will be extra work. I'm curious if anybody still uses
 Python 2.6 to run Spark.
 
 Thanks.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Koert Kuipers
rhel/centos 6 ships with python 2.6, doesnt it?

if so, i still know plenty of large companies where python 2.6 is the only
option. asking them for python 2.7 is not going to work

so i think its a bad idea

On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland 
wrote:

> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
> point, Python 3 should be the default that is encouraged.
> Most organizations acknowledge the 2.7 is common, but lagging behind the
> version they should theoretically use. Dropping python 2.6
> support sounds very reasonable to me.
>
> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> +1
>>
>> Red Hat supports Python 2.6 on REHL 5 until 2020
>> , but
>> otherwise yes, Python 2.6 is ancient history and the core Python developers
>> stopped supporting it in 2013. REHL 5 is not a good enough reason to
>> continue support for Python 2.6 IMO.
>>
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>>
>> Nick
>>
>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:
>>
>>> plus 1,
>>>
>>> we are currently using python 2.7.2 in production environment.
>>>
>>>
>>>
>>>
>>>
>>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>>
>>> +1
>>> We use Python 2.7
>>>
>>> Regards,
>>>
>>> Meethu Mathew
>>>
>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin 
>>> wrote:
>>>
 Does anybody here care about us dropping support for Python 2.6 in
 Spark 2.0?

 Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
 parsing) when compared with Python 2.7. Some libraries that Spark depend on
 stopped supporting 2.6. We can still convince the library maintainers to
 support 2.6, but it will be extra work. I'm curious if anybody still uses
 Python 2.6 to run Spark.

 Thanks.



>>>
>


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Juliet Hougland
I don't see a reason Spark 2.0 would need to support Python 2.6. At this
point, Python 3 should be the default that is encouraged.
Most organizations acknowledge the 2.7 is common, but lagging behind the
version they should theoretically use. Dropping python 2.6
support sounds very reasonable to me.

On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas  wrote:

> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020
> , but
> otherwise yes, Python 2.6 is ancient history and the core Python developers
> stopped supporting it in 2013. REHL 5 is not a good enough reason to
> continue support for Python 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:
>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>>


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Davies Liu
+1

On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
 wrote:
> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python
> 2.6 is ancient history and the core Python developers stopped supporting it
> in 2013. REHL 5 is not a good enough reason to continue support for Python
> 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:
>>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:
>>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
+1

Red Hat supports Python 2.6 on REHL 5 until 2020
, but
otherwise yes, Python 2.6 is ancient history and the core Python developers
stopped supporting it in 2013. REHL 5 is not a good enough reason to
continue support for Python 2.6 IMO.

We should aim to support Python 2.7 and Python 3.3+ (which I believe we
currently do).

Nick

On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang  wrote:

> plus 1,
>
> we are currently using python 2.7.2 in production environment.
>
>
>
>
>
> 在 2016-01-05 18:11:45,"Meethu Mathew"  写道:
>
> +1
> We use Python 2.7
>
> Regards,
>
> Meethu Mathew
>
> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:
>
>> Does anybody here care about us dropping support for Python 2.6 in Spark
>> 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>> stopped supporting 2.6. We can still convince the library maintainers to
>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>> Python 2.6 to run Spark.
>>
>> Thanks.
>>
>>
>>
>


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Allen Zhang
plus 1,


we are currently using python 2.7.2 in production environment.






在 2016-01-05 18:11:45,"Meethu Mathew"  写道:

+1
We use Python 2.7


Regards,
 
Meethu Mathew


On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:

Does anybody here care about us dropping support for Python 2.6 in Spark 2.0? 


Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) 
when compared with Python 2.7. Some libraries that Spark depend on stopped 
supporting 2.6. We can still convince the library maintainers to support 2.6, 
but it will be extra work. I'm curious if anybody still uses Python 2.6 to run 
Spark.


Thanks.







Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Meethu Mathew
+1
We use Python 2.7

Regards,

Meethu Mathew

On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin  wrote:

> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>
>


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Sean Owen
+juliet for an additional opinion, but FWIW I think it's safe to say
that future CDH will have a more consistent Python story and that
story will support 2.7 rather than 2.6.

On Tue, Jan 5, 2016 at 7:17 AM, Reynold Xin  wrote:
> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread yash datta
+1

On Tue, Jan 5, 2016 at 1:57 PM, Jian Feng Zhang 
wrote:

> +1
>
> We use Python 2.7+ and 3.4+ to call PySpark.
>
> 2016-01-05 15:58 GMT+08:00 Kushal Datta :
>
>> +1
>>
>> 
>> Dr. Kushal Datta
>> Senior Research Scientist
>> Big Data Research & Pathfinding
>> Intel Corporation, USA.
>>
>> On Mon, Jan 4, 2016 at 11:52 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1
>>>
>>> no problem for me to remove Python 2.6 in 2.0.
>>>
>>> Thanks
>>> Regards
>>> JB
>>>
>>>
>>> On 01/05/2016 08:17 AM, Reynold Xin wrote:
>>>
 Does anybody here care about us dropping support for Python 2.6 in Spark
 2.0?

 Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
 parsing) when compared with Python 2.7. Some libraries that Spark depend
 on stopped supporting 2.6. We can still convince the library maintainers
 to support 2.6, but it will be extra work. I'm curious if anybody still
 uses Python 2.6 to run Spark.

 Thanks.



>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Best,
> Jian Feng
>



-- 
When events unfold with calm and ease
When the winds that blow are merely breeze
Learn from nature, from birds and bees
Live your life in love, and let joy not cease.


Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Jian Feng Zhang
+1

We use Python 2.7+ and 3.4+ to call PySpark.

2016-01-05 15:58 GMT+08:00 Kushal Datta :

> +1
>
> 
> Dr. Kushal Datta
> Senior Research Scientist
> Big Data Research & Pathfinding
> Intel Corporation, USA.
>
> On Mon, Jan 4, 2016 at 11:52 PM, Jean-Baptiste Onofré 
> wrote:
>
>> +1
>>
>> no problem for me to remove Python 2.6 in 2.0.
>>
>> Thanks
>> Regards
>> JB
>>
>>
>> On 01/05/2016 08:17 AM, Reynold Xin wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend
>>> on stopped supporting 2.6. We can still convince the library maintainers
>>> to support 2.6, but it will be extra work. I'm curious if anybody still
>>> uses Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


-- 
Best,
Jian Feng


Re: [discuss] dropping Python 2.6 support

2016-01-04 Thread Kushal Datta
+1


Dr. Kushal Datta
Senior Research Scientist
Big Data Research & Pathfinding
Intel Corporation, USA.

On Mon, Jan 4, 2016 at 11:52 PM, Jean-Baptiste Onofré 
wrote:

> +1
>
> no problem for me to remove Python 2.6 in 2.0.
>
> Thanks
> Regards
> JB
>
>
> On 01/05/2016 08:17 AM, Reynold Xin wrote:
>
>> Does anybody here care about us dropping support for Python 2.6 in Spark
>> 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>> parsing) when compared with Python 2.7. Some libraries that Spark depend
>> on stopped supporting 2.6. We can still convince the library maintainers
>> to support 2.6, but it will be extra work. I'm curious if anybody still
>> uses Python 2.6 to run Spark.
>>
>> Thanks.
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [discuss] dropping Python 2.6 support

2016-01-04 Thread Jean-Baptiste Onofré

+1

no problem for me to remove Python 2.6 in 2.0.

Thanks
Regards
JB

On 01/05/2016 08:17 AM, Reynold Xin wrote:

Does anybody here care about us dropping support for Python 2.6 in Spark
2.0?

Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
parsing) when compared with Python 2.7. Some libraries that Spark depend
on stopped supporting 2.6. We can still convince the library maintainers
to support 2.6, but it will be extra work. I'm curious if anybody still
uses Python 2.6 to run Spark.

Thanks.




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[discuss] dropping Python 2.6 support

2016-01-04 Thread Reynold Xin
Does anybody here care about us dropping support for Python 2.6 in Spark
2.0?

Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
parsing) when compared with Python 2.7. Some libraries that Spark depend on
stopped supporting 2.6. We can still convince the library maintainers to
support 2.6, but it will be extra work. I'm curious if anybody still uses
Python 2.6 to run Spark.

Thanks.