Re: Should python-2 be supported in Spark 3.0?

2019-06-03 Thread Xiangrui Meng
I updated Spark website and announced the plan for dropping python 2
support there:
http://spark.apache.org/news/plan-for-dropping-python-2-support.html. I
will send an announcement email to user@ and dev@. -Xiangrui

On Fri, May 31, 2019 at 10:54 PM Felix Cheung 
wrote:

> Very subtle but someone might take
>
> “We will drop Python 2 support in a future release in 2020”
>
> To mean any / first release in 2020. Whereas the next statement indicates
> patch release is not included in above. Might help reorder the items or
> clarify the wording.
>
>
> --
> *From:* shane knapp 
> *Sent:* Friday, May 31, 2019 7:38:10 PM
> *To:* Denny Lee
> *Cc:* Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark
> Hamstra; Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng;
> dev; user
> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>
> +1000  ;)
>
> On Sat, Jun 1, 2019 at 6:53 AM Denny Lee  wrote:
>
>> +1
>>
>> On Fri, May 31, 2019 at 17:58 Holden Karau  wrote:
>>
>>> +1
>>>
>>> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:
>>>
>>>> +1 and the draft sounds good
>>>>
>>>> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:
>>>>
>>>>> Here is the draft announcement:
>>>>>
>>>>> ===
>>>>> Plan for dropping Python 2 support
>>>>>
>>>>> As many of you already knew, Python core development team and many
>>>>> utilized Python packages like Pandas and NumPy will drop Python 2 support
>>>>> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
>>>>> since Spark 1.4 release in 2015. However, maintaining Python 2/3
>>>>> compatibility is an increasing burden and it essentially limits the use of
>>>>> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
>>>>> coming, we plan to eventually drop Python 2 support as well. The current
>>>>> plan is as follows:
>>>>>
>>>>> * In the next major release in 2019, we will deprecate Python 2
>>>>> support. PySpark users will see a deprecation warning if Python 2 is used.
>>>>> We will publish a migration guide for PySpark users to migrate to Python 
>>>>> 3.
>>>>> * We will drop Python 2 support in a future release in 2020, after
>>>>> Python 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is
>>>>> used.
>>>>> * For releases that support Python 2, e.g., Spark 2.4, their patch
>>>>> releases will continue supporting Python 2. However, after Python 2 EOL, 
>>>>> we
>>>>> might not take patches that are specific to Python 2.
>>>>> ===
>>>>>
>>>>> Sean helped make a pass. If it looks good, I'm going to upload it to
>>>>> Spark website and announce it here. Let me know if you think we should do 
>>>>> a
>>>>> VOTE instead.
>>>>>
>>>>> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng 
>>>>> wrote:
>>>>>
>>>>>> I created https://issues.apache.org/jira/browse/SPARK-27884 to track
>>>>>> the work.
>>>>>>
>>>>>> On Thu, May 30, 2019 at 2:18 AM Felix Cheung <
>>>>>> felixcheun...@hotmail.com> wrote:
>>>>>>
>>>>>>> We don’t usually reference a future release on website
>>>>>>>
>>>>>>> > Spark website and state that Python 2 is deprecated in Spark 3.0
>>>>>>>
>>>>>>> I suspect people will then ask when is Spark 3.0 coming out then.
>>>>>>> Might need to provide some clarity on that.
>>>>>>>
>>>>>>
>>>>>> We can say the "next major release in 2019" instead of Spark 3.0.
>>>>>> Spark 3.0 timeline certainly requires a new thread to discuss.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *From:* Reynold Xin 
>>>>>>> *Sent:* Thursday, May 30, 2019 12:59:14 AM
>>>>>>> *To:* shane knapp
>>>>>>> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen;
>>>>>>> Wenchen Fen; Xiangrui Meng; dev; user
>>>>>>> *Subject:* Re: Should pyth

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Felix Cheung
Very subtle but someone might take

“We will drop Python 2 support in a future release in 2020”

To mean any / first release in 2020. Whereas the next statement indicates patch 
release is not included in above. Might help reorder the items or clarify the 
wording.



From: shane knapp 
Sent: Friday, May 31, 2019 7:38:10 PM
To: Denny Lee
Cc: Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark Hamstra; 
Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1000  ;)

On Sat, Jun 1, 2019 at 6:53 AM Denny Lee 
mailto:denny.g@gmail.com>> wrote:
+1

On Fri, May 31, 2019 at 17:58 Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
+1

On Fri, May 31, 2019 at 5:41 PM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
+1 and the draft sounds good

On Thu, May 30, 2019, 11:32 AM Xiangrui Meng 
mailto:men...@gmail.com>> wrote:
Here is the draft announcement:

===
Plan for dropping Python 2 support

As many of you already knew, Python core development team and many utilized 
Python packages like Pandas and NumPy will drop Python 2 support in or before 
2020/01/01. Apache Spark has supported both Python 2 and 3 since Spark 1.4 
release in 2015. However, maintaining Python 2/3 compatibility is an increasing 
burden and it essentially limits the use of Python 3 features in Spark. Given 
the end of life (EOL) of Python 2 is coming, we plan to eventually drop Python 
2 support as well. The current plan is as follows:

* In the next major release in 2019, we will deprecate Python 2 support. 
PySpark users will see a deprecation warning if Python 2 is used. We will 
publish a migration guide for PySpark users to migrate to Python 3.
* We will drop Python 2 support in a future release in 2020, after Python 2 EOL 
on 2020/01/01. PySpark users will see an error if Python 2 is used.
* For releases that support Python 2, e.g., Spark 2.4, their patch releases 
will continue supporting Python 2. However, after Python 2 EOL, we might not 
take patches that are specific to Python 2.
===

Sean helped make a pass. If it looks good, I'm going to upload it to Spark 
website and announce it here. Let me know if you think we should do a VOTE 
instead.

On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng 
mailto:men...@gmail.com>> wrote:
I created https://issues.apache.org/jira/browse/SPARK-27884 to track the work.

On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
We don’t usually reference a future release on website

> Spark website and state that Python 2 is deprecated in Spark 3.0

I suspect people will then ask when is Spark 3.0 coming out then. Might need to 
provide some clarity on that.

We can say the "next major release in 2019" instead of Spark 3.0. Spark 3.0 
timeline certainly requires a new thread to discuss.




From: Reynold Xin mailto:r...@databricks.com>>
Sent: Thursday, May 30, 2019 12:59:14 AM
To: shane knapp
Cc: Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen Fen; 
Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1 on Xiangrui’s plan.

On Thu, May 30, 2019 at 7:55 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?

from the build/test side, it will actually be pretty easy to continue support 
for python2.7 for spark 2.x as the feature sets won't be expanding.

that being said, i will be cracking a bottle of champagne when i can delete all 
of the ansible and anaconda configs for python2.x.  :)

On the development side, in a future release that drops Python 2 support we can 
remove code that maintains python 2/3 compatibility and start using python 3 
only features, which is also quite exciting.


shane
--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread shane knapp
+1000  ;)

On Sat, Jun 1, 2019 at 6:53 AM Denny Lee  wrote:

> +1
>
> On Fri, May 31, 2019 at 17:58 Holden Karau  wrote:
>
>> +1
>>
>> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:
>>
>>> +1 and the draft sounds good
>>>
>>> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:
>>>
>>>> Here is the draft announcement:
>>>>
>>>> ===
>>>> Plan for dropping Python 2 support
>>>>
>>>> As many of you already knew, Python core development team and many
>>>> utilized Python packages like Pandas and NumPy will drop Python 2 support
>>>> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
>>>> since Spark 1.4 release in 2015. However, maintaining Python 2/3
>>>> compatibility is an increasing burden and it essentially limits the use of
>>>> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
>>>> coming, we plan to eventually drop Python 2 support as well. The current
>>>> plan is as follows:
>>>>
>>>> * In the next major release in 2019, we will deprecate Python 2
>>>> support. PySpark users will see a deprecation warning if Python 2 is used.
>>>> We will publish a migration guide for PySpark users to migrate to Python 3.
>>>> * We will drop Python 2 support in a future release in 2020, after
>>>> Python 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is
>>>> used.
>>>> * For releases that support Python 2, e.g., Spark 2.4, their patch
>>>> releases will continue supporting Python 2. However, after Python 2 EOL, we
>>>> might not take patches that are specific to Python 2.
>>>> ===
>>>>
>>>> Sean helped make a pass. If it looks good, I'm going to upload it to
>>>> Spark website and announce it here. Let me know if you think we should do a
>>>> VOTE instead.
>>>>
>>>> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:
>>>>
>>>>> I created https://issues.apache.org/jira/browse/SPARK-27884 to track
>>>>> the work.
>>>>>
>>>>> On Thu, May 30, 2019 at 2:18 AM Felix Cheung <
>>>>> felixcheun...@hotmail.com> wrote:
>>>>>
>>>>>> We don’t usually reference a future release on website
>>>>>>
>>>>>> > Spark website and state that Python 2 is deprecated in Spark 3.0
>>>>>>
>>>>>> I suspect people will then ask when is Spark 3.0 coming out then.
>>>>>> Might need to provide some clarity on that.
>>>>>>
>>>>>
>>>>> We can say the "next major release in 2019" instead of Spark 3.0.
>>>>> Spark 3.0 timeline certainly requires a new thread to discuss.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *From:* Reynold Xin 
>>>>>> *Sent:* Thursday, May 30, 2019 12:59:14 AM
>>>>>> *To:* shane knapp
>>>>>> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen;
>>>>>> Wenchen Fen; Xiangrui Meng; dev; user
>>>>>> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>>>>>>
>>>>>> +1 on Xiangrui’s plan.
>>>>>>
>>>>>> On Thu, May 30, 2019 at 7:55 AM shane knapp 
>>>>>> wrote:
>>>>>>
>>>>>>> I don't have a good sense of the overhead of continuing to support
>>>>>>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>>>>>>
>>>>>>>> from the build/test side, it will actually be pretty easy to
>>>>>>> continue support for python2.7 for spark 2.x as the feature sets won't 
>>>>>>> be
>>>>>>> expanding.
>>>>>>>
>>>>>>
>>>>>>> that being said, i will be cracking a bottle of champagne when i can
>>>>>>> delete all of the ansible and anaconda configs for python2.x.  :)
>>>>>>>
>>>>>>
>>>>> On the development side, in a future release that drops Python 2
>>>>> support we can remove code that maintains python 2/3 compatibility and
>>>>> start using python 3 only features, which is also quite exciting.
>>>>>
>>>>>
>>>>>>
>>>>>>> shane
>>>>>>> --
>>>>>>> Shane Knapp
>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>
>>>>>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Denny Lee
+1

On Fri, May 31, 2019 at 17:58 Holden Karau  wrote:

> +1
>
> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:
>
>> +1 and the draft sounds good
>>
>> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:
>>
>>> Here is the draft announcement:
>>>
>>> ===
>>> Plan for dropping Python 2 support
>>>
>>> As many of you already knew, Python core development team and many
>>> utilized Python packages like Pandas and NumPy will drop Python 2 support
>>> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
>>> since Spark 1.4 release in 2015. However, maintaining Python 2/3
>>> compatibility is an increasing burden and it essentially limits the use of
>>> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
>>> coming, we plan to eventually drop Python 2 support as well. The current
>>> plan is as follows:
>>>
>>> * In the next major release in 2019, we will deprecate Python 2 support.
>>> PySpark users will see a deprecation warning if Python 2 is used. We will
>>> publish a migration guide for PySpark users to migrate to Python 3.
>>> * We will drop Python 2 support in a future release in 2020, after
>>> Python 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is
>>> used.
>>> * For releases that support Python 2, e.g., Spark 2.4, their patch
>>> releases will continue supporting Python 2. However, after Python 2 EOL, we
>>> might not take patches that are specific to Python 2.
>>> ===
>>>
>>> Sean helped make a pass. If it looks good, I'm going to upload it to
>>> Spark website and announce it here. Let me know if you think we should do a
>>> VOTE instead.
>>>
>>> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:
>>>
>>>> I created https://issues.apache.org/jira/browse/SPARK-27884 to track
>>>> the work.
>>>>
>>>> On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
>>>> wrote:
>>>>
>>>>> We don’t usually reference a future release on website
>>>>>
>>>>> > Spark website and state that Python 2 is deprecated in Spark 3.0
>>>>>
>>>>> I suspect people will then ask when is Spark 3.0 coming out then.
>>>>> Might need to provide some clarity on that.
>>>>>
>>>>
>>>> We can say the "next major release in 2019" instead of Spark 3.0. Spark
>>>> 3.0 timeline certainly requires a new thread to discuss.
>>>>
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *From:* Reynold Xin 
>>>>> *Sent:* Thursday, May 30, 2019 12:59:14 AM
>>>>> *To:* shane knapp
>>>>> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
>>>>> Fen; Xiangrui Meng; dev; user
>>>>> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>>>>>
>>>>> +1 on Xiangrui’s plan.
>>>>>
>>>>> On Thu, May 30, 2019 at 7:55 AM shane knapp 
>>>>> wrote:
>>>>>
>>>>>> I don't have a good sense of the overhead of continuing to support
>>>>>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>>>>>
>>>>>>> from the build/test side, it will actually be pretty easy to
>>>>>> continue support for python2.7 for spark 2.x as the feature sets won't be
>>>>>> expanding.
>>>>>>
>>>>>
>>>>>> that being said, i will be cracking a bottle of champagne when i can
>>>>>> delete all of the ansible and anaconda configs for python2.x.  :)
>>>>>>
>>>>>
>>>> On the development side, in a future release that drops Python 2
>>>> support we can remove code that maintains python 2/3 compatibility and
>>>> start using python 3 only features, which is also quite exciting.
>>>>
>>>>
>>>>>
>>>>>> shane
>>>>>> --
>>>>>> Shane Knapp
>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> https://rise.cs.berkeley.edu
>>>>>>
>>>>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Holden Karau
+1

On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:

> +1 and the draft sounds good
>
> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:
>
>> Here is the draft announcement:
>>
>> ===
>> Plan for dropping Python 2 support
>>
>> As many of you already knew, Python core development team and many
>> utilized Python packages like Pandas and NumPy will drop Python 2 support
>> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
>> since Spark 1.4 release in 2015. However, maintaining Python 2/3
>> compatibility is an increasing burden and it essentially limits the use of
>> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
>> coming, we plan to eventually drop Python 2 support as well. The current
>> plan is as follows:
>>
>> * In the next major release in 2019, we will deprecate Python 2 support.
>> PySpark users will see a deprecation warning if Python 2 is used. We will
>> publish a migration guide for PySpark users to migrate to Python 3.
>> * We will drop Python 2 support in a future release in 2020, after Python
>> 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is used.
>> * For releases that support Python 2, e.g., Spark 2.4, their patch
>> releases will continue supporting Python 2. However, after Python 2 EOL, we
>> might not take patches that are specific to Python 2.
>> ===
>>
>> Sean helped make a pass. If it looks good, I'm going to upload it to
>> Spark website and announce it here. Let me know if you think we should do a
>> VOTE instead.
>>
>> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:
>>
>>> I created https://issues.apache.org/jira/browse/SPARK-27884 to track
>>> the work.
>>>
>>> On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
>>> wrote:
>>>
>>>> We don’t usually reference a future release on website
>>>>
>>>> > Spark website and state that Python 2 is deprecated in Spark 3.0
>>>>
>>>> I suspect people will then ask when is Spark 3.0 coming out then. Might
>>>> need to provide some clarity on that.
>>>>
>>>
>>> We can say the "next major release in 2019" instead of Spark 3.0. Spark
>>> 3.0 timeline certainly requires a new thread to discuss.
>>>
>>>
>>>>
>>>>
>>>> --
>>>> *From:* Reynold Xin 
>>>> *Sent:* Thursday, May 30, 2019 12:59:14 AM
>>>> *To:* shane knapp
>>>> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
>>>> Fen; Xiangrui Meng; dev; user
>>>> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>>>>
>>>> +1 on Xiangrui’s plan.
>>>>
>>>> On Thu, May 30, 2019 at 7:55 AM shane knapp 
>>>> wrote:
>>>>
>>>>> I don't have a good sense of the overhead of continuing to support
>>>>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>>>>
>>>>>> from the build/test side, it will actually be pretty easy to continue
>>>>> support for python2.7 for spark 2.x as the feature sets won't be 
>>>>> expanding.
>>>>>
>>>>
>>>>> that being said, i will be cracking a bottle of champagne when i can
>>>>> delete all of the ansible and anaconda configs for python2.x.  :)
>>>>>
>>>>
>>> On the development side, in a future release that drops Python 2 support
>>> we can remove code that maintains python 2/3 compatibility and start using
>>> python 3 only features, which is also quite exciting.
>>>
>>>
>>>>
>>>>> shane
>>>>> --
>>>>> Shane Knapp
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Bryan Cutler
+1 and the draft sounds good

On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:

> Here is the draft announcement:
>
> ===
> Plan for dropping Python 2 support
>
> As many of you already knew, Python core development team and many
> utilized Python packages like Pandas and NumPy will drop Python 2 support
> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
> since Spark 1.4 release in 2015. However, maintaining Python 2/3
> compatibility is an increasing burden and it essentially limits the use of
> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
> coming, we plan to eventually drop Python 2 support as well. The current
> plan is as follows:
>
> * In the next major release in 2019, we will deprecate Python 2 support.
> PySpark users will see a deprecation warning if Python 2 is used. We will
> publish a migration guide for PySpark users to migrate to Python 3.
> * We will drop Python 2 support in a future release in 2020, after Python
> 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is used.
> * For releases that support Python 2, e.g., Spark 2.4, their patch
> releases will continue supporting Python 2. However, after Python 2 EOL, we
> might not take patches that are specific to Python 2.
> ===
>
> Sean helped make a pass. If it looks good, I'm going to upload it to Spark
> website and announce it here. Let me know if you think we should do a VOTE
> instead.
>
> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:
>
>> I created https://issues.apache.org/jira/browse/SPARK-27884 to track the
>> work.
>>
>> On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
>> wrote:
>>
>>> We don’t usually reference a future release on website
>>>
>>> > Spark website and state that Python 2 is deprecated in Spark 3.0
>>>
>>> I suspect people will then ask when is Spark 3.0 coming out then. Might
>>> need to provide some clarity on that.
>>>
>>
>> We can say the "next major release in 2019" instead of Spark 3.0. Spark
>> 3.0 timeline certainly requires a new thread to discuss.
>>
>>
>>>
>>>
>>> ------
>>> *From:* Reynold Xin 
>>> *Sent:* Thursday, May 30, 2019 12:59:14 AM
>>> *To:* shane knapp
>>> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
>>> Fen; Xiangrui Meng; dev; user
>>> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>>>
>>> +1 on Xiangrui’s plan.
>>>
>>> On Thu, May 30, 2019 at 7:55 AM shane knapp  wrote:
>>>
>>>> I don't have a good sense of the overhead of continuing to support
>>>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>>>
>>>>> from the build/test side, it will actually be pretty easy to continue
>>>> support for python2.7 for spark 2.x as the feature sets won't be expanding.
>>>>
>>>
>>>> that being said, i will be cracking a bottle of champagne when i can
>>>> delete all of the ansible and anaconda configs for python2.x.  :)
>>>>
>>>
>> On the development side, in a future release that drops Python 2 support
>> we can remove code that maintains python 2/3 compatibility and start using
>> python 3 only features, which is also quite exciting.
>>
>>
>>>
>>>> shane
>>>> --
>>>> Shane Knapp
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>


Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Xiangrui Meng
Here is the draft announcement:

===
Plan for dropping Python 2 support

As many of you already knew, Python core development team and many utilized
Python packages like Pandas and NumPy will drop Python 2 support in or
before 2020/01/01. Apache Spark has supported both Python 2 and 3 since
Spark 1.4 release in 2015. However, maintaining Python 2/3 compatibility is
an increasing burden and it essentially limits the use of Python 3 features
in Spark. Given the end of life (EOL) of Python 2 is coming, we plan to
eventually drop Python 2 support as well. The current plan is as follows:

* In the next major release in 2019, we will deprecate Python 2 support.
PySpark users will see a deprecation warning if Python 2 is used. We will
publish a migration guide for PySpark users to migrate to Python 3.
* We will drop Python 2 support in a future release in 2020, after Python 2
EOL on 2020/01/01. PySpark users will see an error if Python 2 is used.
* For releases that support Python 2, e.g., Spark 2.4, their patch releases
will continue supporting Python 2. However, after Python 2 EOL, we might
not take patches that are specific to Python 2.
===

Sean helped make a pass. If it looks good, I'm going to upload it to Spark
website and announce it here. Let me know if you think we should do a VOTE
instead.

On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng  wrote:

> I created https://issues.apache.org/jira/browse/SPARK-27884 to track the
> work.
>
> On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
> wrote:
>
>> We don’t usually reference a future release on website
>>
>> > Spark website and state that Python 2 is deprecated in Spark 3.0
>>
>> I suspect people will then ask when is Spark 3.0 coming out then. Might
>> need to provide some clarity on that.
>>
>
> We can say the "next major release in 2019" instead of Spark 3.0. Spark
> 3.0 timeline certainly requires a new thread to discuss.
>
>
>>
>>
>> --
>> *From:* Reynold Xin 
>> *Sent:* Thursday, May 30, 2019 12:59:14 AM
>> *To:* shane knapp
>> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
>> Fen; Xiangrui Meng; dev; user
>> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>>
>> +1 on Xiangrui’s plan.
>>
>> On Thu, May 30, 2019 at 7:55 AM shane knapp  wrote:
>>
>>> I don't have a good sense of the overhead of continuing to support
>>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>>
>>>> from the build/test side, it will actually be pretty easy to continue
>>> support for python2.7 for spark 2.x as the feature sets won't be expanding.
>>>
>>
>>> that being said, i will be cracking a bottle of champagne when i can
>>> delete all of the ansible and anaconda configs for python2.x.  :)
>>>
>>
> On the development side, in a future release that drops Python 2 support
> we can remove code that maintains python 2/3 compatibility and start using
> python 3 only features, which is also quite exciting.
>
>
>>
>>> shane
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>


Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Xiangrui Meng
I created https://issues.apache.org/jira/browse/SPARK-27884 to track the
work.

On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
wrote:

> We don’t usually reference a future release on website
>
> > Spark website and state that Python 2 is deprecated in Spark 3.0
>
> I suspect people will then ask when is Spark 3.0 coming out then. Might
> need to provide some clarity on that.
>

We can say the "next major release in 2019" instead of Spark 3.0. Spark 3.0
timeline certainly requires a new thread to discuss.


>
>
> --
> *From:* Reynold Xin 
> *Sent:* Thursday, May 30, 2019 12:59:14 AM
> *To:* shane knapp
> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen
> Fen; Xiangrui Meng; dev; user
> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>
> +1 on Xiangrui’s plan.
>
> On Thu, May 30, 2019 at 7:55 AM shane knapp  wrote:
>
>> I don't have a good sense of the overhead of continuing to support
>>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>>
>>> from the build/test side, it will actually be pretty easy to continue
>> support for python2.7 for spark 2.x as the feature sets won't be expanding.
>>
>
>> that being said, i will be cracking a bottle of champagne when i can
>> delete all of the ansible and anaconda configs for python2.x.  :)
>>
>
On the development side, in a future release that drops Python 2 support we
can remove code that maintains python 2/3 compatibility and start using
python 3 only features, which is also quite exciting.


>
>> shane
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>


Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Felix Cheung
We don’t usually reference a future release on website

> Spark website and state that Python 2 is deprecated in Spark 3.0

I suspect people will then ask when is Spark 3.0 coming out then. Might need to 
provide some clarity on that.



From: Reynold Xin 
Sent: Thursday, May 30, 2019 12:59:14 AM
To: shane knapp
Cc: Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen Fen; 
Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1 on Xiangrui’s plan.

On Thu, May 30, 2019 at 7:55 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?

from the build/test side, it will actually be pretty easy to continue support 
for python2.7 for spark 2.x as the feature sets won't be expanding.

that being said, i will be cracking a bottle of champagne when i can delete all 
of the ansible and anaconda configs for python2.x.  :)

shane
--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Reynold Xin
+1 on Xiangrui’s plan.

On Thu, May 30, 2019 at 7:55 AM shane knapp  wrote:

> I don't have a good sense of the overhead of continuing to support
>> Python 2; is it large enough to consider dropping it in Spark 3.0?
>>
>> from the build/test side, it will actually be pretty easy to continue
> support for python2.7 for spark 2.x as the feature sets won't be expanding.
>
> that being said, i will be cracking a bottle of champagne when i can
> delete all of the ansible and anaconda configs for python2.x.  :)
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread shane knapp
>
> I don't have a good sense of the overhead of continuing to support
> Python 2; is it large enough to consider dropping it in Spark 3.0?
>
> from the build/test side, it will actually be pretty easy to continue
support for python2.7 for spark 2.x as the feature sets won't be expanding.

that being said, i will be cracking a bottle of champagne when i can delete
all of the ansible and anaconda configs for python2.x.  :)

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread Jules Damji
Here’s the tweet from the horse’s mouth: 

https://twitter.com/gvanrossum/status/1133496146700058626?s=21

Cheers 
Jules 
—
Sent from my iPhone
Pardon the dumb thumb typos :)

> On May 29, 2019, at 10:12 PM, Sean Owen  wrote:
> 
> Deprecated -- certainly and sooner than later.
> I don't have a good sense of the overhead of continuing to support
> Python 2; is it large enough to consider dropping it in Spark 3.0?
> 
>> On Wed, May 29, 2019 at 11:47 PM Xiangrui Meng  wrote:
>> 
>> Hi all,
>> 
>> I want to revive this old thread since no action was taken so far. If we 
>> plan to mark Python 2 as deprecated in Spark 3.0, we should do it as early 
>> as possible and let users know ahead. PySpark depends on Python, numpy, 
>> pandas, and pyarrow, all of which are sunsetting Python 2 support by 
>> 2020/01/01 per https://python3statement.org/. At that time we cannot really 
>> support Python 2 because the dependent libraries do not plan to make new 
>> releases, even for security reasons. So I suggest the following:
>> 
>> 1. Update Spark website and state that Python 2 is deprecated in Spark 3.0 
>> and its support will be removed in a release after 2020/01/01.
>> 2. Make a formal announcement to dev@ and users@.
>> 3. Add Apache Spark project to https://python3statement.org/ timeline.
>> 4. Update PySpark, check python version and print a deprecation warning if 
>> version < 3.
>> 
>> Any thoughts and suggestions?
>> 
>> Best,
>> Xiangrui
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread Sean Owen
Deprecated -- certainly and sooner than later.
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?

On Wed, May 29, 2019 at 11:47 PM Xiangrui Meng  wrote:
>
> Hi all,
>
> I want to revive this old thread since no action was taken so far. If we plan 
> to mark Python 2 as deprecated in Spark 3.0, we should do it as early as 
> possible and let users know ahead. PySpark depends on Python, numpy, pandas, 
> and pyarrow, all of which are sunsetting Python 2 support by 2020/01/01 per 
> https://python3statement.org/. At that time we cannot really support Python 2 
> because the dependent libraries do not plan to make new releases, even for 
> security reasons. So I suggest the following:
>
> 1. Update Spark website and state that Python 2 is deprecated in Spark 3.0 
> and its support will be removed in a release after 2020/01/01.
> 2. Make a formal announcement to dev@ and users@.
> 3. Add Apache Spark project to https://python3statement.org/ timeline.
> 4. Update PySpark, check python version and print a deprecation warning if 
> version < 3.
>
> Any thoughts and suggestions?
>
> Best,
> Xiangrui

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread Xiangrui Meng
Hi all,

I want to revive this old thread since no action was taken so far. If we
plan to mark Python 2 as deprecated in Spark 3.0, we should do it as early
as possible and let users know ahead. PySpark depends on Python, numpy,
pandas, and pyarrow, all of which are sunsetting Python 2 support by
2020/01/01 per https://python3statement.org/. At that time we cannot really
support Python 2 because the dependent libraries do not plan to make new
releases, even for security reasons. So I suggest the following:

1. Update Spark website and state that Python 2 is deprecated in Spark 3.0
and its support will be removed in a release after 2020/01/01.
2. Make a formal announcement to dev@ and users@.
3. Add Apache Spark project to https://python3statement.org/ timeline.
4. Update PySpark, check python version and print a deprecation warning if
version < 3.

Any thoughts and suggestions?

Best,
Xiangrui

On Mon, Sep 17, 2018 at 6:54 PM Erik Erlandson  wrote:

>
> I think that makes sense. The main benefit of deprecating *prior* to 3.0
> would be informational - making the community aware of the upcoming
> transition earlier. But there are other ways to start informing the
> community between now and 3.0, besides formal deprecation.
>
> I have some residual curiosity about what it might mean for a release like
> 2.4 to still be in its support lifetime after Py2 goes EOL. I asked Apache
> Legal  to comment. It is
> possible there are no issues with this at all.
>
>
> On Mon, Sep 17, 2018 at 4:26 PM, Reynold Xin  wrote:
>
>> i'd like to second that.
>>
>> if we want to communicate timeline, we can add to the release notes
>> saying py2 will be deprecated in 3.0, and removed in a 3.x release.
>>
>> --
>> excuse the brevity and lower case due to wrist injury
>>
>>
>> On Mon, Sep 17, 2018 at 4:24 PM Matei Zaharia 
>> wrote:
>>
>>> That’s a good point — I’d say there’s just a risk of creating a
>>> perception issue. First, some users might feel that this means they have to
>>> migrate now, which is before Python itself drops support; they might also
>>> be surprised that we did this in a minor release (e.g. might we drop Python
>>> 2 altogether in a Spark 2.5 if that later comes out?). Second, contributors
>>> might feel that this means new features no longer have to work with Python
>>> 2, which would be confusing. Maybe it’s OK on both fronts, but it just
>>> seems scarier for users to do this now if we do plan to have Spark 3.0 in
>>> the next 6 months anyway.
>>>
>>> Matei
>>>
>>> > On Sep 17, 2018, at 1:04 PM, Mark Hamstra 
>>> wrote:
>>> >
>>> > What is the disadvantage to deprecating now in 2.4.0? I mean, it
>>> doesn't change the code at all; it's just a notification that we will
>>> eventually cease supporting Py2. Wouldn't users prefer to get that
>>> notification sooner rather than later?
>>> >
>>> > On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia <
>>> matei.zaha...@gmail.com> wrote:
>>> > I’d like to understand the maintenance burden of Python 2 before
>>> deprecating it. Since it is not EOL yet, it might make sense to only
>>> deprecate it once it’s EOL (which is still over a year from now).
>>> Supporting Python 2+3 seems less burdensome than supporting, say, multiple
>>> Scala versions in the same codebase, so what are we losing out?
>>> >
>>> > The other thing is that even though Python core devs might not support
>>> 2.x later, it’s quite possible that various Linux distros will if moving
>>> from 2 to 3 remains painful. In that case, we may want Apache Spark to
>>> continue releasing for it despite the Python core devs not supporting it.
>>> >
>>> > Basically, I’d suggest to deprecate this in Spark 3.0 and then remove
>>> it later in 3.x instead of deprecating it in 2.4. I’d also consider looking
>>> at what other data science tools are doing before fully removing it: for
>>> example, if Pandas and TensorFlow no longer support Python 2 past some
>>> point, that might be a good point to remove it.
>>> >
>>> > Matei
>>> >
>>> > > On Sep 17, 2018, at 11:01 AM, Mark Hamstra 
>>> wrote:
>>> > >
>>> > > If we're going to do that, then we need to do it right now, since
>>> 2.4.0 is already in release candidates.
>>> > >
>>> > > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson 
>>> wrote:
>>> > > I like Mark’s concept for deprecating Py2 starting with 2.4: It may
>>> seem like a ways off but even now there may be some spark versions
>>> supporting Py2 past the point where Py2 is no longer receiving security
>>> patches
>>> > >
>>> > >
>>> > > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra <
>>> m...@clearstorydata.com> wrote:
>>> > > We could also deprecate Py2 already in the 2.4.0 release.
>>> > >
>>> > > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
>>> wrote:
>>> > > In case this didn't make it onto this thread:
>>> > >
>>> > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
>>> remove it entirely on a later 3.x release.
>>> > >
>>> > > On 

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Erik Erlandson
I think that makes sense. The main benefit of deprecating *prior* to 3.0
would be informational - making the community aware of the upcoming
transition earlier. But there are other ways to start informing the
community between now and 3.0, besides formal deprecation.

I have some residual curiosity about what it might mean for a release like
2.4 to still be in its support lifetime after Py2 goes EOL. I asked Apache
Legal  to comment. It is
possible there are no issues with this at all.


On Mon, Sep 17, 2018 at 4:26 PM, Reynold Xin  wrote:

> i'd like to second that.
>
> if we want to communicate timeline, we can add to the release notes saying
> py2 will be deprecated in 3.0, and removed in a 3.x release.
>
> --
> excuse the brevity and lower case due to wrist injury
>
>
> On Mon, Sep 17, 2018 at 4:24 PM Matei Zaharia 
> wrote:
>
>> That’s a good point — I’d say there’s just a risk of creating a
>> perception issue. First, some users might feel that this means they have to
>> migrate now, which is before Python itself drops support; they might also
>> be surprised that we did this in a minor release (e.g. might we drop Python
>> 2 altogether in a Spark 2.5 if that later comes out?). Second, contributors
>> might feel that this means new features no longer have to work with Python
>> 2, which would be confusing. Maybe it’s OK on both fronts, but it just
>> seems scarier for users to do this now if we do plan to have Spark 3.0 in
>> the next 6 months anyway.
>>
>> Matei
>>
>> > On Sep 17, 2018, at 1:04 PM, Mark Hamstra 
>> wrote:
>> >
>> > What is the disadvantage to deprecating now in 2.4.0? I mean, it
>> doesn't change the code at all; it's just a notification that we will
>> eventually cease supporting Py2. Wouldn't users prefer to get that
>> notification sooner rather than later?
>> >
>> > On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia 
>> wrote:
>> > I’d like to understand the maintenance burden of Python 2 before
>> deprecating it. Since it is not EOL yet, it might make sense to only
>> deprecate it once it’s EOL (which is still over a year from now).
>> Supporting Python 2+3 seems less burdensome than supporting, say, multiple
>> Scala versions in the same codebase, so what are we losing out?
>> >
>> > The other thing is that even though Python core devs might not support
>> 2.x later, it’s quite possible that various Linux distros will if moving
>> from 2 to 3 remains painful. In that case, we may want Apache Spark to
>> continue releasing for it despite the Python core devs not supporting it.
>> >
>> > Basically, I’d suggest to deprecate this in Spark 3.0 and then remove
>> it later in 3.x instead of deprecating it in 2.4. I’d also consider looking
>> at what other data science tools are doing before fully removing it: for
>> example, if Pandas and TensorFlow no longer support Python 2 past some
>> point, that might be a good point to remove it.
>> >
>> > Matei
>> >
>> > > On Sep 17, 2018, at 11:01 AM, Mark Hamstra 
>> wrote:
>> > >
>> > > If we're going to do that, then we need to do it right now, since
>> 2.4.0 is already in release candidates.
>> > >
>> > > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson 
>> wrote:
>> > > I like Mark’s concept for deprecating Py2 starting with 2.4: It may
>> seem like a ways off but even now there may be some spark versions
>> supporting Py2 past the point where Py2 is no longer receiving security
>> patches
>> > >
>> > >
>> > > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra <
>> m...@clearstorydata.com> wrote:
>> > > We could also deprecate Py2 already in the 2.4.0 release.
>> > >
>> > > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
>> wrote:
>> > > In case this didn't make it onto this thread:
>> > >
>> > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
>> remove it entirely on a later 3.x release.
>> > >
>> > > On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
>> wrote:
>> > > On a separate dev@spark thread, I raised a question of whether or
>> not to support python 2 in Apache Spark, going forward into Spark 3.0.
>> > >
>> > > Python-2 is going EOL at the end of 2019. The upcoming release of
>> Spark 3.0 is an opportunity to make breaking changes to Spark's APIs, and
>> so it is a good time to consider support for Python-2 on PySpark.
>> > >
>> > > Key advantages to dropping Python 2 are:
>> > >   • Support for PySpark becomes significantly easier.
>> > >   • Avoid having to support Python 2 until Spark 4.0, which is
>> likely to imply supporting Python 2 for some time after it goes EOL.
>> > > (Note that supporting python 2 after EOL means, among other things,
>> that PySpark would be supporting a version of python that was no longer
>> receiving security patches)
>> > >
>> > > The main disadvantage is that PySpark users who have legacy python-2
>> code would have to migrate their code to python 3 to take advantage of
>> Spark 3.0
>> > >
>> > > This decision obviously has 

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Reynold Xin
i'd like to second that.

if we want to communicate timeline, we can add to the release notes saying
py2 will be deprecated in 3.0, and removed in a 3.x release.

--
excuse the brevity and lower case due to wrist injury


On Mon, Sep 17, 2018 at 4:24 PM Matei Zaharia 
wrote:

> That’s a good point — I’d say there’s just a risk of creating a perception
> issue. First, some users might feel that this means they have to migrate
> now, which is before Python itself drops support; they might also be
> surprised that we did this in a minor release (e.g. might we drop Python 2
> altogether in a Spark 2.5 if that later comes out?). Second, contributors
> might feel that this means new features no longer have to work with Python
> 2, which would be confusing. Maybe it’s OK on both fronts, but it just
> seems scarier for users to do this now if we do plan to have Spark 3.0 in
> the next 6 months anyway.
>
> Matei
>
> > On Sep 17, 2018, at 1:04 PM, Mark Hamstra 
> wrote:
> >
> > What is the disadvantage to deprecating now in 2.4.0? I mean, it doesn't
> change the code at all; it's just a notification that we will eventually
> cease supporting Py2. Wouldn't users prefer to get that notification sooner
> rather than later?
> >
> > On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia 
> wrote:
> > I’d like to understand the maintenance burden of Python 2 before
> deprecating it. Since it is not EOL yet, it might make sense to only
> deprecate it once it’s EOL (which is still over a year from now).
> Supporting Python 2+3 seems less burdensome than supporting, say, multiple
> Scala versions in the same codebase, so what are we losing out?
> >
> > The other thing is that even though Python core devs might not support
> 2.x later, it’s quite possible that various Linux distros will if moving
> from 2 to 3 remains painful. In that case, we may want Apache Spark to
> continue releasing for it despite the Python core devs not supporting it.
> >
> > Basically, I’d suggest to deprecate this in Spark 3.0 and then remove it
> later in 3.x instead of deprecating it in 2.4. I’d also consider looking at
> what other data science tools are doing before fully removing it: for
> example, if Pandas and TensorFlow no longer support Python 2 past some
> point, that might be a good point to remove it.
> >
> > Matei
> >
> > > On Sep 17, 2018, at 11:01 AM, Mark Hamstra 
> wrote:
> > >
> > > If we're going to do that, then we need to do it right now, since
> 2.4.0 is already in release candidates.
> > >
> > > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson 
> wrote:
> > > I like Mark’s concept for deprecating Py2 starting with 2.4: It may
> seem like a ways off but even now there may be some spark versions
> supporting Py2 past the point where Py2 is no longer receiving security
> patches
> > >
> > >
> > > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra 
> wrote:
> > > We could also deprecate Py2 already in the 2.4.0 release.
> > >
> > > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
> wrote:
> > > In case this didn't make it onto this thread:
> > >
> > > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
> remove it entirely on a later 3.x release.
> > >
> > > On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
> wrote:
> > > On a separate dev@spark thread, I raised a question of whether or not
> to support python 2 in Apache Spark, going forward into Spark 3.0.
> > >
> > > Python-2 is going EOL at the end of 2019. The upcoming release of
> Spark 3.0 is an opportunity to make breaking changes to Spark's APIs, and
> so it is a good time to consider support for Python-2 on PySpark.
> > >
> > > Key advantages to dropping Python 2 are:
> > >   • Support for PySpark becomes significantly easier.
> > >   • Avoid having to support Python 2 until Spark 4.0, which is
> likely to imply supporting Python 2 for some time after it goes EOL.
> > > (Note that supporting python 2 after EOL means, among other things,
> that PySpark would be supporting a version of python that was no longer
> receiving security patches)
> > >
> > > The main disadvantage is that PySpark users who have legacy python-2
> code would have to migrate their code to python 3 to take advantage of
> Spark 3.0
> > >
> > > This decision obviously has large implications for the Apache Spark
> community and we want to solicit community feedback.
> > >
> > >
> >
>
>


Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
That’s a good point — I’d say there’s just a risk of creating a perception 
issue. First, some users might feel that this means they have to migrate now, 
which is before Python itself drops support; they might also be surprised that 
we did this in a minor release (e.g. might we drop Python 2 altogether in a 
Spark 2.5 if that later comes out?). Second, contributors might feel that this 
means new features no longer have to work with Python 2, which would be 
confusing. Maybe it’s OK on both fronts, but it just seems scarier for users to 
do this now if we do plan to have Spark 3.0 in the next 6 months anyway.

Matei

> On Sep 17, 2018, at 1:04 PM, Mark Hamstra  wrote:
> 
> What is the disadvantage to deprecating now in 2.4.0? I mean, it doesn't 
> change the code at all; it's just a notification that we will eventually 
> cease supporting Py2. Wouldn't users prefer to get that notification sooner 
> rather than later?
> 
> On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia  
> wrote:
> I’d like to understand the maintenance burden of Python 2 before deprecating 
> it. Since it is not EOL yet, it might make sense to only deprecate it once 
> it’s EOL (which is still over a year from now). Supporting Python 2+3 seems 
> less burdensome than supporting, say, multiple Scala versions in the same 
> codebase, so what are we losing out?
> 
> The other thing is that even though Python core devs might not support 2.x 
> later, it’s quite possible that various Linux distros will if moving from 2 
> to 3 remains painful. In that case, we may want Apache Spark to continue 
> releasing for it despite the Python core devs not supporting it.
> 
> Basically, I’d suggest to deprecate this in Spark 3.0 and then remove it 
> later in 3.x instead of deprecating it in 2.4. I’d also consider looking at 
> what other data science tools are doing before fully removing it: for 
> example, if Pandas and TensorFlow no longer support Python 2 past some point, 
> that might be a good point to remove it.
> 
> Matei
> 
> > On Sep 17, 2018, at 11:01 AM, Mark Hamstra  wrote:
> > 
> > If we're going to do that, then we need to do it right now, since 2.4.0 is 
> > already in release candidates.
> > 
> > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson  wrote:
> > I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem 
> > like a ways off but even now there may be some spark versions supporting 
> > Py2 past the point where Py2 is no longer receiving security patches 
> > 
> > 
> > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra  
> > wrote:
> > We could also deprecate Py2 already in the 2.4.0 release.
> > 
> > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson  wrote:
> > In case this didn't make it onto this thread:
> > 
> > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove 
> > it entirely on a later 3.x release.
> > 
> > On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson  
> > wrote:
> > On a separate dev@spark thread, I raised a question of whether or not to 
> > support python 2 in Apache Spark, going forward into Spark 3.0.
> > 
> > Python-2 is going EOL at the end of 2019. The upcoming release of Spark 3.0 
> > is an opportunity to make breaking changes to Spark's APIs, and so it is a 
> > good time to consider support for Python-2 on PySpark.
> > 
> > Key advantages to dropping Python 2 are:
> >   • Support for PySpark becomes significantly easier.
> >   • Avoid having to support Python 2 until Spark 4.0, which is likely 
> > to imply supporting Python 2 for some time after it goes EOL.
> > (Note that supporting python 2 after EOL means, among other things, that 
> > PySpark would be supporting a version of python that was no longer 
> > receiving security patches)
> > 
> > The main disadvantage is that PySpark users who have legacy python-2 code 
> > would have to migrate their code to python 3 to take advantage of Spark 3.0
> > 
> > This decision obviously has large implications for the Apache Spark 
> > community and we want to solicit community feedback.
> > 
> > 
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Erik Erlandson
FWIW, Pandas is dropping

Py2 support at the end of this year.  Tensorflow is less clear. They only
support py3 on windows, but there is no reference to any policy about py2
on their roadmap or the TF 2.0 announcement.


Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Mark Hamstra
What is the disadvantage to deprecating now in 2.4.0? I mean, it doesn't
change the code at all; it's just a notification that we will eventually
cease supporting Py2. Wouldn't users prefer to get that notification sooner
rather than later?

On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia 
wrote:

> I’d like to understand the maintenance burden of Python 2 before
> deprecating it. Since it is not EOL yet, it might make sense to only
> deprecate it once it’s EOL (which is still over a year from now).
> Supporting Python 2+3 seems less burdensome than supporting, say, multiple
> Scala versions in the same codebase, so what are we losing out?
>
> The other thing is that even though Python core devs might not support 2.x
> later, it’s quite possible that various Linux distros will if moving from 2
> to 3 remains painful. In that case, we may want Apache Spark to continue
> releasing for it despite the Python core devs not supporting it.
>
> Basically, I’d suggest to deprecate this in Spark 3.0 and then remove it
> later in 3.x instead of deprecating it in 2.4. I’d also consider looking at
> what other data science tools are doing before fully removing it: for
> example, if Pandas and TensorFlow no longer support Python 2 past some
> point, that might be a good point to remove it.
>
> Matei
>
> > On Sep 17, 2018, at 11:01 AM, Mark Hamstra 
> wrote:
> >
> > If we're going to do that, then we need to do it right now, since 2.4.0
> is already in release candidates.
> >
> > On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson 
> wrote:
> > I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem
> like a ways off but even now there may be some spark versions supporting
> Py2 past the point where Py2 is no longer receiving security patches
> >
> >
> > On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra 
> wrote:
> > We could also deprecate Py2 already in the 2.4.0 release.
> >
> > On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
> wrote:
> > In case this didn't make it onto this thread:
> >
> > There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
> remove it entirely on a later 3.x release.
> >
> > On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
> wrote:
> > On a separate dev@spark thread, I raised a question of whether or not
> to support python 2 in Apache Spark, going forward into Spark 3.0.
> >
> > Python-2 is going EOL at the end of 2019. The upcoming release of Spark
> 3.0 is an opportunity to make breaking changes to Spark's APIs, and so it
> is a good time to consider support for Python-2 on PySpark.
> >
> > Key advantages to dropping Python 2 are:
> >   • Support for PySpark becomes significantly easier.
> >   • Avoid having to support Python 2 until Spark 4.0, which is
> likely to imply supporting Python 2 for some time after it goes EOL.
> > (Note that supporting python 2 after EOL means, among other things, that
> PySpark would be supporting a version of python that was no longer
> receiving security patches)
> >
> > The main disadvantage is that PySpark users who have legacy python-2
> code would have to migrate their code to python 3 to take advantage of
> Spark 3.0
> >
> > This decision obviously has large implications for the Apache Spark
> community and we want to solicit community feedback.
> >
> >
>
>


Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
I’d like to understand the maintenance burden of Python 2 before deprecating 
it. Since it is not EOL yet, it might make sense to only deprecate it once it’s 
EOL (which is still over a year from now). Supporting Python 2+3 seems less 
burdensome than supporting, say, multiple Scala versions in the same codebase, 
so what are we losing out?

The other thing is that even though Python core devs might not support 2.x 
later, it’s quite possible that various Linux distros will if moving from 2 to 
3 remains painful. In that case, we may want Apache Spark to continue releasing 
for it despite the Python core devs not supporting it.

Basically, I’d suggest to deprecate this in Spark 3.0 and then remove it later 
in 3.x instead of deprecating it in 2.4. I’d also consider looking at what 
other data science tools are doing before fully removing it: for example, if 
Pandas and TensorFlow no longer support Python 2 past some point, that might be 
a good point to remove it.

Matei

> On Sep 17, 2018, at 11:01 AM, Mark Hamstra  wrote:
> 
> If we're going to do that, then we need to do it right now, since 2.4.0 is 
> already in release candidates.
> 
> On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson  wrote:
> I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem like 
> a ways off but even now there may be some spark versions supporting Py2 past 
> the point where Py2 is no longer receiving security patches 
> 
> 
> On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra  wrote:
> We could also deprecate Py2 already in the 2.4.0 release.
> 
> On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson  wrote:
> In case this didn't make it onto this thread:
> 
> There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove it 
> entirely on a later 3.x release.
> 
> On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson  wrote:
> On a separate dev@spark thread, I raised a question of whether or not to 
> support python 2 in Apache Spark, going forward into Spark 3.0.
> 
> Python-2 is going EOL at the end of 2019. The upcoming release of Spark 3.0 
> is an opportunity to make breaking changes to Spark's APIs, and so it is a 
> good time to consider support for Python-2 on PySpark.
> 
> Key advantages to dropping Python 2 are:
>   • Support for PySpark becomes significantly easier.
>   • Avoid having to support Python 2 until Spark 4.0, which is likely to 
> imply supporting Python 2 for some time after it goes EOL.
> (Note that supporting python 2 after EOL means, among other things, that 
> PySpark would be supporting a version of python that was no longer receiving 
> security patches)
> 
> The main disadvantage is that PySpark users who have legacy python-2 code 
> would have to migrate their code to python 3 to take advantage of Spark 3.0
> 
> This decision obviously has large implications for the Apache Spark community 
> and we want to solicit community feedback.
> 
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Mark Hamstra
If we're going to do that, then we need to do it right now, since 2.4.0 is
already in release candidates.

On Mon, Sep 17, 2018 at 10:57 AM Erik Erlandson  wrote:

> I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem
> like a ways off but even now there may be some spark versions supporting
> Py2 past the point where Py2 is no longer receiving security patches
>
>
> On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra 
> wrote:
>
>> We could also deprecate Py2 already in the 2.4.0 release.
>>
>> On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
>> wrote:
>>
>>> In case this didn't make it onto this thread:
>>>
>>> There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
>>> remove it entirely on a later 3.x release.
>>>
>>> On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
>>> wrote:
>>>
 On a separate dev@spark thread, I raised a question of whether or not
 to support python 2 in Apache Spark, going forward into Spark 3.0.

 Python-2 is going EOL  at
 the end of 2019. The upcoming release of Spark 3.0 is an opportunity to
 make breaking changes to Spark's APIs, and so it is a good time to consider
 support for Python-2 on PySpark.

 Key advantages to dropping Python 2 are:

- Support for PySpark becomes significantly easier.
- Avoid having to support Python 2 until Spark 4.0, which is likely
to imply supporting Python 2 for some time after it goes EOL.

 (Note that supporting python 2 after EOL means, among other things,
 that PySpark would be supporting a version of python that was no longer
 receiving security patches)

 The main disadvantage is that PySpark users who have legacy python-2
 code would have to migrate their code to python 3 to take advantage of
 Spark 3.0

 This decision obviously has large implications for the Apache Spark
 community and we want to solicit community feedback.


>>>


Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Erik Erlandson
I like Mark’s concept for deprecating Py2 starting with 2.4: It may seem
like a ways off but even now there may be some spark versions supporting
Py2 past the point where Py2 is no longer receiving security patches


On Sun, Sep 16, 2018 at 12:26 PM Mark Hamstra 
wrote:

> We could also deprecate Py2 already in the 2.4.0 release.
>
> On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
> wrote:
>
>> In case this didn't make it onto this thread:
>>
>> There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
>> remove it entirely on a later 3.x release.
>>
>> On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
>> wrote:
>>
>>> On a separate dev@spark thread, I raised a question of whether or not
>>> to support python 2 in Apache Spark, going forward into Spark 3.0.
>>>
>>> Python-2 is going EOL  at
>>> the end of 2019. The upcoming release of Spark 3.0 is an opportunity to
>>> make breaking changes to Spark's APIs, and so it is a good time to consider
>>> support for Python-2 on PySpark.
>>>
>>> Key advantages to dropping Python 2 are:
>>>
>>>- Support for PySpark becomes significantly easier.
>>>- Avoid having to support Python 2 until Spark 4.0, which is likely
>>>to imply supporting Python 2 for some time after it goes EOL.
>>>
>>> (Note that supporting python 2 after EOL means, among other things, that
>>> PySpark would be supporting a version of python that was no longer
>>> receiving security patches)
>>>
>>> The main disadvantage is that PySpark users who have legacy python-2
>>> code would have to migrate their code to python 3 to take advantage of
>>> Spark 3.0
>>>
>>> This decision obviously has large implications for the Apache Spark
>>> community and we want to solicit community feedback.
>>>
>>>
>>


Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Hyukjin Kwon
I think we can deprecate it in 3.x.0 and remove it in Spark 4.0.0. Many
people still use Python 2. Also, techincally 2.7 support is not officially
dropped yet - https://pythonclock.org/


2018년 9월 17일 (월) 오전 9:31, Aakash Basu 님이 작성:

> Removing support for an API in a major release makes poor sense,
> deprecating is always better. Removal can always be done two - three minor
> release later.
>
> On Mon 17 Sep, 2018, 6:49 AM Felix Cheung, 
> wrote:
>
>> I don’t think we should remove any API even in a major release without
>> deprecating it first...
>>
>>
>> --
>> *From:* Mark Hamstra 
>> *Sent:* Sunday, September 16, 2018 12:26 PM
>> *To:* Erik Erlandson
>> *Cc:* u...@spark.apache.org; dev
>> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>>
>> We could also deprecate Py2 already in the 2.4.0 release.
>>
>> On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
>> wrote:
>>
>>> In case this didn't make it onto this thread:
>>>
>>> There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and
>>> remove it entirely on a later 3.x release.
>>>
>>> On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
>>> wrote:
>>>
>>>> On a separate dev@spark thread, I raised a question of whether or not
>>>> to support python 2 in Apache Spark, going forward into Spark 3.0.
>>>>
>>>> Python-2 is going EOL <https://github.com/python/devguide/pull/344> at
>>>> the end of 2019. The upcoming release of Spark 3.0 is an opportunity to
>>>> make breaking changes to Spark's APIs, and so it is a good time to consider
>>>> support for Python-2 on PySpark.
>>>>
>>>> Key advantages to dropping Python 2 are:
>>>>
>>>>- Support for PySpark becomes significantly easier.
>>>>- Avoid having to support Python 2 until Spark 4.0, which is likely
>>>>to imply supporting Python 2 for some time after it goes EOL.
>>>>
>>>> (Note that supporting python 2 after EOL means, among other things,
>>>> that PySpark would be supporting a version of python that was no longer
>>>> receiving security patches)
>>>>
>>>> The main disadvantage is that PySpark users who have legacy python-2
>>>> code would have to migrate their code to python 3 to take advantage of
>>>> Spark 3.0
>>>>
>>>> This decision obviously has large implications for the Apache Spark
>>>> community and we want to solicit community feedback.
>>>>
>>>>
>>>


Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Felix Cheung
I don’t think we should remove any API even in a major release without 
deprecating it first...



From: Mark Hamstra 
Sent: Sunday, September 16, 2018 12:26 PM
To: Erik Erlandson
Cc: u...@spark.apache.org; dev
Subject: Re: Should python-2 be supported in Spark 3.0?

We could also deprecate Py2 already in the 2.4.0 release.

On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson 
mailto:eerla...@redhat.com>> wrote:
In case this didn't make it onto this thread:

There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove it 
entirely on a later 3.x release.

On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
mailto:eerla...@redhat.com>> wrote:
On a separate dev@spark thread, I raised a question of whether or not to 
support python 2 in Apache Spark, going forward into Spark 3.0.

Python-2 is going EOL<https://github.com/python/devguide/pull/344> at the end 
of 2019. The upcoming release of Spark 3.0 is an opportunity to make breaking 
changes to Spark's APIs, and so it is a good time to consider support for 
Python-2 on PySpark.

Key advantages to dropping Python 2 are:

  *   Support for PySpark becomes significantly easier.
  *   Avoid having to support Python 2 until Spark 4.0, which is likely to 
imply supporting Python 2 for some time after it goes EOL.

(Note that supporting python 2 after EOL means, among other things, that 
PySpark would be supporting a version of python that was no longer receiving 
security patches)

The main disadvantage is that PySpark users who have legacy python-2 code would 
have to migrate their code to python 3 to take advantage of Spark 3.0

This decision obviously has large implications for the Apache Spark community 
and we want to solicit community feedback.




Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Mark Hamstra
We could also deprecate Py2 already in the 2.4.0 release.

On Sat, Sep 15, 2018 at 11:46 AM Erik Erlandson  wrote:

> In case this didn't make it onto this thread:
>
> There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove
> it entirely on a later 3.x release.
>
> On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
> wrote:
>
>> On a separate dev@spark thread, I raised a question of whether or not to
>> support python 2 in Apache Spark, going forward into Spark 3.0.
>>
>> Python-2 is going EOL  at
>> the end of 2019. The upcoming release of Spark 3.0 is an opportunity to
>> make breaking changes to Spark's APIs, and so it is a good time to consider
>> support for Python-2 on PySpark.
>>
>> Key advantages to dropping Python 2 are:
>>
>>- Support for PySpark becomes significantly easier.
>>- Avoid having to support Python 2 until Spark 4.0, which is likely
>>to imply supporting Python 2 for some time after it goes EOL.
>>
>> (Note that supporting python 2 after EOL means, among other things, that
>> PySpark would be supporting a version of python that was no longer
>> receiving security patches)
>>
>> The main disadvantage is that PySpark users who have legacy python-2 code
>> would have to migrate their code to python 3 to take advantage of Spark 3.0
>>
>> This decision obviously has large implications for the Apache Spark
>> community and we want to solicit community feedback.
>>
>>
>


Re: Should python-2 be supported in Spark 3.0?

2018-09-15 Thread Erik Erlandson
In case this didn't make it onto this thread:

There is a 3rd option, which is to deprecate Py2 for Spark-3.0, and remove
it entirely on a later 3.x release.

On Sat, Sep 15, 2018 at 11:09 AM, Erik Erlandson 
wrote:

> On a separate dev@spark thread, I raised a question of whether or not to
> support python 2 in Apache Spark, going forward into Spark 3.0.
>
> Python-2 is going EOL  at
> the end of 2019. The upcoming release of Spark 3.0 is an opportunity to
> make breaking changes to Spark's APIs, and so it is a good time to consider
> support for Python-2 on PySpark.
>
> Key advantages to dropping Python 2 are:
>
>- Support for PySpark becomes significantly easier.
>- Avoid having to support Python 2 until Spark 4.0, which is likely to
>imply supporting Python 2 for some time after it goes EOL.
>
> (Note that supporting python 2 after EOL means, among other things, that
> PySpark would be supporting a version of python that was no longer
> receiving security patches)
>
> The main disadvantage is that PySpark users who have legacy python-2 code
> would have to migrate their code to python 3 to take advantage of Spark 3.0
>
> This decision obviously has large implications for the Apache Spark
> community and we want to solicit community feedback.
>
>


Re: Should python-2 be supported in Spark 3.0?

2018-09-15 Thread Nicholas Chammas
As Reynold pointed out, we don't have to drop Python 2 support right off
the bat. We can just deprecate it with Spark 3.0, which would allow us to
actually drop it at a later 3.x release.

On Sat, Sep 15, 2018 at 2:09 PM Erik Erlandson  wrote:

> On a separate dev@spark thread, I raised a question of whether or not to
> support python 2 in Apache Spark, going forward into Spark 3.0.
>
> Python-2 is going EOL  at
> the end of 2019. The upcoming release of Spark 3.0 is an opportunity to
> make breaking changes to Spark's APIs, and so it is a good time to consider
> support for Python-2 on PySpark.
>
> Key advantages to dropping Python 2 are:
>
>- Support for PySpark becomes significantly easier.
>- Avoid having to support Python 2 until Spark 4.0, which is likely to
>imply supporting Python 2 for some time after it goes EOL.
>
> (Note that supporting python 2 after EOL means, among other things, that
> PySpark would be supporting a version of python that was no longer
> receiving security patches)
>
> The main disadvantage is that PySpark users who have legacy python-2 code
> would have to migrate their code to python 3 to take advantage of Spark 3.0
>
> This decision obviously has large implications for the Apache Spark
> community and we want to solicit community feedback.
>
>