I’m going to drink a celebratory afternoon coffee :) On Tue, Jul 14, 2020 at 12:26 PM shane knapp ☠ <skn...@berkeley.edu> wrote:
> this is seriously great news! let's all take a moment and welcome apache > spark's python support to the present. ;) > > On Mon, Jul 13, 2020 at 7:26 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> Awesome, thanks you for driving this forward :) >> >> On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master >>> branch at https://github.com/apache/spark/pull/28957 >>> >>> 2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: >>> >>>> Thanks Dongjoon. That makes much more sense now! >>>> >>>> 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun <dongjoon.h...@gmail.com>님이 작성: >>>> >>>>> Thank you, Hyukjin. >>>>> >>>>> According to the Python community, Python 3.5 is also EOF at >>>>> 2020-09-13 (only two months left). >>>>> >>>>> - https://www.python.org/downloads/ >>>>> >>>>> So, targeting live Python versions at Apache Spark 3.1.0 (December >>>>> 2020) looks reasonable to me. >>>>> >>>>> For old Python versions, we still have Apache Spark 2.4 LTS and also >>>>> Apache Spark 3.0.x will work. >>>>> >>>>> Bests, >>>>> Dongjoon. >>>>> >>>>> >>>>> On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li <xyliyuanj...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1, especially Python 2 >>>>>> >>>>>> Holden Karau <hol...@pigscanfly.ca> 于2020年7月2日周四 上午10:20写道: >>>>>> >>>>>>> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. >>>>>>> It will be exciting to get to use more recent Python features. The most >>>>>>> recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with >>>>>>> 3.5, if folks really can’t upgrade there’s conda. >>>>>>> >>>>>>> Is there anyone with a large Python 3.5 fleet who can’t use conda? >>>>>>> >>>>>>> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon <gurwls...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think >>>>>>>> we should make such changes in maintenance releases >>>>>>>> >>>>>>>> 2020년 7월 2일 (목) 오전 11:13, Holden Karau <hol...@pigscanfly.ca>님이 작성: >>>>>>>> >>>>>>>>> To be clear the plan is to drop them in Spark 3.1 onwards, yes? >>>>>>>>> >>>>>>>>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <gurwls...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I would like to discuss dropping deprecated Python versions 2, >>>>>>>>>> 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I >>>>>>>>>> assume people support it in general >>>>>>>>>> but I am writing this to make sure everybody is happy. >>>>>>>>>> >>>>>>>>>> Fokko made a very good investigation on it, see >>>>>>>>>> https://github.com/apache/spark/pull/28957#issuecomment-652022449 >>>>>>>>>> . >>>>>>>>>> Assuming from the statistics, I think we're pretty safe to drop >>>>>>>>>> them. >>>>>>>>>> Also note that dropping Python 2 was actually declared at >>>>>>>>>> https://python3statement.org/ >>>>>>>>>> >>>>>>>>>> Roughly speaking, there are many main advantages by dropping them: >>>>>>>>>> 1. It removes a bunch of hacks we added around 700 lines in >>>>>>>>>> PySpark. >>>>>>>>>> 2. PyPy2 has a critical bug that causes a flaky test, >>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28358 given my >>>>>>>>>> testing and investigation. >>>>>>>>>> 3. Users can use Python type hints with Pandas UDFs without >>>>>>>>>> thinking about Python version >>>>>>>>>> 4. Users can leverage one latest cloudpickle, >>>>>>>>>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it >>>>>>>>>> can also leverage C pickle. >>>>>>>>>> 5. ... >>>>>>>>>> >>>>>>>>>> So it benefits both users and dev. WDYT guys? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>> >>>>>>>> -- >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> > > > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau