That seems like a really good reason for a major version change given the % of PySpark users and the fact we are (effectively) tied to pandas APIs.
On Tue, Jun 20, 2023 at 12:24 PM Bjørn Jørgensen <bjornjorgen...@gmail.com> wrote: > One big thing for 4.0 will be that pandas API on spark will support pandas > version 2.0 > > With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking > changes have been introduced. So, we have made the decision to postpone > addressing these breaking changes until the next major release of Spark, > version 4.0.0 to minimize disruptions for our users and provide a more > seamless upgrade experience. > > The pandas 2.0.0 release includes a significant number of updates, such as > API removals, changes in API behavior, parameter removals, parameter > behavior changes, and bug fixes. We have planned the following approach for > each item: > > - *API Removals*: Removed APIs will remain deprecated in Spark 3.5.0, > provide appropriate warnings, and will be removed in Spark 4.0.0. > > - *API Behavior Changes*: APIs with changed behavior will retain the > behavior in Spark 3.5.0, provide appropriate warnings, and will align the > behavior with pandas in Spark 4.0.0. > > - *Parameter Removals*: Removed parameters will remain deprecated in > Spark 3.5.0, provide appropriate warnings, and will be removed in Spark > 4.0.0. > > - *Parameter Behavior Changes*: Parameters with changed behavior will > retain the behavior in Spark 3.5.0, provide appropriate warnings, and will > align the behavior with pandas in Spark 4.0.0. > > - *Bug Fixes*: Bug fixes mainly related to correctness issues will be > fixed in pandas 3.5.0. > > *To recap, all breaking changes related to pandas 2.0.0 will be supported > in Spark 4.0.0,* *and will remain deprecated with appropriate errors in > Spark 3.5.0.* > > > > https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel > > tir. 20. juni 2023 kl. 06:18 skrev Dongjoon Hyun <dongj...@apache.org>: > >> Hi, Herman. >> >> This is a series of discussions as I re-summarized here. >> >> You can find some context in the previous timeline thread. >> >> 2023-05-30 Apache Spark 4.0 Timeframe? >> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 >> >> Could you reply there to collect your timeline suggestions? We can >> discuss more there. >> >> Dongjoon. >> >> >> >> On Mon, Jun 19, 2023 at 1:58 PM Herman van Hovell <her...@databricks.com> >> wrote: >> >>> Dongjoon, I am not sure if I am not sure if I follow the line of thought >>> here. >>> >>> Multiple people have asked for clarification on what Spark 4.0 would >>> mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me >>> to this list. However you choose to single out Xiao because asks this >>> question and wants to do a preview release as well? So again, what does >>> Spark 4 mean, and why does it need to take almost a year? Historically >>> major Spark releases tend to break APIs, but if it only entails changing to >>> Scala 2.13 and dropping support for JDK 8, then we could also just release >>> a month after 3.5. >>> >>> How about we do this? We get 3.5 released, and afterwards we do a couple >>> of meetings where we build this roadmap. Using that, we can - hopefully - >>> have a grounded discussion. >>> >>> Cheers, >>> Herman >>> >>> On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun <dongj...@apache.org> >>> wrote: >>> >>>> Thank you. I reviewed the threads, vote and result once more. >>>> >>>> I found that I missed the binding vote mark on Holden in the vote >>>> result email. The following should be "-0: Holden Karau *". Sorry for this >>>> mistake, Holden and all. >>>> >>>> > -0: Holden Karau >>>> >>>> To Hyukjin, I disagree with you at the following point because the >>>> thread started clearly with your and Sean's Apache Spark 4.0 requirement in >>>> order to move away from Scala 2.12. In addition, we also discussed another >>>> item (dropping Java 8) from other current dev thread. The vote scope and >>>> goal is clear and specific. >>>> >>>> > we're unclear on the picture of Spark 4.0.0. >>>> >>>> Instead of vote scope and result, what is really unclear is that what >>>> you propose here. If Xiao wants a preview, Xiao can propose the preview >>>> plan more. It's welcome. If you want to has many 4.0 dev ideas which are >>>> not exposed to the community yet. Please share them with the community. >>>> It's welcome, too. Apache Spark is open source community. If you don't >>>> share it, there is no way for us to know what you want. >>>> >>>> Dongjoon >>>> >>>> On 2023/06/19 04:31:46 Hyukjin Kwon wrote: >>>> > The major concerns raised in the thread were that we should initiate >>>> the >>>> > discussion for the below first: >>>> > - Apache Spark 4.0.0 Preview (and Dates) >>>> > - Apache Spark 4.0.0 Items >>>> > - Apache Spark 4.0.0 Plan Adjustment >>>> > >>>> > before setting the timeline for Spark 4.0.0 because we're unclear on >>>> the >>>> > picture of Spark 4.0.0. So discussing the timeline 4.0.0 first is the >>>> > opposite order procedurally. >>>> > The vote passed as a procedural issue, but I would prefer to consider >>>> this >>>> > as a tentative date, and should probably need another vote to adjust >>>> the >>>> > date considering the plans, preview dates, and items we aim for 4.0.0. >>>> > >>>> > >>>> > On Sat, 17 Jun 2023 at 04:33, Dongjoon Hyun <dongj...@apache.org> >>>> wrote: >>>> > >>>> > > This was a part of the following on-going discussions. >>>> > > >>>> > > 2023-05-28 Apache Spark 3.5.0 Expectations (?) >>>> > > https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0 >>>> > > >>>> > > 2023-05-30 Apache Spark 4.0 Timeframe? >>>> > > https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 >>>> > > >>>> > > 2023-06-05 ASF policy violation and Scala version issues >>>> > > https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb >>>> > > >>>> > > 2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024) >>>> > > https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb >>>> > > >>>> > > I'm looking forward to seeing the upcoming detailed discussions >>>> including >>>> > > the following >>>> > > - Apache Spark 4.0.0 Preview (and Dates) >>>> > > - Apache Spark 4.0.0 Items >>>> > > - Apache Spark 4.0.0 Plan Adjustment >>>> > > >>>> > > Please initiate the discussion. >>>> > > >>>> > > Thanks, >>>> > > Dongjoon. >>>> > > >>>> > > >>>> > > On 2023/06/16 19:30:42 Dongjoon Hyun wrote: >>>> > > > The vote passes with 6 +1s (4 binding +1s), one -0, and one -1. >>>> > > > Thank you all for your participation and >>>> > > > especially your additional comments during this voting, >>>> > > > Mridul, Hyukjin, and Jungtaek. >>>> > > > >>>> > > > (* = binding) >>>> > > > +1: >>>> > > > - Dongjoon Hyun * >>>> > > > - Huaxin Gao * >>>> > > > - Liang-Chi Hsieh * >>>> > > > - Kazuyuki Tanimura >>>> > > > - Chao Sun * >>>> > > > - Jia Fan >>>> > > > >>>> > > > -0: Holden Karau >>>> > > > >>>> > > > -1: Xiao Li * >>>> > > > >>>> > > >>>> > > >>>> --------------------------------------------------------------------- >>>> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> > > >>>> > > >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau