That seems like a really good reason for a major version change given the %
of PySpark users and the fact we are (effectively) tied to pandas APIs.

On Tue, Jun 20, 2023 at 12:24 PM Bjørn Jørgensen <bjornjorgen...@gmail.com>
wrote:

> One big thing for 4.0 will be that pandas API on spark will support pandas
> version 2.0
>
> With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking
> changes have been introduced. So, we have made the decision to postpone
> addressing these breaking changes until the next major release of Spark,
> version 4.0.0 to minimize disruptions for our users and provide a more
> seamless upgrade experience.
>
> The pandas 2.0.0 release includes a significant number of updates, such as
> API removals, changes in API behavior, parameter removals, parameter
> behavior changes, and bug fixes. We have planned the following approach for
> each item:
>
> - *API Removals*: Removed APIs will remain deprecated in Spark 3.5.0,
> provide appropriate warnings, and will be removed in Spark 4.0.0.
>
> - *API Behavior Changes*: APIs with changed behavior will retain the
> behavior in Spark 3.5.0, provide appropriate warnings, and will align the
> behavior with pandas in Spark 4.0.0.
>
> - *Parameter Removals*: Removed parameters will remain deprecated in
> Spark 3.5.0, provide appropriate warnings, and will be removed in Spark
> 4.0.0.
>
> - *Parameter Behavior Changes*: Parameters with changed behavior will
> retain the behavior in Spark 3.5.0, provide appropriate warnings, and will
> align the behavior with pandas in Spark 4.0.0.
>
> - *Bug Fixes*: Bug fixes mainly related to correctness issues will be
> fixed in pandas 3.5.0.
>
> *To recap, all breaking changes related to pandas 2.0.0 will be supported
> in Spark 4.0.0,* *and will remain deprecated with appropriate errors in
> Spark 3.5.0.*
>
>
>
> https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
>
> tir. 20. juni 2023 kl. 06:18 skrev Dongjoon Hyun <dongj...@apache.org>:
>
>> Hi, Herman.
>>
>> This is a series of discussions as I re-summarized here.
>>
>> You can find some context in the previous timeline thread.
>>
>> 2023-05-30 Apache Spark 4.0 Timeframe?
>> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>>
>> Could you reply there to collect your timeline suggestions? We can
>> discuss more there.
>>
>> Dongjoon.
>>
>>
>>
>> On Mon, Jun 19, 2023 at 1:58 PM Herman van Hovell <her...@databricks.com>
>> wrote:
>>
>>> Dongjoon, I am not sure if I am not sure if I follow the line of thought
>>> here.
>>>
>>> Multiple people have asked for clarification on what Spark 4.0 would
>>> mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me
>>> to this list. However you choose to single out Xiao because asks this
>>> question and wants to do a preview release as well? So again, what does
>>> Spark 4 mean, and why does it need to take almost a year? Historically
>>> major Spark releases tend to break APIs, but if it only entails changing to
>>> Scala 2.13 and dropping support for JDK 8, then we could also just release
>>> a month after 3.5.
>>>
>>> How about we do this? We get 3.5 released, and afterwards we do a couple
>>> of meetings where we build this roadmap. Using that, we can - hopefully -
>>> have a grounded discussion.
>>>
>>> Cheers,
>>> Herman
>>>
>>> On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun <dongj...@apache.org>
>>> wrote:
>>>
>>>> Thank you. I reviewed the threads, vote and result once more.
>>>>
>>>> I found that I missed the binding vote mark on Holden in the vote
>>>> result email. The following should be "-0: Holden Karau *". Sorry for this
>>>> mistake, Holden and all.
>>>>
>>>> > -0: Holden Karau
>>>>
>>>> To Hyukjin, I disagree with you at the following point because the
>>>> thread started clearly with your and Sean's Apache Spark 4.0 requirement in
>>>> order to move away from Scala 2.12. In addition, we also discussed another
>>>> item (dropping Java 8) from other current dev thread. The vote scope and
>>>> goal is clear and specific.
>>>>
>>>> > we're unclear on the picture of Spark 4.0.0.
>>>>
>>>> Instead of vote scope and result, what is really unclear is that what
>>>> you propose here. If Xiao wants a preview, Xiao can propose the preview
>>>> plan more. It's welcome. If you want to has many 4.0 dev ideas which are
>>>> not exposed to the community yet. Please share them with the community.
>>>> It's welcome, too. Apache Spark is open source community. If you don't
>>>> share it, there is no way for us to know what you want.
>>>>
>>>> Dongjoon
>>>>
>>>> On 2023/06/19 04:31:46 Hyukjin Kwon wrote:
>>>> > The major concerns raised in the thread were that we should initiate
>>>> the
>>>> > discussion for the below first:
>>>> > - Apache Spark 4.0.0 Preview (and Dates)
>>>> > - Apache Spark 4.0.0 Items
>>>> > - Apache Spark 4.0.0 Plan Adjustment
>>>> >
>>>> > before setting the timeline for Spark 4.0.0 because we're unclear on
>>>> the
>>>> > picture of Spark 4.0.0. So discussing the timeline 4.0.0 first is the
>>>> > opposite order procedurally.
>>>> > The vote passed as a procedural issue, but I would prefer to consider
>>>> this
>>>> > as a tentative date, and should probably need another vote to adjust
>>>> the
>>>> > date considering the plans, preview dates, and items we aim for 4.0.0.
>>>> >
>>>> >
>>>> > On Sat, 17 Jun 2023 at 04:33, Dongjoon Hyun <dongj...@apache.org>
>>>> wrote:
>>>> >
>>>> > > This was a part of the following on-going discussions.
>>>> > >
>>>> > > 2023-05-28  Apache Spark 3.5.0 Expectations (?)
>>>> > > https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
>>>> > >
>>>> > > 2023-05-30 Apache Spark 4.0 Timeframe?
>>>> > > https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>>>> > >
>>>> > > 2023-06-05 ASF policy violation and Scala version issues
>>>> > > https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb
>>>> > >
>>>> > > 2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)
>>>> > > https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb
>>>> > >
>>>> > > I'm looking forward to seeing the upcoming detailed discussions
>>>> including
>>>> > > the following
>>>> > > - Apache Spark 4.0.0 Preview (and Dates)
>>>> > > - Apache Spark 4.0.0 Items
>>>> > > - Apache Spark 4.0.0 Plan Adjustment
>>>> > >
>>>> > > Please initiate the discussion.
>>>> > >
>>>> > > Thanks,
>>>> > > Dongjoon.
>>>> > >
>>>> > >
>>>> > > On 2023/06/16 19:30:42 Dongjoon Hyun wrote:
>>>> > > > The vote passes with 6 +1s (4 binding +1s), one -0, and one -1.
>>>> > > > Thank you all for your participation and
>>>> > > > especially your additional comments during this voting,
>>>> > > > Mridul, Hyukjin, and Jungtaek.
>>>> > > >
>>>> > > > (* = binding)
>>>> > > > +1:
>>>> > > > - Dongjoon Hyun *
>>>> > > > - Huaxin Gao *
>>>> > > > - Liang-Chi Hsieh *
>>>> > > > - Kazuyuki Tanimura
>>>> > > > - Chao Sun *
>>>> > > > - Jia Fan
>>>> > > >
>>>> > > > -0: Holden Karau
>>>> > > >
>>>> > > > -1: Xiao Li *
>>>> > > >
>>>> > >
>>>> > >
>>>> ---------------------------------------------------------------------
>>>> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>> > >
>>>> > >
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to