One big thing for 4.0 will be that pandas API on spark will support pandas
version 2.0

With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking
changes have been introduced. So, we have made the decision to postpone
addressing these breaking changes until the next major release of Spark,
version 4.0.0 to minimize disruptions for our users and provide a more
seamless upgrade experience.

The pandas 2.0.0 release includes a significant number of updates, such as
API removals, changes in API behavior, parameter removals, parameter
behavior changes, and bug fixes. We have planned the following approach for
each item:

- *API Removals*: Removed APIs will remain deprecated in Spark 3.5.0,
provide appropriate warnings, and will be removed in Spark 4.0.0.

- *API Behavior Changes*: APIs with changed behavior will retain the
behavior in Spark 3.5.0, provide appropriate warnings, and will align the
behavior with pandas in Spark 4.0.0.

- *Parameter Removals*: Removed parameters will remain deprecated in Spark
3.5.0, provide appropriate warnings, and will be removed in Spark 4.0.0.

- *Parameter Behavior Changes*: Parameters with changed behavior will
retain the behavior in Spark 3.5.0, provide appropriate warnings, and will
align the behavior with pandas in Spark 4.0.0.

- *Bug Fixes*: Bug fixes mainly related to correctness issues will be fixed
in pandas 3.5.0.

*To recap, all breaking changes related to pandas 2.0.0 will be supported
in Spark 4.0.0,* *and will remain deprecated with appropriate errors in
Spark 3.5.0.*


https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

tir. 20. juni 2023 kl. 06:18 skrev Dongjoon Hyun <dongj...@apache.org>:

> Hi, Herman.
>
> This is a series of discussions as I re-summarized here.
>
> You can find some context in the previous timeline thread.
>
> 2023-05-30 Apache Spark 4.0 Timeframe?
> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>
> Could you reply there to collect your timeline suggestions? We can discuss
> more there.
>
> Dongjoon.
>
>
>
> On Mon, Jun 19, 2023 at 1:58 PM Herman van Hovell <her...@databricks.com>
> wrote:
>
>> Dongjoon, I am not sure if I am not sure if I follow the line of thought
>> here.
>>
>> Multiple people have asked for clarification on what Spark 4.0 would mean
>> (Holden, Mridul, Jia & Xiao). You can - for the record - also add me to
>> this list. However you choose to single out Xiao because asks this question
>> and wants to do a preview release as well? So again, what does Spark 4
>> mean, and why does it need to take almost a year? Historically major Spark
>> releases tend to break APIs, but if it only entails changing to Scala 2.13
>> and dropping support for JDK 8, then we could also just release a month
>> after 3.5.
>>
>> How about we do this? We get 3.5 released, and afterwards we do a couple
>> of meetings where we build this roadmap. Using that, we can - hopefully -
>> have a grounded discussion.
>>
>> Cheers,
>> Herman
>>
>> On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun <dongj...@apache.org>
>> wrote:
>>
>>> Thank you. I reviewed the threads, vote and result once more.
>>>
>>> I found that I missed the binding vote mark on Holden in the vote result
>>> email. The following should be "-0: Holden Karau *". Sorry for this
>>> mistake, Holden and all.
>>>
>>> > -0: Holden Karau
>>>
>>> To Hyukjin, I disagree with you at the following point because the
>>> thread started clearly with your and Sean's Apache Spark 4.0 requirement in
>>> order to move away from Scala 2.12. In addition, we also discussed another
>>> item (dropping Java 8) from other current dev thread. The vote scope and
>>> goal is clear and specific.
>>>
>>> > we're unclear on the picture of Spark 4.0.0.
>>>
>>> Instead of vote scope and result, what is really unclear is that what
>>> you propose here. If Xiao wants a preview, Xiao can propose the preview
>>> plan more. It's welcome. If you want to has many 4.0 dev ideas which are
>>> not exposed to the community yet. Please share them with the community.
>>> It's welcome, too. Apache Spark is open source community. If you don't
>>> share it, there is no way for us to know what you want.
>>>
>>> Dongjoon
>>>
>>> On 2023/06/19 04:31:46 Hyukjin Kwon wrote:
>>> > The major concerns raised in the thread were that we should initiate
>>> the
>>> > discussion for the below first:
>>> > - Apache Spark 4.0.0 Preview (and Dates)
>>> > - Apache Spark 4.0.0 Items
>>> > - Apache Spark 4.0.0 Plan Adjustment
>>> >
>>> > before setting the timeline for Spark 4.0.0 because we're unclear on
>>> the
>>> > picture of Spark 4.0.0. So discussing the timeline 4.0.0 first is the
>>> > opposite order procedurally.
>>> > The vote passed as a procedural issue, but I would prefer to consider
>>> this
>>> > as a tentative date, and should probably need another vote to adjust
>>> the
>>> > date considering the plans, preview dates, and items we aim for 4.0.0.
>>> >
>>> >
>>> > On Sat, 17 Jun 2023 at 04:33, Dongjoon Hyun <dongj...@apache.org>
>>> wrote:
>>> >
>>> > > This was a part of the following on-going discussions.
>>> > >
>>> > > 2023-05-28  Apache Spark 3.5.0 Expectations (?)
>>> > > https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
>>> > >
>>> > > 2023-05-30 Apache Spark 4.0 Timeframe?
>>> > > https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>>> > >
>>> > > 2023-06-05 ASF policy violation and Scala version issues
>>> > > https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb
>>> > >
>>> > > 2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)
>>> > > https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb
>>> > >
>>> > > I'm looking forward to seeing the upcoming detailed discussions
>>> including
>>> > > the following
>>> > > - Apache Spark 4.0.0 Preview (and Dates)
>>> > > - Apache Spark 4.0.0 Items
>>> > > - Apache Spark 4.0.0 Plan Adjustment
>>> > >
>>> > > Please initiate the discussion.
>>> > >
>>> > > Thanks,
>>> > > Dongjoon.
>>> > >
>>> > >
>>> > > On 2023/06/16 19:30:42 Dongjoon Hyun wrote:
>>> > > > The vote passes with 6 +1s (4 binding +1s), one -0, and one -1.
>>> > > > Thank you all for your participation and
>>> > > > especially your additional comments during this voting,
>>> > > > Mridul, Hyukjin, and Jungtaek.
>>> > > >
>>> > > > (* = binding)
>>> > > > +1:
>>> > > > - Dongjoon Hyun *
>>> > > > - Huaxin Gao *
>>> > > > - Liang-Chi Hsieh *
>>> > > > - Kazuyuki Tanimura
>>> > > > - Chao Sun *
>>> > > > - Jia Fan
>>> > > >
>>> > > > -0: Holden Karau
>>> > > >
>>> > > > -1: Xiao Li *
>>> > > >
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >
>>> > >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Reply via email to