Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-20 Thread yangjie01
+1


在 2023/6/21 13:20,“L. C. Hsieh”mailto:vii...@gmail.com>> 写入:


+1


On Tue, Jun 20, 2023 at 8:48 PM Dongjoon Hyun mailto:dongj...@apache.org>> wrote:
>
> +1
>
> Dongjoon
>
> On 2023/06/20 02:51:32 Jia Fan wrote:
> > +1
> >
> > Dongjoon Hyun mailto:dongj...@apache.org>> 
> > 于2023年6月20日周二 10:41写道:
> >
> > > Please vote on releasing the following candidate as Apache Spark version
> > > 3.4.1.
> > >
> > > The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC
> > > votes are cast, with a minimum of 3 +1 votes.
> > >
> > > [ ] +1 Release this package as Apache Spark 3.4.1
> > > [ ] -1 Do not release this package because ...
> > >
> > > To learn more about Apache Spark, please see https://spark.apache.org/ 
> > > 
> > >
> > > The tag to be voted on is v3.4.1-rc1 (commit
> > > 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
> > > https://github.com/apache/spark/tree/v3.4.1-rc1 
> > > 
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/ 
> > > 
> > >
> > > Signatures used for Spark RCs can be found in this file:
> > > https://dist.apache.org/repos/dist/dev/spark/KEYS 
> > > 
> > >
> > > The staging repository for this release can be found at:
> > > https://repository.apache.org/content/repositories/orgapachespark-1443/ 
> > > 
> > >
> > > The documentation corresponding to this release can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/ 
> > > 
> > >
> > > The list of bug fixes going into 3.4.1 can be found at the following URL:
> > > https://issues.apache.org/jira/projects/SPARK/versions/12352874 
> > > 
> > >
> > > This release is using the release script of the tag v3.4.1-rc1.
> > >
> > > FAQ
> > >
> > > =
> > > How can I help test this release?
> > > =
> > >
> > > If you are a Spark user, you can help us test this release by taking
> > > an existing Spark workload and running on this release candidate, then
> > > reporting any regressions.
> > >
> > > If you're working in PySpark you can set up a virtual env and install
> > > the current RC and see if anything important breaks, in the Java/Scala
> > > you can add the staging repository to your projects resolvers and test
> > > with the RC (make sure to clean up the artifact cache before/after so
> > > you don't end up building with a out of date RC going forward).
> > >
> > > ===
> > > What should happen to JIRA tickets still targeting 3.4.1?
> > > ===
> > >
> > > The current list of open tickets targeted at 3.4.1 can be found at:
> > > https://issues.apache.org/jira/projects/SPARK 
> > >  and search for "Target
> > > Version/s" = 3.4.1
> > >
> > > Committers should look at those and triage. Extremely important bug
> > > fixes, documentation, and API tweaks that impact compatibility should
> > > be worked on immediately. Everything else please retarget to an
> > > appropriate release.
> > >
> > > ==
> > > But my bug isn't fixed?
> > > ==
> > >
> > > In order to make timely releases, we will typically not hold the
> > > release unless the bug in question is a regression from the previous
> > > release. That being said, if there is something which is a regression
> > > that has not been correctly targeted please ping me or a committer to
> > > help target the issue.
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
>


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 








Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-20 Thread L. C. Hsieh
+1

On Tue, Jun 20, 2023 at 8:48 PM Dongjoon Hyun  wrote:
>
> +1
>
> Dongjoon
>
> On 2023/06/20 02:51:32 Jia Fan wrote:
> > +1
> >
> > Dongjoon Hyun  于2023年6月20日周二 10:41写道:
> >
> > > Please vote on releasing the following candidate as Apache Spark version
> > > 3.4.1.
> > >
> > > The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC
> > > votes are cast, with a minimum of 3 +1 votes.
> > >
> > > [ ] +1 Release this package as Apache Spark 3.4.1
> > > [ ] -1 Do not release this package because ...
> > >
> > > To learn more about Apache Spark, please see https://spark.apache.org/
> > >
> > > The tag to be voted on is v3.4.1-rc1 (commit
> > > 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
> > > https://github.com/apache/spark/tree/v3.4.1-rc1
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/
> > >
> > > Signatures used for Spark RCs can be found in this file:
> > > https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >
> > > The staging repository for this release can be found at:
> > > https://repository.apache.org/content/repositories/orgapachespark-1443/
> > >
> > > The documentation corresponding to this release can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/
> > >
> > > The list of bug fixes going into 3.4.1 can be found at the following URL:
> > > https://issues.apache.org/jira/projects/SPARK/versions/12352874
> > >
> > > This release is using the release script of the tag v3.4.1-rc1.
> > >
> > > FAQ
> > >
> > > =
> > > How can I help test this release?
> > > =
> > >
> > > If you are a Spark user, you can help us test this release by taking
> > > an existing Spark workload and running on this release candidate, then
> > > reporting any regressions.
> > >
> > > If you're working in PySpark you can set up a virtual env and install
> > > the current RC and see if anything important breaks, in the Java/Scala
> > > you can add the staging repository to your projects resolvers and test
> > > with the RC (make sure to clean up the artifact cache before/after so
> > > you don't end up building with a out of date RC going forward).
> > >
> > > ===
> > > What should happen to JIRA tickets still targeting 3.4.1?
> > > ===
> > >
> > > The current list of open tickets targeted at 3.4.1 can be found at:
> > > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > > Version/s" = 3.4.1
> > >
> > > Committers should look at those and triage. Extremely important bug
> > > fixes, documentation, and API tweaks that impact compatibility should
> > > be worked on immediately. Everything else please retarget to an
> > > appropriate release.
> > >
> > > ==
> > > But my bug isn't fixed?
> > > ==
> > >
> > > In order to make timely releases, we will typically not hold the
> > > release unless the bug in question is a regression from the previous
> > > release. That being said, if there is something which is a regression
> > > that has not been correctly targeted please ping me or a committer to
> > > help target the issue.
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-20 Thread Dongjoon Hyun
+1

Dongjoon

On 2023/06/20 02:51:32 Jia Fan wrote:
> +1
> 
> Dongjoon Hyun  于2023年6月20日周二 10:41写道:
> 
> > Please vote on releasing the following candidate as Apache Spark version
> > 3.4.1.
> >
> > The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC
> > votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.4.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v3.4.1-rc1 (commit
> > 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
> > https://github.com/apache/spark/tree/v3.4.1-rc1
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1443/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/
> >
> > The list of bug fixes going into 3.4.1 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12352874
> >
> > This release is using the release script of the tag v3.4.1-rc1.
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.4.1?
> > ===
> >
> > The current list of open tickets targeted at 3.4.1 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> > Version/s" = 3.4.1
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 4.0.0 Dev Item Planning (SPARK-44111)

2023-06-20 Thread Dongjoon Hyun
Thank you, Yuming.

Please update SPARK-44111 by adding links to those JIRA for visibility.
Otherwise, we may miss them during the up-coming discussion.

Dongjoon.



On Tue, Jun 20, 2023 at 6:40 PM Yuming Wang  wrote:

> Thank you Dongjoon. I'd like to add these items.
>
> *Support for more SQL syntax*
> SPARK-31561  Add
> QUALIFY clause
> SPARK-24497  Support
> recursive SQL
> SPARK-32064  Support
> temporary table
>
> *Improve Query performance in specific scenarios*
> SPARK-8682  Range Join
> for Spark SQL. We have a blog in Chinese
>  about this
> optimization.
> SPARK-38506  Push
> partial aggregation through join
>
>
> On Wed, Jun 21, 2023 at 4:42 AM Dongjoon Hyun  wrote:
>
>> Hi, All.
>>
>> As a continuation of our previous discussion, the official Apache Spark
>> 4.0 Plan JIRA is created today in order to collect the community dev items.
>> Feel free to add your work items, ideas, suggestions, aspirations and
>> interests. We will moderate together.
>>
>> https://issues.apache.org/jira/browse/SPARK-44111
>> Prepare Apache Spark 4.0.0
>>
>> In addition, we are going to include all left-over items which Apache
>> Spark 3.5 cannot include on July 16th (Feature Freeze,
>> https://spark.apache.org/versioning-policy.html)
>>
>>
>>
>> === PREVIOUS THREADS ===
>>
>> 2023-05-28 Apache Spark 3.5.0 Expectations (?)
>> https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
>>
>> 2023-05-30 Apache Spark 4.0 Timeframe?
>> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>>
>> 2023-06-05 ASF policy violation and Scala version issues
>> https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb
>>
>> 2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)
>> https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb
>>
>> 2023-06-16 [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)
>> https://lists.apache.org/thread/5vfof0nm82gt5b2k2o0ws944hofz232g
>>
>


Re: Apache Spark 4.0.0 Dev Item Planning (SPARK-44111)

2023-06-20 Thread Yuming Wang
Thank you Dongjoon. I'd like to add these items.

*Support for more SQL syntax*
SPARK-31561  Add QUALIFY
clause
SPARK-24497  Support
recursive SQL
SPARK-32064  Support
temporary table

*Improve Query performance in specific scenarios*
SPARK-8682  Range Join
for Spark SQL. We have a blog in Chinese
 about this optimization.
SPARK-38506  Push
partial aggregation through join


On Wed, Jun 21, 2023 at 4:42 AM Dongjoon Hyun  wrote:

> Hi, All.
>
> As a continuation of our previous discussion, the official Apache Spark
> 4.0 Plan JIRA is created today in order to collect the community dev items.
> Feel free to add your work items, ideas, suggestions, aspirations and
> interests. We will moderate together.
>
> https://issues.apache.org/jira/browse/SPARK-44111
> Prepare Apache Spark 4.0.0
>
> In addition, we are going to include all left-over items which Apache
> Spark 3.5 cannot include on July 16th (Feature Freeze,
> https://spark.apache.org/versioning-policy.html)
>
>
>
> === PREVIOUS THREADS ===
>
> 2023-05-28 Apache Spark 3.5.0 Expectations (?)
> https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
>
> 2023-05-30 Apache Spark 4.0 Timeframe?
> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>
> 2023-06-05 ASF policy violation and Scala version issues
> https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb
>
> 2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)
> https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb
>
> 2023-06-16 [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)
> https://lists.apache.org/thread/5vfof0nm82gt5b2k2o0ws944hofz232g
>


Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Dongjoon Hyun
Ya, it sounds like that. Could you link those items to the following JIRA?

https://issues.apache.org/jira/browse/SPARK-44111 Prepare Apache Spark 4.0.0

Dongjoon.



On Tue, Jun 20, 2023 at 12:45 PM Holden Karau  wrote:

> That seems like a really good reason for a major version change given the
> % of PySpark users and the fact we are (effectively) tied to pandas APIs.
>
> On Tue, Jun 20, 2023 at 12:24 PM Bjørn Jørgensen 
> wrote:
>
>> One big thing for 4.0 will be that pandas API on spark will support
>> pandas version 2.0
>>
>> With the major release of pandas 2.0.0 on April 3, 2023, numerous
>> breaking changes have been introduced. So, we have made the decision to
>> postpone addressing these breaking changes until the next major release of
>> Spark, version 4.0.0 to minimize disruptions for our users and provide a
>> more seamless upgrade experience.
>>
>> The pandas 2.0.0 release includes a significant number of updates, such
>> as API removals, changes in API behavior, parameter removals, parameter
>> behavior changes, and bug fixes. We have planned the following approach for
>> each item:
>>
>> - *API Removals*: Removed APIs will remain deprecated in Spark 3.5.0,
>> provide appropriate warnings, and will be removed in Spark 4.0.0.
>>
>> - *API Behavior Changes*: APIs with changed behavior will retain the
>> behavior in Spark 3.5.0, provide appropriate warnings, and will align the
>> behavior with pandas in Spark 4.0.0.
>>
>> - *Parameter Removals*: Removed parameters will remain deprecated in
>> Spark 3.5.0, provide appropriate warnings, and will be removed in Spark
>> 4.0.0.
>>
>> - *Parameter Behavior Changes*: Parameters with changed behavior will
>> retain the behavior in Spark 3.5.0, provide appropriate warnings, and will
>> align the behavior with pandas in Spark 4.0.0.
>>
>> - *Bug Fixes*: Bug fixes mainly related to correctness issues will be
>> fixed in pandas 3.5.0.
>>
>> *To recap, all breaking changes related to pandas 2.0.0 will be supported
>> in Spark 4.0.0,* *and will remain deprecated with appropriate errors in
>> Spark 3.5.0.*
>>
>>
>>
>> https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
>>
>> tir. 20. juni 2023 kl. 06:18 skrev Dongjoon Hyun :
>>
>>> Hi, Herman.
>>>
>>> This is a series of discussions as I re-summarized here.
>>>
>>> You can find some context in the previous timeline thread.
>>>
>>> 2023-05-30 Apache Spark 4.0 Timeframe?
>>> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>>>
>>> Could you reply there to collect your timeline suggestions? We can
>>> discuss more there.
>>>
>>> Dongjoon.
>>>
>>>
>>>
>>> On Mon, Jun 19, 2023 at 1:58 PM Herman van Hovell 
>>> wrote:
>>>
 Dongjoon, I am not sure if I am not sure if I follow the line of
 thought here.

 Multiple people have asked for clarification on what Spark 4.0 would
 mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me
 to this list. However you choose to single out Xiao because asks this
 question and wants to do a preview release as well? So again, what does
 Spark 4 mean, and why does it need to take almost a year? Historically
 major Spark releases tend to break APIs, but if it only entails changing to
 Scala 2.13 and dropping support for JDK 8, then we could also just release
 a month after 3.5.

 How about we do this? We get 3.5 released, and afterwards we do a
 couple of meetings where we build this roadmap. Using that, we can -
 hopefully - have a grounded discussion.

 Cheers,
 Herman

 On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun 
 wrote:

> Thank you. I reviewed the threads, vote and result once more.
>
> I found that I missed the binding vote mark on Holden in the vote
> result email. The following should be "-0: Holden Karau *". Sorry for this
> mistake, Holden and all.
>
> > -0: Holden Karau
>
> To Hyukjin, I disagree with you at the following point because the
> thread started clearly with your and Sean's Apache Spark 4.0 requirement 
> in
> order to move away from Scala 2.12. In addition, we also discussed another
> item (dropping Java 8) from other current dev thread. The vote scope and
> goal is clear and specific.
>
> > we're unclear on the picture of Spark 4.0.0.
>
> Instead of vote scope and result, what is really unclear is that what
> you propose here. If Xiao wants a preview, Xiao can propose the preview
> plan more. It's welcome. If you want to has many 4.0 dev ideas which are
> not exposed to the community yet. Please share them with the community.
> It's welcome, too. Apache Spark is open source community. If you don't
> share it, there is no way for us to know what you want.
>
> Dongjoon
>
> On 2023/06/19 04:31:46 Hyukjin Kwon wrote:
> > The major concerns raised in the thread were 

Apache Spark 4.0.0 Dev Item Planning (SPARK-44111)

2023-06-20 Thread Dongjoon Hyun
Hi, All.

As a continuation of our previous discussion, the official Apache Spark 4.0
Plan JIRA is created today in order to collect the community dev items.
Feel free to add your work items, ideas, suggestions, aspirations and
interests. We will moderate together.

https://issues.apache.org/jira/browse/SPARK-44111
Prepare Apache Spark 4.0.0

In addition, we are going to include all left-over items which Apache Spark
3.5 cannot include on July 16th (Feature Freeze,
https://spark.apache.org/versioning-policy.html)



=== PREVIOUS THREADS ===

2023-05-28 Apache Spark 3.5.0 Expectations (?)
https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0

2023-05-30 Apache Spark 4.0 Timeframe?
https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6

2023-06-05 ASF policy violation and Scala version issues
https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb

2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)
https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb

2023-06-16 [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)
https://lists.apache.org/thread/5vfof0nm82gt5b2k2o0ws944hofz232g


Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Holden Karau
That seems like a really good reason for a major version change given the %
of PySpark users and the fact we are (effectively) tied to pandas APIs.

On Tue, Jun 20, 2023 at 12:24 PM Bjørn Jørgensen 
wrote:

> One big thing for 4.0 will be that pandas API on spark will support pandas
> version 2.0
>
> With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking
> changes have been introduced. So, we have made the decision to postpone
> addressing these breaking changes until the next major release of Spark,
> version 4.0.0 to minimize disruptions for our users and provide a more
> seamless upgrade experience.
>
> The pandas 2.0.0 release includes a significant number of updates, such as
> API removals, changes in API behavior, parameter removals, parameter
> behavior changes, and bug fixes. We have planned the following approach for
> each item:
>
> - *API Removals*: Removed APIs will remain deprecated in Spark 3.5.0,
> provide appropriate warnings, and will be removed in Spark 4.0.0.
>
> - *API Behavior Changes*: APIs with changed behavior will retain the
> behavior in Spark 3.5.0, provide appropriate warnings, and will align the
> behavior with pandas in Spark 4.0.0.
>
> - *Parameter Removals*: Removed parameters will remain deprecated in
> Spark 3.5.0, provide appropriate warnings, and will be removed in Spark
> 4.0.0.
>
> - *Parameter Behavior Changes*: Parameters with changed behavior will
> retain the behavior in Spark 3.5.0, provide appropriate warnings, and will
> align the behavior with pandas in Spark 4.0.0.
>
> - *Bug Fixes*: Bug fixes mainly related to correctness issues will be
> fixed in pandas 3.5.0.
>
> *To recap, all breaking changes related to pandas 2.0.0 will be supported
> in Spark 4.0.0,* *and will remain deprecated with appropriate errors in
> Spark 3.5.0.*
>
>
>
> https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
>
> tir. 20. juni 2023 kl. 06:18 skrev Dongjoon Hyun :
>
>> Hi, Herman.
>>
>> This is a series of discussions as I re-summarized here.
>>
>> You can find some context in the previous timeline thread.
>>
>> 2023-05-30 Apache Spark 4.0 Timeframe?
>> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>>
>> Could you reply there to collect your timeline suggestions? We can
>> discuss more there.
>>
>> Dongjoon.
>>
>>
>>
>> On Mon, Jun 19, 2023 at 1:58 PM Herman van Hovell 
>> wrote:
>>
>>> Dongjoon, I am not sure if I am not sure if I follow the line of thought
>>> here.
>>>
>>> Multiple people have asked for clarification on what Spark 4.0 would
>>> mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me
>>> to this list. However you choose to single out Xiao because asks this
>>> question and wants to do a preview release as well? So again, what does
>>> Spark 4 mean, and why does it need to take almost a year? Historically
>>> major Spark releases tend to break APIs, but if it only entails changing to
>>> Scala 2.13 and dropping support for JDK 8, then we could also just release
>>> a month after 3.5.
>>>
>>> How about we do this? We get 3.5 released, and afterwards we do a couple
>>> of meetings where we build this roadmap. Using that, we can - hopefully -
>>> have a grounded discussion.
>>>
>>> Cheers,
>>> Herman
>>>
>>> On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun 
>>> wrote:
>>>
 Thank you. I reviewed the threads, vote and result once more.

 I found that I missed the binding vote mark on Holden in the vote
 result email. The following should be "-0: Holden Karau *". Sorry for this
 mistake, Holden and all.

 > -0: Holden Karau

 To Hyukjin, I disagree with you at the following point because the
 thread started clearly with your and Sean's Apache Spark 4.0 requirement in
 order to move away from Scala 2.12. In addition, we also discussed another
 item (dropping Java 8) from other current dev thread. The vote scope and
 goal is clear and specific.

 > we're unclear on the picture of Spark 4.0.0.

 Instead of vote scope and result, what is really unclear is that what
 you propose here. If Xiao wants a preview, Xiao can propose the preview
 plan more. It's welcome. If you want to has many 4.0 dev ideas which are
 not exposed to the community yet. Please share them with the community.
 It's welcome, too. Apache Spark is open source community. If you don't
 share it, there is no way for us to know what you want.

 Dongjoon

 On 2023/06/19 04:31:46 Hyukjin Kwon wrote:
 > The major concerns raised in the thread were that we should initiate
 the
 > discussion for the below first:
 > - Apache Spark 4.0.0 Preview (and Dates)
 > - Apache Spark 4.0.0 Items
 > - Apache Spark 4.0.0 Plan Adjustment
 >
 > before setting the timeline for Spark 4.0.0 because we're unclear on
 the
 > picture of Spark 4.0.0. So discussing the timeline 

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Bjørn Jørgensen
One big thing for 4.0 will be that pandas API on spark will support pandas
version 2.0

With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking
changes have been introduced. So, we have made the decision to postpone
addressing these breaking changes until the next major release of Spark,
version 4.0.0 to minimize disruptions for our users and provide a more
seamless upgrade experience.

The pandas 2.0.0 release includes a significant number of updates, such as
API removals, changes in API behavior, parameter removals, parameter
behavior changes, and bug fixes. We have planned the following approach for
each item:

- *API Removals*: Removed APIs will remain deprecated in Spark 3.5.0,
provide appropriate warnings, and will be removed in Spark 4.0.0.

- *API Behavior Changes*: APIs with changed behavior will retain the
behavior in Spark 3.5.0, provide appropriate warnings, and will align the
behavior with pandas in Spark 4.0.0.

- *Parameter Removals*: Removed parameters will remain deprecated in Spark
3.5.0, provide appropriate warnings, and will be removed in Spark 4.0.0.

- *Parameter Behavior Changes*: Parameters with changed behavior will
retain the behavior in Spark 3.5.0, provide appropriate warnings, and will
align the behavior with pandas in Spark 4.0.0.

- *Bug Fixes*: Bug fixes mainly related to correctness issues will be fixed
in pandas 3.5.0.

*To recap, all breaking changes related to pandas 2.0.0 will be supported
in Spark 4.0.0,* *and will remain deprecated with appropriate errors in
Spark 3.5.0.*


https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

tir. 20. juni 2023 kl. 06:18 skrev Dongjoon Hyun :

> Hi, Herman.
>
> This is a series of discussions as I re-summarized here.
>
> You can find some context in the previous timeline thread.
>
> 2023-05-30 Apache Spark 4.0 Timeframe?
> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
>
> Could you reply there to collect your timeline suggestions? We can discuss
> more there.
>
> Dongjoon.
>
>
>
> On Mon, Jun 19, 2023 at 1:58 PM Herman van Hovell 
> wrote:
>
>> Dongjoon, I am not sure if I am not sure if I follow the line of thought
>> here.
>>
>> Multiple people have asked for clarification on what Spark 4.0 would mean
>> (Holden, Mridul, Jia & Xiao). You can - for the record - also add me to
>> this list. However you choose to single out Xiao because asks this question
>> and wants to do a preview release as well? So again, what does Spark 4
>> mean, and why does it need to take almost a year? Historically major Spark
>> releases tend to break APIs, but if it only entails changing to Scala 2.13
>> and dropping support for JDK 8, then we could also just release a month
>> after 3.5.
>>
>> How about we do this? We get 3.5 released, and afterwards we do a couple
>> of meetings where we build this roadmap. Using that, we can - hopefully -
>> have a grounded discussion.
>>
>> Cheers,
>> Herman
>>
>> On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you. I reviewed the threads, vote and result once more.
>>>
>>> I found that I missed the binding vote mark on Holden in the vote result
>>> email. The following should be "-0: Holden Karau *". Sorry for this
>>> mistake, Holden and all.
>>>
>>> > -0: Holden Karau
>>>
>>> To Hyukjin, I disagree with you at the following point because the
>>> thread started clearly with your and Sean's Apache Spark 4.0 requirement in
>>> order to move away from Scala 2.12. In addition, we also discussed another
>>> item (dropping Java 8) from other current dev thread. The vote scope and
>>> goal is clear and specific.
>>>
>>> > we're unclear on the picture of Spark 4.0.0.
>>>
>>> Instead of vote scope and result, what is really unclear is that what
>>> you propose here. If Xiao wants a preview, Xiao can propose the preview
>>> plan more. It's welcome. If you want to has many 4.0 dev ideas which are
>>> not exposed to the community yet. Please share them with the community.
>>> It's welcome, too. Apache Spark is open source community. If you don't
>>> share it, there is no way for us to know what you want.
>>>
>>> Dongjoon
>>>
>>> On 2023/06/19 04:31:46 Hyukjin Kwon wrote:
>>> > The major concerns raised in the thread were that we should initiate
>>> the
>>> > discussion for the below first:
>>> > - Apache Spark 4.0.0 Preview (and Dates)
>>> > - Apache Spark 4.0.0 Items
>>> > - Apache Spark 4.0.0 Plan Adjustment
>>> >
>>> > before setting the timeline for Spark 4.0.0 because we're unclear on
>>> the
>>> > picture of Spark 4.0.0. So discussing the timeline 4.0.0 first is the
>>> > opposite order procedurally.
>>> > The vote passed as a procedural issue, but I would prefer to consider
>>> this
>>> > as a tentative date, and should probably need another vote to adjust
>>> the
>>> > date considering the plans, preview dates, and items we aim for 4.0.0.
>>> >
>>> >
>>> > On Sat, 17 Jun 2023 at 04:33, Dongjoon 

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-20 Thread Wenchen Fan
In an ideal world, every data source you want to connect to already has a
Spark data source implementation (either v1 or v2), then this Python API is
useless. But I feel it's common that people want to do quick data
exploration, and the target data system is not popular enough to have an
existing Spark data source implementation. It will be useful if people can
quickly implement a Spark data source using their favorite Python language.

I'm +1 to this proposal, assuming that we will keep it simple and won't
copy all the complicated features we built in DS v2 to this new Python API.

On Tue, Jun 20, 2023 at 2:11 PM Maciej  wrote:

> Similarly to Jacek, I feel it fails to document an actual community need
> for such a feature.
>
> Currently, any data source implementation has the potential to benefit
> Spark users across all supported and third-party clients.  For generally
> available sources, this is advantageous for the whole Spark community and
> avoids creating 1st and 2nd-tier citizens. This is even more important with
> new officially supported languages being added through connect.
>
> Instead, we might rather document in detail the process of implementing a
> new source using current APIs and work towards easily extensible or
> customizable sources, in case there is such a need.
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
>
> On 6/20/23 05:19, Hyukjin Kwon wrote:
>
> Actually I support this idea in a way that Python developers don't have to
> learn Scala to write their own source (and separate packaging).
> This is more crucial especially when you want to write a simple data
> source that interacts with the Python ecosystem.
>
> On Tue, 20 Jun 2023 at 03:08, Denny Lee  wrote:
>
>> Slightly biased, but per my conversations - this would be awesome to
>> have!
>>
>> On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari 
>> wrote:
>>
>>> I would definitely use it - is it's available :)
>>>
>>> On Mon, 19 Jun 2023, 21:56 Jacek Laskowski,  wrote:
>>>
 Hi Allison and devs,

 Although I was against this idea at first sight (probably because I'm a
 Scala dev), I think it could work as long as there are people who'd be
 interested in such an API. Were there any? I'm just curious. I've seen no
 emails requesting it.

 I also doubt that Python devs would like to work on new data sources
 but support their wishes wholeheartedly :)

 Pozdrawiam,
 Jacek Laskowski
 
 "The Internals Of" Online Books 
 Follow me on https://twitter.com/jaceklaskowski

 


 On Fri, Jun 16, 2023 at 6:14 AM Allison Wang
 
  wrote:

> Hi everyone,
>
> I would like to start a discussion on “Python Data Source API”.
>
> This proposal aims to introduce a simple API in Python for Data
> Sources. The idea is to enable Python developers to create data sources
> without having to learn Scala or deal with the complexities of the current
> data source APIs. The goal is to make a Python-based API that is simple 
> and
> easy to use, thus making Spark more accessible to the wider Python
> developer community. This proposed approach is based on the recently
> introduced Python user-defined table functions with extensions to support
> data sources.
>
> *SPIP Doc*:
> https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing
>
> *SPIP JIRA*: https://issues.apache.org/jira/browse/SPARK-44076
>
> Looking forward to your feedback.
>
> Thanks,
> Allison
>

>
>


unsubscribe

2023-06-20 Thread Bhargava Sukkala
-- 
Thanks,
Bhargava Sukkala.
Cell no:216-278-1066
MS in Business Analytics,
Arizona State University.


Re: [VOTE] Apache Spark PMC asks Databricks to differentiate its Spark version string

2023-06-20 Thread Maciej

+0

A PMC member raised a justified concern regarding the Apache Spark 
trademark usage. Based on the linked discussion on @legal, that opinion 
seems to be weakly supported by the ASF Legal Affairs Assistant V.P.


As such, it shouldn't just be rejected, especially not because of our 
preference for discussion to be held in private or because it doesn't 
address a general case. In fact, given the special role that the alleged 
violator has in the project community (and a low potential for harm), we 
might prefer it to be public to avoid accusations of bias and conflicts 
of interest, although it is worth pointing out that it goes against an 
explicit ASF policy 
(https://www.apache.org/foundation/marks/reporting.html):


"It is a best practice to use the private@/projectname/.apache.org 
mailing list to discuss any reports of potential infringement first"


However, in the case of any trademark, license, or contract violation, 
it is important not only to establish that there is a legal basis for 
action but also to document the actual harm that it causes.  For 
trademark violations, we're cornered primarily with customer (user) 
harm, and it hasn't been shown here or in the other thread that such a 
risk exists.


That being said, PMC asking a company to clearly indicate a modified 
version of the software is a soft action considering the alternative 
(passing the case to the ASF branding), which we are required to use in 
case a polite request fails.


On 6/19/23 06:59, Hyukjin Kwon wrote:
With the spirit of open source, -1. At least there have been other 
cases mentioned in the discussion thread, and solely doing it for one 
specific vendor would not solve the problem, and I wouldn't also 
expect to cast a vote for each case publicly.
I would prefer to start this in the narrower scope, for example, 
contacting the vendor first and/or starting from a private mailing 
list instead of publicly raising this in the dev mailing list.



On Sat, 17 Jun 2023 at 07:22, Dongjoon Hyun  
wrote:


Here are my replies, Sean.

> Since we're here, fine: I vote -1, simply because this states no
reason for the action at all.

Thank you for your explicit vote because
this vote was explicitly triggered by this controversial comment,
"I do not see some police action from the PMC must follow".


> I would again ask we not simply repeat the same thread again.

We are in the next stage from the previous discussion which identified
our diverse perspective. The vote is the only official way to make
a conclusion, isn't it?


> - Relevant ASF policy seems to say this is fine,
> as argued at
https://lists.apache.org/thread/p15tc772j9qwyvn852sh8ksmzrol9cof

I already disagreed with the above point, "this is fine", at
https://lists.apache.org/thread/crp01jg4wr27w10mc9dsbsogxm1qj6co .


> - There is no argument any of this has caused a problem
> for the community anyway

Shall we focus on legal scope on this vote because we are
talking about ASF branding policy? For the record, the above
perspective implies
Apache Spark PMC should ignore ASF branding policy.


> Given that this has stopped being about ASF policy, ...

I want to emphasize that this statement vote is only about
Apache Spark PMC's stance ("Ask or not Ask").
If the vote decides not to ask, that's it.


Dongjoon.


On Fri, Jun 16, 2023 at 2:23 PM Sean Owen  wrote:

On Fri, Jun 16, 2023 at 3:58 PM Dongjoon Hyun
 wrote:

I started the thread about already publicly visible
version issues according to the ASF PMC communication
guideline. It's no confidential, personal, or
security-related stuff. Are you insisting this is
confidential?


Discussion about a particular company should be on private@ -
this is IMHO like "personnel matters", in the doc you link.
The principle is that discussing whether an entity is doing
something right or wrong is better in private, because, hey,
if the conclusion is "nothing's wrong here" then you avoid
disseminating any implication to the contrary.

I agreed with you, there's some value in discussing the
general issue on dev@. (I even said who the company was,
though, it was I think clear before)

But, your thread title here is: "Apache Spark PMC asks
Databricks to differentiate its Spark version string"
(You separately claim this vote is about whether the PMC has a
role here, but, that's plainly not how this thread begins.)

Given that this has stopped being about ASF policy, and seems
to be about taking some action related to a company, I find it
inappropriate again for dev@, for exactly the reason I gave
above. We have a PMC member repeating this claim over and
over, without support. This is why we don't do this in 

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-20 Thread Maciej
Similarly to Jacek, I feel it fails to document an actual community need 
for such a feature.


Currently, any data source implementation has the potential to benefit 
Spark users across all supported and third-party clients. For generally 
available sources, this is advantageous for the whole Spark community 
and avoids creating 1st and 2nd-tier citizens. This is even more 
important with new officially supported languages being added through 
connect.


Instead, we might rather document in detail the process of implementing 
a new source using current APIs and work towards easily extensible or 
customizable sources, in case there is such a need.


--
Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC


On 6/20/23 05:19, Hyukjin Kwon wrote:
Actually I support this idea in a way that Python developers don't 
have to learn Scala to write their own source (and separate packaging).
This is more crucial especially when you want to write a simple data 
source that interacts with the Python ecosystem.


On Tue, 20 Jun 2023 at 03:08, Denny Lee  wrote:

Slightly biased, but per my conversations - this would be awesome
to have!

On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari
 wrote:

I would definitely use it - is it's available :)

On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, 
wrote:

Hi Allison and devs,

Although I was against this idea at first sight (probably
because I'm a Scala dev), I think it could work as long as
there are people who'd be interested in such an API. Were
there any? I'm just curious. I've seen no emails
requesting it.

I also doubt that Python devs would like to work on new
data sources but support their wishes wholeheartedly :)

Pozdrawiam,
Jacek Laskowski

"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Fri, Jun 16, 2023 at 6:14 AM Allison Wang
 wrote:

Hi everyone,

I would like to start a discussion on “Python Data
Source API”.

This proposal aims to introduce a simple API in Python
for Data Sources. The idea is to enable Python
developers to create data sources without having to
learn Scala or deal with the complexities of the
current data source APIs. The goal is to make a
Python-based API that is simple and easy to use, thus
making Spark more accessible to the wider Python
developer community. This proposed approach is based
on the recently introduced Python user-defined table
functions with extensions to support data sources.

*SPIP Doc*:

https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing


*SPIP JIRA*:
https://issues.apache.org/jira/browse/SPARK-44076

Looking forward to your feedback.

Thanks,
Allison






OpenPGP_signature
Description: OpenPGP digital signature