Re: [VOTE] Release Spark 3.5.2 (RC4)

2024-07-26 Thread Dongjoon Hyun
+1

Thank you, Kent.

Dongjoon.

On Fri, Jul 26, 2024 at 6:37 AM Kent Yao  wrote:

> Hi dev,
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.2.
>
> The vote is open until Jul 29, 14:00:00 UTC, and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.5.2-rc4 (commit
> 1edbddfadeb46581134fa477d35399ddc63b7163):
> https://github.com/apache/spark/tree/v3.5.2-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1460/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-docs/
>
> The list of bug fixes going into 3.5.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353980
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-bin/pyspark-3.5.2.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.5.2?
> ===
>
> The current list of open tickets targeted at 3.5.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for
> "Target Version/s" = 3.5.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Kent Yao
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.5.2 (RC3)

2024-07-26 Thread Kent Yao
Hi all,

3.5.2-RC3 failed. 

Thank you, Dongjoon, for the feedback and quick fix.

I re-triggered the branch-3.5 daily CI job[1] and it passed.

I will start RC4 soon.

Bests,
Kent

[1] https://github.com/apache/spark/actions/runs/10092957068

On 2024/07/25 17:54:57 Dongjoon Hyun wrote:
> Hi, Kent.
> 
> Sorry but I need to cast -1 for RC3 inevitably.
> 
> Unlike RC0 and RC1 testing, I found that RC3 distribution fails to build
> PySpark Docker image.
> 
> This is due to the external Java 17 docker image OS change (which happened
> two days ago) instead of Spark binaries.
> 
> You can see the recent failures in branch-3.5 and branch-3.4 daily CIs, too.
> 
> - https://github.com/apache/spark/actions/workflows/build_branch35.yml
> - https://github.com/apache/spark/actions/workflows/build_branch34.yml
> 
> The patch landed to all live release branches (branch-3.5 and branch-3.4)
> now.
> 
> https://issues.apache.org/jira/browse/SPARK-49005
> [SPARK-49005][K8S][3.5] Use 17-jammy tag instead of 17 to prevent Python
> 3.12
> [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent
> Python 3.12
> 
> FYI, Python 3.12 Support was added to Apache Spark 4.0.0 only and `master`
> branch is not affected.
> 
> Dongjoon.
> 
> 
> On Thu, Jul 25, 2024 at 6:06 AM Kent Yao  wrote:
> 
> > Hi dev,
> >
> > Please vote on releasing the following candidate as Apache Spark version
> > 3.5.2.
> >
> > The vote is open until Jul 28, 13:00:00 AM UTC, and passes if a majority +1
> > PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.5.2
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v3.5.2-rc3 (commit
> > ebda6a6a97bf0b3932b970801f4c2f5dc6ae81d4):
> > https://github.com/apache/spark/tree/v3.5.2-rc3
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc3-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1459/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc3-docs/
> >
> > The list of bug fixes going into 3.5.2 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12353980
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC via "pip install
> >
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc3-bin/pyspark-3.5.2.tar.gz
> > "
> > and see if anything important breaks.
> > In the Java/Scala, you can add the staging repository to your projects
> > resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.5.2?
> > ===
> >
> > The current list of open tickets targeted at 3.5.2 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for
> > "Target Version/s" = 3.5.2
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> > Thanks,
> > Kent Yao
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.5.2 (RC3)

2024-07-25 Thread Dongjoon Hyun
Hi, Kent.

Sorry but I need to cast -1 for RC3 inevitably.

Unlike RC0 and RC1 testing, I found that RC3 distribution fails to build
PySpark Docker image.

This is due to the external Java 17 docker image OS change (which happened
two days ago) instead of Spark binaries.

You can see the recent failures in branch-3.5 and branch-3.4 daily CIs, too.

- https://github.com/apache/spark/actions/workflows/build_branch35.yml
- https://github.com/apache/spark/actions/workflows/build_branch34.yml

The patch landed to all live release branches (branch-3.5 and branch-3.4)
now.

https://issues.apache.org/jira/browse/SPARK-49005
[SPARK-49005][K8S][3.5] Use 17-jammy tag instead of 17 to prevent Python
3.12
[SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent
Python 3.12

FYI, Python 3.12 Support was added to Apache Spark 4.0.0 only and `master`
branch is not affected.

Dongjoon.


On Thu, Jul 25, 2024 at 6:06 AM Kent Yao  wrote:

> Hi dev,
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.2.
>
> The vote is open until Jul 28, 13:00:00 AM UTC, and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.5.2-rc3 (commit
> ebda6a6a97bf0b3932b970801f4c2f5dc6ae81d4):
> https://github.com/apache/spark/tree/v3.5.2-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1459/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc3-docs/
>
> The list of bug fixes going into 3.5.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353980
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc3-bin/pyspark-3.5.2.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.5.2?
> ===
>
> The current list of open tickets targeted at 3.5.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for
> "Target Version/s" = 3.5.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Kent Yao
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-25 Thread Kent Yao
Thank everyone for the votes.

RC2 failed. Preparing v3.5.2-rc3

Kent

Wenchen Fan  于2024年7月25日周四 17:04写道:
>
> I'm changing my vote to -1 as we found a regression that breaks Delta Lake's 
> generated column feature. The fix was merged just now: 
> https://github.com/apache/spark/pull/47483
>
> Can we cut a new RC?
>
> On Thu, Jul 25, 2024 at 3:13 PM Mridul Muralidharan  wrote:
>>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Phive -Pyarn -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>>
>> On Tue, Jul 23, 2024 at 9:51 PM Kent Yao  wrote:
>>>
>>> +1(non-binding), I have checked:
>>>
>>> - Download links are OK
>>> - Signatures, Checksums, and the KEYS file are OK
>>> - LICENSE and NOTICE are present
>>> - No unexpected binary files in source releases
>>> - Successfully built from source
>>>
>>> Thanks,
>>> Kent Yao
>>>
>>> On 2024/07/23 06:55:28 yangjie01 wrote:
>>> > +1, Thanks Kent Yao ~
>>> >
>>> > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>> 
>>> > 写入:
>>> >
>>> >
>>> > Hi dev,
>>> >
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark version 
>>> > 3.5.2.
>>> >
>>> >
>>> > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority 
>>> > +1
>>> > PMC votes are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> >
>>> > [ ] +1 Release this package as Apache Spark 3.5.2
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> >
>>> > To learn more about Apache Spark, please see https://spark.apache.org/ 
>>> > 
>>> >
>>> >
>>> > The tag to be voted on is v3.5.2-rc2 (commit
>>> > 6d8f511430881fa7a3203405260da174df424103):
>>> > https://github.com/apache/spark/tree/v3.5.2-rc2 
>>> > 
>>> >
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ 
>>> > 
>>> >
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS 
>>> > 
>>> >
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1458/ 
>>> > 
>>> >
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ 
>>> > 
>>> >
>>> >
>>> > The list of bug fixes going into 3.5.2 can be found at the following URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12353980 
>>> > 
>>> >
>>> >
>>> > FAQ
>>> >
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC via "pip install
>>> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
>>> >  
>>> > 
>>> > and see if anything important breaks.
>>> > In the Java/Scala, you can add the staging repository to your projects
>>> > resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with an out of date RC going forward).
>>> >
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 3.5.2?
>>> > ===
>>> >
>>> >
>>> > The current list of open tickets targeted at 3.5.2 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK 
>>> >  and search for
>>> > "Target Version/s" = 3.5.2
>>> >
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>> >
>>> >
>>> > Thanks,
>>> > Kent Yao
>>> >
>>> >

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-25 Thread Wenchen Fan
I'm changing my vote to -1 as we found a regression that breaks Delta
Lake's generated column feature. The fix was merged just now:
https://github.com/apache/spark/pull/47483

Can we cut a new RC?

On Thu, Jul 25, 2024 at 3:13 PM Mridul Muralidharan 
wrote:

>
> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Phive -Pyarn -Pkubernetes
>
> Regards,
> Mridul
>
>
> On Tue, Jul 23, 2024 at 9:51 PM Kent Yao  wrote:
>
>> +1(non-binding), I have checked:
>>
>> - Download links are OK
>> - Signatures, Checksums, and the KEYS file are OK
>> - LICENSE and NOTICE are present
>> - No unexpected binary files in source releases
>> - Successfully built from source
>>
>> Thanks,
>> Kent Yao
>>
>> On 2024/07/23 06:55:28 yangjie01 wrote:
>> > +1, Thanks Kent Yao ~
>> >
>> > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>>
>> 写入:
>> >
>> >
>> > Hi dev,
>> >
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 3.5.2.
>> >
>> >
>> > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a
>> majority +1
>> > PMC votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> >
>> > [ ] +1 Release this package as Apache Spark 3.5.2
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see https://spark.apache.org/
>> 
>> >
>> >
>> > The tag to be voted on is v3.5.2-rc2 (commit
>> > 6d8f511430881fa7a3203405260da174df424103):
>> > https://github.com/apache/spark/tree/v3.5.2-rc2 <
>> https://github.com/apache/spark/tree/v3.5.2-rc2>
>> >
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
>> >
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS <
>> https://dist.apache.org/repos/dist/dev/spark/KEYS>
>> >
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1458/
>> 
>> >
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
>> >
>> >
>> > The list of bug fixes going into 3.5.2 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
>> https://issues.apache.org/jira/projects/SPARK/versions/12353980>
>> >
>> >
>> > FAQ
>> >
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC via "pip install
>> >
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
>> <
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
>> ;>
>> > and see if anything important breaks.
>> > In the Java/Scala, you can add the staging repository to your projects
>> > resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with an out of date RC going forward).
>> >
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 3.5.2?
>> > ===
>> >
>> >
>> > The current list of open tickets targeted at 3.5.2 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK <
>> https://issues.apache.org/jira/projects/SPARK> and search for
>> > "Target Version/s" = 3.5.2
>> >
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>> >
>> >
>> > Thanks,
>> > Kent Yao
>> >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > dev-unsubscr...@spark.apache.org>
>> >
>> >
>> >
>> >
>> >
>> >
>> > -
>> > To 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-25 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pkubernetes

Regards,
Mridul


On Tue, Jul 23, 2024 at 9:51 PM Kent Yao  wrote:

> +1(non-binding), I have checked:
>
> - Download links are OK
> - Signatures, Checksums, and the KEYS file are OK
> - LICENSE and NOTICE are present
> - No unexpected binary files in source releases
> - Successfully built from source
>
> Thanks,
> Kent Yao
>
> On 2024/07/23 06:55:28 yangjie01 wrote:
> > +1, Thanks Kent Yao ~
> >
> > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>>
> 写入:
> >
> >
> > Hi dev,
> >
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.5.2.
> >
> >
> > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority
> +1
> > PMC votes are cast, with
> > a minimum of 3 +1 votes.
> >
> >
> > [ ] +1 Release this package as Apache Spark 3.5.2
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/ <
> https://spark.apache.org/>
> >
> >
> > The tag to be voted on is v3.5.2-rc2 (commit
> > 6d8f511430881fa7a3203405260da174df424103):
> > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> https://github.com/apache/spark/tree/v3.5.2-rc2>
> >
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> >
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> https://dist.apache.org/repos/dist/dev/spark/KEYS>
> >
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1458/
> 
> >
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> >
> >
> > The list of bug fixes going into 3.5.2 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> >
> >
> > FAQ
> >
> >
> > =
> > How can I help test this release?
> > =
> >
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC via "pip install
> >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
> <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> ;>
> > and see if anything important breaks.
> > In the Java/Scala, you can add the staging repository to your projects
> > resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.5.2?
> > ===
> >
> >
> > The current list of open tickets targeted at 3.5.2 can be found at:
> > https://issues.apache.org/jira/projects/SPARK <
> https://issues.apache.org/jira/projects/SPARK> and search for
> > "Target Version/s" = 3.5.2
> >
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> > Thanks,
> > Kent Yao
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  dev-unsubscr...@spark.apache.org>
> >
> >
> >
> >
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-24 Thread Kent Yao
Hey Venki,

I'll extend the deadline to Jul 27.

Good luck.

Kent

On 2024/07/25 01:08:02 Venki Korukanti wrote:
> Hi Kent,
> 
> Is it possible to extend the voting by a couple of days more? In Delta
> Lake, we tested  the 3.5.2 RC2
> jars and saw that it has regressed around 37 test cases
> .
> All of them are around the generated column, which is also a partition
> column. The partition value is read correctly from the Delta metadata and
> given to `PartitionedFile` and the Parquet reader output column vector for
> partition column has the correct values. but somewhere the partition value
> becomes null. Still debugging the RCA.
> 
> Thanks
> Venki
> 
> On Tue, Jul 23, 2024 at 7:51 PM Kent Yao  wrote:
> 
> > +1(non-binding), I have checked:
> >
> > - Download links are OK
> > - Signatures, Checksums, and the KEYS file are OK
> > - LICENSE and NOTICE are present
> > - No unexpected binary files in source releases
> > - Successfully built from source
> >
> > Thanks,
> > Kent Yao
> >
> > On 2024/07/23 06:55:28 yangjie01 wrote:
> > > +1, Thanks Kent Yao ~
> > >
> > > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>>
> > 写入:
> > >
> > >
> > > Hi dev,
> > >
> > >
> > > Please vote on releasing the following candidate as Apache Spark version
> > 3.5.2.
> > >
> > >
> > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority
> > +1
> > > PMC votes are cast, with
> > > a minimum of 3 +1 votes.
> > >
> > >
> > > [ ] +1 Release this package as Apache Spark 3.5.2
> > > [ ] -1 Do not release this package because ...
> > >
> > >
> > > To learn more about Apache Spark, please see https://spark.apache.org/ <
> > https://spark.apache.org/>
> > >
> > >
> > > The tag to be voted on is v3.5.2-rc2 (commit
> > > 6d8f511430881fa7a3203405260da174df424103):
> > > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> > https://github.com/apache/spark/tree/v3.5.2-rc2>
> > >
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> > >
> > >
> > > Signatures used for Spark RCs can be found in this file:
> > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> > https://dist.apache.org/repos/dist/dev/spark/KEYS>
> > >
> > >
> > > The staging repository for this release can be found at:
> > > https://repository.apache.org/content/repositories/orgapachespark-1458/
> > 
> > >
> > >
> > > The documentation corresponding to this release can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> > >
> > >
> > > The list of bug fixes going into 3.5.2 can be found at the following URL:
> > > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> > https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> > >
> > >
> > > FAQ
> > >
> > >
> > > =
> > > How can I help test this release?
> > > =
> > >
> > >
> > > If you are a Spark user, you can help us test this release by taking
> > > an existing Spark workload and running on this release candidate, then
> > > reporting any regressions.
> > >
> > >
> > > If you're working in PySpark you can set up a virtual env and install
> > > the current RC via "pip install
> > >
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
> > <
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> > ;>
> > > and see if anything important breaks.
> > > In the Java/Scala, you can add the staging repository to your projects
> > > resolvers and test
> > > with the RC (make sure to clean up the artifact cache before/after so
> > > you don't end up building with an out of date RC going forward).
> > >
> > >
> > > ===
> > > What should happen to JIRA tickets still targeting 3.5.2?
> > > ===
> > >
> > >
> > > The current list of open tickets targeted at 3.5.2 can be found at:
> > > https://issues.apache.org/jira/projects/SPARK <
> > https://issues.apache.org/jira/projects/SPARK> and search for
> > > "Target Version/s" = 3.5.2
> > >
> > >
> > > Committers should look at those and triage. Extremely important bug
> > > fixes, documentation, and API tweaks that impact compatibility should
> > > be worked on immediately. Everything else please retarget to an
> > > appropriate release.
> > >
> > >
> > > ==
> > > But my bug isn't fixed?
> > > ==
> > >
> > >
> > > In order to make timely releases, we will typically not hold the
> > > release unless the bug in question is a regression from the previous
> > > 

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-24 Thread Hyukjin Kwon
Just FWIW, Spark remains as Spark. We just refer to "Spark without Spark
Connect" in the documentation as "Spark Classic" for clarification. I think
it won't be excessively used.

On Thu, 25 Jul 2024 at 03:59, Holden Karau  wrote:

> I'm concerned about the term "Classic" bringing a negative connotation to
> it.
>
> On Mon, Jul 22, 2024 at 5:11 PM Hyukjin Kwon  wrote:
>
>> Yeah that's what I intended. Thanks for clarification.
>>
>> Let me start the vote
>>
>>
>> On Tue, 23 Jul 2024 at 08:14, Sadha Chilukoori 
>> wrote:
>>
>>> Hi Dongjoon,
>>>
>>> *To be clear, is the proposal aiming to make us to say like A instead of
>>> B in our documentation?*
>>>
>>> *A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>>> Classic` mode instead.*
>>> *B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>>> without Spark Connect` mode instead*.
>>>
>>>
>>> Correct, the thread is recommending to use option A, consistently in all
>>> the documentation.
>>>
>>> -Sadha
>>>
>>> On Mon, Jul 22, 2024, 10:25 AM Dongjoon Hyun 
>>> wrote:
>>>
 Thank you for opening this thread, Hyukjin.

 In this discussion thread, we have three terminologies, (1) ~ (3).

 > Spark Classic (vs. Spark Connect)

 1. Spark
 2. Spark Classic (= A proposal for Spark without Spark Connect)
 3. Spark Connect

 As Holden and Jungtaek mentioned,

 - (1) is definitely the existing code base which includes all
 (including RDD API, Spark Thrift Server, Spark Connect and so on).

 - (3) is is a very specific use case to a user when a Spark binary
 distribution is used with `--remote` option (or enabling the related
 features). Like Spark Thrift Server, after query planning steps, there is
 no fundamental difference in the execution code side in Spark clusters or
 Spark jobs.

 - (2) By the proposed definition, (2) `Spark Classic` is not (1)
 `Spark`. Like `--remote`, it's one of runnable modes.

 To be clear, is the proposal aiming to make us to say like A instead of
 B in our documentation?

 A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
 Classic` mode instead.
 B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
 without Spark Connect` mode instead.

 Dongjoon.



 On 2024/07/22 12:59:54 Sadha Chilukoori wrote:
 > +1  (non-binding) for classic.
 >
 > On Mon, Jul 22, 2024 at 3:59 AM Martin Grund
 
 > wrote:
 >
 > > +1 for classic. It's simple, easy to understand and it doesn't have
 the
 > > negative meanings like legacy for example.
 > >
 > > On Sun, Jul 21, 2024 at 23:48 Wenchen Fan 
 wrote:
 > >
 > >> Classic SGTM.
 > >>
 > >> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
 > >> kabhwan.opensou...@gmail.com> wrote:
 > >>
 > >>> I'd propose not to change the name of "Spark Connect" - the name
 > >>> represents the characteristic of the mode (separation of layer
 for client
 > >>> and server). Trying to remove the part of "Connect" would just
 make
 > >>> confusion.
 > >>>
 > >>> +1 for Classic to existing mode, till someone comes up with better
 > >>> alternatives.
 > >>>
 > >>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon <
 gurwls...@apache.org>
 > >>> wrote:
 > >>>
 >  I was thinking about a similar option too but I ended up giving
 this up
 >  .. It's quite unlikely at this moment but suppose that we have
 another
 >  Spark Connect-ish component in the far future and it would be
 challenging
 >  to come up with another name ... Another case is that we might
 have to cope
 >  with the cases like Spark Connect, vs Spark (with Spark Connect)
 and Spark
 >  (without Spark Connect) ..
 > 
 >  On Sun, 21 Jul 2024 at 09:59, Holden Karau <
 holden.ka...@gmail.com>
 >  wrote:
 > 
 > > I think perhaps Spark Connect could be phrased as “Basic*
 Spark” &
 > > existing Spark could be “Full Spark” given the API limitations
 of Spark
 > > connect.
 > >
 > > *I was also thinking Core here but we’ve used core to refer to
 the RDD
 > > APIs for too long to reuse it here.
 > >
 > > Twitter: https://twitter.com/holdenkarau
 > > Books (Learning Spark, High Performance Spark, etc.):
 > > https://amzn.to/2MaRAG9  
 > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
 > >
 > >
 > > On Sat, Jul 20, 2024 at 8:02 PM Xiao Li 
 wrote:
 > >
 > >> Classic is much better than Legacy. : )
 > >>
 > >> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
 > >>
 > >>> Hi all,
 > >>>
 > >>> I noticed that we need to standardize 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-24 Thread Venki Korukanti
Hi Kent,

Is it possible to extend the voting by a couple of days more? In Delta
Lake, we tested  the 3.5.2 RC2
jars and saw that it has regressed around 37 test cases
.
All of them are around the generated column, which is also a partition
column. The partition value is read correctly from the Delta metadata and
given to `PartitionedFile` and the Parquet reader output column vector for
partition column has the correct values. but somewhere the partition value
becomes null. Still debugging the RCA.

Thanks
Venki

On Tue, Jul 23, 2024 at 7:51 PM Kent Yao  wrote:

> +1(non-binding), I have checked:
>
> - Download links are OK
> - Signatures, Checksums, and the KEYS file are OK
> - LICENSE and NOTICE are present
> - No unexpected binary files in source releases
> - Successfully built from source
>
> Thanks,
> Kent Yao
>
> On 2024/07/23 06:55:28 yangjie01 wrote:
> > +1, Thanks Kent Yao ~
> >
> > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>>
> 写入:
> >
> >
> > Hi dev,
> >
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.5.2.
> >
> >
> > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority
> +1
> > PMC votes are cast, with
> > a minimum of 3 +1 votes.
> >
> >
> > [ ] +1 Release this package as Apache Spark 3.5.2
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/ <
> https://spark.apache.org/>
> >
> >
> > The tag to be voted on is v3.5.2-rc2 (commit
> > 6d8f511430881fa7a3203405260da174df424103):
> > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> https://github.com/apache/spark/tree/v3.5.2-rc2>
> >
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> >
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> https://dist.apache.org/repos/dist/dev/spark/KEYS>
> >
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1458/
> 
> >
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> >
> >
> > The list of bug fixes going into 3.5.2 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> >
> >
> > FAQ
> >
> >
> > =
> > How can I help test this release?
> > =
> >
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC via "pip install
> >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
> <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> ;>
> > and see if anything important breaks.
> > In the Java/Scala, you can add the staging repository to your projects
> > resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.5.2?
> > ===
> >
> >
> > The current list of open tickets targeted at 3.5.2 can be found at:
> > https://issues.apache.org/jira/projects/SPARK <
> https://issues.apache.org/jira/projects/SPARK> and search for
> > "Target Version/s" = 3.5.2
> >
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> > Thanks,
> > Kent Yao
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  dev-unsubscr...@spark.apache.org>
> >
> >
> >

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-24 Thread Holden Karau
I'm concerned about the term "Classic" bringing a negative connotation to
it.

On Mon, Jul 22, 2024 at 5:11 PM Hyukjin Kwon  wrote:

> Yeah that's what I intended. Thanks for clarification.
>
> Let me start the vote
>
>
> On Tue, 23 Jul 2024 at 08:14, Sadha Chilukoori 
> wrote:
>
>> Hi Dongjoon,
>>
>> *To be clear, is the proposal aiming to make us to say like A instead of
>> B in our documentation?*
>>
>> *A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>> Classic` mode instead.*
>> *B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>> without Spark Connect` mode instead*.
>>
>>
>> Correct, the thread is recommending to use option A, consistently in all
>> the documentation.
>>
>> -Sadha
>>
>> On Mon, Jul 22, 2024, 10:25 AM Dongjoon Hyun  wrote:
>>
>>> Thank you for opening this thread, Hyukjin.
>>>
>>> In this discussion thread, we have three terminologies, (1) ~ (3).
>>>
>>> > Spark Classic (vs. Spark Connect)
>>>
>>> 1. Spark
>>> 2. Spark Classic (= A proposal for Spark without Spark Connect)
>>> 3. Spark Connect
>>>
>>> As Holden and Jungtaek mentioned,
>>>
>>> - (1) is definitely the existing code base which includes all (including
>>> RDD API, Spark Thrift Server, Spark Connect and so on).
>>>
>>> - (3) is is a very specific use case to a user when a Spark binary
>>> distribution is used with `--remote` option (or enabling the related
>>> features). Like Spark Thrift Server, after query planning steps, there is
>>> no fundamental difference in the execution code side in Spark clusters or
>>> Spark jobs.
>>>
>>> - (2) By the proposed definition, (2) `Spark Classic` is not (1)
>>> `Spark`. Like `--remote`, it's one of runnable modes.
>>>
>>> To be clear, is the proposal aiming to make us to say like A instead of
>>> B in our documentation?
>>>
>>> A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>>> Classic` mode instead.
>>> B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>>> without Spark Connect` mode instead.
>>>
>>> Dongjoon.
>>>
>>>
>>>
>>> On 2024/07/22 12:59:54 Sadha Chilukoori wrote:
>>> > +1  (non-binding) for classic.
>>> >
>>> > On Mon, Jul 22, 2024 at 3:59 AM Martin Grund
>>> 
>>> > wrote:
>>> >
>>> > > +1 for classic. It's simple, easy to understand and it doesn't have
>>> the
>>> > > negative meanings like legacy for example.
>>> > >
>>> > > On Sun, Jul 21, 2024 at 23:48 Wenchen Fan 
>>> wrote:
>>> > >
>>> > >> Classic SGTM.
>>> > >>
>>> > >> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
>>> > >> kabhwan.opensou...@gmail.com> wrote:
>>> > >>
>>> > >>> I'd propose not to change the name of "Spark Connect" - the name
>>> > >>> represents the characteristic of the mode (separation of layer for
>>> client
>>> > >>> and server). Trying to remove the part of "Connect" would just make
>>> > >>> confusion.
>>> > >>>
>>> > >>> +1 for Classic to existing mode, till someone comes up with better
>>> > >>> alternatives.
>>> > >>>
>>> > >>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon >> >
>>> > >>> wrote:
>>> > >>>
>>> >  I was thinking about a similar option too but I ended up giving
>>> this up
>>> >  .. It's quite unlikely at this moment but suppose that we have
>>> another
>>> >  Spark Connect-ish component in the far future and it would be
>>> challenging
>>> >  to come up with another name ... Another case is that we might
>>> have to cope
>>> >  with the cases like Spark Connect, vs Spark (with Spark Connect)
>>> and Spark
>>> >  (without Spark Connect) ..
>>> > 
>>> >  On Sun, 21 Jul 2024 at 09:59, Holden Karau <
>>> holden.ka...@gmail.com>
>>> >  wrote:
>>> > 
>>> > > I think perhaps Spark Connect could be phrased as “Basic* Spark”
>>> &
>>> > > existing Spark could be “Full Spark” given the API limitations
>>> of Spark
>>> > > connect.
>>> > >
>>> > > *I was also thinking Core here but we’ve used core to refer to
>>> the RDD
>>> > > APIs for too long to reuse it here.
>>> > >
>>> > > Twitter: https://twitter.com/holdenkarau
>>> > > Books (Learning Spark, High Performance Spark, etc.):
>>> > > https://amzn.to/2MaRAG9  
>>> > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> > >
>>> > >
>>> > > On Sat, Jul 20, 2024 at 8:02 PM Xiao Li 
>>> wrote:
>>> > >
>>> > >> Classic is much better than Legacy. : )
>>> > >>
>>> > >> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>>> > >>
>>> > >>> Hi all,
>>> > >>>
>>> > >>> I noticed that we need to standardize our terminology before
>>> moving
>>> > >>> forward. For instance, when documenting, 'Spark without Spark
>>> Connect' is
>>> > >>> too long and verbose. Additionally, I've observed that we use
>>> various names
>>> > >>> for Spark without Spark Connect: Spark Classic, Classic Spark,
>>> Legacy
>>> > >>> Spark, etc.
>>> > >>>
>>> > >>> I propose that we consistently refer to it as 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-24 Thread Peter Toth
+1

huaxin gao  ezt írta (időpont: 2024. júl. 24., Sze,
11:14):

> +1
>
> On Tue, Jul 23, 2024 at 9:18 PM XiDuo You  wrote:
>
>> +1 (non-binding)
>>
>> L. C. Hsieh  于2024年7月24日周三 11:40写道:
>> >
>> > +1
>> >
>> > Thanks.
>> >
>> > On Tue, Jul 23, 2024 at 8:35 PM Dongjoon Hyun 
>> wrote:
>> > >
>> > > +1
>> > >
>> > > Dongjoon.
>> > >
>> > > On 2024/07/24 03:28:58 Wenchen Fan wrote:
>> > > > +1
>> > > >
>> > > > On Wed, Jul 24, 2024 at 10:51 AM Kent Yao  wrote:
>> > > >
>> > > > > +1(non-binding), I have checked:
>> > > > >
>> > > > > - Download links are OK
>> > > > > - Signatures, Checksums, and the KEYS file are OK
>> > > > > - LICENSE and NOTICE are present
>> > > > > - No unexpected binary files in source releases
>> > > > > - Successfully built from source
>> > > > >
>> > > > > Thanks,
>> > > > > Kent Yao
>> > > > >
>> > > > > On 2024/07/23 06:55:28 yangjie01 wrote:
>> > > > > > +1, Thanks Kent Yao ~
>> > > > > >
>> > > > > > 在 2024/7/22 17:01,“Kent Yao”> y...@apache.org>>
>> > > > > 写入:
>> > > > > >
>> > > > > >
>> > > > > > Hi dev,
>> > > > > >
>> > > > > >
>> > > > > > Please vote on releasing the following candidate as Apache
>> Spark version
>> > > > > 3.5.2.
>> > > > > >
>> > > > > >
>> > > > > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a
>> majority
>> > > > > +1
>> > > > > > PMC votes are cast, with
>> > > > > > a minimum of 3 +1 votes.
>> > > > > >
>> > > > > >
>> > > > > > [ ] +1 Release this package as Apache Spark 3.5.2
>> > > > > > [ ] -1 Do not release this package because ...
>> > > > > >
>> > > > > >
>> > > > > > To learn more about Apache Spark, please see
>> https://spark.apache.org/ <
>> > > > > https://spark.apache.org/>
>> > > > > >
>> > > > > >
>> > > > > > The tag to be voted on is v3.5.2-rc2 (commit
>> > > > > > 6d8f511430881fa7a3203405260da174df424103):
>> > > > > > https://github.com/apache/spark/tree/v3.5.2-rc2 <
>> > > > > https://github.com/apache/spark/tree/v3.5.2-rc2>
>> > > > > >
>> > > > > >
>> > > > > > The release files, including signatures, digests, etc. can be
>> found at:
>> > > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
>> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
>> > > > > >
>> > > > > >
>> > > > > > Signatures used for Spark RCs can be found in this file:
>> > > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
>> > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS>
>> > > > > >
>> > > > > >
>> > > > > > The staging repository for this release can be found at:
>> > > > > >
>> https://repository.apache.org/content/repositories/orgapachespark-1458/
>> > > > > <
>> https://repository.apache.org/content/repositories/orgapachespark-1458/>
>> > > > > >
>> > > > > >
>> > > > > > The documentation corresponding to this release can be found at:
>> > > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
>> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
>> > > > > >
>> > > > > >
>> > > > > > The list of bug fixes going into 3.5.2 can be found at the
>> following URL:
>> > > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980
>> <
>> > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980>
>> > > > > >
>> > > > > >
>> > > > > > FAQ
>> > > > > >
>> > > > > >
>> > > > > > =
>> > > > > > How can I help test this release?
>> > > > > > =
>> > > > > >
>> > > > > >
>> > > > > > If you are a Spark user, you can help us test this release by
>> taking
>> > > > > > an existing Spark workload and running on this release
>> candidate, then
>> > > > > > reporting any regressions.
>> > > > > >
>> > > > > >
>> > > > > > If you're working in PySpark you can set up a virtual env and
>> install
>> > > > > > the current RC via "pip install
>> > > > > >
>> > > > >
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
>> "
>> > > > > <
>> > > > >
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
>> > > > > ;>
>> > > > > > and see if anything important breaks.
>> > > > > > In the Java/Scala, you can add the staging repository to your
>> projects
>> > > > > > resolvers and test
>> > > > > > with the RC (make sure to clean up the artifact cache
>> before/after so
>> > > > > > you don't end up building with an out of date RC going forward).
>> > > > > >
>> > > > > >
>> > > > > > ===
>> > > > > > What should happen to JIRA tickets still targeting 3.5.2?
>> > > > > > ===
>> > > > > >
>> > > > > >
>> > > > > > The current list of open tickets targeted at 3.5.2 can be found
>> at:
>> > > > > > https://issues.apache.org/jira/projects/SPARK <
>> > > > > https://issues.apache.org/jira/projects/SPARK> and search for
>> > > > > > "Target Version/s" = 3.5.2
>> > > > > >
>> > > > > >
>> > > > > > Committers should look at those and triage. Extremely 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-24 Thread huaxin gao
+1

On Tue, Jul 23, 2024 at 9:18 PM XiDuo You  wrote:

> +1 (non-binding)
>
> L. C. Hsieh  于2024年7月24日周三 11:40写道:
> >
> > +1
> >
> > Thanks.
> >
> > On Tue, Jul 23, 2024 at 8:35 PM Dongjoon Hyun 
> wrote:
> > >
> > > +1
> > >
> > > Dongjoon.
> > >
> > > On 2024/07/24 03:28:58 Wenchen Fan wrote:
> > > > +1
> > > >
> > > > On Wed, Jul 24, 2024 at 10:51 AM Kent Yao  wrote:
> > > >
> > > > > +1(non-binding), I have checked:
> > > > >
> > > > > - Download links are OK
> > > > > - Signatures, Checksums, and the KEYS file are OK
> > > > > - LICENSE and NOTICE are present
> > > > > - No unexpected binary files in source releases
> > > > > - Successfully built from source
> > > > >
> > > > > Thanks,
> > > > > Kent Yao
> > > > >
> > > > > On 2024/07/23 06:55:28 yangjie01 wrote:
> > > > > > +1, Thanks Kent Yao ~
> > > > > >
> > > > > > 在 2024/7/22 17:01,“Kent Yao” y...@apache.org>>
> > > > > 写入:
> > > > > >
> > > > > >
> > > > > > Hi dev,
> > > > > >
> > > > > >
> > > > > > Please vote on releasing the following candidate as Apache Spark
> version
> > > > > 3.5.2.
> > > > > >
> > > > > >
> > > > > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a
> majority
> > > > > +1
> > > > > > PMC votes are cast, with
> > > > > > a minimum of 3 +1 votes.
> > > > > >
> > > > > >
> > > > > > [ ] +1 Release this package as Apache Spark 3.5.2
> > > > > > [ ] -1 Do not release this package because ...
> > > > > >
> > > > > >
> > > > > > To learn more about Apache Spark, please see
> https://spark.apache.org/ <
> > > > > https://spark.apache.org/>
> > > > > >
> > > > > >
> > > > > > The tag to be voted on is v3.5.2-rc2 (commit
> > > > > > 6d8f511430881fa7a3203405260da174df424103):
> > > > > > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> > > > > https://github.com/apache/spark/tree/v3.5.2-rc2>
> > > > > >
> > > > > >
> > > > > > The release files, including signatures, digests, etc. can be
> found at:
> > > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> > > > > >
> > > > > >
> > > > > > Signatures used for Spark RCs can be found in this file:
> > > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS>
> > > > > >
> > > > > >
> > > > > > The staging repository for this release can be found at:
> > > > > >
> https://repository.apache.org/content/repositories/orgapachespark-1458/
> > > > > <
> https://repository.apache.org/content/repositories/orgapachespark-1458/>
> > > > > >
> > > > > >
> > > > > > The documentation corresponding to this release can be found at:
> > > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> > > > > >
> > > > > >
> > > > > > The list of bug fixes going into 3.5.2 can be found at the
> following URL:
> > > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980
> <
> > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> > > > > >
> > > > > >
> > > > > > FAQ
> > > > > >
> > > > > >
> > > > > > =
> > > > > > How can I help test this release?
> > > > > > =
> > > > > >
> > > > > >
> > > > > > If you are a Spark user, you can help us test this release by
> taking
> > > > > > an existing Spark workload and running on this release
> candidate, then
> > > > > > reporting any regressions.
> > > > > >
> > > > > >
> > > > > > If you're working in PySpark you can set up a virtual env and
> install
> > > > > > the current RC via "pip install
> > > > > >
> > > > >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> "
> > > > > <
> > > > >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> > > > > ;>
> > > > > > and see if anything important breaks.
> > > > > > In the Java/Scala, you can add the staging repository to your
> projects
> > > > > > resolvers and test
> > > > > > with the RC (make sure to clean up the artifact cache
> before/after so
> > > > > > you don't end up building with an out of date RC going forward).
> > > > > >
> > > > > >
> > > > > > ===
> > > > > > What should happen to JIRA tickets still targeting 3.5.2?
> > > > > > ===
> > > > > >
> > > > > >
> > > > > > The current list of open tickets targeted at 3.5.2 can be found
> at:
> > > > > > https://issues.apache.org/jira/projects/SPARK <
> > > > > https://issues.apache.org/jira/projects/SPARK> and search for
> > > > > > "Target Version/s" = 3.5.2
> > > > > >
> > > > > >
> > > > > > Committers should look at those and triage. Extremely important
> bug
> > > > > > fixes, documentation, and API tweaks that impact compatibility
> should
> > > > > > be worked on immediately. Everything else please retarget to an
> > > > > > appropriate release.
> > > > > >
> > 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread Zhou Jiang
+1 (non-binding)

*Zhou JIANG*



On Tue, Jul 23, 2024 at 20:41 L. C. Hsieh  wrote:

> +1
>
> Thanks.
>
> On Tue, Jul 23, 2024 at 8:35 PM Dongjoon Hyun  wrote:
> >
> > +1
> >
> > Dongjoon.
> >
> > On 2024/07/24 03:28:58 Wenchen Fan wrote:
> > > +1
> > >
> > > On Wed, Jul 24, 2024 at 10:51 AM Kent Yao  wrote:
> > >
> > > > +1(non-binding), I have checked:
> > > >
> > > > - Download links are OK
> > > > - Signatures, Checksums, and the KEYS file are OK
> > > > - LICENSE and NOTICE are present
> > > > - No unexpected binary files in source releases
> > > > - Successfully built from source
> > > >
> > > > Thanks,
> > > > Kent Yao
> > > >
> > > > On 2024/07/23 06:55:28 yangjie01 wrote:
> > > > > +1, Thanks Kent Yao ~
> > > > >
> > > > > 在 2024/7/22 17:01,“Kent Yao” y...@apache.org>>
> > > > 写入:
> > > > >
> > > > >
> > > > > Hi dev,
> > > > >
> > > > >
> > > > > Please vote on releasing the following candidate as Apache Spark
> version
> > > > 3.5.2.
> > > > >
> > > > >
> > > > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a
> majority
> > > > +1
> > > > > PMC votes are cast, with
> > > > > a minimum of 3 +1 votes.
> > > > >
> > > > >
> > > > > [ ] +1 Release this package as Apache Spark 3.5.2
> > > > > [ ] -1 Do not release this package because ...
> > > > >
> > > > >
> > > > > To learn more about Apache Spark, please see
> https://spark.apache.org/ <
> > > > https://spark.apache.org/>
> > > > >
> > > > >
> > > > > The tag to be voted on is v3.5.2-rc2 (commit
> > > > > 6d8f511430881fa7a3203405260da174df424103):
> > > > > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> > > > https://github.com/apache/spark/tree/v3.5.2-rc2>
> > > > >
> > > > >
> > > > > The release files, including signatures, digests, etc. can be
> found at:
> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> > > > >
> > > > >
> > > > > Signatures used for Spark RCs can be found in this file:
> > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS>
> > > > >
> > > > >
> > > > > The staging repository for this release can be found at:
> > > > >
> https://repository.apache.org/content/repositories/orgapachespark-1458/
> > > > <
> https://repository.apache.org/content/repositories/orgapachespark-1458/>
> > > > >
> > > > >
> > > > > The documentation corresponding to this release can be found at:
> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> > > > >
> > > > >
> > > > > The list of bug fixes going into 3.5.2 can be found at the
> following URL:
> > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> > > > >
> > > > >
> > > > > FAQ
> > > > >
> > > > >
> > > > > =
> > > > > How can I help test this release?
> > > > > =
> > > > >
> > > > >
> > > > > If you are a Spark user, you can help us test this release by
> taking
> > > > > an existing Spark workload and running on this release candidate,
> then
> > > > > reporting any regressions.
> > > > >
> > > > >
> > > > > If you're working in PySpark you can set up a virtual env and
> install
> > > > > the current RC via "pip install
> > > > >
> > > >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> "
> > > > <
> > > >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> > > > ;>
> > > > > and see if anything important breaks.
> > > > > In the Java/Scala, you can add the staging repository to your
> projects
> > > > > resolvers and test
> > > > > with the RC (make sure to clean up the artifact cache before/after
> so
> > > > > you don't end up building with an out of date RC going forward).
> > > > >
> > > > >
> > > > > ===
> > > > > What should happen to JIRA tickets still targeting 3.5.2?
> > > > > ===
> > > > >
> > > > >
> > > > > The current list of open tickets targeted at 3.5.2 can be found at:
> > > > > https://issues.apache.org/jira/projects/SPARK <
> > > > https://issues.apache.org/jira/projects/SPARK> and search for
> > > > > "Target Version/s" = 3.5.2
> > > > >
> > > > >
> > > > > Committers should look at those and triage. Extremely important bug
> > > > > fixes, documentation, and API tweaks that impact compatibility
> should
> > > > > be worked on immediately. Everything else please retarget to an
> > > > > appropriate release.
> > > > >
> > > > >
> > > > > ==
> > > > > But my bug isn't fixed?
> > > > > ==
> > > > >
> > > > >
> > > > > In order to make timely releases, we will typically not hold the
> > > > > release unless the bug in question is a regression from the
> previous
> > > > > release. That 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread XiDuo You
+1 (non-binding)

L. C. Hsieh  于2024年7月24日周三 11:40写道:
>
> +1
>
> Thanks.
>
> On Tue, Jul 23, 2024 at 8:35 PM Dongjoon Hyun  wrote:
> >
> > +1
> >
> > Dongjoon.
> >
> > On 2024/07/24 03:28:58 Wenchen Fan wrote:
> > > +1
> > >
> > > On Wed, Jul 24, 2024 at 10:51 AM Kent Yao  wrote:
> > >
> > > > +1(non-binding), I have checked:
> > > >
> > > > - Download links are OK
> > > > - Signatures, Checksums, and the KEYS file are OK
> > > > - LICENSE and NOTICE are present
> > > > - No unexpected binary files in source releases
> > > > - Successfully built from source
> > > >
> > > > Thanks,
> > > > Kent Yao
> > > >
> > > > On 2024/07/23 06:55:28 yangjie01 wrote:
> > > > > +1, Thanks Kent Yao ~
> > > > >
> > > > > 在 2024/7/22 17:01,“Kent Yao” > > > > >
> > > > 写入:
> > > > >
> > > > >
> > > > > Hi dev,
> > > > >
> > > > >
> > > > > Please vote on releasing the following candidate as Apache Spark 
> > > > > version
> > > > 3.5.2.
> > > > >
> > > > >
> > > > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a 
> > > > > majority
> > > > +1
> > > > > PMC votes are cast, with
> > > > > a minimum of 3 +1 votes.
> > > > >
> > > > >
> > > > > [ ] +1 Release this package as Apache Spark 3.5.2
> > > > > [ ] -1 Do not release this package because ...
> > > > >
> > > > >
> > > > > To learn more about Apache Spark, please see 
> > > > > https://spark.apache.org/ <
> > > > https://spark.apache.org/>
> > > > >
> > > > >
> > > > > The tag to be voted on is v3.5.2-rc2 (commit
> > > > > 6d8f511430881fa7a3203405260da174df424103):
> > > > > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> > > > https://github.com/apache/spark/tree/v3.5.2-rc2>
> > > > >
> > > > >
> > > > > The release files, including signatures, digests, etc. can be found 
> > > > > at:
> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> > > > >
> > > > >
> > > > > Signatures used for Spark RCs can be found in this file:
> > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS>
> > > > >
> > > > >
> > > > > The staging repository for this release can be found at:
> > > > > https://repository.apache.org/content/repositories/orgapachespark-1458/
> > > > 
> > > > >
> > > > >
> > > > > The documentation corresponding to this release can be found at:
> > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> > > > >
> > > > >
> > > > > The list of bug fixes going into 3.5.2 can be found at the following 
> > > > > URL:
> > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> > > > >
> > > > >
> > > > > FAQ
> > > > >
> > > > >
> > > > > =
> > > > > How can I help test this release?
> > > > > =
> > > > >
> > > > >
> > > > > If you are a Spark user, you can help us test this release by taking
> > > > > an existing Spark workload and running on this release candidate, then
> > > > > reporting any regressions.
> > > > >
> > > > >
> > > > > If you're working in PySpark you can set up a virtual env and install
> > > > > the current RC via "pip install
> > > > >
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
> > > > <
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> > > > ;>
> > > > > and see if anything important breaks.
> > > > > In the Java/Scala, you can add the staging repository to your projects
> > > > > resolvers and test
> > > > > with the RC (make sure to clean up the artifact cache before/after so
> > > > > you don't end up building with an out of date RC going forward).
> > > > >
> > > > >
> > > > > ===
> > > > > What should happen to JIRA tickets still targeting 3.5.2?
> > > > > ===
> > > > >
> > > > >
> > > > > The current list of open tickets targeted at 3.5.2 can be found at:
> > > > > https://issues.apache.org/jira/projects/SPARK <
> > > > https://issues.apache.org/jira/projects/SPARK> and search for
> > > > > "Target Version/s" = 3.5.2
> > > > >
> > > > >
> > > > > Committers should look at those and triage. Extremely important bug
> > > > > fixes, documentation, and API tweaks that impact compatibility should
> > > > > be worked on immediately. Everything else please retarget to an
> > > > > appropriate release.
> > > > >
> > > > >
> > > > > ==
> > > > > But my bug isn't fixed?
> > > > > ==
> > > > >
> > > > >
> > > > > In order to make timely releases, we will typically not hold the
> > > > > release unless the bug in question is a regression from the previous
> > > > > 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread L. C. Hsieh
+1

Thanks.

On Tue, Jul 23, 2024 at 8:35 PM Dongjoon Hyun  wrote:
>
> +1
>
> Dongjoon.
>
> On 2024/07/24 03:28:58 Wenchen Fan wrote:
> > +1
> >
> > On Wed, Jul 24, 2024 at 10:51 AM Kent Yao  wrote:
> >
> > > +1(non-binding), I have checked:
> > >
> > > - Download links are OK
> > > - Signatures, Checksums, and the KEYS file are OK
> > > - LICENSE and NOTICE are present
> > > - No unexpected binary files in source releases
> > > - Successfully built from source
> > >
> > > Thanks,
> > > Kent Yao
> > >
> > > On 2024/07/23 06:55:28 yangjie01 wrote:
> > > > +1, Thanks Kent Yao ~
> > > >
> > > > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>>
> > > 写入:
> > > >
> > > >
> > > > Hi dev,
> > > >
> > > >
> > > > Please vote on releasing the following candidate as Apache Spark version
> > > 3.5.2.
> > > >
> > > >
> > > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority
> > > +1
> > > > PMC votes are cast, with
> > > > a minimum of 3 +1 votes.
> > > >
> > > >
> > > > [ ] +1 Release this package as Apache Spark 3.5.2
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > >
> > > > To learn more about Apache Spark, please see https://spark.apache.org/ <
> > > https://spark.apache.org/>
> > > >
> > > >
> > > > The tag to be voted on is v3.5.2-rc2 (commit
> > > > 6d8f511430881fa7a3203405260da174df424103):
> > > > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> > > https://github.com/apache/spark/tree/v3.5.2-rc2>
> > > >
> > > >
> > > > The release files, including signatures, digests, etc. can be found at:
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> > > >
> > > >
> > > > Signatures used for Spark RCs can be found in this file:
> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> > > https://dist.apache.org/repos/dist/dev/spark/KEYS>
> > > >
> > > >
> > > > The staging repository for this release can be found at:
> > > > https://repository.apache.org/content/repositories/orgapachespark-1458/
> > > 
> > > >
> > > >
> > > > The documentation corresponding to this release can be found at:
> > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> > > >
> > > >
> > > > The list of bug fixes going into 3.5.2 can be found at the following 
> > > > URL:
> > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> > > https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> > > >
> > > >
> > > > FAQ
> > > >
> > > >
> > > > =
> > > > How can I help test this release?
> > > > =
> > > >
> > > >
> > > > If you are a Spark user, you can help us test this release by taking
> > > > an existing Spark workload and running on this release candidate, then
> > > > reporting any regressions.
> > > >
> > > >
> > > > If you're working in PySpark you can set up a virtual env and install
> > > > the current RC via "pip install
> > > >
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
> > > <
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> > > ;>
> > > > and see if anything important breaks.
> > > > In the Java/Scala, you can add the staging repository to your projects
> > > > resolvers and test
> > > > with the RC (make sure to clean up the artifact cache before/after so
> > > > you don't end up building with an out of date RC going forward).
> > > >
> > > >
> > > > ===
> > > > What should happen to JIRA tickets still targeting 3.5.2?
> > > > ===
> > > >
> > > >
> > > > The current list of open tickets targeted at 3.5.2 can be found at:
> > > > https://issues.apache.org/jira/projects/SPARK <
> > > https://issues.apache.org/jira/projects/SPARK> and search for
> > > > "Target Version/s" = 3.5.2
> > > >
> > > >
> > > > Committers should look at those and triage. Extremely important bug
> > > > fixes, documentation, and API tweaks that impact compatibility should
> > > > be worked on immediately. Everything else please retarget to an
> > > > appropriate release.
> > > >
> > > >
> > > > ==
> > > > But my bug isn't fixed?
> > > > ==
> > > >
> > > >
> > > > In order to make timely releases, we will typically not hold the
> > > > release unless the bug in question is a regression from the previous
> > > > release. That being said, if there is something which is a regression
> > > > that has not been correctly targeted please ping me or a committer to
> > > > help target the issue.
> > > >
> > > >
> > > > Thanks,
> > > > Kent Yao
> > > >
> > > >
> > > > -
> > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  > > 

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread Dongjoon Hyun
+1

Dongjoon.

On 2024/07/24 03:28:58 Wenchen Fan wrote:
> +1
> 
> On Wed, Jul 24, 2024 at 10:51 AM Kent Yao  wrote:
> 
> > +1(non-binding), I have checked:
> >
> > - Download links are OK
> > - Signatures, Checksums, and the KEYS file are OK
> > - LICENSE and NOTICE are present
> > - No unexpected binary files in source releases
> > - Successfully built from source
> >
> > Thanks,
> > Kent Yao
> >
> > On 2024/07/23 06:55:28 yangjie01 wrote:
> > > +1, Thanks Kent Yao ~
> > >
> > > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>>
> > 写入:
> > >
> > >
> > > Hi dev,
> > >
> > >
> > > Please vote on releasing the following candidate as Apache Spark version
> > 3.5.2.
> > >
> > >
> > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority
> > +1
> > > PMC votes are cast, with
> > > a minimum of 3 +1 votes.
> > >
> > >
> > > [ ] +1 Release this package as Apache Spark 3.5.2
> > > [ ] -1 Do not release this package because ...
> > >
> > >
> > > To learn more about Apache Spark, please see https://spark.apache.org/ <
> > https://spark.apache.org/>
> > >
> > >
> > > The tag to be voted on is v3.5.2-rc2 (commit
> > > 6d8f511430881fa7a3203405260da174df424103):
> > > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> > https://github.com/apache/spark/tree/v3.5.2-rc2>
> > >
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> > >
> > >
> > > Signatures used for Spark RCs can be found in this file:
> > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> > https://dist.apache.org/repos/dist/dev/spark/KEYS>
> > >
> > >
> > > The staging repository for this release can be found at:
> > > https://repository.apache.org/content/repositories/orgapachespark-1458/
> > 
> > >
> > >
> > > The documentation corresponding to this release can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> > >
> > >
> > > The list of bug fixes going into 3.5.2 can be found at the following URL:
> > > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> > https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> > >
> > >
> > > FAQ
> > >
> > >
> > > =
> > > How can I help test this release?
> > > =
> > >
> > >
> > > If you are a Spark user, you can help us test this release by taking
> > > an existing Spark workload and running on this release candidate, then
> > > reporting any regressions.
> > >
> > >
> > > If you're working in PySpark you can set up a virtual env and install
> > > the current RC via "pip install
> > >
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
> > <
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> > ;>
> > > and see if anything important breaks.
> > > In the Java/Scala, you can add the staging repository to your projects
> > > resolvers and test
> > > with the RC (make sure to clean up the artifact cache before/after so
> > > you don't end up building with an out of date RC going forward).
> > >
> > >
> > > ===
> > > What should happen to JIRA tickets still targeting 3.5.2?
> > > ===
> > >
> > >
> > > The current list of open tickets targeted at 3.5.2 can be found at:
> > > https://issues.apache.org/jira/projects/SPARK <
> > https://issues.apache.org/jira/projects/SPARK> and search for
> > > "Target Version/s" = 3.5.2
> > >
> > >
> > > Committers should look at those and triage. Extremely important bug
> > > fixes, documentation, and API tweaks that impact compatibility should
> > > be worked on immediately. Everything else please retarget to an
> > > appropriate release.
> > >
> > >
> > > ==
> > > But my bug isn't fixed?
> > > ==
> > >
> > >
> > > In order to make timely releases, we will typically not hold the
> > > release unless the bug in question is a regression from the previous
> > > release. That being said, if there is something which is a regression
> > > that has not been correctly targeted please ping me or a committer to
> > > help target the issue.
> > >
> > >
> > > Thanks,
> > > Kent Yao
> > >
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  > dev-unsubscr...@spark.apache.org>
> > >
> > >
> > >
> > >
> > >
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread Wenchen Fan
+1

On Wed, Jul 24, 2024 at 10:51 AM Kent Yao  wrote:

> +1(non-binding), I have checked:
>
> - Download links are OK
> - Signatures, Checksums, and the KEYS file are OK
> - LICENSE and NOTICE are present
> - No unexpected binary files in source releases
> - Successfully built from source
>
> Thanks,
> Kent Yao
>
> On 2024/07/23 06:55:28 yangjie01 wrote:
> > +1, Thanks Kent Yao ~
> >
> > 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>>
> 写入:
> >
> >
> > Hi dev,
> >
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.5.2.
> >
> >
> > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority
> +1
> > PMC votes are cast, with
> > a minimum of 3 +1 votes.
> >
> >
> > [ ] +1 Release this package as Apache Spark 3.5.2
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/ <
> https://spark.apache.org/>
> >
> >
> > The tag to be voted on is v3.5.2-rc2 (commit
> > 6d8f511430881fa7a3203405260da174df424103):
> > https://github.com/apache/spark/tree/v3.5.2-rc2 <
> https://github.com/apache/spark/tree/v3.5.2-rc2>
> >
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/>
> >
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS <
> https://dist.apache.org/repos/dist/dev/spark/KEYS>
> >
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1458/
> 
> >
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/>
> >
> >
> > The list of bug fixes going into 3.5.2 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12353980 <
> https://issues.apache.org/jira/projects/SPARK/versions/12353980>
> >
> >
> > FAQ
> >
> >
> > =
> > How can I help test this release?
> > =
> >
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC via "pip install
> >
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
> <
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz
> ;>
> > and see if anything important breaks.
> > In the Java/Scala, you can add the staging repository to your projects
> > resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.5.2?
> > ===
> >
> >
> > The current list of open tickets targeted at 3.5.2 can be found at:
> > https://issues.apache.org/jira/projects/SPARK <
> https://issues.apache.org/jira/projects/SPARK> and search for
> > "Target Version/s" = 3.5.2
> >
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> > Thanks,
> > Kent Yao
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  dev-unsubscr...@spark.apache.org>
> >
> >
> >
> >
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread Kent Yao
+1(non-binding), I have checked:

- Download links are OK
- Signatures, Checksums, and the KEYS file are OK
- LICENSE and NOTICE are present
- No unexpected binary files in source releases
- Successfully built from source

Thanks,
Kent Yao

On 2024/07/23 06:55:28 yangjie01 wrote:
> +1, Thanks Kent Yao ~
> 
> 在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>> 写入:
> 
> 
> Hi dev,
> 
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 3.5.2.
> 
> 
> The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority +1
> PMC votes are cast, with
> a minimum of 3 +1 votes.
> 
> 
> [ ] +1 Release this package as Apache Spark 3.5.2
> [ ] -1 Do not release this package because ...
> 
> 
> To learn more about Apache Spark, please see https://spark.apache.org/ 
> 
> 
> 
> The tag to be voted on is v3.5.2-rc2 (commit
> 6d8f511430881fa7a3203405260da174df424103):
> https://github.com/apache/spark/tree/v3.5.2-rc2 
> 
> 
> 
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ 
> 
> 
> 
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS 
> 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1458/ 
> 
> 
> 
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ 
> 
> 
> 
> The list of bug fixes going into 3.5.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353980 
> 
> 
> 
> FAQ
> 
> 
> =
> How can I help test this release?
> =
> 
> 
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
> 
> 
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
>  
> 
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
> 
> 
> ===
> What should happen to JIRA tickets still targeting 3.5.2?
> ===
> 
> 
> The current list of open tickets targeted at 3.5.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK 
>  and search for
> "Target Version/s" = 3.5.2
> 
> 
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
> 
> 
> ==
> But my bug isn't fixed?
> ==
> 
> 
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
> 
> 
> Thanks,
> Kent Yao
> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-23 Thread Hyukjin Kwon
There is always a running session. I replied in the PR.

On Tue, 23 Jul 2024 at 23:32, Dongjoon Hyun  wrote:

> I'm bumping up this thread because the overhead bites us back already.
> Here is a commit merged 3 hours ago.
>
> https://github.com/apache/spark/pull/47453
> [SPARK-48970][PYTHON][ML] Avoid using SparkSession.getActiveSession in
> spark ML reader/writer
>
> In short, unlike the original PRs' claims, this commit starts to create
> `SparkSession` in this layer. Although I understand the reason why Hyukjin
> and Martin claims that `SparkSession` will be there in any way, this is an
> architectural change which we need to decide explicitly, not implicitly.
>
> > On 2024/07/13 05:33:32 Hyukjin Kwon wrote:
> > We actually get the active Spark session so it doesn't cause overhead.
> Also
> > even we create, it will create once which should be pretty trivial
> overhead.
>
> If this architectural change is required inevitably and needs to happen in
> Apache Spark 4.0.0. Can we have a dev-document about this? If there is no
> proper place, we can add it to the ML migration guide simply.
>
> Dongjoon.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-23 Thread Dongjoon Hyun
I'm bumping up this thread because the overhead bites us back already. Here is 
a commit merged 3 hours ago.

https://github.com/apache/spark/pull/47453
[SPARK-48970][PYTHON][ML] Avoid using SparkSession.getActiveSession in spark ML 
reader/writer

In short, unlike the original PRs' claims, this commit starts to create 
`SparkSession` in this layer. Although I understand the reason why Hyukjin and 
Martin claims that `SparkSession` will be there in any way, this is an 
architectural change which we need to decide explicitly, not implicitly.

> On 2024/07/13 05:33:32 Hyukjin Kwon wrote:
> We actually get the active Spark session so it doesn't cause overhead. Also
> even we create, it will create once which should be pretty trivial overhead.

If this architectural change is required inevitably and needs to happen in 
Apache Spark 4.0.0. Can we have a dev-document about this? If there is no 
proper place, we can add it to the ML migration guide simply.

Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [外部邮件] [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-23 Thread Martin Grund
+1

On Tue, Jul 23, 2024 at 07:06 Dongjoon Hyun  wrote:

> +1 for the proposed definition.
>
> Thanks,
> Dongjoon
>
>
> On Tue, Jul 23, 2024 at 6:42 AM Xianjin YE  wrote:
>
>> +1 (non-binding)
>>
>> On Jul 23, 2024, at 16:16, Jungtaek Lim 
>> wrote:
>>
>> +1 (non-binding)
>>
>> On Tue, Jul 23, 2024 at 1:51 PM  wrote:
>>
>>>
>>> +1
>>>
>>> On Jul 22, 2024, at 21:42, John Zhuge  wrote:
>>>
>>> 
>>> +1 (non-binding)
>>>
>>> On Mon, Jul 22, 2024 at 8:16 PM yangjie01 
>>> wrote:
>>>
 +1

 在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org>>
 写入:


 +1


 On 2024/07/23 02:04:17 Herman van Hovell wrote:
 > +1
 >
 > On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan >>> > wrote:
 >
 > > +1
 > >
 > > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng >>> > wrote:
 > >
 > >> +1
 > >>
 > >> Thank you @Hyukjin Kwon >>> gurwls...@apache.org>> !
 > >>
 > >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang >>> > wrote:
 > >>
 > >>> +1
 > >>>
 > >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon <
 gurwls...@apache.org >
 > >>> wrote:
 > >>>
 >  Starting with my own +1.
 > 
 >  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon >>> >
 >  wrote:
 > 
 > > Hi all,
 > >
 > > I’d like to start a vote for differentiating "Spark without
 Spark
 > > Connect" as "Spark Classic".
 > >
 > > Please also refer to:
 > >
 > > - Discussion thread:
 > >
 https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym <
 https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym>
 > >
 > > Please vote on the SPIP for the next 72 hours:
 > >
 > > [ ] +1: Accept the proposal
 > > [ ] +0
 > > [ ] -1: I don’t think this is a good idea because …
 > >
 > > Thank you!
 > >
 > 
 >


 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> dev-unsubscr...@spark.apache.org>






>>>
>>> --
>>> John Zhuge
>>>
>>>
>>


Re: [外部邮件] [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-23 Thread Dongjoon Hyun
+1 for the proposed definition.

Thanks,
Dongjoon


On Tue, Jul 23, 2024 at 6:42 AM Xianjin YE  wrote:

> +1 (non-binding)
>
> On Jul 23, 2024, at 16:16, Jungtaek Lim 
> wrote:
>
> +1 (non-binding)
>
> On Tue, Jul 23, 2024 at 1:51 PM  wrote:
>
>>
>> +1
>>
>> On Jul 22, 2024, at 21:42, John Zhuge  wrote:
>>
>> 
>> +1 (non-binding)
>>
>> On Mon, Jul 22, 2024 at 8:16 PM yangjie01 
>> wrote:
>>
>>> +1
>>>
>>> 在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org>>
>>> 写入:
>>>
>>>
>>> +1
>>>
>>>
>>> On 2024/07/23 02:04:17 Herman van Hovell wrote:
>>> > +1
>>> >
>>> > On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan >> > wrote:
>>> >
>>> > > +1
>>> > >
>>> > > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng >> > wrote:
>>> > >
>>> > >> +1
>>> > >>
>>> > >> Thank you @Hyukjin Kwon >> gurwls...@apache.org>> !
>>> > >>
>>> > >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang >> > wrote:
>>> > >>
>>> > >>> +1
>>> > >>>
>>> > >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon >> >
>>> > >>> wrote:
>>> > >>>
>>> >  Starting with my own +1.
>>> > 
>>> >  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon >> >
>>> >  wrote:
>>> > 
>>> > > Hi all,
>>> > >
>>> > > I’d like to start a vote for differentiating "Spark without Spark
>>> > > Connect" as "Spark Classic".
>>> > >
>>> > > Please also refer to:
>>> > >
>>> > > - Discussion thread:
>>> > > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>>> 
>>> > >
>>> > > Please vote on the SPIP for the next 72 hours:
>>> > >
>>> > > [ ] +1: Accept the proposal
>>> > > [ ] +0
>>> > > [ ] -1: I don’t think this is a good idea because …
>>> > >
>>> > > Thank you!
>>> > >
>>> > 
>>> >
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> dev-unsubscr...@spark.apache.org>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> John Zhuge
>>
>>
>


Re: [外部邮件] [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-23 Thread Xianjin YE
+1 (non-binding)

> On Jul 23, 2024, at 16:16, Jungtaek Lim  wrote:
> 
> +1 (non-binding)
> 
> On Tue, Jul 23, 2024 at 1:51 PM  > wrote:
>> 
>> +1
>> 
>>> On Jul 22, 2024, at 21:42, John Zhuge >> > wrote:
>>> 
>>> 
>>> +1 (non-binding)
>>> 
>>> On Mon, Jul 22, 2024 at 8:16 PM yangjie01  
>>> wrote:
 +1
 
 在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org> 
 >> 写入:
 
 
 +1
 
 
 On 2024/07/23 02:04:17 Herman van Hovell wrote:
 > +1
 > 
 > On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan >>> >   >> wrote:
 > 
 > > +1
 > >
 > > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng >>> > >   > >> wrote:
 > >
 > >> +1
 > >>
 > >> Thank you @Hyukjin Kwon >>> > >>   >> >> !
 > >>
 > >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang >>> > >>   >> >> wrote:
 > >>
 > >>> +1
 > >>>
 > >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon >>> > >>>   >>> >>
 > >>> wrote:
 > >>>
 >  Starting with my own +1.
 > 
 >  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon >>> >     >>
 >  wrote:
 > 
 > > Hi all,
 > >
 > > I’d like to start a vote for differentiating "Spark without Spark
 > > Connect" as "Spark Classic".
 > >
 > > Please also refer to:
 > >
 > > - Discussion thread:
 > > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym 
 > > 
 > >
 > > Please vote on the SPIP for the next 72 hours:
 > >
 > > [ ] +1: Accept the proposal
 > > [ ] +0
 > > [ ] -1: I don’t think this is a good idea because …
 > >
 > > Thank you!
 > >
 > 
 > 
 
 
 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
  
 >
 
 
 
 
 
>>> 
>>> 
>>> --
>>> John Zhuge



Re: [外部邮件] Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-23 Thread Jungtaek Lim
+1 (non-binding)

On Tue, Jul 23, 2024 at 1:51 PM  wrote:

>
> +1
>
> On Jul 22, 2024, at 21:42, John Zhuge  wrote:
>
> 
> +1 (non-binding)
>
> On Mon, Jul 22, 2024 at 8:16 PM yangjie01 
> wrote:
>
>> +1
>>
>> 在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org>> 写入:
>>
>>
>> +1
>>
>>
>> On 2024/07/23 02:04:17 Herman van Hovell wrote:
>> > +1
>> >
>> > On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan > > wrote:
>> >
>> > > +1
>> > >
>> > > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng > > wrote:
>> > >
>> > >> +1
>> > >>
>> > >> Thank you @Hyukjin Kwon > gurwls...@apache.org>> !
>> > >>
>> > >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang > > wrote:
>> > >>
>> > >>> +1
>> > >>>
>> > >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon > >
>> > >>> wrote:
>> > >>>
>> >  Starting with my own +1.
>> > 
>> >  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon > >
>> >  wrote:
>> > 
>> > > Hi all,
>> > >
>> > > I’d like to start a vote for differentiating "Spark without Spark
>> > > Connect" as "Spark Classic".
>> > >
>> > > Please also refer to:
>> > >
>> > > - Discussion thread:
>> > > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>> 
>> > >
>> > > Please vote on the SPIP for the next 72 hours:
>> > >
>> > > [ ] +1: Accept the proposal
>> > > [ ] +0
>> > > [ ] -1: I don’t think this is a good idea because …
>> > >
>> > > Thank you!
>> > >
>> > 
>> >
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > dev-unsubscr...@spark.apache.org>
>>
>>
>>
>>
>>
>>
>
> --
> John Zhuge
>
>


Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread yangjie01
+1, Thanks Kent Yao ~

在 2024/7/22 17:01,“Kent Yao”mailto:y...@apache.org>> 写入:


Hi dev,


Please vote on releasing the following candidate as Apache Spark version 3.5.2.


The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a majority +1
PMC votes are cast, with
a minimum of 3 +1 votes.


[ ] +1 Release this package as Apache Spark 3.5.2
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/ 



The tag to be voted on is v3.5.2-rc2 (commit
6d8f511430881fa7a3203405260da174df424103):
https://github.com/apache/spark/tree/v3.5.2-rc2 



The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ 



Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS 



The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1458/ 



The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ 



The list of bug fixes going into 3.5.2 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12353980 



FAQ


=
How can I help test this release?
=


If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.


If you're working in PySpark you can set up a virtual env and install
the current RC via "pip install
https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz;
 

and see if anything important breaks.
In the Java/Scala, you can add the staging repository to your projects
resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).


===
What should happen to JIRA tickets still targeting 3.5.2?
===


The current list of open tickets targeted at 3.5.2 can be found at:
https://issues.apache.org/jira/projects/SPARK 
 and search for
"Target Version/s" = 3.5.2


Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.


==
But my bug isn't fixed?
==


In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Thanks,
Kent Yao


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 







-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [外部邮件] Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread xyliyuanjian
+1On Jul 22, 2024, at 21:42, John Zhuge  wrote:+1 (non-binding)On Mon, Jul 22, 2024 at 8:16 PM yangjie01  wrote:+1

在 2024/7/23 11:11,“Kent Yao”> 写入:


+1


On 2024/07/23 02:04:17 Herman van Hovell wrote:
> +1
> 
> On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan > wrote:
> 
> > +1
> >
> > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng > wrote:
> >
> >> +1
> >>
> >> Thank you @Hyukjin Kwon > !
> >>
> >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang > wrote:
> >>
> >>> +1
> >>>
> >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon >
> >>> wrote:
> >>>
>  Starting with my own +1.
> 
>  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon >
>  wrote:
> 
> > Hi all,
> >
> > I’d like to start a vote for differentiating "Spark without Spark
> > Connect" as "Spark Classic".
> >
> > Please also refer to:
> >
> > - Discussion thread:
> > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym 
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you!
> >
> 
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org dev-unsubscr...@spark.apache.org>





-- John Zhuge


Re: [外部邮件] Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Ruifeng Zheng
+1

On Tue, Jul 23, 2024 at 11:15 AM yangjie01 
wrote:

> +1
>
> 在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org>> 写入:
>
>
> +1
>
>
> On 2024/07/23 02:04:17 Herman van Hovell wrote:
> > +1
> >
> > On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan  > wrote:
> >
> > > +1
> > >
> > > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  > wrote:
> > >
> > >> +1
> > >>
> > >> Thank you @Hyukjin Kwon  gurwls...@apache.org>> !
> > >>
> > >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  > wrote:
> > >>
> > >>> +1
> > >>>
> > >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon  >
> > >>> wrote:
> > >>>
> >  Starting with my own +1.
> > 
> >  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  >
> >  wrote:
> > 
> > > Hi all,
> > >
> > > I’d like to start a vote for differentiating "Spark without Spark
> > > Connect" as "Spark Classic".
> > >
> > > Please also refer to:
> > >
> > > - Discussion thread:
> > > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym <
> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym>
> > >
> > > Please vote on the SPIP for the next 72 hours:
> > >
> > > [ ] +1: Accept the proposal
> > > [ ] +0
> > > [ ] -1: I don’t think this is a good idea because …
> > >
> > > Thank you!
> > >
> > 
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  dev-unsubscr...@spark.apache.org>
>
>
>
>
>
>


Re: [外部邮件] Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread John Zhuge
+1 (non-binding)

On Mon, Jul 22, 2024 at 8:16 PM yangjie01 
wrote:

> +1
>
> 在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org>> 写入:
>
>
> +1
>
>
> On 2024/07/23 02:04:17 Herman van Hovell wrote:
> > +1
> >
> > On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan  > wrote:
> >
> > > +1
> > >
> > > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  > wrote:
> > >
> > >> +1
> > >>
> > >> Thank you @Hyukjin Kwon  gurwls...@apache.org>> !
> > >>
> > >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  > wrote:
> > >>
> > >>> +1
> > >>>
> > >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon  >
> > >>> wrote:
> > >>>
> >  Starting with my own +1.
> > 
> >  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  >
> >  wrote:
> > 
> > > Hi all,
> > >
> > > I’d like to start a vote for differentiating "Spark without Spark
> > > Connect" as "Spark Classic".
> > >
> > > Please also refer to:
> > >
> > > - Discussion thread:
> > > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym <
> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym>
> > >
> > > Please vote on the SPIP for the next 72 hours:
> > >
> > > [ ] +1: Accept the proposal
> > > [ ] +0
> > > [ ] -1: I don’t think this is a good idea because …
> > >
> > > Thank you!
> > >
> > 
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  dev-unsubscr...@spark.apache.org>
>
>
>
>
>
>

-- 
John Zhuge


Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Denny Lee
+1 (non-binding)

On Tue, Jul 23, 2024 at 12:19 PM Sadha Chilukoori 
wrote:

> +1 (non-binding)
>
> On Mon, Jul 22, 2024 at 5:56 PM Wenchen Fan  wrote:
>
>> +1
>>
>> On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  wrote:
>>
>>> +1
>>>
>>> Thank you @Hyukjin Kwon  !
>>>
>>> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  wrote:
>>>
 +1

 On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon 
 wrote:

> Starting with my own +1.
>
> On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for differentiating "Spark without Spark
>> Connect" as "Spark Classic".
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thank you!
>>
>


Re: [外部邮件] Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread yangjie01
+1

在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org>> 写入:


+1


On 2024/07/23 02:04:17 Herman van Hovell wrote:
> +1
> 
> On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan  > wrote:
> 
> > +1
> >
> > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  > > wrote:
> >
> >> +1
> >>
> >> Thank you @Hyukjin Kwon  >> > !
> >>
> >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  >> > wrote:
> >>
> >>> +1
> >>>
> >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon  >>> >
> >>> wrote:
> >>>
>  Starting with my own +1.
> 
>  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon   >
>  wrote:
> 
> > Hi all,
> >
> > I’d like to start a vote for differentiating "Spark without Spark
> > Connect" as "Spark Classic".
> >
> > Please also refer to:
> >
> > - Discussion thread:
> > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym 
> > 
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you!
> >
> 
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 








Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Kent Yao
+1

On 2024/07/23 02:04:17 Herman van Hovell wrote:
> +1
> 
> On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan  wrote:
> 
> > +1
> >
> > On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  wrote:
> >
> >> +1
> >>
> >> Thank you @Hyukjin Kwon  !
> >>
> >> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  wrote:
> >>
> >>> +1
> >>>
> >>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon 
> >>> wrote:
> >>>
>  Starting with my own +1.
> 
>  On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon 
>  wrote:
> 
> > Hi all,
> >
> > I’d like to start a vote for differentiating "Spark without Spark
> > Connect" as "Spark Classic".
> >
> > Please also refer to:
> >
> >- Discussion thread:
> > https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
> >
> > Please vote on the SPIP for the next 72 hours:
> >
> > [ ] +1: Accept the proposal
> > [ ] +0
> > [ ] -1: I don’t think this is a good idea because …
> >
> > Thank you!
> >
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Herman van Hovell
+1

On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan  wrote:

> +1
>
> On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  wrote:
>
>> +1
>>
>> Thank you @Hyukjin Kwon  !
>>
>> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  wrote:
>>
>>> +1
>>>
>>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon 
>>> wrote:
>>>
 Starting with my own +1.

 On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon 
 wrote:

> Hi all,
>
> I’d like to start a vote for differentiating "Spark without Spark
> Connect" as "Spark Classic".
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>



Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Sadha Chilukoori
+1 (non-binding)

On Mon, Jul 22, 2024 at 5:56 PM Wenchen Fan  wrote:

> +1
>
> On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  wrote:
>
>> +1
>>
>> Thank you @Hyukjin Kwon  !
>>
>> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  wrote:
>>
>>> +1
>>>
>>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon 
>>> wrote:
>>>
 Starting with my own +1.

 On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon 
 wrote:

> Hi all,
>
> I’d like to start a vote for differentiating "Spark without Spark
> Connect" as "Spark Classic".
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>



Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Wenchen Fan
+1

On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng  wrote:

> +1
>
> Thank you @Hyukjin Kwon  !
>
> On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  wrote:
>
>> +1
>>
>> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon 
>> wrote:
>>
>>> Starting with my own +1.
>>>
>>> On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  wrote:
>>>
 Hi all,

 I’d like to start a vote for differentiating "Spark without Spark
 Connect" as "Spark Classic".

 Please also refer to:

- Discussion thread:
 https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym

 Please vote on the SPIP for the next 72 hours:

 [ ] +1: Accept the proposal
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …

 Thank you!

>>>


Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Xinrong Meng
+1

Thank you @Hyukjin Kwon  !

On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang  wrote:

> +1
>
> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon  wrote:
>
>> Starting with my own +1.
>>
>> On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I’d like to start a vote for differentiating "Spark without Spark
>>> Connect" as "Spark Classic".
>>>
>>> Please also refer to:
>>>
>>>- Discussion thread:
>>> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thank you!
>>>
>>


Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Takuya UESHIN
+1

On Mon, Jul 22, 2024 at 5:21 PM Gengliang Wang  wrote:

> +1
>
> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon  wrote:
>
>> Starting with my own +1.
>>
>> On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I’d like to start a vote for differentiating "Spark without Spark
>>> Connect" as "Spark Classic".
>>>
>>> Please also refer to:
>>>
>>>- Discussion thread:
>>> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thank you!
>>>
>>

-- 
Takuya UESHIN


Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Gengliang Wang
+1

On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon  wrote:

> Starting with my own +1.
>
> On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for differentiating "Spark without Spark
>> Connect" as "Spark Classic".
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thank you!
>>
>


Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Hyukjin Kwon
Starting with my own +1.

On Tue, 23 Jul 2024 at 09:12, Hyukjin Kwon  wrote:

> Hi all,
>
> I’d like to start a vote for differentiating "Spark without Spark Connect"
> as "Spark Classic".
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/ys7zsod8cs9c7qllmf0p0msk6z2mz2ym
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Hyukjin Kwon
Yeah that's what I intended. Thanks for clarification.

Let me start the vote


On Tue, 23 Jul 2024 at 08:14, Sadha Chilukoori 
wrote:

> Hi Dongjoon,
>
> *To be clear, is the proposal aiming to make us to say like A instead of B
> in our documentation?*
>
> *A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
> Classic` mode instead.*
> *B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
> without Spark Connect` mode instead*.
>
>
> Correct, the thread is recommending to use option A, consistently in all
> the documentation.
>
> -Sadha
>
> On Mon, Jul 22, 2024, 10:25 AM Dongjoon Hyun  wrote:
>
>> Thank you for opening this thread, Hyukjin.
>>
>> In this discussion thread, we have three terminologies, (1) ~ (3).
>>
>> > Spark Classic (vs. Spark Connect)
>>
>> 1. Spark
>> 2. Spark Classic (= A proposal for Spark without Spark Connect)
>> 3. Spark Connect
>>
>> As Holden and Jungtaek mentioned,
>>
>> - (1) is definitely the existing code base which includes all (including
>> RDD API, Spark Thrift Server, Spark Connect and so on).
>>
>> - (3) is is a very specific use case to a user when a Spark binary
>> distribution is used with `--remote` option (or enabling the related
>> features). Like Spark Thrift Server, after query planning steps, there is
>> no fundamental difference in the execution code side in Spark clusters or
>> Spark jobs.
>>
>> - (2) By the proposed definition, (2) `Spark Classic` is not (1) `Spark`.
>> Like `--remote`, it's one of runnable modes.
>>
>> To be clear, is the proposal aiming to make us to say like A instead of B
>> in our documentation?
>>
>> A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>> Classic` mode instead.
>> B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>> without Spark Connect` mode instead.
>>
>> Dongjoon.
>>
>>
>>
>> On 2024/07/22 12:59:54 Sadha Chilukoori wrote:
>> > +1  (non-binding) for classic.
>> >
>> > On Mon, Jul 22, 2024 at 3:59 AM Martin Grund
>> 
>> > wrote:
>> >
>> > > +1 for classic. It's simple, easy to understand and it doesn't have
>> the
>> > > negative meanings like legacy for example.
>> > >
>> > > On Sun, Jul 21, 2024 at 23:48 Wenchen Fan 
>> wrote:
>> > >
>> > >> Classic SGTM.
>> > >>
>> > >> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
>> > >> kabhwan.opensou...@gmail.com> wrote:
>> > >>
>> > >>> I'd propose not to change the name of "Spark Connect" - the name
>> > >>> represents the characteristic of the mode (separation of layer for
>> client
>> > >>> and server). Trying to remove the part of "Connect" would just make
>> > >>> confusion.
>> > >>>
>> > >>> +1 for Classic to existing mode, till someone comes up with better
>> > >>> alternatives.
>> > >>>
>> > >>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon 
>> > >>> wrote:
>> > >>>
>> >  I was thinking about a similar option too but I ended up giving
>> this up
>> >  .. It's quite unlikely at this moment but suppose that we have
>> another
>> >  Spark Connect-ish component in the far future and it would be
>> challenging
>> >  to come up with another name ... Another case is that we might
>> have to cope
>> >  with the cases like Spark Connect, vs Spark (with Spark Connect)
>> and Spark
>> >  (without Spark Connect) ..
>> > 
>> >  On Sun, 21 Jul 2024 at 09:59, Holden Karau > >
>> >  wrote:
>> > 
>> > > I think perhaps Spark Connect could be phrased as “Basic* Spark” &
>> > > existing Spark could be “Full Spark” given the API limitations of
>> Spark
>> > > connect.
>> > >
>> > > *I was also thinking Core here but we’ve used core to refer to
>> the RDD
>> > > APIs for too long to reuse it here.
>> > >
>> > > Twitter: https://twitter.com/holdenkarau
>> > > Books (Learning Spark, High Performance Spark, etc.):
>> > > https://amzn.to/2MaRAG9  
>> > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> > >
>> > >
>> > > On Sat, Jul 20, 2024 at 8:02 PM Xiao Li 
>> wrote:
>> > >
>> > >> Classic is much better than Legacy. : )
>> > >>
>> > >> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>> > >>
>> > >>> Hi all,
>> > >>>
>> > >>> I noticed that we need to standardize our terminology before
>> moving
>> > >>> forward. For instance, when documenting, 'Spark without Spark
>> Connect' is
>> > >>> too long and verbose. Additionally, I've observed that we use
>> various names
>> > >>> for Spark without Spark Connect: Spark Classic, Classic Spark,
>> Legacy
>> > >>> Spark, etc.
>> > >>>
>> > >>> I propose that we consistently refer to it as Spark Classic (vs.
>> > >>> Spark Connect).
>> > >>>
>> > >>> Please share your thoughts on this. Thanks!
>> > >>>
>> > >>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Sadha Chilukoori
Hi Dongjoon,

*To be clear, is the proposal aiming to make us to say like A instead of B
in our documentation?*

*A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
Classic` mode instead.*
*B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
without Spark Connect` mode instead*.


Correct, the thread is recommending to use option A, consistently in all
the documentation.

-Sadha

On Mon, Jul 22, 2024, 10:25 AM Dongjoon Hyun  wrote:

> Thank you for opening this thread, Hyukjin.
>
> In this discussion thread, we have three terminologies, (1) ~ (3).
>
> > Spark Classic (vs. Spark Connect)
>
> 1. Spark
> 2. Spark Classic (= A proposal for Spark without Spark Connect)
> 3. Spark Connect
>
> As Holden and Jungtaek mentioned,
>
> - (1) is definitely the existing code base which includes all (including
> RDD API, Spark Thrift Server, Spark Connect and so on).
>
> - (3) is is a very specific use case to a user when a Spark binary
> distribution is used with `--remote` option (or enabling the related
> features). Like Spark Thrift Server, after query planning steps, there is
> no fundamental difference in the execution code side in Spark clusters or
> Spark jobs.
>
> - (2) By the proposed definition, (2) `Spark Classic` is not (1) `Spark`.
> Like `--remote`, it's one of runnable modes.
>
> To be clear, is the proposal aiming to make us to say like A instead of B
> in our documentation?
>
> A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
> Classic` mode instead.
> B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
> without Spark Connect` mode instead.
>
> Dongjoon.
>
>
>
> On 2024/07/22 12:59:54 Sadha Chilukoori wrote:
> > +1  (non-binding) for classic.
> >
> > On Mon, Jul 22, 2024 at 3:59 AM Martin Grund
> 
> > wrote:
> >
> > > +1 for classic. It's simple, easy to understand and it doesn't have the
> > > negative meanings like legacy for example.
> > >
> > > On Sun, Jul 21, 2024 at 23:48 Wenchen Fan  wrote:
> > >
> > >> Classic SGTM.
> > >>
> > >> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
> > >> kabhwan.opensou...@gmail.com> wrote:
> > >>
> > >>> I'd propose not to change the name of "Spark Connect" - the name
> > >>> represents the characteristic of the mode (separation of layer for
> client
> > >>> and server). Trying to remove the part of "Connect" would just make
> > >>> confusion.
> > >>>
> > >>> +1 for Classic to existing mode, till someone comes up with better
> > >>> alternatives.
> > >>>
> > >>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon 
> > >>> wrote:
> > >>>
> >  I was thinking about a similar option too but I ended up giving
> this up
> >  .. It's quite unlikely at this moment but suppose that we have
> another
> >  Spark Connect-ish component in the far future and it would be
> challenging
> >  to come up with another name ... Another case is that we might have
> to cope
> >  with the cases like Spark Connect, vs Spark (with Spark Connect)
> and Spark
> >  (without Spark Connect) ..
> > 
> >  On Sun, 21 Jul 2024 at 09:59, Holden Karau 
> >  wrote:
> > 
> > > I think perhaps Spark Connect could be phrased as “Basic* Spark” &
> > > existing Spark could be “Full Spark” given the API limitations of
> Spark
> > > connect.
> > >
> > > *I was also thinking Core here but we’ve used core to refer to the
> RDD
> > > APIs for too long to reuse it here.
> > >
> > > Twitter: https://twitter.com/holdenkarau
> > > Books (Learning Spark, High Performance Spark, etc.):
> > > https://amzn.to/2MaRAG9  
> > > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > >
> > >
> > > On Sat, Jul 20, 2024 at 8:02 PM Xiao Li 
> wrote:
> > >
> > >> Classic is much better than Legacy. : )
> > >>
> > >> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I noticed that we need to standardize our terminology before
> moving
> > >>> forward. For instance, when documenting, 'Spark without Spark
> Connect' is
> > >>> too long and verbose. Additionally, I've observed that we use
> various names
> > >>> for Spark without Spark Connect: Spark Classic, Classic Spark,
> Legacy
> > >>> Spark, etc.
> > >>>
> > >>> I propose that we consistently refer to it as Spark Classic (vs.
> > >>> Spark Connect).
> > >>>
> > >>> Please share your thoughts on this. Thanks!
> > >>>
> > >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Dongjoon Hyun
Thank you for opening this thread, Hyukjin.

In this discussion thread, we have three terminologies, (1) ~ (3).

> Spark Classic (vs. Spark Connect)

1. Spark
2. Spark Classic (= A proposal for Spark without Spark Connect)
3. Spark Connect

As Holden and Jungtaek mentioned, 

- (1) is definitely the existing code base which includes all (including RDD 
API, Spark Thrift Server, Spark Connect and so on). 

- (3) is is a very specific use case to a user when a Spark binary distribution 
is used with `--remote` option (or enabling the related features). Like Spark 
Thrift Server, after query planning steps, there is no fundamental difference 
in the execution code side in Spark clusters or Spark jobs.

- (2) By the proposed definition, (2) `Spark Classic` is not (1) `Spark`. Like 
`--remote`, it's one of runnable modes.

To be clear, is the proposal aiming to make us to say like A instead of B in 
our documentation?

A. Since `Spark Connect` mode has no RDD API, we need to use `Spark Classic` 
mode instead.
B. Since `Spark Connect` mode has no RDD API, we need to use `Spark without 
Spark Connect` mode instead.

Dongjoon.



On 2024/07/22 12:59:54 Sadha Chilukoori wrote:
> +1  (non-binding) for classic.
> 
> On Mon, Jul 22, 2024 at 3:59 AM Martin Grund 
> wrote:
> 
> > +1 for classic. It's simple, easy to understand and it doesn't have the
> > negative meanings like legacy for example.
> >
> > On Sun, Jul 21, 2024 at 23:48 Wenchen Fan  wrote:
> >
> >> Classic SGTM.
> >>
> >> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
> >> kabhwan.opensou...@gmail.com> wrote:
> >>
> >>> I'd propose not to change the name of "Spark Connect" - the name
> >>> represents the characteristic of the mode (separation of layer for client
> >>> and server). Trying to remove the part of "Connect" would just make
> >>> confusion.
> >>>
> >>> +1 for Classic to existing mode, till someone comes up with better
> >>> alternatives.
> >>>
> >>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon 
> >>> wrote:
> >>>
>  I was thinking about a similar option too but I ended up giving this up
>  .. It's quite unlikely at this moment but suppose that we have another
>  Spark Connect-ish component in the far future and it would be challenging
>  to come up with another name ... Another case is that we might have to 
>  cope
>  with the cases like Spark Connect, vs Spark (with Spark Connect) and 
>  Spark
>  (without Spark Connect) ..
> 
>  On Sun, 21 Jul 2024 at 09:59, Holden Karau 
>  wrote:
> 
> > I think perhaps Spark Connect could be phrased as “Basic* Spark” &
> > existing Spark could be “Full Spark” given the API limitations of Spark
> > connect.
> >
> > *I was also thinking Core here but we’ve used core to refer to the RDD
> > APIs for too long to reuse it here.
> >
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> > https://amzn.to/2MaRAG9  
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
> >
> > On Sat, Jul 20, 2024 at 8:02 PM Xiao Li  wrote:
> >
> >> Classic is much better than Legacy. : )
> >>
> >> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
> >>
> >>> Hi all,
> >>>
> >>> I noticed that we need to standardize our terminology before moving
> >>> forward. For instance, when documenting, 'Spark without Spark 
> >>> Connect' is
> >>> too long and verbose. Additionally, I've observed that we use various 
> >>> names
> >>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
> >>> Spark, etc.
> >>>
> >>> I propose that we consistently refer to it as Spark Classic (vs.
> >>> Spark Connect).
> >>>
> >>> Please share your thoughts on this. Thanks!
> >>>
> >>
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Sadha Chilukoori
+1  (non-binding) for classic.

On Mon, Jul 22, 2024 at 3:59 AM Martin Grund 
wrote:

> +1 for classic. It's simple, easy to understand and it doesn't have the
> negative meanings like legacy for example.
>
> On Sun, Jul 21, 2024 at 23:48 Wenchen Fan  wrote:
>
>> Classic SGTM.
>>
>> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> I'd propose not to change the name of "Spark Connect" - the name
>>> represents the characteristic of the mode (separation of layer for client
>>> and server). Trying to remove the part of "Connect" would just make
>>> confusion.
>>>
>>> +1 for Classic to existing mode, till someone comes up with better
>>> alternatives.
>>>
>>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon 
>>> wrote:
>>>
 I was thinking about a similar option too but I ended up giving this up
 .. It's quite unlikely at this moment but suppose that we have another
 Spark Connect-ish component in the far future and it would be challenging
 to come up with another name ... Another case is that we might have to cope
 with the cases like Spark Connect, vs Spark (with Spark Connect) and Spark
 (without Spark Connect) ..

 On Sun, 21 Jul 2024 at 09:59, Holden Karau 
 wrote:

> I think perhaps Spark Connect could be phrased as “Basic* Spark” &
> existing Spark could be “Full Spark” given the API limitations of Spark
> connect.
>
> *I was also thinking Core here but we’ve used core to refer to the RDD
> APIs for too long to reuse it here.
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Sat, Jul 20, 2024 at 8:02 PM Xiao Li  wrote:
>
>> Classic is much better than Legacy. : )
>>
>> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>>
>>> Hi all,
>>>
>>> I noticed that we need to standardize our terminology before moving
>>> forward. For instance, when documenting, 'Spark without Spark Connect' 
>>> is
>>> too long and verbose. Additionally, I've observed that we use various 
>>> names
>>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
>>> Spark, etc.
>>>
>>> I propose that we consistently refer to it as Spark Classic (vs.
>>> Spark Connect).
>>>
>>> Please share your thoughts on this. Thanks!
>>>
>>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Martin Grund
+1 for classic. It's simple, easy to understand and it doesn't have the
negative meanings like legacy for example.

On Sun, Jul 21, 2024 at 23:48 Wenchen Fan  wrote:

> Classic SGTM.
>
> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim 
> wrote:
>
>> I'd propose not to change the name of "Spark Connect" - the name
>> represents the characteristic of the mode (separation of layer for client
>> and server). Trying to remove the part of "Connect" would just make
>> confusion.
>>
>> +1 for Classic to existing mode, till someone comes up with better
>> alternatives.
>>
>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon 
>> wrote:
>>
>>> I was thinking about a similar option too but I ended up giving this up
>>> .. It's quite unlikely at this moment but suppose that we have another
>>> Spark Connect-ish component in the far future and it would be challenging
>>> to come up with another name ... Another case is that we might have to cope
>>> with the cases like Spark Connect, vs Spark (with Spark Connect) and Spark
>>> (without Spark Connect) ..
>>>
>>> On Sun, 21 Jul 2024 at 09:59, Holden Karau 
>>> wrote:
>>>
 I think perhaps Spark Connect could be phrased as “Basic* Spark” &
 existing Spark could be “Full Spark” given the API limitations of Spark
 connect.

 *I was also thinking Core here but we’ve used core to refer to the RDD
 APIs for too long to reuse it here.

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau


 On Sat, Jul 20, 2024 at 8:02 PM Xiao Li  wrote:

> Classic is much better than Legacy. : )
>
> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>
>> Hi all,
>>
>> I noticed that we need to standardize our terminology before moving
>> forward. For instance, when documenting, 'Spark without Spark Connect' is
>> too long and verbose. Additionally, I've observed that we use various 
>> names
>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
>> Spark, etc.
>>
>> I propose that we consistently refer to it as Spark Classic (vs.
>> Spark Connect).
>>
>> Please share your thoughts on this. Thanks!
>>
>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Wenchen Fan
Classic SGTM.

On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim 
wrote:

> I'd propose not to change the name of "Spark Connect" - the name
> represents the characteristic of the mode (separation of layer for client
> and server). Trying to remove the part of "Connect" would just make
> confusion.
>
> +1 for Classic to existing mode, till someone comes up with better
> alternatives.
>
> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon  wrote:
>
>> I was thinking about a similar option too but I ended up giving this up
>> .. It's quite unlikely at this moment but suppose that we have another
>> Spark Connect-ish component in the far future and it would be challenging
>> to come up with another name ... Another case is that we might have to cope
>> with the cases like Spark Connect, vs Spark (with Spark Connect) and Spark
>> (without Spark Connect) ..
>>
>> On Sun, 21 Jul 2024 at 09:59, Holden Karau 
>> wrote:
>>
>>> I think perhaps Spark Connect could be phrased as “Basic* Spark” &
>>> existing Spark could be “Full Spark” given the API limitations of Spark
>>> connect.
>>>
>>> *I was also thinking Core here but we’ve used core to refer to the RDD
>>> APIs for too long to reuse it here.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Sat, Jul 20, 2024 at 8:02 PM Xiao Li  wrote:
>>>
 Classic is much better than Legacy. : )

 Hyukjin Kwon  于2024年7月18日周四 16:58写道:

> Hi all,
>
> I noticed that we need to standardize our terminology before moving
> forward. For instance, when documenting, 'Spark without Spark Connect' is
> too long and verbose. Additionally, I've observed that we use various 
> names
> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
> Spark, etc.
>
> I propose that we consistently refer to it as Spark Classic (vs. Spark
> Connect).
>
> Please share your thoughts on this. Thanks!
>



Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-21 Thread Jungtaek Lim
I'd propose not to change the name of "Spark Connect" - the name represents
the characteristic of the mode (separation of layer for client and server).
Trying to remove the part of "Connect" would just make confusion.

+1 for Classic to existing mode, till someone comes up with better
alternatives.

On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon  wrote:

> I was thinking about a similar option too but I ended up giving this up ..
> It's quite unlikely at this moment but suppose that we have another Spark
> Connect-ish component in the far future and it would be challenging to come
> up with another name ... Another case is that we might have to cope with
> the cases like Spark Connect, vs Spark (with Spark Connect) and Spark
> (without Spark Connect) ..
>
> On Sun, 21 Jul 2024 at 09:59, Holden Karau  wrote:
>
>> I think perhaps Spark Connect could be phrased as “Basic* Spark” &
>> existing Spark could be “Full Spark” given the API limitations of Spark
>> connect.
>>
>> *I was also thinking Core here but we’ve used core to refer to the RDD
>> APIs for too long to reuse it here.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Sat, Jul 20, 2024 at 8:02 PM Xiao Li  wrote:
>>
>>> Classic is much better than Legacy. : )
>>>
>>> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>>>
 Hi all,

 I noticed that we need to standardize our terminology before moving
 forward. For instance, when documenting, 'Spark without Spark Connect' is
 too long and verbose. Additionally, I've observed that we use various names
 for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
 Spark, etc.

 I propose that we consistently refer to it as Spark Classic (vs. Spark
 Connect).

 Please share your thoughts on this. Thanks!

>>>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-21 Thread Hyukjin Kwon
I was thinking about a similar option too but I ended up giving this up ..
It's quite unlikely at this moment but suppose that we have another Spark
Connect-ish component in the far future and it would be challenging to come
up with another name ... Another case is that we might have to cope with
the cases like Spark Connect, vs Spark (with Spark Connect) and Spark
(without Spark Connect) ..

On Sun, 21 Jul 2024 at 09:59, Holden Karau  wrote:

> I think perhaps Spark Connect could be phrased as “Basic* Spark” &
> existing Spark could be “Full Spark” given the API limitations of Spark
> connect.
>
> *I was also thinking Core here but we’ve used core to refer to the RDD
> APIs for too long to reuse it here.
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Sat, Jul 20, 2024 at 8:02 PM Xiao Li  wrote:
>
>> Classic is much better than Legacy. : )
>>
>> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>>
>>> Hi all,
>>>
>>> I noticed that we need to standardize our terminology before moving
>>> forward. For instance, when documenting, 'Spark without Spark Connect' is
>>> too long and verbose. Additionally, I've observed that we use various names
>>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
>>> Spark, etc.
>>>
>>> I propose that we consistently refer to it as Spark Classic (vs. Spark
>>> Connect).
>>>
>>> Please share your thoughts on this. Thanks!
>>>
>>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread John Zhuge
+1 for Classic

On Sat, Jul 20, 2024 at 9:58 PM Denny Lee  wrote:

> +1 for 'Classic' as well :)
>
> On Sun, Jul 21, 2024 at 10:15 AM Ruifeng Zheng 
> wrote:
>
>> +1 for 'Classic'
>>
>> On Sun, Jul 21, 2024 at 8:03 AM Xiao Li  wrote:
>>
>>> Classic is much better than Legacy. : )
>>>
>>> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>>>
 Hi all,

 I noticed that we need to standardize our terminology before moving
 forward. For instance, when documenting, 'Spark without Spark Connect' is
 too long and verbose. Additionally, I've observed that we use various names
 for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
 Spark, etc.

 I propose that we consistently refer to it as Spark Classic (vs. Spark
 Connect).

 Please share your thoughts on this. Thanks!

>>>

-- 
John Zhuge


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Ye Xianjin
+1 for ClassicSent from my iPhoneOn Jul 21, 2024, at 10:15 AM, Ruifeng Zheng  wrote:+1 for 'Classic'On Sun, Jul 21, 2024 at 8:03 AM Xiao Li  wrote:Classic is much better than Legacy. : ) Hyukjin Kwon  于2024年7月18日周四 16:58写道:Hi all,I noticed that we need to standardize our terminology before moving forward. For instance, when documenting, 'Spark without Spark Connect' is too long and verbose. Additionally, I've observed that we use various names for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy Spark, etc.I propose that we consistently refer to it as Spark Classic (vs. Spark Connect).Please share your thoughts on this. Thanks!




Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Denny Lee
+1 for 'Classic' as well :)

On Sun, Jul 21, 2024 at 10:15 AM Ruifeng Zheng  wrote:

> +1 for 'Classic'
>
> On Sun, Jul 21, 2024 at 8:03 AM Xiao Li  wrote:
>
>> Classic is much better than Legacy. : )
>>
>> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>>
>>> Hi all,
>>>
>>> I noticed that we need to standardize our terminology before moving
>>> forward. For instance, when documenting, 'Spark without Spark Connect' is
>>> too long and verbose. Additionally, I've observed that we use various names
>>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
>>> Spark, etc.
>>>
>>> I propose that we consistently refer to it as Spark Classic (vs. Spark
>>> Connect).
>>>
>>> Please share your thoughts on this. Thanks!
>>>
>>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Ruifeng Zheng
+1 for 'Classic'

On Sun, Jul 21, 2024 at 8:03 AM Xiao Li  wrote:

> Classic is much better than Legacy. : )
>
> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>
>> Hi all,
>>
>> I noticed that we need to standardize our terminology before moving
>> forward. For instance, when documenting, 'Spark without Spark Connect' is
>> too long and verbose. Additionally, I've observed that we use various names
>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
>> Spark, etc.
>>
>> I propose that we consistently refer to it as Spark Classic (vs. Spark
>> Connect).
>>
>> Please share your thoughts on this. Thanks!
>>
>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Holden Karau
I think perhaps Spark Connect could be phrased as “Basic* Spark” & existing
Spark could be “Full Spark” given the API limitations of Spark connect.

*I was also thinking Core here but we’ve used core to refer to the RDD APIs
for too long to reuse it here.

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Sat, Jul 20, 2024 at 8:02 PM Xiao Li  wrote:

> Classic is much better than Legacy. : )
>
> Hyukjin Kwon  于2024年7月18日周四 16:58写道:
>
>> Hi all,
>>
>> I noticed that we need to standardize our terminology before moving
>> forward. For instance, when documenting, 'Spark without Spark Connect' is
>> too long and verbose. Additionally, I've observed that we use various names
>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
>> Spark, etc.
>>
>> I propose that we consistently refer to it as Spark Classic (vs. Spark
>> Connect).
>>
>> Please share your thoughts on this. Thanks!
>>
>


Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Xiao Li
Classic is much better than Legacy. : )

Hyukjin Kwon  于2024年7月18日周四 16:58写道:

> Hi all,
>
> I noticed that we need to standardize our terminology before moving
> forward. For instance, when documenting, 'Spark without Spark Connect' is
> too long and verbose. Additionally, I've observed that we use various names
> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
> Spark, etc.
>
> I propose that we consistently refer to it as Spark Classic (vs. Spark
> Connect).
>
> Please share your thoughts on this. Thanks!
>


Re: [VOTE] Release Spark 3.5.2 (RC1)

2024-07-19 Thread Kent Yao
Thank you, Huaxin and L. C. Hsieh, for your input.

We shall also include PRs like
https://github.com/apache/spark/pull/47412 for correctness

So, 3.5.2-RC1 failed, I will start RC2 in two or three days.


Kent Yao

L. C. Hsieh  于2024年7月19日周五 13:02写道:
>
> I also support -1 to include the fix.
>
> On Thu, Jul 18, 2024 at 8:46 PM huaxin gao  wrote:
> >
> > -1 because we need to include this fix 
> > https://github.com/apache/spark/pull/47406
> >
> > On Thu, Jul 18, 2024 at 4:01 AM Kent Yao  wrote:
> >>
> >> Thank you Wenchen.
> >>
> >> The vote is open until Jul 21, 11 AM UTC. Considering that the
> >> deadline falls on a weekend, the results might be counted on the
> >> following Monday.
> >>
> >> Bests,
> >>
> >> Kent Yao
> >>
> >> Wenchen Fan  于2024年7月18日周四 18:54写道:
> >> >
> >> > > The vote is open until Jul 18
> >> >
> >> > Is it a typo? It's July 18 today.
> >> >
> >> > On Thu, Jul 18, 2024 at 6:30 PM Kent Yao  wrote:
> >> >>
> >> >> Hi dev,
> >> >>
> >> >> Please vote on releasing the following candidate as Apache Spark 
> >> >> version 3.5.2.
> >> >>
> >> >> The vote is open until Jul 18, 11 AM UTC, and passes if a majority +1
> >> >> PMC votes are cast, with
> >> >> a minimum of 3 +1 votes.
> >> >>
> >> >> [ ] +1 Release this package as Apache Spark 3.5.2
> >> >> [ ] -1 Do not release this package because ...
> >> >>
> >> >> To learn more about Apache Spark, please see https://spark.apache.org/
> >> >>
> >> >> The tag to be voted on is v3.5.2-rc1 (commit
> >> >> b1510127df952af90b7971684b91f6c0e804de24):
> >> >> https://github.com/apache/spark/tree/v3.5.2-rc1
> >> >>
> >> >> The release files, including signatures, digests, etc. can be found at:
> >> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/
> >> >>
> >> >> Signatures used for Spark RCs can be found in this file:
> >> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >>
> >> >> The staging repository for this release can be found at:
> >> >> https://repository.apache.org/content/repositories/orgapachespark-1457/
> >> >>
> >> >> The documentation corresponding to this release can be found at:
> >> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-docs/
> >> >>
> >> >> The list of bug fixes going into 3.5.2 can be found at the following 
> >> >> URL:
> >> >> https://issues.apache.org/jira/projects/SPARK/versions/12353980
> >> >>
> >> >> FAQ
> >> >>
> >> >> =
> >> >> How can I help test this release?
> >> >> =
> >> >>
> >> >> If you are a Spark user, you can help us test this release by taking
> >> >> an existing Spark workload and running on this release candidate, then
> >> >> reporting any regressions.
> >> >>
> >> >> If you're working in PySpark you can set up a virtual env and install
> >> >> the current RC via "pip install
> >> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/pyspark-3.5.2.tar.gz;
> >> >> and see if anything important breaks.
> >> >> In the Java/Scala, you can add the staging repository to your projects
> >> >> resolvers and test
> >> >> with the RC (make sure to clean up the artifact cache before/after so
> >> >> you don't end up building with an out of date RC going forward).
> >> >>
> >> >> ===
> >> >> What should happen to JIRA tickets still targeting 3.5.2?
> >> >> ===
> >> >>
> >> >> The current list of open tickets targeted at 3.5.2 can be found at:
> >> >> https://issues.apache.org/jira/projects/SPARK and search for
> >> >> "Target Version/s" = 3.5.2
> >> >>
> >> >> Committers should look at those and triage. Extremely important bug
> >> >> fixes, documentation, and API tweaks that impact compatibility should
> >> >> be worked on immediately. Everything else please retarget to an
> >> >> appropriate release.
> >> >>
> >> >> ==
> >> >> But my bug isn't fixed?
> >> >> ==
> >> >>
> >> >> In order to make timely releases, we will typically not hold the
> >> >> release unless the bug in question is a regression from the previous
> >> >> release. That being said, if there is something which is a regression
> >> >> that has not been correctly targeted please ping me or a committer to
> >> >> help target the issue.
> >> >>
> >> >> Thanks,
> >> >> Kent Yao
> >> >>
> >> >> -
> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >> >>
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.5.2 (RC1)

2024-07-18 Thread L. C. Hsieh
I also support -1 to include the fix.

On Thu, Jul 18, 2024 at 8:46 PM huaxin gao  wrote:
>
> -1 because we need to include this fix 
> https://github.com/apache/spark/pull/47406
>
> On Thu, Jul 18, 2024 at 4:01 AM Kent Yao  wrote:
>>
>> Thank you Wenchen.
>>
>> The vote is open until Jul 21, 11 AM UTC. Considering that the
>> deadline falls on a weekend, the results might be counted on the
>> following Monday.
>>
>> Bests,
>>
>> Kent Yao
>>
>> Wenchen Fan  于2024年7月18日周四 18:54写道:
>> >
>> > > The vote is open until Jul 18
>> >
>> > Is it a typo? It's July 18 today.
>> >
>> > On Thu, Jul 18, 2024 at 6:30 PM Kent Yao  wrote:
>> >>
>> >> Hi dev,
>> >>
>> >> Please vote on releasing the following candidate as Apache Spark version 
>> >> 3.5.2.
>> >>
>> >> The vote is open until Jul 18, 11 AM UTC, and passes if a majority +1
>> >> PMC votes are cast, with
>> >> a minimum of 3 +1 votes.
>> >>
>> >> [ ] +1 Release this package as Apache Spark 3.5.2
>> >> [ ] -1 Do not release this package because ...
>> >>
>> >> To learn more about Apache Spark, please see https://spark.apache.org/
>> >>
>> >> The tag to be voted on is v3.5.2-rc1 (commit
>> >> b1510127df952af90b7971684b91f6c0e804de24):
>> >> https://github.com/apache/spark/tree/v3.5.2-rc1
>> >>
>> >> The release files, including signatures, digests, etc. can be found at:
>> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/
>> >>
>> >> Signatures used for Spark RCs can be found in this file:
>> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>
>> >> The staging repository for this release can be found at:
>> >> https://repository.apache.org/content/repositories/orgapachespark-1457/
>> >>
>> >> The documentation corresponding to this release can be found at:
>> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-docs/
>> >>
>> >> The list of bug fixes going into 3.5.2 can be found at the following URL:
>> >> https://issues.apache.org/jira/projects/SPARK/versions/12353980
>> >>
>> >> FAQ
>> >>
>> >> =
>> >> How can I help test this release?
>> >> =
>> >>
>> >> If you are a Spark user, you can help us test this release by taking
>> >> an existing Spark workload and running on this release candidate, then
>> >> reporting any regressions.
>> >>
>> >> If you're working in PySpark you can set up a virtual env and install
>> >> the current RC via "pip install
>> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/pyspark-3.5.2.tar.gz;
>> >> and see if anything important breaks.
>> >> In the Java/Scala, you can add the staging repository to your projects
>> >> resolvers and test
>> >> with the RC (make sure to clean up the artifact cache before/after so
>> >> you don't end up building with an out of date RC going forward).
>> >>
>> >> ===
>> >> What should happen to JIRA tickets still targeting 3.5.2?
>> >> ===
>> >>
>> >> The current list of open tickets targeted at 3.5.2 can be found at:
>> >> https://issues.apache.org/jira/projects/SPARK and search for
>> >> "Target Version/s" = 3.5.2
>> >>
>> >> Committers should look at those and triage. Extremely important bug
>> >> fixes, documentation, and API tweaks that impact compatibility should
>> >> be worked on immediately. Everything else please retarget to an
>> >> appropriate release.
>> >>
>> >> ==
>> >> But my bug isn't fixed?
>> >> ==
>> >>
>> >> In order to make timely releases, we will typically not hold the
>> >> release unless the bug in question is a regression from the previous
>> >> release. That being said, if there is something which is a regression
>> >> that has not been correctly targeted please ping me or a committer to
>> >> help target the issue.
>> >>
>> >> Thanks,
>> >> Kent Yao
>> >>
>> >> -
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.5.2 (RC1)

2024-07-18 Thread huaxin gao
-1 because we need to include this fix
https://github.com/apache/spark/pull/47406

On Thu, Jul 18, 2024 at 4:01 AM Kent Yao  wrote:

> Thank you Wenchen.
>
> The vote is open until Jul 21, 11 AM UTC. Considering that the
> deadline falls on a weekend, the results might be counted on the
> following Monday.
>
> Bests,
>
> Kent Yao
>
> Wenchen Fan  于2024年7月18日周四 18:54写道:
> >
> > > The vote is open until Jul 18
> >
> > Is it a typo? It's July 18 today.
> >
> > On Thu, Jul 18, 2024 at 6:30 PM Kent Yao  wrote:
> >>
> >> Hi dev,
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 3.5.2.
> >>
> >> The vote is open until Jul 18, 11 AM UTC, and passes if a majority +1
> >> PMC votes are cast, with
> >> a minimum of 3 +1 votes.
> >>
> >> [ ] +1 Release this package as Apache Spark 3.5.2
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see https://spark.apache.org/
> >>
> >> The tag to be voted on is v3.5.2-rc1 (commit
> >> b1510127df952af90b7971684b91f6c0e804de24):
> >> https://github.com/apache/spark/tree/v3.5.2-rc1
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1457/
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-docs/
> >>
> >> The list of bug fixes going into 3.5.2 can be found at the following
> URL:
> >> https://issues.apache.org/jira/projects/SPARK/versions/12353980
> >>
> >> FAQ
> >>
> >> =
> >> How can I help test this release?
> >> =
> >>
> >> If you are a Spark user, you can help us test this release by taking
> >> an existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >> the current RC via "pip install
> >>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/pyspark-3.5.2.tar.gz
> "
> >> and see if anything important breaks.
> >> In the Java/Scala, you can add the staging repository to your projects
> >> resolvers and test
> >> with the RC (make sure to clean up the artifact cache before/after so
> >> you don't end up building with an out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 3.5.2?
> >> ===
> >>
> >> The current list of open tickets targeted at 3.5.2 can be found at:
> >> https://issues.apache.org/jira/projects/SPARK and search for
> >> "Target Version/s" = 3.5.2
> >>
> >> Committers should look at those and triage. Extremely important bug
> >> fixes, documentation, and API tweaks that impact compatibility should
> >> be worked on immediately. Everything else please retarget to an
> >> appropriate release.
> >>
> >> ==
> >> But my bug isn't fixed?
> >> ==
> >>
> >> In order to make timely releases, we will typically not hold the
> >> release unless the bug in question is a regression from the previous
> >> release. That being said, if there is something which is a regression
> >> that has not been correctly targeted please ping me or a committer to
> >> help target the issue.
> >>
> >> Thanks,
> >> Kent Yao
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.5.2 (RC1)

2024-07-18 Thread Kent Yao
Thank you Wenchen.

The vote is open until Jul 21, 11 AM UTC. Considering that the
deadline falls on a weekend, the results might be counted on the
following Monday.

Bests,

Kent Yao

Wenchen Fan  于2024年7月18日周四 18:54写道:
>
> > The vote is open until Jul 18
>
> Is it a typo? It's July 18 today.
>
> On Thu, Jul 18, 2024 at 6:30 PM Kent Yao  wrote:
>>
>> Hi dev,
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 3.5.2.
>>
>> The vote is open until Jul 18, 11 AM UTC, and passes if a majority +1
>> PMC votes are cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.5.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v3.5.2-rc1 (commit
>> b1510127df952af90b7971684b91f6c0e804de24):
>> https://github.com/apache/spark/tree/v3.5.2-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1457/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-docs/
>>
>> The list of bug fixes going into 3.5.2 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12353980
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC via "pip install
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/pyspark-3.5.2.tar.gz;
>> and see if anything important breaks.
>> In the Java/Scala, you can add the staging repository to your projects
>> resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.5.2?
>> ===
>>
>> The current list of open tickets targeted at 3.5.2 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for
>> "Target Version/s" = 3.5.2
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Thanks,
>> Kent Yao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.5.2 (RC1)

2024-07-18 Thread Wenchen Fan
> The vote is open until Jul 18

Is it a typo? It's July 18 today.

On Thu, Jul 18, 2024 at 6:30 PM Kent Yao  wrote:

> Hi dev,
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.2.
>
> The vote is open until Jul 18, 11 AM UTC, and passes if a majority +1
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.5.2-rc1 (commit
> b1510127df952af90b7971684b91f6c0e804de24):
> https://github.com/apache/spark/tree/v3.5.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1457/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-docs/
>
> The list of bug fixes going into 3.5.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353980
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc1-bin/pyspark-3.5.2.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.5.2?
> ===
>
> The current list of open tickets targeted at 3.5.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for
> "Target Version/s" = 3.5.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Kent Yao
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [Issue] Spark SQL - broadcast failure

2024-07-16 Thread Mich Talebzadeh
It will help if you mention the Spark version and the piece of problematic
code

HTH

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Tue, 16 Jul 2024 at 08:51, Sudharshan V 
wrote:

>
> On Mon, 8 Jul, 2024, 7:53 pm Sudharshan V, 
> wrote:
>
>> Hi all,
>>
>> Been facing a weird issue lately.
>> In our production code base , we have an explicit broadcast for a small
>> table.
>> It is just a look up table that is around 1gb in size in s3 and just had
>> few million records and 5 columns.
>>
>> The ETL was running fine , but with no change from the codebase nor the
>> infrastructure, we are getting broadcast failures. Even weird fact is the
>> older size of the data is 1.4gb while for the new run is just 900 MB
>>
>> Below is the error message
>> Cannot broadcast table that is larger than 8 GB : 8GB.
>>
>> I find it extremely weird considering that the data size is very well
>> under the thresholds.
>>
>> Are there any other ways to find what could be the issue and how we can
>> rectify this issue?
>>
>> Could the data characteristics be an issue?
>>
>> Any help would be immensely appreciated.
>>
>> Thanks
>>
>


Re: [Issue] Spark SQL - broadcast failure

2024-07-16 Thread Sudharshan V
On Mon, 8 Jul, 2024, 7:53 pm Sudharshan V, 
wrote:

> Hi all,
>
> Been facing a weird issue lately.
> In our production code base , we have an explicit broadcast for a small
> table.
> It is just a look up table that is around 1gb in size in s3 and just had
> few million records and 5 columns.
>
> The ETL was running fine , but with no change from the codebase nor the
> infrastructure, we are getting broadcast failures. Even weird fact is the
> older size of the data is 1.4gb while for the new run is just 900 MB
>
> Below is the error message
> Cannot broadcast table that is larger than 8 GB : 8GB.
>
> I find it extremely weird considering that the data size is very well
> under the thresholds.
>
> Are there any other ways to find what could be the issue and how we can
> rectify this issue?
>
> Could the data characteristics be an issue?
>
> Any help would be immensely appreciated.
>
> Thanks
>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-14 Thread Mich Talebzadeh
I was looking at this email trail and the original one raised by Martin
Grund. I too agree that mistakes can and do happen.

On my part, kudos to Martin for raising the issue and . @Hyukjin Kwon
  for quick action that helped avoid potential
delays. Thanks both.

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sat, 13 Jul 2024 at 23:14, Hyukjin Kwon  wrote:

> 
>
> On Sun, Jul 14, 2024 at 1:07 AM Holden Karau 
> wrote:
>
>> Thank you :)
>>
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Sat, Jul 13, 2024 at 1:37 AM Hyukjin Kwon 
>> wrote:
>>
>>> Reverted, and opened a new one
>>> https://github.com/apache/spark/pull/47341.
>>>
>>> On Sat, 13 Jul 2024 at 15:40, Hyukjin Kwon  wrote:
>>>
 Yeah that's fine. I'll revert and open a fresh PR including my own
 followup when I get back home later today.

 On Sat, Jul 13, 2024 at 3:08 PM Holden Karau 
 wrote:

> Even if the change is reasonable (and I can see arguments both ways),
> it's important that we follow the process we agreed on. Merging a PR
> without discussion* in ~ 2 hours from the initial proposal is not enough
> time to reach a lazy consensus. If it was a small bug-fix I could
> understand but this was a non-trivial change.
>
>
> * It was approved by another committer but without any discussion, and
> the approver & code author work for the same employer mentioned as the
> justification for the change.
>
> On Fri, Jul 12, 2024 at 6:42 PM Hyukjin Kwon 
> wrote:
>
>> I think we should have not mentioned a specific vendor there. The
>> change also shouldn't repartition. We should create a partition 1.
>>
>> But in general leveraging Catalyst optimizer and SQL engine there is
>> a good idea as we can leverage all optimization there. For example, it 
>> will
>> use UTF8 encoding instead of a plan string ser/de. We made similar 
>> changes
>> in JSON and CSV schema inference (it was an RDD before)
>>
>> On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
>> wrote:
>>
>>> My bad I meant to say I believe the provided justification is
>>> inappropriate.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
>>> wrote:
>>>
 So looking at the PR it does not appear to be removing any RDD APIs
 but the justification provided for changing the ML backend to use the
 DataFrame APIs is indeed concerning.

 This PR appears to have been merged without proper review (or
 providing an opportunity for review).

 I’d like to remind people of the expectations we decided on
 together —
 https://spark.apache.org/committers.html

 I believe the provided justification for the change and would ask
 that we revert this PR so that a proper discussion can take place.

 “
 In databricks runtime, RDD read / write API has some issue for
 certain storage types that requires the account key, but Dataframe 
 read /
 write API works.
 “

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>

 On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
  wrote:

> I took a quick look at the PR and would like to understand your
> concern better about:
>
> >  SparkSession is heavier than SparkContext
>
> It looks like the PR is using the active SparkSession, not
> creating a new one etc. I would highly appreciate it if you could 
> help me
> understand this situation better.

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-13 Thread Hyukjin Kwon


On Sun, Jul 14, 2024 at 1:07 AM Holden Karau  wrote:

> Thank you :)
>
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Sat, Jul 13, 2024 at 1:37 AM Hyukjin Kwon  wrote:
>
>> Reverted, and opened a new one https://github.com/apache/spark/pull/47341
>> .
>>
>> On Sat, 13 Jul 2024 at 15:40, Hyukjin Kwon  wrote:
>>
>>> Yeah that's fine. I'll revert and open a fresh PR including my own
>>> followup when I get back home later today.
>>>
>>> On Sat, Jul 13, 2024 at 3:08 PM Holden Karau 
>>> wrote:
>>>
 Even if the change is reasonable (and I can see arguments both ways),
 it's important that we follow the process we agreed on. Merging a PR
 without discussion* in ~ 2 hours from the initial proposal is not enough
 time to reach a lazy consensus. If it was a small bug-fix I could
 understand but this was a non-trivial change.


 * It was approved by another committer but without any discussion, and
 the approver & code author work for the same employer mentioned as the
 justification for the change.

 On Fri, Jul 12, 2024 at 6:42 PM Hyukjin Kwon 
 wrote:

> I think we should have not mentioned a specific vendor there. The
> change also shouldn't repartition. We should create a partition 1.
>
> But in general leveraging Catalyst optimizer and SQL engine there is a
> good idea as we can leverage all optimization there. For example, it will
> use UTF8 encoding instead of a plan string ser/de. We made similar changes
> in JSON and CSV schema inference (it was an RDD before)
>
> On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
> wrote:
>
>> My bad I meant to say I believe the provided justification is
>> inappropriate.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
>> wrote:
>>
>>> So looking at the PR it does not appear to be removing any RDD APIs
>>> but the justification provided for changing the ML backend to use the
>>> DataFrame APIs is indeed concerning.
>>>
>>> This PR appears to have been merged without proper review (or
>>> providing an opportunity for review).
>>>
>>> I’d like to remind people of the expectations we decided on together
>>> —
>>> https://spark.apache.org/committers.html
>>>
>>> I believe the provided justification for the change and would ask
>>> that we revert this PR so that a proper discussion can take place.
>>>
>>> “
>>> In databricks runtime, RDD read / write API has some issue for
>>> certain storage types that requires the account key, but Dataframe read 
>>> /
>>> write API works.
>>> “
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>>>
>>> On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
>>>  wrote:
>>>
 I took a quick look at the PR and would like to understand your
 concern better about:

 >  SparkSession is heavier than SparkContext

 It looks like the PR is using the active SparkSession, not creating
 a new one etc. I would highly appreciate it if you could help me 
 understand
 this situation better.

 Thanks a lot!

 On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:

> Hi, All.
>
> Apache Spark's RDD API plays an essential and invaluable role from
> the beginning and it will be even if it's not supported by Spark 
> Connect.
>
> I have a concern about a recent activity which replaces RDD with
> SparkSession blindly.
>
> For instance,
>
> https://github.com/apache/spark/pull/47328
> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
> Dataframe read / write API
>
> This PR doesn't look proper to me in two ways.
> - SparkSession is heavier than SparkContext
> - According to the following PR description, the background is
> also hidden in the community.
>
>   > # Why are the changes needed?
>   > In databricks runtime, RDD read / write API has some issue for
> certain storage types
>   > that requires the account key, but Dataframe read / write API
> works.
>

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-13 Thread Holden Karau
Thank you :)

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Sat, Jul 13, 2024 at 1:37 AM Hyukjin Kwon  wrote:

> Reverted, and opened a new one https://github.com/apache/spark/pull/47341.
>
> On Sat, 13 Jul 2024 at 15:40, Hyukjin Kwon  wrote:
>
>> Yeah that's fine. I'll revert and open a fresh PR including my own
>> followup when I get back home later today.
>>
>> On Sat, Jul 13, 2024 at 3:08 PM Holden Karau 
>> wrote:
>>
>>> Even if the change is reasonable (and I can see arguments both ways),
>>> it's important that we follow the process we agreed on. Merging a PR
>>> without discussion* in ~ 2 hours from the initial proposal is not enough
>>> time to reach a lazy consensus. If it was a small bug-fix I could
>>> understand but this was a non-trivial change.
>>>
>>>
>>> * It was approved by another committer but without any discussion, and
>>> the approver & code author work for the same employer mentioned as the
>>> justification for the change.
>>>
>>> On Fri, Jul 12, 2024 at 6:42 PM Hyukjin Kwon 
>>> wrote:
>>>
 I think we should have not mentioned a specific vendor there. The
 change also shouldn't repartition. We should create a partition 1.

 But in general leveraging Catalyst optimizer and SQL engine there is a
 good idea as we can leverage all optimization there. For example, it will
 use UTF8 encoding instead of a plan string ser/de. We made similar changes
 in JSON and CSV schema inference (it was an RDD before)

 On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
 wrote:

> My bad I meant to say I believe the provided justification is
> inappropriate.
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
> wrote:
>
>> So looking at the PR it does not appear to be removing any RDD APIs
>> but the justification provided for changing the ML backend to use the
>> DataFrame APIs is indeed concerning.
>>
>> This PR appears to have been merged without proper review (or
>> providing an opportunity for review).
>>
>> I’d like to remind people of the expectations we decided on together
>> —
>> https://spark.apache.org/committers.html
>>
>> I believe the provided justification for the change and would ask
>> that we revert this PR so that a proper discussion can take place.
>>
>> “
>> In databricks runtime, RDD read / write API has some issue for
>> certain storage types that requires the account key, but Dataframe read /
>> write API works.
>> “
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>>
>> On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
>>  wrote:
>>
>>> I took a quick look at the PR and would like to understand your
>>> concern better about:
>>>
>>> >  SparkSession is heavier than SparkContext
>>>
>>> It looks like the PR is using the active SparkSession, not creating
>>> a new one etc. I would highly appreciate it if you could help me 
>>> understand
>>> this situation better.
>>>
>>> Thanks a lot!
>>>
>>> On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>>
 Hi, All.

 Apache Spark's RDD API plays an essential and invaluable role from
 the beginning and it will be even if it's not supported by Spark 
 Connect.

 I have a concern about a recent activity which replaces RDD with
 SparkSession blindly.

 For instance,

 https://github.com/apache/spark/pull/47328
 [SPARK-48883][ML][R] Replace RDD read / write API invocation with
 Dataframe read / write API

 This PR doesn't look proper to me in two ways.
 - SparkSession is heavier than SparkContext
 - According to the following PR description, the background is also
 hidden in the community.

   > # Why are the changes needed?
   > In databricks runtime, RDD read / write API has some issue for
 certain storage types
   > that requires the account key, but Dataframe read / write API
 works.

 In addition, we don't know if this PR fixes the mentioned unknown
 storage's issue or not because it's not testable in the community test
 coverage.

 I'm wondering 

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-13 Thread Hyukjin Kwon
Reverted, and opened a new one https://github.com/apache/spark/pull/47341.

On Sat, 13 Jul 2024 at 15:40, Hyukjin Kwon  wrote:

> Yeah that's fine. I'll revert and open a fresh PR including my own
> followup when I get back home later today.
>
> On Sat, Jul 13, 2024 at 3:08 PM Holden Karau 
> wrote:
>
>> Even if the change is reasonable (and I can see arguments both ways),
>> it's important that we follow the process we agreed on. Merging a PR
>> without discussion* in ~ 2 hours from the initial proposal is not enough
>> time to reach a lazy consensus. If it was a small bug-fix I could
>> understand but this was a non-trivial change.
>>
>>
>> * It was approved by another committer but without any discussion, and
>> the approver & code author work for the same employer mentioned as the
>> justification for the change.
>>
>> On Fri, Jul 12, 2024 at 6:42 PM Hyukjin Kwon 
>> wrote:
>>
>>> I think we should have not mentioned a specific vendor there. The change
>>> also shouldn't repartition. We should create a partition 1.
>>>
>>> But in general leveraging Catalyst optimizer and SQL engine there is a
>>> good idea as we can leverage all optimization there. For example, it will
>>> use UTF8 encoding instead of a plan string ser/de. We made similar changes
>>> in JSON and CSV schema inference (it was an RDD before)
>>>
>>> On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
>>> wrote:
>>>
 My bad I meant to say I believe the provided justification is
 inappropriate.

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau


 On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
 wrote:

> So looking at the PR it does not appear to be removing any RDD APIs
> but the justification provided for changing the ML backend to use the
> DataFrame APIs is indeed concerning.
>
> This PR appears to have been merged without proper review (or
> providing an opportunity for review).
>
> I’d like to remind people of the expectations we decided on together —
> https://spark.apache.org/committers.html
>
> I believe the provided justification for the change and would ask that
> we revert this PR so that a proper discussion can take place.
>
> “
> In databricks runtime, RDD read / write API has some issue for certain
> storage types that requires the account key, but Dataframe read / write 
> API
> works.
> “
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

>
> On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
>  wrote:
>
>> I took a quick look at the PR and would like to understand your
>> concern better about:
>>
>> >  SparkSession is heavier than SparkContext
>>
>> It looks like the PR is using the active SparkSession, not creating a
>> new one etc. I would highly appreciate it if you could help me understand
>> this situation better.
>>
>> Thanks a lot!
>>
>> On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>>
>>> Hi, All.
>>>
>>> Apache Spark's RDD API plays an essential and invaluable role from
>>> the beginning and it will be even if it's not supported by Spark 
>>> Connect.
>>>
>>> I have a concern about a recent activity which replaces RDD with
>>> SparkSession blindly.
>>>
>>> For instance,
>>>
>>> https://github.com/apache/spark/pull/47328
>>> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
>>> Dataframe read / write API
>>>
>>> This PR doesn't look proper to me in two ways.
>>> - SparkSession is heavier than SparkContext
>>> - According to the following PR description, the background is also
>>> hidden in the community.
>>>
>>>   > # Why are the changes needed?
>>>   > In databricks runtime, RDD read / write API has some issue for
>>> certain storage types
>>>   > that requires the account key, but Dataframe read / write API
>>> works.
>>>
>>> In addition, we don't know if this PR fixes the mentioned unknown
>>> storage's issue or not because it's not testable in the community test
>>> coverage.
>>>
>>> I'm wondering if the Apache Spark community aims to move away from
>>> the RDD usage in favor of `Spark Connect`. Isn't it too early because
>>> `Spark Connect` is not even GA in the community?
>>>
>>> Dongjoon.
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live 

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-13 Thread Hyukjin Kwon
Yeah that's fine. I'll revert and open a fresh PR including my own followup
when I get back home later today.

On Sat, Jul 13, 2024 at 3:08 PM Holden Karau  wrote:

> Even if the change is reasonable (and I can see arguments both ways), it's
> important that we follow the process we agreed on. Merging a PR without
> discussion* in ~ 2 hours from the initial proposal is not enough time to
> reach a lazy consensus. If it was a small bug-fix I could understand but
> this was a non-trivial change.
>
>
> * It was approved by another committer but without any discussion, and the
> approver & code author work for the same employer mentioned as the
> justification for the change.
>
> On Fri, Jul 12, 2024 at 6:42 PM Hyukjin Kwon  wrote:
>
>> I think we should have not mentioned a specific vendor there. The change
>> also shouldn't repartition. We should create a partition 1.
>>
>> But in general leveraging Catalyst optimizer and SQL engine there is a
>> good idea as we can leverage all optimization there. For example, it will
>> use UTF8 encoding instead of a plan string ser/de. We made similar changes
>> in JSON and CSV schema inference (it was an RDD before)
>>
>> On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
>> wrote:
>>
>>> My bad I meant to say I believe the provided justification is
>>> inappropriate.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
>>> wrote:
>>>
 So looking at the PR it does not appear to be removing any RDD APIs but
 the justification provided for changing the ML backend to use the DataFrame
 APIs is indeed concerning.

 This PR appears to have been merged without proper review (or providing
 an opportunity for review).

 I’d like to remind people of the expectations we decided on together —
 https://spark.apache.org/committers.html

 I believe the provided justification for the change and would ask that
 we revert this PR so that a proper discussion can take place.

 “
 In databricks runtime, RDD read / write API has some issue for certain
 storage types that requires the account key, but Dataframe read / write API
 works.
 “

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>

 On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
  wrote:

> I took a quick look at the PR and would like to understand your
> concern better about:
>
> >  SparkSession is heavier than SparkContext
>
> It looks like the PR is using the active SparkSession, not creating a
> new one etc. I would highly appreciate it if you could help me understand
> this situation better.
>
> Thanks a lot!
>
> On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> Apache Spark's RDD API plays an essential and invaluable role from
>> the beginning and it will be even if it's not supported by Spark Connect.
>>
>> I have a concern about a recent activity which replaces RDD with
>> SparkSession blindly.
>>
>> For instance,
>>
>> https://github.com/apache/spark/pull/47328
>> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
>> Dataframe read / write API
>>
>> This PR doesn't look proper to me in two ways.
>> - SparkSession is heavier than SparkContext
>> - According to the following PR description, the background is also
>> hidden in the community.
>>
>>   > # Why are the changes needed?
>>   > In databricks runtime, RDD read / write API has some issue for
>> certain storage types
>>   > that requires the account key, but Dataframe read / write API
>> works.
>>
>> In addition, we don't know if this PR fixes the mentioned unknown
>> storage's issue or not because it's not testable in the community test
>> coverage.
>>
>> I'm wondering if the Apache Spark community aims to move away from
>> the RDD usage in favor of `Spark Connect`. Isn't it too early because
>> `Spark Connect` is not even GA in the community?
>>
>> Dongjoon.
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Hyukjin Kwon
We actually get the active Spark session so it doesn't cause overhead. Also
even we create, it will create once which should be pretty trivial overhead.

I don't think we can deprecate RDD API IMHO in any event.

On Sat, Jul 13, 2024 at 1:30 PM Martin Grund 
wrote:

> Mridul, I really just wanted to understand the concern from Dongjoon. What
> you're pointing at is a slightly different concern. So what I see is the
> following:
>
> > [...] they can initialize a SparkContext and work with RDD api:
>
> The current PR uses a potentially optional value without checking that it
> is set. (Which is what would happen if you just have a SparkContext and no
> SparkSession).
>
> I understand that this can happen when someone creates a Spark job and
> uses no other Spark APIs to begin with. But in the context of using the
> current Spark ML implementation, is it actually possible to end up in this
> situation? I'm really just trying to understand the system's invariants.
>
> > [...] SparkSession is heavier than SparkContext
>
> Assuming that, for whatever reason, a SparkSession was created. Is there a
> downside to using it?
>
> Please see my questions as independent of the RDD API discussion itself,
> and I don't think this PR was even meant to be put in the context of any
> Spark Connect work.
>
> On Fri, Jul 12, 2024 at 11:58 PM Mridul Muralidharan 
> wrote:
>
>>
>> It is not necessary for users to create a SparkSession Martin - they can
>> initialize a SparkContext and work with RDD api: which would be what
>> Dongjoon is referring to IMO.
>>
>> Even after Spark Connect GA, I am not in favor of deprecating RDD Api at
>> least until we have parity between both (which we don’t have today), and we
>> have vetted this parity over the course of a few minor releases.
>>
>>
>> Regards,
>> Mridul
>>
>>
>>
>> On Fri, Jul 12, 2024 at 4:19 PM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> Apache Spark's RDD API plays an essential and invaluable role from the
>>> beginning and it will be even if it's not supported by Spark Connect.
>>>
>>> I have a concern about a recent activity which replaces RDD with
>>> SparkSession blindly.
>>>
>>> For instance,
>>>
>>> https://github.com/apache/spark/pull/47328
>>> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
>>> Dataframe read / write API
>>>
>>> This PR doesn't look proper to me in two ways.
>>> - SparkSession is heavier than SparkContext
>>> - According to the following PR description, the background is also
>>> hidden in the community.
>>>
>>>   > # Why are the changes needed?
>>>   > In databricks runtime, RDD read / write API has some issue for
>>> certain storage types
>>>   > that requires the account key, but Dataframe read / write API works.
>>>
>>> In addition, we don't know if this PR fixes the mentioned unknown
>>> storage's issue or not because it's not testable in the community test
>>> coverage.
>>>
>>> I'm wondering if the Apache Spark community aims to move away from the
>>> RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark
>>> Connect` is not even GA in the community?
>>>
>>>
>>> Dongjoon.
>>>
>>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Martin Grund
Mridul, I really just wanted to understand the concern from Dongjoon. What
you're pointing at is a slightly different concern. So what I see is the
following:

> [...] they can initialize a SparkContext and work with RDD api:

The current PR uses a potentially optional value without checking that it
is set. (Which is what would happen if you just have a SparkContext and no
SparkSession).

I understand that this can happen when someone creates a Spark job and uses
no other Spark APIs to begin with. But in the context of using the
current Spark ML implementation, is it actually possible to end up in this
situation? I'm really just trying to understand the system's invariants.

> [...] SparkSession is heavier than SparkContext

Assuming that, for whatever reason, a SparkSession was created. Is there a
downside to using it?

Please see my questions as independent of the RDD API discussion itself,
and I don't think this PR was even meant to be put in the context of any
Spark Connect work.

On Fri, Jul 12, 2024 at 11:58 PM Mridul Muralidharan 
wrote:

>
> It is not necessary for users to create a SparkSession Martin - they can
> initialize a SparkContext and work with RDD api: which would be what
> Dongjoon is referring to IMO.
>
> Even after Spark Connect GA, I am not in favor of deprecating RDD Api at
> least until we have parity between both (which we don’t have today), and we
> have vetted this parity over the course of a few minor releases.
>
>
> Regards,
> Mridul
>
>
>
> On Fri, Jul 12, 2024 at 4:19 PM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> Apache Spark's RDD API plays an essential and invaluable role from the
>> beginning and it will be even if it's not supported by Spark Connect.
>>
>> I have a concern about a recent activity which replaces RDD with
>> SparkSession blindly.
>>
>> For instance,
>>
>> https://github.com/apache/spark/pull/47328
>> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
>> Dataframe read / write API
>>
>> This PR doesn't look proper to me in two ways.
>> - SparkSession is heavier than SparkContext
>> - According to the following PR description, the background is also
>> hidden in the community.
>>
>>   > # Why are the changes needed?
>>   > In databricks runtime, RDD read / write API has some issue for
>> certain storage types
>>   > that requires the account key, but Dataframe read / write API works.
>>
>> In addition, we don't know if this PR fixes the mentioned unknown
>> storage's issue or not because it's not testable in the community test
>> coverage.
>>
>> I'm wondering if the Apache Spark community aims to move away from the
>> RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark
>> Connect` is not even GA in the community?
>>
>>
>> Dongjoon.
>>
>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Hyukjin Kwon
I made a followup (https://github.com/apache/spark/pull/47341) to address
my own concerns. Please let me know if there are additional concerns. We
could further discuss it there.
I am also fine with reverting it and starting it from scratch if that's
preferred.


On Sat, 13 Jul 2024 at 11:52, Ruifeng Zheng  wrote:

> My bad, as the reviewer, I should review the PR description more closely.
>
> I think it is a good change to replace spark context based implementation
> with spark session, and if I recall correctly there were some similar
> attempts in MLLib in the past.
>
>
> On Sat, Jul 13, 2024 at 9:42 AM Hyukjin Kwon  wrote:
>
>> I think we should have not mentioned a specific vendor there. The change
>> also shouldn't repartition. We should create a partition 1.
>>
>> But in general leveraging Catalyst optimizer and SQL engine there is a
>> good idea as we can leverage all optimization there. For example, it will
>> use UTF8 encoding instead of a plan string ser/de. We made similar changes
>> in JSON and CSV schema inference (it was an RDD before)
>>
>> On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
>> wrote:
>>
>>> My bad I meant to say I believe the provided justification is
>>> inappropriate.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
>>> wrote:
>>>
 So looking at the PR it does not appear to be removing any RDD APIs but
 the justification provided for changing the ML backend to use the DataFrame
 APIs is indeed concerning.

 This PR appears to have been merged without proper review (or providing
 an opportunity for review).

 I’d like to remind people of the expectations we decided on together —
 https://spark.apache.org/committers.html

 I believe the provided justification for the change and would ask that
 we revert this PR so that a proper discussion can take place.

 “
 In databricks runtime, RDD read / write API has some issue for certain
 storage types that requires the account key, but Dataframe read / write API
 works.
 “

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>

 On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
  wrote:

> I took a quick look at the PR and would like to understand your
> concern better about:
>
> >  SparkSession is heavier than SparkContext
>
> It looks like the PR is using the active SparkSession, not creating a
> new one etc. I would highly appreciate it if you could help me understand
> this situation better.
>
> Thanks a lot!
>
> On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> Apache Spark's RDD API plays an essential and invaluable role from
>> the beginning and it will be even if it's not supported by Spark Connect.
>>
>> I have a concern about a recent activity which replaces RDD with
>> SparkSession blindly.
>>
>> For instance,
>>
>> https://github.com/apache/spark/pull/47328
>> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
>> Dataframe read / write API
>>
>> This PR doesn't look proper to me in two ways.
>> - SparkSession is heavier than SparkContext
>> - According to the following PR description, the background is also
>> hidden in the community.
>>
>>   > # Why are the changes needed?
>>   > In databricks runtime, RDD read / write API has some issue for
>> certain storage types
>>   > that requires the account key, but Dataframe read / write API
>> works.
>>
>> In addition, we don't know if this PR fixes the mentioned unknown
>> storage's issue or not because it's not testable in the community test
>> coverage.
>>
>> I'm wondering if the Apache Spark community aims to move away from
>> the RDD usage in favor of `Spark Connect`. Isn't it too early because
>> `Spark Connect` is not even GA in the community?
>>
>> Dongjoon.
>>
>
>
> --
> Ruifeng Zheng
> E-mail: zrfli...@gmail.com
>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Ruifeng Zheng
My bad, as the reviewer, I should review the PR description more closely.

I think it is a good change to replace spark context based implementation
with spark session, and if I recall correctly there were some similar
attempts in MLLib in the past.


On Sat, Jul 13, 2024 at 9:42 AM Hyukjin Kwon  wrote:

> I think we should have not mentioned a specific vendor there. The change
> also shouldn't repartition. We should create a partition 1.
>
> But in general leveraging Catalyst optimizer and SQL engine there is a
> good idea as we can leverage all optimization there. For example, it will
> use UTF8 encoding instead of a plan string ser/de. We made similar changes
> in JSON and CSV schema inference (it was an RDD before)
>
> On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
> wrote:
>
>> My bad I meant to say I believe the provided justification is
>> inappropriate.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
>> wrote:
>>
>>> So looking at the PR it does not appear to be removing any RDD APIs but
>>> the justification provided for changing the ML backend to use the DataFrame
>>> APIs is indeed concerning.
>>>
>>> This PR appears to have been merged without proper review (or providing
>>> an opportunity for review).
>>>
>>> I’d like to remind people of the expectations we decided on together —
>>> https://spark.apache.org/committers.html
>>>
>>> I believe the provided justification for the change and would ask that
>>> we revert this PR so that a proper discussion can take place.
>>>
>>> “
>>> In databricks runtime, RDD read / write API has some issue for certain
>>> storage types that requires the account key, but Dataframe read / write API
>>> works.
>>> “
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>>>
>>> On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
>>>  wrote:
>>>
 I took a quick look at the PR and would like to understand your concern
 better about:

 >  SparkSession is heavier than SparkContext

 It looks like the PR is using the active SparkSession, not creating a
 new one etc. I would highly appreciate it if you could help me understand
 this situation better.

 Thanks a lot!

 On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
 wrote:

> Hi, All.
>
> Apache Spark's RDD API plays an essential and invaluable role from the
> beginning and it will be even if it's not supported by Spark Connect.
>
> I have a concern about a recent activity which replaces RDD with
> SparkSession blindly.
>
> For instance,
>
> https://github.com/apache/spark/pull/47328
> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
> Dataframe read / write API
>
> This PR doesn't look proper to me in two ways.
> - SparkSession is heavier than SparkContext
> - According to the following PR description, the background is also
> hidden in the community.
>
>   > # Why are the changes needed?
>   > In databricks runtime, RDD read / write API has some issue for
> certain storage types
>   > that requires the account key, but Dataframe read / write API
> works.
>
> In addition, we don't know if this PR fixes the mentioned unknown
> storage's issue or not because it's not testable in the community test
> coverage.
>
> I'm wondering if the Apache Spark community aims to move away from the
> RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark
> Connect` is not even GA in the community?
>
> Dongjoon.
>


-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
Even if the change is reasonable (and I can see arguments both ways), it's
important that we follow the process we agreed on. Merging a PR without
discussion* in ~ 2 hours from the initial proposal is not enough time to
reach a lazy consensus. If it was a small bug-fix I could understand but
this was a non-trivial change.


* It was approved by another committer but without any discussion, and the
approver & code author work for the same employer mentioned as the
justification for the change.

On Fri, Jul 12, 2024 at 6:42 PM Hyukjin Kwon  wrote:

> I think we should have not mentioned a specific vendor there. The change
> also shouldn't repartition. We should create a partition 1.
>
> But in general leveraging Catalyst optimizer and SQL engine there is a
> good idea as we can leverage all optimization there. For example, it will
> use UTF8 encoding instead of a plan string ser/de. We made similar changes
> in JSON and CSV schema inference (it was an RDD before)
>
> On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
> wrote:
>
>> My bad I meant to say I believe the provided justification is
>> inappropriate.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
>> wrote:
>>
>>> So looking at the PR it does not appear to be removing any RDD APIs but
>>> the justification provided for changing the ML backend to use the DataFrame
>>> APIs is indeed concerning.
>>>
>>> This PR appears to have been merged without proper review (or providing
>>> an opportunity for review).
>>>
>>> I’d like to remind people of the expectations we decided on together —
>>> https://spark.apache.org/committers.html
>>>
>>> I believe the provided justification for the change and would ask that
>>> we revert this PR so that a proper discussion can take place.
>>>
>>> “
>>> In databricks runtime, RDD read / write API has some issue for certain
>>> storage types that requires the account key, but Dataframe read / write API
>>> works.
>>> “
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>>>
>>> On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
>>>  wrote:
>>>
 I took a quick look at the PR and would like to understand your concern
 better about:

 >  SparkSession is heavier than SparkContext

 It looks like the PR is using the active SparkSession, not creating a
 new one etc. I would highly appreciate it if you could help me understand
 this situation better.

 Thanks a lot!

 On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
 wrote:

> Hi, All.
>
> Apache Spark's RDD API plays an essential and invaluable role from the
> beginning and it will be even if it's not supported by Spark Connect.
>
> I have a concern about a recent activity which replaces RDD with
> SparkSession blindly.
>
> For instance,
>
> https://github.com/apache/spark/pull/47328
> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
> Dataframe read / write API
>
> This PR doesn't look proper to me in two ways.
> - SparkSession is heavier than SparkContext
> - According to the following PR description, the background is also
> hidden in the community.
>
>   > # Why are the changes needed?
>   > In databricks runtime, RDD read / write API has some issue for
> certain storage types
>   > that requires the account key, but Dataframe read / write API
> works.
>
> In addition, we don't know if this PR fixes the mentioned unknown
> storage's issue or not because it's not testable in the community test
> coverage.
>
> I'm wondering if the Apache Spark community aims to move away from the
> RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark
> Connect` is not even GA in the community?
>
> Dongjoon.
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Hyukjin Kwon
I think we should have not mentioned a specific vendor there. The change
also shouldn't repartition. We should create a partition 1.

But in general leveraging Catalyst optimizer and SQL engine there is a good
idea as we can leverage all optimization there. For example, it will use
UTF8 encoding instead of a plan string ser/de. We made similar changes in
JSON and CSV schema inference (it was an RDD before)

On Sat, Jul 13, 2024 at 10:33 AM Holden Karau 
wrote:

> My bad I meant to say I believe the provided justification is
> inappropriate.
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Fri, Jul 12, 2024 at 5:14 PM Holden Karau 
> wrote:
>
>> So looking at the PR it does not appear to be removing any RDD APIs but
>> the justification provided for changing the ML backend to use the DataFrame
>> APIs is indeed concerning.
>>
>> This PR appears to have been merged without proper review (or providing
>> an opportunity for review).
>>
>> I’d like to remind people of the expectations we decided on together —
>> https://spark.apache.org/committers.html
>>
>> I believe the provided justification for the change and would ask that we
>> revert this PR so that a proper discussion can take place.
>>
>> “
>> In databricks runtime, RDD read / write API has some issue for certain
>> storage types that requires the account key, but Dataframe read / write API
>> works.
>> “
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>>
>> On Fri, Jul 12, 2024 at 1:02 PM Martin Grund
>>  wrote:
>>
>>> I took a quick look at the PR and would like to understand your concern
>>> better about:
>>>
>>> >  SparkSession is heavier than SparkContext
>>>
>>> It looks like the PR is using the active SparkSession, not creating a
>>> new one etc. I would highly appreciate it if you could help me understand
>>> this situation better.
>>>
>>> Thanks a lot!
>>>
>>> On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
>>> wrote:
>>>
 Hi, All.

 Apache Spark's RDD API plays an essential and invaluable role from the
 beginning and it will be even if it's not supported by Spark Connect.

 I have a concern about a recent activity which replaces RDD with
 SparkSession blindly.

 For instance,

 https://github.com/apache/spark/pull/47328
 [SPARK-48883][ML][R] Replace RDD read / write API invocation with
 Dataframe read / write API

 This PR doesn't look proper to me in two ways.
 - SparkSession is heavier than SparkContext
 - According to the following PR description, the background is also
 hidden in the community.

   > # Why are the changes needed?
   > In databricks runtime, RDD read / write API has some issue for
 certain storage types
   > that requires the account key, but Dataframe read / write API works.

 In addition, we don't know if this PR fixes the mentioned unknown
 storage's issue or not because it's not testable in the community test
 coverage.

 I'm wondering if the Apache Spark community aims to move away from the
 RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark
 Connect` is not even GA in the community?

 Dongjoon.

>>>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
My bad I meant to say I believe the provided justification is inappropriate.

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Fri, Jul 12, 2024 at 5:14 PM Holden Karau  wrote:

> So looking at the PR it does not appear to be removing any RDD APIs but
> the justification provided for changing the ML backend to use the DataFrame
> APIs is indeed concerning.
>
> This PR appears to have been merged without proper review (or providing an
> opportunity for review).
>
> I’d like to remind people of the expectations we decided on together —
> https://spark.apache.org/committers.html
>
> I believe the provided justification for the change and would ask that we
> revert this PR so that a proper discussion can take place.
>
> “
> In databricks runtime, RDD read / write API has some issue for certain
> storage types that requires the account key, but Dataframe read / write API
> works.
> “
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Fri, Jul 12, 2024 at 1:02 PM Martin Grund 
> wrote:
>
>> I took a quick look at the PR and would like to understand your concern
>> better about:
>>
>> >  SparkSession is heavier than SparkContext
>>
>> It looks like the PR is using the active SparkSession, not creating a new
>> one etc. I would highly appreciate it if you could help me understand this
>> situation better.
>>
>> Thanks a lot!
>>
>> On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> Apache Spark's RDD API plays an essential and invaluable role from the
>>> beginning and it will be even if it's not supported by Spark Connect.
>>>
>>> I have a concern about a recent activity which replaces RDD with
>>> SparkSession blindly.
>>>
>>> For instance,
>>>
>>> https://github.com/apache/spark/pull/47328
>>> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
>>> Dataframe read / write API
>>>
>>> This PR doesn't look proper to me in two ways.
>>> - SparkSession is heavier than SparkContext
>>> - According to the following PR description, the background is also
>>> hidden in the community.
>>>
>>>   > # Why are the changes needed?
>>>   > In databricks runtime, RDD read / write API has some issue for
>>> certain storage types
>>>   > that requires the account key, but Dataframe read / write API works.
>>>
>>> In addition, we don't know if this PR fixes the mentioned unknown
>>> storage's issue or not because it's not testable in the community test
>>> coverage.
>>>
>>> I'm wondering if the Apache Spark community aims to move away from the
>>> RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark
>>> Connect` is not even GA in the community?
>>>
>>> Dongjoon.
>>>
>>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
So looking at the PR it does not appear to be removing any RDD APIs but the
justification provided for changing the ML backend to use the DataFrame
APIs is indeed concerning.

This PR appears to have been merged without proper review (or providing an
opportunity for review).

I’d like to remind people of the expectations we decided on together —
https://spark.apache.org/committers.html

I believe the provided justification for the change and would ask that we
revert this PR so that a proper discussion can take place.

“
In databricks runtime, RDD read / write API has some issue for certain
storage types that requires the account key, but Dataframe read / write API
works.
“

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Fri, Jul 12, 2024 at 1:02 PM Martin Grund 
wrote:

> I took a quick look at the PR and would like to understand your concern
> better about:
>
> >  SparkSession is heavier than SparkContext
>
> It looks like the PR is using the active SparkSession, not creating a new
> one etc. I would highly appreciate it if you could help me understand this
> situation better.
>
> Thanks a lot!
>
> On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> Apache Spark's RDD API plays an essential and invaluable role from the
>> beginning and it will be even if it's not supported by Spark Connect.
>>
>> I have a concern about a recent activity which replaces RDD with
>> SparkSession blindly.
>>
>> For instance,
>>
>> https://github.com/apache/spark/pull/47328
>> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
>> Dataframe read / write API
>>
>> This PR doesn't look proper to me in two ways.
>> - SparkSession is heavier than SparkContext
>> - According to the following PR description, the background is also
>> hidden in the community.
>>
>>   > # Why are the changes needed?
>>   > In databricks runtime, RDD read / write API has some issue for
>> certain storage types
>>   > that requires the account key, but Dataframe read / write API works.
>>
>> In addition, we don't know if this PR fixes the mentioned unknown
>> storage's issue or not because it's not testable in the community test
>> coverage.
>>
>> I'm wondering if the Apache Spark community aims to move away from the
>> RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark
>> Connect` is not even GA in the community?
>>
>> Dongjoon.
>>
>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Mridul Muralidharan
It is not necessary for users to create a SparkSession Martin - they can
initialize a SparkContext and work with RDD api: which would be what
Dongjoon is referring to IMO.

Even after Spark Connect GA, I am not in favor of deprecating RDD Api at
least until we have parity between both (which we don’t have today), and we
have vetted this parity over the course of a few minor releases.


Regards,
Mridul



On Fri, Jul 12, 2024 at 4:19 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Apache Spark's RDD API plays an essential and invaluable role from the
> beginning and it will be even if it's not supported by Spark Connect.
>
> I have a concern about a recent activity which replaces RDD with
> SparkSession blindly.
>
> For instance,
>
> https://github.com/apache/spark/pull/47328
> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
> Dataframe read / write API
>
> This PR doesn't look proper to me in two ways.
> - SparkSession is heavier than SparkContext
> - According to the following PR description, the background is also hidden
> in the community.
>
>   > # Why are the changes needed?
>   > In databricks runtime, RDD read / write API has some issue for certain
> storage types
>   > that requires the account key, but Dataframe read / write API works.
>
> In addition, we don't know if this PR fixes the mentioned unknown
> storage's issue or not because it's not testable in the community test
> coverage.
>
> I'm wondering if the Apache Spark community aims to move away from the RDD
> usage in favor of `Spark Connect`. Isn't it too early because `Spark
> Connect` is not even GA in the community?
>
>
> Dongjoon.
>


Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Martin Grund
I took a quick look at the PR and would like to understand your concern
better about:

>  SparkSession is heavier than SparkContext

It looks like the PR is using the active SparkSession, not creating a new
one etc. I would highly appreciate it if you could help me understand this
situation better.

Thanks a lot!

On Fri, Jul 12, 2024 at 8:52 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Apache Spark's RDD API plays an essential and invaluable role from the
> beginning and it will be even if it's not supported by Spark Connect.
>
> I have a concern about a recent activity which replaces RDD with
> SparkSession blindly.
>
> For instance,
>
> https://github.com/apache/spark/pull/47328
> [SPARK-48883][ML][R] Replace RDD read / write API invocation with
> Dataframe read / write API
>
> This PR doesn't look proper to me in two ways.
> - SparkSession is heavier than SparkContext
> - According to the following PR description, the background is also hidden
> in the community.
>
>   > # Why are the changes needed?
>   > In databricks runtime, RDD read / write API has some issue for certain
> storage types
>   > that requires the account key, but Dataframe read / write API works.
>
> In addition, we don't know if this PR fixes the mentioned unknown
> storage's issue or not because it's not testable in the community test
> coverage.
>
> I'm wondering if the Apache Spark community aims to move away from the RDD
> usage in favor of `Spark Connect`. Isn't it too early because `Spark
> Connect` is not even GA in the community?
>
> Dongjoon.
>


Re: [DISCUSS] Auto scaling support for structured streaming

2024-07-12 Thread Nimrod Ofek
Hi,

Anyone?
Scaling for different loads in a structured streaming app should be a
trivial requirement for users...

Thanks!
Nimrod

בתאריך יום ג׳, 9 ביולי 2024, 10:20, מאת Nimrod Ofek ‏:

> PMC members, can someone please push this thing forward?
>
> Thanks!
> Nimrod
>
> בתאריך יום ג׳, 9 ביולי 2024, 01:57, מאת Pavan Kotikalapudi ‏<
> pkotikalap...@twilio.com>:
>
>> Definitely!. We internally use it extensively in all our apps and would
>> love to get community feedback.
>>
>> I think we have enough work done to move this feature forward.
>> We had discussion and vote threads already published in the past, but we
>> need enough backing/votes of the PMC members to take it to completion.
>>
>>
>> cc: @Jungtaek Lim , @Mich Talebzadeh
>>  mentors of this effort.
>>
>> Cheers,
>>
>> Pavan
>>
>>
>>
>>
>> On Mon, Jul 8, 2024 at 10:33 AM Nimrod Ofek 
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks Pavan.
>>>
>>> I think that the change is very important due to the amount of Spark
>>> structured streaming apps running today out there...
>>> IMHO this should be introduced in the upcoming Spark 4.0.0 version as an
>>> experimental feature for evaluation by the community...
>>>
>>> What should be the next steps to make sure the community gets this
>>> important feature, at least to evaluate?
>>> How can the community experiment with it to decide if it's good enough
>>> for production use?
>>>
>>> Thanks!
>>> Nimrod
>>>
>>> On Mon, Jul 8, 2024 at 4:10 PM Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have taken up the responsibility for the development of that feature
>>>> right now.
>>>>
>>>> Here is the current work https://github.com/apache/spark/pull/42352
>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIGKj5coZ$>
>>>>
>>>> last active email thread (maybe you want to reply to this): Re: Vote
>>>> on Dynamic resource allocation for structured streaming [SPARK-24815]
>>>> <https://urldefense.com/v3/__https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIPycqhCt$>
>>>>
>>>> doc: Dynamic resource allocation for structured streaming
>>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIN-NBSBO$>
>>>> .
>>>>
>>>> This still needs to be reviewed/approved by PMC members, so not sure
>>>> about the timeline at this point.
>>>>
>>>> Thanks,
>>>>
>>>> Pavan
>>>>
>>>>
>>>>
>>>> On Thu, Jul 4, 2024 at 10:46 AM Nimrod Ofek 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I remember there was a discussion about better supporting auto scaling
>>>>> for structured streaming.
>>>>> Is there anything happening with that for the upcoming Spark 4.0
>>>>> release?
>>>>> Will there be support for auto scaling (at least on K8s) spark
>>>>> structured streaming apps?
>>>>>
>>>>> Thanks,
>>>>> Nimrod
>>>>>
>>>>


Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-12 Thread Kent Yao
Thank you everyone for the positive feedback.

 A special thanks to Dongjoon for offering to help.

xianjin  于2024年7月12日周五 15:12写道:
>
> +1.
> Sent from my iPhone
>
> > On Jul 12, 2024, at 3:06 PM, L. C. Hsieh  wrote:
> >
> > +1
> >
> >> On Thu, Jul 11, 2024 at 3:22 PM Zhou Jiang  wrote:
> >>
> >> +1 for releasing 3.5.2, which would also benefit the Spark Operator 
> >> multi-version support.
> >>
> >>> On Thu, Jul 11, 2024 at 7:56 AM Dongjoon Hyun  
> >>> wrote:
> >>>
> >>> Thank you for the head-up and volunteering, Kent.
> >>>
> >>> +1 for 3.5.2 release.
> >>>
> >>> I can help you with the release steps which require Spark PMC permissions.
> >>>
> >>> Please let me know if you have any questions or hit any issues.
> >>>
> >>> Thanks,
> >>> Dongjoon.
> >>>
> >>>
> >>> On Thu, Jul 11, 2024 at 2:04 AM Kent Yao  wrote:
> 
>  Hi dev,
> 
>  It's been approximately 5 months since Feb 23, 2024, when
>  we released version 3.5.1 for branch-3.5. The patchset differing
>  from 3.5.1 has grown significantly, now consisting of over 160
>  commits.
> 
>  The JIRA[2] also indicates that more than 120 resolved tickets are aimed
>  at version 3.5.2, including some blockers and critical issues.
> 
>  What do you think about releasing 3.5.2? I am volunteering to take on
>  the role of
>  release manager for 3.5.2.
> 
> 
>  Bests,
>  Kent Yao
> 
>  [1] https://spark.apache.org/news/spark-3-5-1-released.html
>  [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
> 
>  -
>  To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> >>
> >>
> >> --
> >> Zhou JIANG
> >>
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-12 Thread xianjin
+1. 
Sent from my iPhone

> On Jul 12, 2024, at 3:06 PM, L. C. Hsieh  wrote:
> 
> +1
> 
>> On Thu, Jul 11, 2024 at 3:22 PM Zhou Jiang  wrote:
>> 
>> +1 for releasing 3.5.2, which would also benefit the Spark Operator 
>> multi-version support.
>> 
>>> On Thu, Jul 11, 2024 at 7:56 AM Dongjoon Hyun  
>>> wrote:
>>> 
>>> Thank you for the head-up and volunteering, Kent.
>>> 
>>> +1 for 3.5.2 release.
>>> 
>>> I can help you with the release steps which require Spark PMC permissions.
>>> 
>>> Please let me know if you have any questions or hit any issues.
>>> 
>>> Thanks,
>>> Dongjoon.
>>> 
>>> 
>>> On Thu, Jul 11, 2024 at 2:04 AM Kent Yao  wrote:
 
 Hi dev,
 
 It's been approximately 5 months since Feb 23, 2024, when
 we released version 3.5.1 for branch-3.5. The patchset differing
 from 3.5.1 has grown significantly, now consisting of over 160
 commits.
 
 The JIRA[2] also indicates that more than 120 resolved tickets are aimed
 at version 3.5.2, including some blockers and critical issues.
 
 What do you think about releasing 3.5.2? I am volunteering to take on
 the role of
 release manager for 3.5.2.
 
 
 Bests,
 Kent Yao
 
 [1] https://spark.apache.org/news/spark-3-5-1-released.html
 [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
 
 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 
>> 
>> 
>> --
>> Zhou JIANG
>> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [外部邮件] Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-11 Thread yangjie01
+1

发件人: Hyukjin Kwon 
日期: 2024年7月12日 星期五 13:09
收件人: "L. C. Hsieh" 
抄送: Dongjoon Hyun , Kent Yao , Zhou 
Jiang , dev 
主题: [外部邮件] Re: [DISCUSS] Release Apache Spark 3.5.2

+1

On Fri, Jul 12, 2024 at 11:13 AM L. C. Hsieh 
mailto:vii...@gmail.com>> wrote:
+1

On Thu, Jul 11, 2024 at 3:22 PM Zhou Jiang 
mailto:zhou.c.ji...@gmail.com>> wrote:
>
> +1 for releasing 3.5.2, which would also benefit the Spark Operator 
> multi-version support.
>
> On Thu, Jul 11, 2024 at 7:56 AM Dongjoon Hyun 
> mailto:dongjoon.h...@gmail.com>> wrote:
>>
>> Thank you for the head-up and volunteering, Kent.
>>
>> +1 for 3.5.2 release.
>>
>> I can help you with the release steps which require Spark PMC permissions.
>>
>> Please let me know if you have any questions or hit any issues.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Thu, Jul 11, 2024 at 2:04 AM Kent Yao 
>> mailto:y...@apache.org>> wrote:
>>>
>>> Hi dev,
>>>
>>> It's been approximately 5 months since Feb 23, 2024, when
>>> we released version 3.5.1 for branch-3.5. The patchset differing
>>> from 3.5.1 has grown significantly, now consisting of over 160
>>> commits.
>>>
>>> The JIRA[2] also indicates that more than 120 resolved tickets are aimed
>>> at version 3.5.2, including some blockers and critical issues.
>>>
>>> What do you think about releasing 3.5.2? I am volunteering to take on
>>> the role of
>>> release manager for 3.5.2.
>>>
>>>
>>> Bests,
>>> Kent Yao
>>>
>>> [1] 
>>> https://spark.apache.org/news/spark-3-5-1-released.html<https://mailshield.baidu.com/check?q=SF10kc3TWNiRTx0uhvvWJqmGP6stWMsNwsh%2bSf%2f6myZhY938wt4AMq9MwU5siCcjgNfixxBmomc4AK%2f1>
>>> [2] 
>>> https://issues.apache.org/jira/projects/SPARK/versions/12353980<https://mailshield.baidu.com/check?q=r%2b0eeXNroLbYv3pgLDmlS3%2f8gezsxVddwKH%2f1rQ9srH3177xbva0ErhJ%2ft1aUfwll0IvWVRFmb56yy5m%2beFvLOck%2b3U%3d>
>>>
>>> -
>>> To unsubscribe e-mail: 
>>> dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>
>>>
>
>
> --
> Zhou JIANG
>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>


Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-11 Thread Hyukjin Kwon
+1

On Fri, Jul 12, 2024 at 11:13 AM L. C. Hsieh  wrote:

> +1
>
> On Thu, Jul 11, 2024 at 3:22 PM Zhou Jiang  wrote:
> >
> > +1 for releasing 3.5.2, which would also benefit the Spark Operator
> multi-version support.
> >
> > On Thu, Jul 11, 2024 at 7:56 AM Dongjoon Hyun 
> wrote:
> >>
> >> Thank you for the head-up and volunteering, Kent.
> >>
> >> +1 for 3.5.2 release.
> >>
> >> I can help you with the release steps which require Spark PMC
> permissions.
> >>
> >> Please let me know if you have any questions or hit any issues.
> >>
> >> Thanks,
> >> Dongjoon.
> >>
> >>
> >> On Thu, Jul 11, 2024 at 2:04 AM Kent Yao  wrote:
> >>>
> >>> Hi dev,
> >>>
> >>> It's been approximately 5 months since Feb 23, 2024, when
> >>> we released version 3.5.1 for branch-3.5. The patchset differing
> >>> from 3.5.1 has grown significantly, now consisting of over 160
> >>> commits.
> >>>
> >>> The JIRA[2] also indicates that more than 120 resolved tickets are
> aimed
> >>> at version 3.5.2, including some blockers and critical issues.
> >>>
> >>> What do you think about releasing 3.5.2? I am volunteering to take on
> >>> the role of
> >>> release manager for 3.5.2.
> >>>
> >>>
> >>> Bests,
> >>> Kent Yao
> >>>
> >>> [1] https://spark.apache.org/news/spark-3-5-1-released.html
> >>> [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
> >
> >
> > --
> > Zhou JIANG
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-11 Thread L. C. Hsieh
+1

On Thu, Jul 11, 2024 at 3:22 PM Zhou Jiang  wrote:
>
> +1 for releasing 3.5.2, which would also benefit the Spark Operator 
> multi-version support.
>
> On Thu, Jul 11, 2024 at 7:56 AM Dongjoon Hyun  wrote:
>>
>> Thank you for the head-up and volunteering, Kent.
>>
>> +1 for 3.5.2 release.
>>
>> I can help you with the release steps which require Spark PMC permissions.
>>
>> Please let me know if you have any questions or hit any issues.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Thu, Jul 11, 2024 at 2:04 AM Kent Yao  wrote:
>>>
>>> Hi dev,
>>>
>>> It's been approximately 5 months since Feb 23, 2024, when
>>> we released version 3.5.1 for branch-3.5. The patchset differing
>>> from 3.5.1 has grown significantly, now consisting of over 160
>>> commits.
>>>
>>> The JIRA[2] also indicates that more than 120 resolved tickets are aimed
>>> at version 3.5.2, including some blockers and critical issues.
>>>
>>> What do you think about releasing 3.5.2? I am volunteering to take on
>>> the role of
>>> release manager for 3.5.2.
>>>
>>>
>>> Bests,
>>> Kent Yao
>>>
>>> [1] https://spark.apache.org/news/spark-3-5-1-released.html
>>> [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>
>
> --
> Zhou JIANG
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-11 Thread John Zhuge
+1 Thanks!

John Zhuge


On Thu, Jul 11, 2024 at 12:03 PM Mridul Muralidharan 
wrote:

> +1
> Thanks for volunteering !
>
> Regards,
> Mridul
>
> On Thu, Jul 11, 2024 at 4:03 AM Kent Yao  wrote:
>
>> Hi dev,
>>
>> It's been approximately 5 months since Feb 23, 2024, when
>> we released version 3.5.1 for branch-3.5. The patchset differing
>> from 3.5.1 has grown significantly, now consisting of over 160
>> commits.
>>
>> The JIRA[2] also indicates that more than 120 resolved tickets are aimed
>> at version 3.5.2, including some blockers and critical issues.
>>
>> What do you think about releasing 3.5.2? I am volunteering to take on
>> the role of
>> release manager for 3.5.2.
>>
>>
>> Bests,
>> Kent Yao
>>
>> [1] https://spark.apache.org/news/spark-3-5-1-released.html
>> [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-11 Thread Zhou Jiang
+1 for releasing 3.5.2, which would also benefit the Spark Operator
multi-version support.

On Thu, Jul 11, 2024 at 7:56 AM Dongjoon Hyun 
wrote:

> Thank you for the head-up and volunteering, Kent.
>
> +1 for 3.5.2 release.
>
> I can help you with the release steps which require Spark PMC permissions.
>
> Please let me know if you have any questions or hit any issues.
>
> Thanks,
> Dongjoon.
>
>
> On Thu, Jul 11, 2024 at 2:04 AM Kent Yao  wrote:
>
>> Hi dev,
>>
>> It's been approximately 5 months since Feb 23, 2024, when
>> we released version 3.5.1 for branch-3.5. The patchset differing
>> from 3.5.1 has grown significantly, now consisting of over 160
>> commits.
>>
>> The JIRA[2] also indicates that more than 120 resolved tickets are aimed
>> at version 3.5.2, including some blockers and critical issues.
>>
>> What do you think about releasing 3.5.2? I am volunteering to take on
>> the role of
>> release manager for 3.5.2.
>>
>>
>> Bests,
>> Kent Yao
>>
>> [1] https://spark.apache.org/news/spark-3-5-1-released.html
>> [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

-- 
*Zhou JIANG*


Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-11 Thread Mridul Muralidharan
+1
Thanks for volunteering !

Regards,
Mridul

On Thu, Jul 11, 2024 at 4:03 AM Kent Yao  wrote:

> Hi dev,
>
> It's been approximately 5 months since Feb 23, 2024, when
> we released version 3.5.1 for branch-3.5. The patchset differing
> from 3.5.1 has grown significantly, now consisting of over 160
> commits.
>
> The JIRA[2] also indicates that more than 120 resolved tickets are aimed
> at version 3.5.2, including some blockers and critical issues.
>
> What do you think about releasing 3.5.2? I am volunteering to take on
> the role of
> release manager for 3.5.2.
>
>
> Bests,
> Kent Yao
>
> [1] https://spark.apache.org/news/spark-3-5-1-released.html
> [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Release Apache Spark 3.5.2

2024-07-11 Thread Dongjoon Hyun
Thank you for the head-up and volunteering, Kent.

+1 for 3.5.2 release.

I can help you with the release steps which require Spark PMC permissions.

Please let me know if you have any questions or hit any issues.

Thanks,
Dongjoon.


On Thu, Jul 11, 2024 at 2:04 AM Kent Yao  wrote:

> Hi dev,
>
> It's been approximately 5 months since Feb 23, 2024, when
> we released version 3.5.1 for branch-3.5. The patchset differing
> from 3.5.1 has grown significantly, now consisting of over 160
> commits.
>
> The JIRA[2] also indicates that more than 120 resolved tickets are aimed
> at version 3.5.2, including some blockers and critical issues.
>
> What do you think about releasing 3.5.2? I am volunteering to take on
> the role of
> release manager for 3.5.2.
>
>
> Bests,
> Kent Yao
>
> [1] https://spark.apache.org/news/spark-3-5-1-released.html
> [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-09 Thread bo yang
+1

On Tue, Jul 9, 2024 at 12:29 PM Mridul Muralidharan 
wrote:

>
> +1
>
> Regards,
> Mridul
>
>
> On Tue, Jul 9, 2024 at 10:19 AM Xianjin YE  wrote:
>
>> +1
>>
>> > On Jul 9, 2024, at 22:41, L. C. Hsieh  wrote:
>> >
>> > +1
>> >
>> > On Tue, Jul 9, 2024 at 1:13 AM Wenchen Fan  wrote:
>> >>
>> >> +1
>> >>
>> >> On Tue, Jul 9, 2024 at 10:47 AM Reynold Xin
>>  wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Mon, Jul 8, 2024 at 7:44 PM haydn  wrote:
>> 
>>  +1
>> 
>>  On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:
>> >
>> > +1
>> >
>> > On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li 
>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
>> 
>>  (I will leave this vote open till 10th July, considering that
>> its holiday season in US)
>> 
>>  On Fri, 5 Jul 2024 at 06:12, Martin Grund 
>> wrote:
>> >
>> > +1 (non-binding)
>> >
>> > On Thu, Jul 4, 2024 at 7:15 PM Holden Karau <
>> holden.ka...@gmail.com> wrote:
>> >>
>> >> +1
>> >>
>> >> Although given its a US holiday maybe keep the vote open for
>> an extra day?
>> >>
>> >> Twitter: https://twitter.com/holdenkarau
>> >> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> >>
>> >>
>> >> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee <
>> denny.g@gmail.com> wrote:
>> >>>
>> >>> +1 (non-binding)
>> >>>
>> >>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon <
>> gurwls...@apache.org> wrote:
>> 
>>  Hi all,
>> 
>>  I’d like to start a vote for allowing GitHub Actions runs
>> for contributors' PRs without approvals in apache/spark-connect-go.
>> 
>>  Please also refer to:
>> 
>>    - Discussion thread:
>> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>>    - JIRA ticket:
>> https://issues.apache.org/jira/browse/INFRA-25936
>> 
>>  Please vote on the SPIP for the next 72 hours:
>> 
>>  [ ] +1: Accept the proposal
>>  [ ] +0
>>  [ ] -1: I don’t think this is a good idea because …
>> 
>>  Thank you!
>> 
>> >>
>> >>
>> >> --
>> >> Takuya UESHIN
>> >>
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-09 Thread Mridul Muralidharan
+1

Regards,
Mridul


On Tue, Jul 9, 2024 at 10:19 AM Xianjin YE  wrote:

> +1
>
> > On Jul 9, 2024, at 22:41, L. C. Hsieh  wrote:
> >
> > +1
> >
> > On Tue, Jul 9, 2024 at 1:13 AM Wenchen Fan  wrote:
> >>
> >> +1
> >>
> >> On Tue, Jul 9, 2024 at 10:47 AM Reynold Xin 
> wrote:
> >>>
> >>> +1
> >>>
> >>> On Mon, Jul 8, 2024 at 7:44 PM haydn  wrote:
> 
>  +1
> 
>  On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:
> >
> > +1
> >
> > On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN 
> wrote:
> >>
> >> +1
> >>
> >> On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li 
> wrote:
> >>>
> >>> +1
> >>>
> >>> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
> 
>  (I will leave this vote open till 10th July, considering that its
> holiday season in US)
> 
>  On Fri, 5 Jul 2024 at 06:12, Martin Grund 
> wrote:
> >
> > +1 (non-binding)
> >
> > On Thu, Jul 4, 2024 at 7:15 PM Holden Karau <
> holden.ka...@gmail.com> wrote:
> >>
> >> +1
> >>
> >> Although given its a US holiday maybe keep the vote open for an
> extra day?
> >>
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >>
> >>
> >> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee 
> wrote:
> >>>
> >>> +1 (non-binding)
> >>>
> >>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon <
> gurwls...@apache.org> wrote:
> 
>  Hi all,
> 
>  I’d like to start a vote for allowing GitHub Actions runs for
> contributors' PRs without approvals in apache/spark-connect-go.
> 
>  Please also refer to:
> 
>    - Discussion thread:
> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>    - JIRA ticket:
> https://issues.apache.org/jira/browse/INFRA-25936
> 
>  Please vote on the SPIP for the next 72 hours:
> 
>  [ ] +1: Accept the proposal
>  [ ] +0
>  [ ] -1: I don’t think this is a good idea because …
> 
>  Thank you!
> 
> >>
> >>
> >> --
> >> Takuya UESHIN
> >>
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-09 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Mon, Jul 8, 2024 at 8:04 PM Ruifeng Zheng  wrote:

> +1
>
> On Sat, Jul 6, 2024 at 4:45 AM bo yang  wrote:
>
>> +1 This is a great suggestion, thanks Hyukjin!
>>
>>
>> On Thu, Jul 4, 2024 at 4:11 AM Hyukjin Kwon  wrote:
>>
>>> Alright! let me start the vote!
>>>
>>> On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh 
>>> wrote:
>>>
 A good point agreed.

 Mich Talebzadeh,
 Technologist | Architect | Data Engineer  | Generative AI | FinCrime
 PhD  Imperial
 College London 
 London, United Kingdom


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* The information provided is correct to the best of my
 knowledge but of course cannot be guaranteed . It is essential to note
 that, as with any advice, quote "one test result is worth one-thousand
 expert opinions (Werner
 Von Braun
 )".


 On Thu, 4 Jul 2024 at 06:14, Martin Grund 
 wrote:

> Absolutely we should do that. I thought that the default rule was
> inclusive already so that once folks have their first contribution it 
> would
> automatically allow kicking of the workflows.
>
> On Thu, Jul 4, 2024 at 04:20 Matthew Powers <
> matthewkevinpow...@gmail.com> wrote:
>
>> Yea, this would be great.
>>
>> spark-connect-go is still experimental and anything we can do to get
>> it production grade would be a great step IMO.  The Go community is 
>> excited
>> to write Spark... with Go!
>>
>> On Wed, Jul 3, 2024 at 8:49 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> The Spark Connect Go client repository (
>>> https://github.com/apache/spark-connect-go) requires GitHub Actions
>>> runs for individual commits within contributors' PRs.
>>>
>>> This policy was intentionally applied (
>>> https://issues.apache.org/jira/browse/INFRA-24387), but we can
>>> change this default once we reach a consensus on it.
>>>
>>> I would like to allow GitHub Actions runs for contributors by
>>> default to make the development faster. For now, I have been approving
>>> individual commits in their PRs, and this becomes overhead.
>>>
>>> If you have any feedback on this, please let me know.
>>>
>>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-09 Thread huaxin gao
+1

On Tue, Jul 9, 2024 at 8:20 AM Xianjin YE  wrote:

> +1
>
> > On Jul 9, 2024, at 22:41, L. C. Hsieh  wrote:
> >
> > +1
> >
> > On Tue, Jul 9, 2024 at 1:13 AM Wenchen Fan  wrote:
> >>
> >> +1
> >>
> >> On Tue, Jul 9, 2024 at 10:47 AM Reynold Xin 
> wrote:
> >>>
> >>> +1
> >>>
> >>> On Mon, Jul 8, 2024 at 7:44 PM haydn  wrote:
> 
>  +1
> 
>  On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:
> >
> > +1
> >
> > On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN 
> wrote:
> >>
> >> +1
> >>
> >> On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li 
> wrote:
> >>>
> >>> +1
> >>>
> >>> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
> 
>  (I will leave this vote open till 10th July, considering that its
> holiday season in US)
> 
>  On Fri, 5 Jul 2024 at 06:12, Martin Grund 
> wrote:
> >
> > +1 (non-binding)
> >
> > On Thu, Jul 4, 2024 at 7:15 PM Holden Karau <
> holden.ka...@gmail.com> wrote:
> >>
> >> +1
> >>
> >> Although given its a US holiday maybe keep the vote open for an
> extra day?
> >>
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >>
> >>
> >> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee 
> wrote:
> >>>
> >>> +1 (non-binding)
> >>>
> >>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon <
> gurwls...@apache.org> wrote:
> 
>  Hi all,
> 
>  I’d like to start a vote for allowing GitHub Actions runs for
> contributors' PRs without approvals in apache/spark-connect-go.
> 
>  Please also refer to:
> 
>    - Discussion thread:
> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>    - JIRA ticket:
> https://issues.apache.org/jira/browse/INFRA-25936
> 
>  Please vote on the SPIP for the next 72 hours:
> 
>  [ ] +1: Accept the proposal
>  [ ] +0
>  [ ] -1: I don’t think this is a good idea because …
> 
>  Thank you!
> 
> >>
> >>
> >> --
> >> Takuya UESHIN
> >>
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-09 Thread Xianjin YE
+1

> On Jul 9, 2024, at 22:41, L. C. Hsieh  wrote:
> 
> +1
> 
> On Tue, Jul 9, 2024 at 1:13 AM Wenchen Fan  wrote:
>> 
>> +1
>> 
>> On Tue, Jul 9, 2024 at 10:47 AM Reynold Xin  
>> wrote:
>>> 
>>> +1
>>> 
>>> On Mon, Jul 8, 2024 at 7:44 PM haydn  wrote:
 
 +1
 
 On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:
> 
> +1
> 
> On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN  wrote:
>> 
>> +1
>> 
>> On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li  
>> wrote:
>>> 
>>> +1
>>> 
>>> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
 
 (I will leave this vote open till 10th July, considering that its 
 holiday season in US)
 
 On Fri, 5 Jul 2024 at 06:12, Martin Grund  
 wrote:
> 
> +1 (non-binding)
> 
> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau  
> wrote:
>> 
>> +1
>> 
>> Although given its a US holiday maybe keep the vote open for an 
>> extra day?
>> 
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): 
>> https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> 
>> 
>> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee  
>> wrote:
>>> 
>>> +1 (non-binding)
>>> 
>>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon  
>>> wrote:
 
 Hi all,
 
 I’d like to start a vote for allowing GitHub Actions runs for 
 contributors' PRs without approvals in apache/spark-connect-go.
 
 Please also refer to:
 
   - Discussion thread: 
 https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
   - JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
 
 Please vote on the SPIP for the next 72 hours:
 
 [ ] +1: Accept the proposal
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …
 
 Thank you!
 
>> 
>> 
>> --
>> Takuya UESHIN
>> 
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-09 Thread L. C. Hsieh
+1

On Tue, Jul 9, 2024 at 1:13 AM Wenchen Fan  wrote:
>
> +1
>
> On Tue, Jul 9, 2024 at 10:47 AM Reynold Xin  
> wrote:
>>
>> +1
>>
>> On Mon, Jul 8, 2024 at 7:44 PM haydn  wrote:
>>>
>>> +1
>>>
>>> On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:

 +1

 On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN  wrote:
>
> +1
>
> On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li  wrote:
>>
>> +1
>>
>> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
>>>
>>> (I will leave this vote open till 10th July, considering that its 
>>> holiday season in US)
>>>
>>> On Fri, 5 Jul 2024 at 06:12, Martin Grund  wrote:

 +1 (non-binding)

 On Thu, Jul 4, 2024 at 7:15 PM Holden Karau  
 wrote:
>
> +1
>
> Although given its a US holiday maybe keep the vote open for an extra 
> day?
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): 
> https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee  
> wrote:
>>
>> +1 (non-binding)
>>
>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon  
>> wrote:
>>>
>>> Hi all,
>>>
>>> I’d like to start a vote for allowing GitHub Actions runs for 
>>> contributors' PRs without approvals in apache/spark-connect-go.
>>>
>>> Please also refer to:
>>>
>>>- Discussion thread: 
>>> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>>>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thank you!
>>>
>
>
> --
> Takuya UESHIN
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Auto scaling support for structured streaming

2024-07-09 Thread Nimrod Ofek
PMC members, can someone please push this thing forward?

Thanks!
Nimrod

בתאריך יום ג׳, 9 ביולי 2024, 01:57, מאת Pavan Kotikalapudi ‏<
pkotikalap...@twilio.com>:

> Definitely!. We internally use it extensively in all our apps and would
> love to get community feedback.
>
> I think we have enough work done to move this feature forward.
> We had discussion and vote threads already published in the past, but we
> need enough backing/votes of the PMC members to take it to completion.
>
>
> cc: @Jungtaek Lim , @Mich Talebzadeh
>  mentors of this effort.
>
> Cheers,
>
> Pavan
>
>
>
>
> On Mon, Jul 8, 2024 at 10:33 AM Nimrod Ofek  wrote:
>
>> Hi,
>>
>> Thanks Pavan.
>>
>> I think that the change is very important due to the amount of Spark
>> structured streaming apps running today out there...
>> IMHO this should be introduced in the upcoming Spark 4.0.0 version as an
>> experimental feature for evaluation by the community...
>>
>> What should be the next steps to make sure the community gets this
>> important feature, at least to evaluate?
>> How can the community experiment with it to decide if it's good enough
>> for production use?
>>
>> Thanks!
>> Nimrod
>>
>> On Mon, Jul 8, 2024 at 4:10 PM Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>> Hi,
>>>
>>> I have taken up the responsibility for the development of that feature
>>> right now.
>>>
>>> Here is the current work https://github.com/apache/spark/pull/42352
>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIGKj5coZ$>
>>>
>>> last active email thread (maybe you want to reply to this): Re: Vote on
>>> Dynamic resource allocation for structured streaming [SPARK-24815]
>>> <https://urldefense.com/v3/__https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIPycqhCt$>
>>>
>>> doc: Dynamic resource allocation for structured streaming
>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIN-NBSBO$>
>>> .
>>>
>>> This still needs to be reviewed/approved by PMC members, so not sure
>>> about the timeline at this point.
>>>
>>> Thanks,
>>>
>>> Pavan
>>>
>>>
>>>
>>> On Thu, Jul 4, 2024 at 10:46 AM Nimrod Ofek 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I remember there was a discussion about better supporting auto scaling
>>>> for structured streaming.
>>>> Is there anything happening with that for the upcoming Spark 4.0
>>>> release?
>>>> Will there be support for auto scaling (at least on K8s) spark
>>>> structured streaming apps?
>>>>
>>>> Thanks,
>>>> Nimrod
>>>>
>>>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Zhou Jiang
+1 (non-binding)

On Thu, Jul 4, 2024 at 4:13 AM Hyukjin Kwon  wrote:

> Hi all,
>
> I’d like to start a vote for allowing GitHub Actions runs for
> contributors' PRs without approvals in apache/spark-connect-go.
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>
>

-- 
*Zhou JIANG*


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Wenchen Fan
+1

On Tue, Jul 9, 2024 at 10:47 AM Reynold Xin 
wrote:

> +1
>
> On Mon, Jul 8, 2024 at 7:44 PM haydn  wrote:
>
>> +1
>>
>> On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:
>>
>>> +1
>>>
>>> On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN 
>>> wrote:
>>>
 +1

 On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li 
 wrote:

> +1
>
> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
>
>> (I will leave this vote open till 10th July, considering that its
>> holiday season in US)
>>
>> On Fri, 5 Jul 2024 at 06:12, Martin Grund 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau 
>>> wrote:
>>>
 +1

 Although given its a US holiday maybe keep the vote open for an
 extra day?

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau


 On Thu, Jul 4, 2024 at 7:33 AM Denny Lee 
 wrote:

> +1 (non-binding)
>
> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for allowing GitHub Actions runs for
>> contributors' PRs without approvals in apache/spark-connect-go.
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>>- JIRA ticket:
>> https://issues.apache.org/jira/browse/INFRA-25936
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thank you!
>>
>>

 --
 Takuya UESHIN




Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Reynold Xin
+1

On Mon, Jul 8, 2024 at 7:44 PM haydn  wrote:

> +1
>
> On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:
>
>> +1
>>
>> On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN 
>> wrote:
>>
>>> +1
>>>
>>> On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li 
>>> wrote:
>>>
 +1

 Hyukjin Kwon  于2024年7月4日周四 16:54写道:

> (I will leave this vote open till 10th July, considering that its
> holiday season in US)
>
> On Fri, 5 Jul 2024 at 06:12, Martin Grund 
> wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau 
>> wrote:
>>
>>> +1
>>>
>>> Although given its a US holiday maybe keep the vote open for an
>>> extra day?
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee 
>>> wrote:
>>>
 +1 (non-binding)

 On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon 
 wrote:

> Hi all,
>
> I’d like to start a vote for allowing GitHub Actions runs for
> contributors' PRs without approvals in apache/spark-connect-go.
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>- JIRA ticket:
> https://issues.apache.org/jira/browse/INFRA-25936
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>
>
>>>
>>> --
>>> Takuya UESHIN
>>>
>>>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread haydn
+1

On Mon, Jul 8, 2024 at 7:41 PM haydn  wrote:

> +1
>
> On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN  wrote:
>
>> +1
>>
>> On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li 
>> wrote:
>>
>>> +1
>>>
>>> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
>>>
 (I will leave this vote open till 10th July, considering that its
 holiday season in US)

 On Fri, 5 Jul 2024 at 06:12, Martin Grund 
 wrote:

> +1 (non-binding)
>
> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau 
> wrote:
>
>> +1
>>
>> Although given its a US holiday maybe keep the vote open for an extra
>> day?
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 I’d like to start a vote for allowing GitHub Actions runs for
 contributors' PRs without approvals in apache/spark-connect-go.

 Please also refer to:

- Discussion thread:
 https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936

 Please vote on the SPIP for the next 72 hours:

 [ ] +1: Accept the proposal
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …

 Thank you!


>>
>> --
>> Takuya UESHIN
>>
>>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Takuya UESHIN
+1

On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li  wrote:

> +1
>
> Hyukjin Kwon  于2024年7月4日周四 16:54写道:
>
>> (I will leave this vote open till 10th July, considering that its holiday
>> season in US)
>>
>> On Fri, 5 Jul 2024 at 06:12, Martin Grund  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau 
>>> wrote:
>>>
 +1

 Although given its a US holiday maybe keep the vote open for an extra
 day?

 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau


 On Thu, Jul 4, 2024 at 7:33 AM Denny Lee  wrote:

> +1 (non-binding)
>
> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for allowing GitHub Actions runs for
>> contributors' PRs without approvals in apache/spark-connect-go.
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thank you!
>>
>>

-- 
Takuya UESHIN


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Yuanjian Li
+1

Hyukjin Kwon  于2024年7月4日周四 16:54写道:

> (I will leave this vote open till 10th July, considering that its holiday
> season in US)
>
> On Fri, 5 Jul 2024 at 06:12, Martin Grund  wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau 
>> wrote:
>>
>>> +1
>>>
>>> Although given its a US holiday maybe keep the vote open for an extra
>>> day?
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee  wrote:
>>>
 +1 (non-binding)

 On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon  wrote:

> Hi all,
>
> I’d like to start a vote for allowing GitHub Actions runs for
> contributors' PRs without approvals in apache/spark-connect-go.
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>
>


Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Ruifeng Zheng
+1

On Sat, Jul 6, 2024 at 4:45 AM bo yang  wrote:

> +1 This is a great suggestion, thanks Hyukjin!
>
>
> On Thu, Jul 4, 2024 at 4:11 AM Hyukjin Kwon  wrote:
>
>> Alright! let me start the vote!
>>
>> On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh 
>> wrote:
>>
>>> A good point agreed.
>>>
>>> Mich Talebzadeh,
>>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>>> PhD  Imperial
>>> College London 
>>> London, United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> Von Braun
>>> )".
>>>
>>>
>>> On Thu, 4 Jul 2024 at 06:14, Martin Grund 
>>> wrote:
>>>
 Absolutely we should do that. I thought that the default rule was
 inclusive already so that once folks have their first contribution it would
 automatically allow kicking of the workflows.

 On Thu, Jul 4, 2024 at 04:20 Matthew Powers <
 matthewkevinpow...@gmail.com> wrote:

> Yea, this would be great.
>
> spark-connect-go is still experimental and anything we can do to get
> it production grade would be a great step IMO.  The Go community is 
> excited
> to write Spark... with Go!
>
> On Wed, Jul 3, 2024 at 8:49 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> The Spark Connect Go client repository (
>> https://github.com/apache/spark-connect-go) requires GitHub Actions
>> runs for individual commits within contributors' PRs.
>>
>> This policy was intentionally applied (
>> https://issues.apache.org/jira/browse/INFRA-24387), but we can
>> change this default once we reach a consensus on it.
>>
>> I would like to allow GitHub Actions runs for contributors by default
>> to make the development faster. For now, I have been approving individual
>> commits in their PRs, and this becomes overhead.
>>
>> If you have any feedback on this, please let me know.
>>
>


Re: [DISCUSS] Auto scaling support for structured streaming

2024-07-08 Thread Pavan Kotikalapudi
Definitely!. We internally use it extensively in all our apps and would
love to get community feedback.

I think we have enough work done to move this feature forward.
We had discussion and vote threads already published in the past, but we
need enough backing/votes of the PMC members to take it to completion.


cc: @Jungtaek Lim , @Mich Talebzadeh
 mentors of this effort.

Cheers,

Pavan




On Mon, Jul 8, 2024 at 10:33 AM Nimrod Ofek  wrote:

> Hi,
>
> Thanks Pavan.
>
> I think that the change is very important due to the amount of Spark
> structured streaming apps running today out there...
> IMHO this should be introduced in the upcoming Spark 4.0.0 version as an
> experimental feature for evaluation by the community...
>
> What should be the next steps to make sure the community gets this
> important feature, at least to evaluate?
> How can the community experiment with it to decide if it's good enough for
> production use?
>
> Thanks!
> Nimrod
>
> On Mon, Jul 8, 2024 at 4:10 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com> wrote:
>
>> Hi,
>>
>> I have taken up the responsibility for the development of that feature
>> right now.
>>
>> Here is the current work https://github.com/apache/spark/pull/42352
>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIGKj5coZ$>
>>
>> last active email thread (maybe you want to reply to this): Re: Vote on
>> Dynamic resource allocation for structured streaming [SPARK-24815]
>> <https://urldefense.com/v3/__https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIPycqhCt$>
>>
>> doc: Dynamic resource allocation for structured streaming
>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit__;!!NCc8flgU!epwFOKDnUs1tKQwFN1ZD0m0aVwHaHbrrZTjf32J7XWyimbwdEN8gp9lxpAKtEnhurALB3fHf_BDNI6PVIN-NBSBO$>
>> .
>>
>> This still needs to be reviewed/approved by PMC members, so not sure
>> about the timeline at this point.
>>
>> Thanks,
>>
>> Pavan
>>
>>
>>
>> On Thu, Jul 4, 2024 at 10:46 AM Nimrod Ofek 
>> wrote:
>>
>>> Hi,
>>>
>>> I remember there was a discussion about better supporting auto scaling
>>> for structured streaming.
>>> Is there anything happening with that for the upcoming Spark 4.0
>>> release?
>>> Will there be support for auto scaling (at least on K8s) spark
>>> structured streaming apps?
>>>
>>> Thanks,
>>> Nimrod
>>>
>>


Re: [DISCUSS] Auto scaling support for structured streaming

2024-07-08 Thread Nimrod Ofek
Hi,

Thanks Pavan.

I think that the change is very important due to the amount of Spark
structured streaming apps running today out there...
IMHO this should be introduced in the upcoming Spark 4.0.0 version as an
experimental feature for evaluation by the community...

What should be the next steps to make sure the community gets this
important feature, at least to evaluate?
How can the community experiment with it to decide if it's good enough for
production use?

Thanks!
Nimrod

On Mon, Jul 8, 2024 at 4:10 PM Pavan Kotikalapudi 
wrote:

> Hi,
>
> I have taken up the responsibility for the development of that feature
> right now.
>
> Here is the current work https://github.com/apache/spark/pull/42352
>
> last active email thread (maybe you want to reply to this): Re: Vote on
> Dynamic resource allocation for structured streaming [SPARK-24815]
> <https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr>
>
> doc: Dynamic resource allocation for structured streaming
> <https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit>
> .
>
> This still needs to be reviewed/approved by PMC members, so not sure about
> the timeline at this point.
>
> Thanks,
>
> Pavan
>
>
>
> On Thu, Jul 4, 2024 at 10:46 AM Nimrod Ofek  wrote:
>
>> Hi,
>>
>> I remember there was a discussion about better supporting auto scaling
>> for structured streaming.
>> Is there anything happening with that for the upcoming Spark 4.0 release?
>> Will there be support for auto scaling (at least on K8s) spark structured
>> streaming apps?
>>
>> Thanks,
>> Nimrod
>>
>


  1   2   3   4   5   6   7   8   9   10   >