Re: [VOTE] Release Spark 3.4.4 (RC1)
+1 *Zhou JIANG* On Mon, Oct 21, 2024 at 11:04 Dongjoon Hyun wrote: > +1 > > Dongjoon Hyun. > > On 2024/10/21 06:58:17 Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > > 3.4.4. > > > > The vote is open until October 25th 1AM (PDT) and passes if a majority +1 > > PMC > > votes are cast, with a minimum of 3 +1 votes. > > > > [ ] +1 Release this package as Apache Spark 3.4.4 > > [ ] -1 Do not release this package because ... > > > > To learn more about Apache Spark, please see https://spark.apache.org/ > > > > The tag to be voted on is v3.4.4-rc1 (commit > > 6729992c76fc59ab07f63f97a9858691274447d0) > > https://github.com/apache/spark/tree/v3.4.4-rc1 > > > > The release files, including signatures, digests, etc. can be found at: > > https://dist.apache.org/repos/dist/dev/spark/v3.4.4-rc1-bin/ > > > > Signatures used for Spark RCs can be found in this file: > > https://dist.apache.org/repos/dist/dev/spark/KEYS > > > > The staging repository for this release can be found at: > > https://repository.apache.org/content/repositories/orgapachespark-1470/ > > > > The documentation corresponding to this release can be found at: > > https://dist.apache.org/repos/dist/dev/spark/v3.4.4-rc1-docs/ > > > > The list of bug fixes going into 3.4.4 can be found at the following URL: > > https://issues.apache.org/jira/projects/SPARK/versions/12354565 > > > > This release is using the release script of the tag v3.4.4-rc1. > > > > FAQ > > > > = > > How can I help test this release? > > = > > > > If you are a Spark user, you can help us test this release by taking > > an existing Spark workload and running on this release candidate, then > > reporting any regressions. > > > > If you're working in PySpark you can set up a virtual env and install > > the current RC and see if anything important breaks, in the Java/Scala > > you can add the staging repository to your projects resolvers and test > > with the RC (make sure to clean up the artifact cache before/after so > > you don't end up building with a out of date RC going forward). > > > > === > > What should happen to JIRA tickets still targeting 3.4.4? > > === > > > > The current list of open tickets targeted at 3.4.4 can be found at: > > https://issues.apache.org/jira/projects/SPARK and search for "Target > > Version/s" = 3.4.4 > > > > Committers should look at those and triage. Extremely important bug > > fixes, documentation, and API tweaks that impact compatibility should > > be worked on immediately. Everything else please retarget to an > > appropriate release. > > > > == > > But my bug isn't fixed? > > == > > > > In order to make timely releases, we will typically not hold the > > release unless the bug in question is a regression from the previous > > release. That being said, if there is something which is a regression > > that has not been correctly targeted please ping me or a committer to > > help target the issue. > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)
+ 1 Sent from my iPhone > On Sep 16, 2024, at 01:04, Dongjoon Hyun wrote: > > > Please vote on releasing the following candidate as Apache Spark version > 4.0.0-preview2. > > The vote is open until September 20th 1AM (PDT) and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 4.0.0-preview2 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see https://spark.apache.org/ > > The tag to be voted on is v4.0.0-preview2-rc1 (commit > f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a) > https://github.com/apache/spark/tree/v4.0.0-preview2-rc1 > > The release files, including signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/ > > Signatures used for Spark RCs can be found in this file: > https://dist.apache.org/repos/dist/dev/spark/KEYS > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1468/ > > The documentation corresponding to this release can be found at: > https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/ > > The list of bug fixes going into 4.0.0-preview2 can be found at the following > URL: > https://issues.apache.org/jira/projects/SPARK/versions/12353359 > > This release is using the release script of the tag v4.0.0-preview2-rc1. > > FAQ > > = > How can I help test this release? > = > > If you are a Spark user, you can help us test this release by taking > an existing Spark workload and running on this release candidate, then > reporting any regressions. > > If you're working in PySpark you can set up a virtual env and install > the current RC and see if anything important breaks, in the Java/Scala > you can add the staging repository to your projects resolvers and test > with the RC (make sure to clean up the artifact cache before/after so > you don't end up building with a out of date RC going forward).
Re: [VOTE] Release Apache Spark 3.5.3 (RC3)
+1 *Zhou JIANG* On Wed, Sep 11, 2024 at 17:29 Ruifeng Zheng wrote: > +1 > > On Thu, Sep 12, 2024 at 2:36 AM L. C. Hsieh wrote: > >> +1 >> >> Thanks. >> >> On Wed, Sep 11, 2024 at 10:41 AM Dongjoon Hyun >> wrote: >> > >> > +1 >> > >> > Dongjoon >> > >> > On 2024/09/11 13:51:23 Herman van Hovell wrote: >> > > +1 >> > > >> > > On Wed, Sep 11, 2024 at 3:30 AM Kent Yao wrote: >> > > >> > > > +1, thank you, Haejoon >> > > > Kent >> > > > >> > > > On 2024/09/11 06:12:19 Gengliang Wang wrote: >> > > > > +1 >> > > > > >> > > > > On Mon, Sep 9, 2024 at 6:01 PM Wenchen Fan >> wrote: >> > > > > >> > > > > > +1 >> > > > > > >> > > > > > On Tue, Sep 10, 2024 at 7:42 AM Rui Wang < >> rui.w...@databricks.com >> > > > .invalid> >> > > > > > wrote: >> > > > > > >> > > > > >> +1 (non-binding) >> > > > > >> >> > > > > >> >> > > > > >> -Rui >> > > > > >> >> > > > > >> On Mon, Sep 9, 2024 at 4:22 PM Hyukjin Kwon < >> gurwls...@apache.org> >> > > > wrote: >> > > > > >> >> > > > > >>> +1 >> > > > > >>> >> > > > > >>> On Tue, Sep 10, 2024 at 5:39 AM Haejoon Lee >> > > > > >>> wrote: >> > > > > >>> >> > > > > >>>> Hi, dev! >> > > > > >>>> >> > > > > >>>> Please vote on releasing the following candidate as Apache >> Spark >> > > > > >>>> version 3.5.3 (RC3). >> > > > > >>>> >> > > > > >>>> The vote is open for next 72 hours, and passes if a majority >> +1 PMC >> > > > > >>>> votes are cast, with a minimum of 3 +1 votes. >> > > > > >>>> >> > > > > >>>> [ ] +1 Release this package as Apache Spark 3.5.3 >> > > > > >>>> [ ] -1 Do not release this package because ... >> > > > > >>>> >> > > > > >>>> To learn more about Apache Spark, please see >> > > > https://spark.apache.org/ >> > > > > >>>> >> > > > > >>>> The tag to be voted on is v3.5.3-rc3 (commit >> > > > > >>>> 32232e9ed33bb16b93ad58cfde8b82e0f07c0970): >> > > > > >>>> https://github.com/apache/spark/tree/v3.5.3-rc3 >> > > > > >>>> >> > > > > >>>> The release files, including signatures, digests, etc. can >> be found >> > > > at: >> > > > > >>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.3-rc3-bin/ >> > > > > >>>> >> > > > > >>>> Signatures used for Spark RCs can be found in this file: >> > > > > >>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >> > > > > >>>> >> > > > > >>>> The staging repository for this release can be found at: >> > > > > >>>> >> > > > >> https://repository.apache.org/content/repositories/orgapachespark-1467/ >> > > > > >>>> >> > > > > >>>> The documentation corresponding to this release can be found >> at: >> > > > > >>>> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.3-rc3-docs/ >> > > > > >>>> >> > > > > >>>> The list of bug fixes going into 3.5.3 can be found at the >> following >> > > > > >>>> URL: >> > > > > >>>> >> https://issues.apache.org/jira/projects/SPARK/versions/12354954 >> > > > > >>>> >> > > > > >>>> FAQ >> > > > > >>>> >> > > > > >>>> = >> > > > > >>>> How can I help test this release? >> > > > > >>>> = >> >
Re: [VOTE] Deprecate SparkR
+1 (non-binding) On Wed, Aug 21, 2024 at 7:23 PM Xiao Li wrote: > +1 > > Hyukjin Kwon 于2024年8月21日周三 16:46写道: > >> +1 >> >> On Thu, 22 Aug 2024 at 05:37, Dongjoon Hyun wrote: >> >>> +1 >>> >>> Dongjoon >>> >>> On 2024/08/21 19:00:46 Holden Karau wrote: >>> > +1 >>> > >>> > Twitter: https://twitter.com/holdenkarau >>> > Books (Learning Spark, High Performance Spark, etc.): >>> > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> > Pronouns: she/her >>> > >>> > >>> > On Wed, Aug 21, 2024 at 8:59 PM Herman van Hovell >>> > wrote: >>> > >>> > > +1 >>> > > >>> > > On Wed, Aug 21, 2024 at 2:55 PM Martin Grund >>> >>> > > wrote: >>> > > >>> > >> +1 >>> > >> >>> > >> On Wed, Aug 21, 2024 at 20:26 Xiangrui Meng >>> wrote: >>> > >> >>> > >>> +1 >>> > >>> >>> > >>> On Wed, Aug 21, 2024, 10:24 AM Mridul Muralidharan < >>> mri...@gmail.com> >>> > >>> wrote: >>> > >>> >>> > >>>> +1 >>> > >>>> >>> > >>>> >>> > >>>> Regards, >>> > >>>> Mridul >>> > >>>> >>> > >>>> >>> > >>>> On Wed, Aug 21, 2024 at 11:46 AM Reynold Xin >>> > >>>> wrote: >>> > >>>> >>> > >>>>> +1 >>> > >>>>> >>> > >>>>> On Wed, Aug 21, 2024 at 6:42 PM Shivaram Venkataraman < >>> > >>>>> shivaram.venkatara...@gmail.com> wrote: >>> > >>>>> >>> > >>>>>> Hi all >>> > >>>>>> >>> > >>>>>> Based on the previous discussion thread [1], I hereby call a >>> vote to >>> > >>>>>> deprecate the SparkR module in Apache Spark with the upcoming >>> Spark 4 >>> > >>>>>> release and remove it in the next major release Spark 5. >>> > >>>>>> >>> > >>>>>> [ ] +1: Accept the proposal >>> > >>>>>> [ ] +0 >>> > >>>>>> [ ] -1: I don’t think this is a good idea because .. >>> > >>>>>> >>> > >>>>>> This vote will be open for the next 72 hours >>> > >>>>>> >>> > >>>>>> Thanks >>> > >>>>>> Shivaram >>> > >>>>>> >>> > >>>>>> [1] >>> https://lists.apache.org/thread/qjgsgxklvpvyvbzsx1qr8o533j4zjlm5 >>> > >>>>>> >>> > >>>>> >>> > >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> -- *Zhou JIANG*
Re: [VOTE] Release Spark 3.5.2 (RC5)
+1 (non-binding) - thanks for the new RC!Sent from my iPhoneOn Aug 9, 2024, at 10:06, Gengliang Wang wrote:+1. Thanks for creating this new RC. I confirmed that SPARK-49054 is fixed.On Fri, Aug 9, 2024 at 6:54 AM Wenchen Fanwrote:+1On Fri, Aug 9, 2024 at 6:04 PM Peter Toth wrote:+1huaxin gao ezt írta (időpont: 2024. aug. 8., Cs, 21:19):+1On Thu, Aug 8, 2024 at 11:41 AM L. C. Hsieh wrote:Then, +1 again On Thu, Aug 8, 2024 at 11:38 AM Dongjoon Hyun wrote: > > +1 > > I'm resending my vote. > > Dongjoon. > > On 2024/08/06 16:06:00 Kent Yao wrote: > > Hi dev, > > > > Please vote on releasing the following candidate as Apache Spark version 3.5.2. > > > > The vote is open until Aug 9, 17:00:00 UTC, and passes if a majority +1 > > PMC votes are cast, with a minimum of 3 +1 votes. > > > > [ ] +1 Release this package as Apache Spark 3.5.2 > > [ ] -1 Do not release this package because ... > > > > To learn more about Apache Spark, please see https://spark.apache.org/ > > > > The tag to be voted on is v3.5.2-rc5 (commit > > bb7846dd487f259994fdc69e18e03382e3f64f42): > > https://github.com/apache/spark/tree/v3.5.2-rc5 > > > > The release files, including signatures, digests, etc. can be found at: > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-bin/ > > > > Signatures used for Spark RCs can be found in this file: > > https://dist.apache.org/repos/dist/dev/spark/KEYS > > > > The staging repository for this release can be found at: > > https://repository.apache.org/content/repositories/orgapachespark-1462/ > > > > The documentation corresponding to this release can be found at: > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-docs/ > > > > The list of bug fixes going into 3.5.2 can be found at the following URL: > > https://issues.apache.org/jira/projects/SPARK/versions/12353980 > > > > FAQ > > > > = > > How can I help test this release? > > = > > > > If you are a Spark user, you can help us test this release by taking > > an existing Spark workload and running on this release candidate, then > > reporting any regressions. > > > > If you're working in PySpark you can set up a virtual env and install > > the current RC via "pip install > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-bin/pyspark-3.5.2.tar.gz" > > and see if anything important breaks. > > In the Java/Scala, you can add the staging repository to your projects > > resolvers and test > > with the RC (make sure to clean up the artifact cache before/after so > > you don't end up building with an out of date RC going forward). > > > > === > > What should happen to JIRA tickets still targeting 3.5.2? > > === > > > > The current list of open tickets targeted at 3.5.2 can be found at: > > https://issues.apache.org/jira/projects/SPARK and search for > > "Target Version/s" = 3.5.2 > > > > Committers should look at those and triage. Extremely important bug > > fixes, documentation, and API tweaks that impact compatibility should > > be worked on immediately. Everything else please retarget to an > > appropriate release. > > > > == > > But my bug isn't fixed? > > == > > > > In order to make timely releases, we will typically not hold the > > release unless the bug in question is a regression from the previous > > release. That being said, if there is something which is a regression > > that has not been correctly targeted please ping me or a committer to > > help target the issue. > > > > Thanks, > > Kent Yao > > > > - > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [VOTE] Release Spark 3.5.2 (RC4)
+1 (non-binding) *Zhou JIANG* On Mon, Jul 29, 2024 at 11:06 L. C. Hsieh wrote: > +1 > > On Mon, Jul 29, 2024 at 7:33 AM Wenchen Fan wrote: > > > > +1 > > > > On Sat, Jul 27, 2024 at 10:03 AM Dongjoon Hyun > wrote: > >> > >> +1 > >> > >> Thank you, Kent. > >> > >> Dongjoon. > >> > >> On Fri, Jul 26, 2024 at 6:37 AM Kent Yao wrote: > >>> > >>> Hi dev, > >>> > >>> Please vote on releasing the following candidate as Apache Spark > version 3.5.2. > >>> > >>> The vote is open until Jul 29, 14:00:00 UTC, and passes if a majority > +1 > >>> PMC votes are cast, with a minimum of 3 +1 votes. > >>> > >>> [ ] +1 Release this package as Apache Spark 3.5.2 > >>> [ ] -1 Do not release this package because ... > >>> > >>> To learn more about Apache Spark, please see https://spark.apache.org/ > >>> > >>> The tag to be voted on is v3.5.2-rc4 (commit > >>> 1edbddfadeb46581134fa477d35399ddc63b7163): > >>> https://github.com/apache/spark/tree/v3.5.2-rc4 > >>> > >>> The release files, including signatures, digests, etc. can be found at: > >>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-bin/ > >>> > >>> Signatures used for Spark RCs can be found in this file: > >>> https://dist.apache.org/repos/dist/dev/spark/KEYS > >>> > >>> The staging repository for this release can be found at: > >>> > https://repository.apache.org/content/repositories/orgapachespark-1460/ > >>> > >>> The documentation corresponding to this release can be found at: > >>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-docs/ > >>> > >>> The list of bug fixes going into 3.5.2 can be found at the following > URL: > >>> https://issues.apache.org/jira/projects/SPARK/versions/12353980 > >>> > >>> FAQ > >>> > >>> = > >>> How can I help test this release? > >>> = > >>> > >>> If you are a Spark user, you can help us test this release by taking > >>> an existing Spark workload and running on this release candidate, then > >>> reporting any regressions. > >>> > >>> If you're working in PySpark you can set up a virtual env and install > >>> the current RC via "pip install > >>> > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc4-bin/pyspark-3.5.2.tar.gz > " > >>> and see if anything important breaks. > >>> In the Java/Scala, you can add the staging repository to your projects > >>> resolvers and test > >>> with the RC (make sure to clean up the artifact cache before/after so > >>> you don't end up building with an out of date RC going forward). > >>> > >>> === > >>> What should happen to JIRA tickets still targeting 3.5.2? > >>> === > >>> > >>> The current list of open tickets targeted at 3.5.2 can be found at: > >>> https://issues.apache.org/jira/projects/SPARK and search for > >>> "Target Version/s" = 3.5.2 > >>> > >>> Committers should look at those and triage. Extremely important bug > >>> fixes, documentation, and API tweaks that impact compatibility should > >>> be worked on immediately. Everything else please retarget to an > >>> appropriate release. > >>> > >>> == > >>> But my bug isn't fixed? > >>> == > >>> > >>> In order to make timely releases, we will typically not hold the > >>> release unless the bug in question is a regression from the previous > >>> release. That being said, if there is something which is a regression > >>> that has not been correctly targeted please ping me or a committer to > >>> help target the issue. > >>> > >>> Thanks, > >>> Kent Yao > >>> > >>> - > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)
+1 (non-binding) *Zhou JIANG* On Tue, Jul 23, 2024 at 20:41 L. C. Hsieh wrote: > +1 > > Thanks. > > On Tue, Jul 23, 2024 at 8:35 PM Dongjoon Hyun wrote: > > > > +1 > > > > Dongjoon. > > > > On 2024/07/24 03:28:58 Wenchen Fan wrote: > > > +1 > > > > > > On Wed, Jul 24, 2024 at 10:51 AM Kent Yao wrote: > > > > > > > +1(non-binding), I have checked: > > > > > > > > - Download links are OK > > > > - Signatures, Checksums, and the KEYS file are OK > > > > - LICENSE and NOTICE are present > > > > - No unexpected binary files in source releases > > > > - Successfully built from source > > > > > > > > Thanks, > > > > Kent Yao > > > > > > > > On 2024/07/23 06:55:28 yangjie01 wrote: > > > > > +1, Thanks Kent Yao ~ > > > > > > > > > > 在 2024/7/22 17:01,“Kent Yao” y...@apache.org>> > > > > 写入: > > > > > > > > > > > > > > > Hi dev, > > > > > > > > > > > > > > > Please vote on releasing the following candidate as Apache Spark > version > > > > 3.5.2. > > > > > > > > > > > > > > > The vote is open until Jul 25, 09:00:00 AM UTC, and passes if a > majority > > > > +1 > > > > > PMC votes are cast, with > > > > > a minimum of 3 +1 votes. > > > > > > > > > > > > > > > [ ] +1 Release this package as Apache Spark 3.5.2 > > > > > [ ] -1 Do not release this package because ... > > > > > > > > > > > > > > > To learn more about Apache Spark, please see > https://spark.apache.org/ < > > > > https://spark.apache.org/> > > > > > > > > > > > > > > > The tag to be voted on is v3.5.2-rc2 (commit > > > > > 6d8f511430881fa7a3203405260da174df424103): > > > > > https://github.com/apache/spark/tree/v3.5.2-rc2 < > > > > https://github.com/apache/spark/tree/v3.5.2-rc2> > > > > > > > > > > > > > > > The release files, including signatures, digests, etc. can be > found at: > > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/ < > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/> > > > > > > > > > > > > > > > Signatures used for Spark RCs can be found in this file: > > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS < > > > > https://dist.apache.org/repos/dist/dev/spark/KEYS> > > > > > > > > > > > > > > > The staging repository for this release can be found at: > > > > > > https://repository.apache.org/content/repositories/orgapachespark-1458/ > > > > < > https://repository.apache.org/content/repositories/orgapachespark-1458/> > > > > > > > > > > > > > > > The documentation corresponding to this release can be found at: > > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/ < > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-docs/> > > > > > > > > > > > > > > > The list of bug fixes going into 3.5.2 can be found at the > following URL: > > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980 < > > > > https://issues.apache.org/jira/projects/SPARK/versions/12353980> > > > > > > > > > > > > > > > FAQ > > > > > > > > > > > > > > > = > > > > > How can I help test this release? > > > > > = > > > > > > > > > > > > > > > If you are a Spark user, you can help us test this release by > taking > > > > > an existing Spark workload and running on this release candidate, > then > > > > > reporting any regressions. > > > > > > > > > > > > > > > If you're working in PySpark you can set up a virtual env and > install > > > > > the current RC via "pip install > > > > > > > > > > https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc2-bin/pyspark-3.5.2.tar.gz > " > > > > < > > > > > https:/
Re: [DISCUSS] Release Apache Spark 3.5.2
+1 for releasing 3.5.2, which would also benefit the Spark Operator multi-version support. On Thu, Jul 11, 2024 at 7:56 AM Dongjoon Hyun wrote: > Thank you for the head-up and volunteering, Kent. > > +1 for 3.5.2 release. > > I can help you with the release steps which require Spark PMC permissions. > > Please let me know if you have any questions or hit any issues. > > Thanks, > Dongjoon. > > > On Thu, Jul 11, 2024 at 2:04 AM Kent Yao wrote: > >> Hi dev, >> >> It's been approximately 5 months since Feb 23, 2024, when >> we released version 3.5.1 for branch-3.5. The patchset differing >> from 3.5.1 has grown significantly, now consisting of over 160 >> commits. >> >> The JIRA[2] also indicates that more than 120 resolved tickets are aimed >> at version 3.5.2, including some blockers and critical issues. >> >> What do you think about releasing 3.5.2? I am volunteering to take on >> the role of >> release manager for 3.5.2. >> >> >> Bests, >> Kent Yao >> >> [1] https://spark.apache.org/news/spark-3-5-1-released.html >> [2] https://issues.apache.org/jira/projects/SPARK/versions/12353980 >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- *Zhou JIANG*
Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go
+1 (non-binding) On Thu, Jul 4, 2024 at 4:13 AM Hyukjin Kwon wrote: > Hi all, > > I’d like to start a vote for allowing GitHub Actions runs for > contributors' PRs without approvals in apache/spark-connect-go. > > Please also refer to: > >- Discussion thread: > https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420 >- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936 > > Please vote on the SPIP for the next 72 hours: > > [ ] +1: Accept the proposal > [ ] +0 > [ ] -1: I don’t think this is a good idea because … > > Thank you! > > -- *Zhou JIANG*
Re: [VOTE] SPIP: Stored Procedures API for Catalogs
+1 (non-binding) On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh wrote: > Hi all, > > I’d like to start a vote for SPIP: Stored Procedures API for Catalogs. > > Please also refer to: > >- Discussion thread: > https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo >- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167 >- SPIP doc: > https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/ > > > Please vote on the SPIP for the next 72 hours: > > [ ] +1: Accept the proposal as an official SPIP > [ ] +0 > [ ] -1: I don’t think this is a good idea because … > > > Thank you! > > Liang-Chi Hsieh > > ----- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- *Zhou JIANG*
Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false
+1 (non-binding) On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun wrote: > I'll start with my +1. > > Dongjoon. > > On 2024/04/26 16:45:51 Dongjoon Hyun wrote: > > Please vote on SPARK-46122 to set > spark.sql.legacy.createHiveTableByDefault > > to `false` by default. The technical scope is defined in the following > PR. > > > > - DISCUSSION: > > https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122 > > - PR: https://github.com/apache/spark/pull/46207 > > > > The vote is open until April 30th 1AM (PST) and passes > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. > > > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because > ... > > > > Thank you in advance. > > > > Dongjoon > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- *Zhou JIANG*
Ready for Review: spark-kubernetes-operator Alpha Release
Hi dev members, I am writing to let you know that the first pull request has been raised to the newly established spark-kubernetes-operator, as previously discussed within the group. This PR includes the alpha version release of this project. https://github.com/apache/spark-kubernetes-operator/pull/2 Here are some key highlights of the PR: * Introduction of the alpha version of spark-kubernetes-operator. * Start & stop Spark apps with simple yaml schema * Deploy and monitor SparkApplications throughout its lifecycle * Version agnostic for Spark 3.2 and above * Full logging and metrics integration * Flexible deployments and native integration with Kubernetes tooling To facilitate the review process, we have provided detailed documentation and comments within the PR. This PR also includes contributions from Qi Tan, Shruti Gumma, Nishchal Venkataramana and Swami Jayaraman, whose efforts have been instrumental in reaching this stage of the project. We are currently in the phase of actively developing and refining the project. This includes extensive testing across diverse workloads and the integration of additional test frameworks to ensure the robustness and reliability of Spark application. We are calling for reviews and inputs on this PR. Please feel free to provide any suggestions, concerns, or feedback that could help to improve the quality and functionality of the project. We look forward to your feedback. -- *Zhou JIANG*
Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark
Hi Shiqi, Thanks for the cross-posting here - sorry for the response delay during the holiday break :) We prefer Java for the operator project as it's JVM-based and widely familiar within the Spark community. This choice aims to facilitate better adoption and ease of onboarding for future maintainers. In addition, the Java API client can also be considered as a mature option widely used, by Spark itself and by other operator implementations like Flink. For easier onboarding and potential migration, we'll consider compatibility with existing CRD designs - the goal is to maintain compatibility as best as possible while minimizing duplication efforts. I'm enthusiastic about the idea of lean, version agnostic submission worker. It aligns with one of the primary goals in the operator design. Let's continue exploring this idea further in design doc. Thanks, Zhou On Wed, Nov 22, 2023 at 3:35 PM Shiqi Sun wrote: > Hi all, > > Sorry for being late to the party. I went through the SPIP doc and I think > this is a great proposal! I left a comment in the SPIP doc a couple days > ago, but I don't see much activity there and no one replied, so I wanted to > cross-post it here to get some feedback. > > I'm Shiqi Sun, and I work for Big Data Platform in Salesforce. My team has > been running the Spark on k8s operator > <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator> (OSS from > Google) in my company to serve Spark users on production for 4+ years, and > we've been actively contributing to the Spark on k8s operator OSS and also, > occasionally, the Spark OSS. According to our experience, Google's Spark > Operator has its own problems, like its close coupling with the spark > version, as well as the JVM overhead during job submission. However on the > other side, it's been a great component in our team's service in the > company, especially being written in golang, it's really easy to have it > interact with k8s, and also its CRD covers a lot of different use cases, as > it has been built up through time thanks to many users' contribution during > these years. There were also a handful of sessions of Google's Spark > Operator Spark Summit that made it widely adopted. > > For this SPIP, I really love the idea of this proposal for the official > k8s operator of Spark project, as well as the separate layer of the > submission worker and being spark version agnostic. I think we can get the > best of the two: > 1. I would advocate the new project to still use golang for the > implementation, as golang is the go-to cloud native language that works the > best with k8s. > 2. We make sure the functionality of the current Google's spark operator > CRD is preserved in the new official Spark Operator; if we can make it > compatible or even merge the two projects to make it the new official > operator in spark project, it would be the best. > 3. The new Spark Operator should continue being spark agnostic and > continue having this lightweight/separate layer of submission worker. We've > seen scalability issues caused by the heavy JVM during spark-submit in > Google's Spark Operator and we implemented an internal version of fix for > it within our company. > > We can continue the discussion in more detail, but generally I love this > move of the official spark operator, and I really appreciate the effort! In > the SPIP doc. I see my comment has gained several upvotes from someone I > don't know, so I believe there are other spark/spark operator users who > agree with some of my points. Let me know what you all think and let's > continue the discussion, so that we can make this operator a great new > component of the Open Source Spark Project! > > Thanks! > > Shiqi > > On Mon, Nov 13, 2023 at 11:50 PM L. C. Hsieh wrote: > >> Thanks for all the support from the community for the SPIP proposal. >> >> Since all questions/discussion are settled down (if I didn't miss any >> major ones), if no more questions or concerns, I'll be the shepherd >> for this SPIP proposal and call for a vote tomorrow. >> >> Thank you all! >> >> On Mon, Nov 13, 2023 at 6:43 PM Zhou Jiang >> wrote: >> > >> > Hi Holden, >> > >> > Thanks a lot for your feedback! >> > Yes, this proposal attempts to integrate existing solutions, especially >> from CRD perspective. The proposed schema retains similarity with current >> designs, while reducing duplicates and maintaining a single source of truth >> from conf properties. It also tends to be close to native integration with >> k8s to minimize schema changes for new features. >> > For dependencies, packing
Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark
Hi Holden, Thanks a lot for your feedback! Yes, this proposal attempts to integrate existing solutions, especially from CRD perspective. The proposed schema retains similarity with current designs, while reducing duplicates and maintaining a single source of truth from conf properties. It also tends to be close to native integration with k8s to minimize schema changes for new features. For dependencies, packing everything is the easiest way to get started. It would be straightforward to add --packages and --repositories support for Maven dependencies. It's technically possible to pull dependencies in cloud storage from init containers (if defined by user). It could be tricky to design a general solution that supports different cloud providers from the operator layer. An enhancement that I can think of is to add support for profile scripts that can enable additional user-defined actions in application containers. Operator does not have to build everything for k8s version compatibility. Similar to Spark, operator can be built on Fabric8 client( https://github.com/fabric8io/kubernetes-client) for support across versions, given that it makes similar API calls for resource management as Spark. For tests, in addition to fabric8 mock server, we may also borrow the idea from Flink operator to start minikube cluster for integration tests. This operator is not starting from scratch as it is derived from an internal project which has been working in prod scale for a few years. It aims to include a few new features / enhancements, and a few re-architecture mostly to incorporate lessons learnt for designing CRD / API perspective. Benchmarking operator performance alone can be nuanced, often tied to the underlying cluster. There's a testing strategy that Aaruna & I discussed in a previous Data AI summit, involves scheduling wide (massive light-weight applications) and deep (single application request a lot of executors with heavy IO) cases, revealing typical bottlenecks at the k8s API server and scheduler performance.Similar tests can be performed for this as well. On Sun, Nov 12, 2023 at 4:32 PM Holden Karau wrote: > To be clear: I am generally supportive of the idea (+1) but have some > follow-up questions: > > Have we taken the time to learn from the other operators? Do we have a > compatible CRD/API or not (and if so why?) > The API seems to assume that everything is packaged in the container in > advance, but I imagine that might not be the case for many folks who have > Java or Python packages published to cloud storage and they want to use? > What's our plan for the testing on the potential version explosion (not > tying ourselves to operator version -> spark version makes a lot of sense, > but how do we reasonably assure ourselves that the cross product of > Operator Version, Kube Version, and Spark Version all function)? Do we have > CI resources for this? > Is there a current (non-open source operator) that folks from Apple are > using and planning to open source, or is this a fresh "from the ground up" > operator proposal? > One of the key reasons for this is listed as "An out-of-the-box automation > solution that scales effectively" but I don't see any discussion of the > target scale or plans to achieve it? > > > > On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang wrote: > >> Hi Spark community, >> >> I'm reaching out to initiate a conversation about the possibility of >> developing a Java-based Kubernetes operator for Apache Spark. Following the >> operator pattern ( >> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark >> users may manage applications and related components seamlessly using >> native tools like kubectl. The primary goal is to simplify the Spark user >> experience on Kubernetes, minimizing the learning curve and operational >> complexities and therefore enable users to focus on the Spark application >> development. >> >> Although there are several open-source Spark on Kubernetes operators >> available, none of them are officially integrated into the Apache Spark >> project. As a result, these operators may lack active support and >> development for new features. Within this proposal, our aim is to introduce >> a Java-based Spark operator as an integral component of the Apache Spark >> project. This solution has been employed internally at Apple for multiple >> years, operating millions of executors in real production environments. The >> use of Java in this solution is intended to accommodate a wider user and >> contributor audience, especially those who are familiar with Scala. >> >> Ideally, this operator should have its dedicated repository, similar to >> Spark Connect Golang or Spark Docker, al
Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark
resending cc dev for record - sorry forgot to reply all earlier :) For 1 - I'm more leaning towards 'official' as this aims to provide Spark users a community-recommended way to automate and manage Spark deployments on k8s. It does not mean the current / other options would become off-standard from my point of view. For 2/3 - as the operator starts driver pods in the same way as spark-submit, I would not expect start-up time to be significantly reduced by using the operator. However there are indeed some optimizations we can do in practice. For example, with operator we can enable users to separate the application packaging from Spark: use an init container to load Spark binary, and apply application jar / packages on top of that in a different container. The benefit is - application image or package would be relatively lean and therefore, taking less time to upload to registry or to download onto nodes. Spark images could be relatively static (e.g. use the official docker images <https://github.com/apache/spark-docker> ) and hence can be cached on nodes. There are more technical details that can be discussed in the upcoming design doc if we agree to proceed with the operator proposal. On Fri, Nov 10, 2023 at 8:11 AM Mich Talebzadeh wrote: > Hi, > > Looks like a good idea but before committing myself, I have a number of > design questions having looked at SPIP itself: > > >1. Will the name "Standard add-on Kubernetes operator to Spark '' >describe it better? >2. We are still struggling with improving Spark driver start-up time. >What would be the footprint of this add-on on the driver start-up time? >3. In a commercial world will there be (?) a static image for this >besides the base image that is maintained in the so called container >registry (ECR, GCR etc), It takes time to upload these images. Will this >bea static image (docker file)? Other alternative would be that this >docker file is created by the user through set of scripts? > > > These are the things that come into my mind. > > HTH > > > Mich Talebzadeh, > Distinguished Technologist, Solutions Architect & Engineer > London > United Kingdom > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 10 Nov 2023 at 14:19, Bjørn Jørgensen > wrote: > >> +1 >> >> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu : >> >>> just curious what happened on google’s spark operator? >>> >>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko wrote: >>> >>>> +1 >>>> >>>> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue wrote: >>>> >>>>> +1 >>>>> >>>>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala wrote: >>>>> >>>>>> +1 for creating an official Kubernetes operator for Apache Spark >>>>>> >>>>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao >>>>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>> >>>>>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> To be completely transparent, I am employed in the same department >>>>>>>> as Zhou at Apple. >>>>>>>> >>>>>>>> I support this proposal, provided that we witness community >>>>>>>> adoption following the release of the Flink Kubernetes operator, >>>>>>>> streamlining Flink deployment on Kubernetes. >>>>>>>> >>>>>>>> A well-maintained official Spark Kubernetes operator is essential >>>>>>>> for our Spark community as well. >>>>>>>> >>>>>>>> DB Tsai | https://www.dbtsai.com/ >>>>>>>> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dbtsai.com%2F&data=05%7C01%7Cif56%40g.cornell.edu%7C6b33babc19c64437ef0408dbe18607c6%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C63
Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark
I'd say that's actually the other way round. A user may either 1. Use spark-submit, this works with or without operator. Or, 2. Deploy the operator, create the Spark Applications with kubectl / clients - so that the Operator does spark-submit for you. We may also continue this discussion in the proposal doc. On Fri, Nov 10, 2023 at 8:57 PM Cheng Pan wrote: > > Not really - this is not designed to be a replacement for the current > approach. > > That's what I assumed too. But my question is, as a user, how to write a > spark-submit command to submit a Spark app to leverage this operator? > > Thanks, > Cheng Pan > > > > On Nov 11, 2023, at 03:21, Zhou Jiang wrote: > > > > Not really - this is not designed to be a replacement for the current > approach. Kubernetes operator fits in the scenario for automation and > application lifecycle management at scale. Users can choose between > spark-submit and operator approach based on their specific needs and > requirements. > > > > On Thu, Nov 9, 2023 at 9:16 PM Cheng Pan wrote: > > Thanks for this impressive proposal, I have a basic question, how does > spark-submit work with this operator? Or it enforces that we must use > `kubectl apply -f spark-job.yaml`(or K8s client in programming way) to > submit Spark app? > > > > Thanks, > > Cheng Pan > > > > > > > On Nov 10, 2023, at 04:05, Zhou Jiang wrote: > > > > > > Hi Spark community, > > > I'm reaching out to initiate a conversation about the possibility of > developing a Java-based Kubernetes operator for Apache Spark. Following the > operator pattern ( > https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark > users may manage applications and related components seamlessly using > native tools like kubectl. The primary goal is to simplify the Spark user > experience on Kubernetes, minimizing the learning curve and operational > complexities and therefore enable users to focus on the Spark application > development. > > > Although there are several open-source Spark on Kubernetes operators > available, none of them are officially integrated into the Apache Spark > project. As a result, these operators may lack active support and > development for new features. Within this proposal, our aim is to introduce > a Java-based Spark operator as an integral component of the Apache Spark > project. This solution has been employed internally at Apple for multiple > years, operating millions of executors in real production environments. The > use of Java in this solution is intended to accommodate a wider user and > contributor audience, especially those who are familiar with Scala. > > > Ideally, this operator should have its dedicated repository, similar > to Spark Connect Golang or Spark Docker, allowing it to maintain a loose > connection with the Spark release cycle. This model is also followed by the > Apache Flink Kubernetes operator. > > > We believe that this project holds the potential to evolve into a > thriving community project over the long run. A comparison can be drawn > with the Flink Kubernetes Operator: Apple has open-sourced internal Flink > Kubernetes operator, making it a part of the Apache Flink project ( > https://github.com/apache/flink-kubernetes-operator). This move has > gained wide industry adoption and contributions from the community. In a > mere year, the Flink operator has garnered more than 600 stars and has > attracted contributions from over 80 contributors. This showcases the level > of community interest and collaborative momentum that can be achieved in > similar scenarios. > > > More details can be found at SPIP doc : Spark Kubernetes Operator > https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE > > > Thanks,-- > > > Zhou JIANG > > > > > > > > > > > -- > > Zhou JIANG > > > > -- *Zhou JIANG*
[DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark
Hi Spark community, I'm reaching out to initiate a conversation about the possibility of developing a Java-based Kubernetes operator for Apache Spark. Following the operator pattern ( https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark users may manage applications and related components seamlessly using native tools like kubectl. The primary goal is to simplify the Spark user experience on Kubernetes, minimizing the learning curve and operational complexities and therefore enable users to focus on the Spark application development. Although there are several open-source Spark on Kubernetes operators available, none of them are officially integrated into the Apache Spark project. As a result, these operators may lack active support and development for new features. Within this proposal, our aim is to introduce a Java-based Spark operator as an integral component of the Apache Spark project. This solution has been employed internally at Apple for multiple years, operating millions of executors in real production environments. The use of Java in this solution is intended to accommodate a wider user and contributor audience, especially those who are familiar with Scala. Ideally, this operator should have its dedicated repository, similar to Spark Connect Golang or Spark Docker, allowing it to maintain a loose connection with the Spark release cycle. This model is also followed by the Apache Flink Kubernetes operator. We believe that this project holds the potential to evolve into a thriving community project over the long run. A comparison can be drawn with the Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes operator, making it a part of the Apache Flink project ( https://github.com/apache/flink-kubernetes-operator). This move has gained wide industry adoption and contributions from the community. In a mere year, the Flink operator has garnered more than 600 stars and has attracted contributions from over 80 contributors. This showcases the level of community interest and collaborative momentum that can be achieved in similar scenarios. More details can be found at SPIP doc : Spark Kubernetes Operator https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE Thanks, -- *Zhou JIANG*