Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
> On Thu, May 9, 2024 at 11:06 PM Dongjoon Hyun > wrote: > >> In addition, FYI, I was the latest release manager with Apache Spark >> 3.4.3 (2024-04-15 Vote) >> >> According to my work log, I uploaded the following binaries to SVN from >> EC2 (us-west-2)

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
, 2024 at 7:57 AM Dongjoon Hyun wrote: > Could you file an INFRA JIRA issue with the error message and context > first, Wenchen? > > As you know, if we see something, we had better file a JIRA issue because > it could be not only an Apache Spark project issue but also all ASF p

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
gt; YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>>> >>>>>>> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek >>>>>>> wrote: >>>>>>> >>&g

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Dongjoon Hyun
olden Karau >> *抄送**: *Chao Sun , Xiao Li , >> Tathagata Das , Wenchen Fan < >> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas < >> nicholas.cham...@gmail.com>, Dongjoon Hyun , >> Cheng Pan , Spark dev list , >> Anish Shrigondekar >> *主题**: *Re

Re: ASF board report draft for May

2024-05-05 Thread Dongjoon Hyun
+1 for Holden's comment. Yes, it would be great to mention `it` as "soon". (If Wenchen release it on Monday, we can simply mention the release) In addition, Apache Spark PMC received an official notice from ASF Infra team. https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg >

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Dongjoon Hyun
es targeting the Delta 4.0 release are still incomplete. >>> >>> Thanks! >>> >>> >>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan wrote: >>> >>>> Thank you all for the replies! >>>> >>>> To @Nicholas Chammas : Th

[VOTE][RESULT] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread Dongjoon Hyun
The vote passes with 11 +1s (6 binding +1s) and one -1. Thanks to all who helped with the vote! (* = binding) +1: - Dongjoon Hyun * - Gengliang Wang * - Liang-Chi Hsieh * - Holden Karau * - Zhou Jiang - Cheng Pan - Hyukjin Kwon * - DB Tsai * - Ye Xianjin - XiDuo You - Nimrod Ofek +0: None -1

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Dongjoon Hyun
sand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Mon, 29 Apr 2024 at 17:32, Dongjoon Hyun wrote: > > > It's a surprise to me to see t

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Dongjoon Hyun
t; 1) Hive provides a more mature and widely adopted catalog >>>>>>>>>>> solution that integrates well with other components in the Hadoop >>>>>>>>>>> ecosystem, such as HDFS, HBase, and YARN. IIf you are Hadoop >&g

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
I'll start with my +1. Dongjoon. On 2024/04/26 16:45:51 Dongjoon Hyun wrote: > Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault > to `false` by default. The technical scope is defined in the following PR. > > - DISCUSSION: > https://lists.ap

[VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault to `false` by default. The technical scope is defined in the following PR. - DISCUSSION: https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd - JIRA: https://issues.apache.org/jira/browse/SPARK-46122 - PR:

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
;>>> complex >>>>>>>>>> SQL queries or existing SQL-based workflows, using Hive may be >>>>>>>>>> advantageous. >>>>>>>>>> 3) If you are looking for performance, spark's native catalog >>>>

[FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Dongjoon Hyun
FYI, there is a proposal to drop Python 3.8 because its EOL is October 2024. https://github.com/apache/spark/pull/46228 [SPARK-47993][PYTHON] Drop Python 3.8 Since it's still alive and there will be an overlap between the lifecycle of Python 3.8 and Apache Spark 4.0.0, please give us your

[DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-24 Thread Dongjoon Hyun
Hi, All. It's great to see community activities to polish 4.0.0 more and more. Thank you all. I'd like to bring SPARK-46122 (another SQL topic) to you from the subtasks of SPARK-4 (Prepare Apache Spark 4.0.0), - https://issues.apache.org/jira/browse/SPARK-46122 Set

[FYI] SPARK-47046: Apache Spark 4.0.0 Dependency Audit and Cleanup

2024-04-21 Thread Dongjoon Hyun
on the above reports or have new ones for Apache Spark 4.0.0. Dongjoon Hyun

Re: [DISCUSS] Un-deprecate Trigger.Once

2024-04-19 Thread Dongjoon Hyun
For that case, I believe it's enough for us to revise the deprecation message only by making sure that Apache Spark will keep it without removal for backward-compatibility purposes only. That's what the users asked, isn't that? > deprecation of Trigger.Once confuses users that the trigger won't

[ANNOUNCE] Apache Spark 3.4.3 released

2024-04-18 Thread Dongjoon Hyun
not have been possible without you. Dongjoon Hyun

[VOTE][RESULT] Release Spark 3.4.3 (RC2)

2024-04-18 Thread Dongjoon Hyun
The vote passes with 10 +1s (8 binding +1s). Thanks to all who helped with the release! (* = binding) +1: - Dongjoon Hyun * - Mridul Muralidharan * - Wenchen Fan * - Liang-Chi Hsieh * - Gengliang Wang * - Hyukjin Kwon * - Bo Yang - DB Tsai * - Kent Yao - Huaxin Gao * +0: None -1: None

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-18 Thread Dongjoon Hyun
> > > > > > +1 > > > > > > On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon > > wrote: > > >> > > >> +1 > > >> > > >> On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: > > >>> > > >>&

[VOTE][RESULT] SPARK-44444: Use ANSI SQL mode by default

2024-04-17 Thread Dongjoon Hyun
The vote passes with 24 +1s (13 binding +1s). Thanks to all who helped with the vote! (* = binding) +1: - Dongjoon Hyun * - Gengliang Wang * - Chao Sun * - Hyukjin Kwon * - Liang-Chi Hsieh * - Holden Karau * - Huaxin Gao * - Denny Lee - Xiao Li * - Mich Talebzadeh - Christiano Anderson - Yang Jie

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-17 Thread Dongjoon Hyun
t; Thanks Dongjoon to drive this! > >> > >> > >> -Rui > >> > >> On Mon, Apr 15, 2024 at 10:10 AM Xinrong Meng wrote: > >> > >>> +1 > >>> > >>> Thank you @Dongjoon Hyun ! > >>> > >>> On

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-14 Thread Dongjoon Hyun
I'll start with my +1. - Checked checksum and signature - Checked Scala/Java/R/Python/SQL Document's Spark version - Checked published Maven artifacts - All CIs passed. Thanks, Dongjoon. On 2024/04/15 04:22:26 Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apa

[VOTE] Release Spark 3.4.3 (RC2)

2024-04-14 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.4.3. The vote is open until April 18th 1AM (PDT) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.4.3 [ ] -1 Do not release this package because

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Dongjoon Hyun
I'll start from my +1. Dongjoon. On 2024/04/13 22:22:05 Dongjoon Hyun wrote: > Please vote on SPARK-4 to use ANSI SQL mode by default. > The technical scope is defined in the following PR which is > one line of code change and one line of migration guide. > > - DISC

[VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Dongjoon Hyun
Please vote on SPARK-4 to use ANSI SQL mode by default. The technical scope is defined in the following PR which is one line of code change and one line of migration guide. - DISCUSSION: https://lists.apache.org/thread/ztlwoz1v1sn81ssks12tb19x37zozxlz - JIRA:

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Dongjoon Hyun
ror Attribution Framework > > <https://issues.apache.org/jira/browse/SPARK-38615> will also be beneficial > > in migrating to ANSI SQL mode. > > > > > > Gengliang > > > > > > On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun > <mailto:dongjoon.h

Re: [DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Dongjoon Hyun
Thank you for volunteering, Wenchen. Dongjoon. On 2024/04/12 15:11:04 Wenchen Fan wrote: > Hi all, > > It's close to the previously proposed 4.0.0 release date (June 2024), and I > think it's time to prepare for it and discuss the ongoing projects: > >- ANSI by default >- Spark Connect

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread Dongjoon Hyun
+1 Thank you! I hope we can customize `dev/merge_spark_pr.py` script per repository after this PR. Dongjoon. On 2024/04/12 03:28:36 "L. C. Hsieh" wrote: > Hi all, > > Thanks for all discussions in the thread of "Versioning of Spark > Operator":

[DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-11 Thread Dongjoon Hyun
Hi, All. Thanks to you, we've been achieving many things and have on-going SPIPs. I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly by asking your opinions about Apache Spark's ANSI SQL mode. https://issues.apache.org/jira/browse/SPARK-44111 Prepare Apache Spark

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Dongjoon Hyun
ten v1.1.1 support Spark3.2 and > 3.3. > > We are planning to support Spark3.4 and 3.5 in Gluten v1.2.0. > > Spark4.0 support for Gluten is depending on the release schedule in > Spark community. > > > > On 2024/04/09 07:14:13 Dongjoon Hyun wrote: > > > Thank you for s

Re: Versioning of Spark Operator

2024-04-10 Thread Dongjoon Hyun
Ya, that would work. Inevitably, I looked at Apache Flink K8s Operator's JIRA and GitHub repo. It looks reasonable to me. Although they share the same JIRA, they choose different patterns per place. 1. In POM file and Maven Artifact, independent version number. 1.8.0 2. Tag is also based on

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
ctor Go Client? For example, > Spark Operator 3.5.x supports Spark 3.5 and above. > > Best, > Bo > > > On Tue, Apr 9, 2024 at 10:14 AM Dongjoon Hyun wrote: > > > Ya, that's simple and possible. > > > > However, it may cause many confusions because it implie

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
elopers and > > intuitive for users. > > > > Regards, > > Mridul > > > > > > On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun > <mailto:dongj...@apache.org>> wrote: > >> Hi, Liang-Chi. > >> > >> Thank you for leading Ap

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Hi, Liang-Chi. Thank you for leading Apache Spark K8s operator as a shepherd. I took a look at `Apache Spark Connect Go` repo mentioned in the thread. Sadly, there is no release at all and no activity since last 6 months. It seems to be the first time for Apache Spark community to consider

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Jia. I have the same questions like the previous Weiting's thread. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Weiting. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under active development now, and doesn't have a stable release. >

Re: Apache Spark 3.4.3 (?)

2024-04-08 Thread Dongjoon Hyun
Thank you, Holden, Mridul, Kent, Liang-Chi, Mich, Jungtaek. I added `Target Version: 3.4.3` to SPARK-47318 and am going to continue to prepare for RC1 (April 15th). Dongjoon. - To unsubscribe e-mail:

Apache Spark 3.4.3 (?)

2024-04-06 Thread Dongjoon Hyun
Hi, All. Apache Spark 3.4.2 tag was created on Nov 24th and `branch-3.4` has 85 commits including important security and correctness patches like SPARK-45580, SPARK-46092, SPARK-46466, SPARK-46794, and SPARK-46862. https://github.com/apache/spark/releases/tag/v3.4.2 $ git log --oneline

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Dongjoon Hyun
+1 Thank you, Hyukjin. Dongjoon On Sun, Mar 31, 2024 at 19:07 Haejoon Lee wrote: > +1 > > On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark >> Connect) >> >> JIRA

Re: The dedicated repository for Kubernetes Operator for Apache Spark

2024-03-28 Thread Dongjoon Hyun
Thank you, Liang-Chi! Dongjoon. On Wed, Mar 27, 2024 at 10:56 PM L. C. Hsieh wrote: > Hi all, > > For the passed SPIP: An Official Kubernetes Operator for Apache Spark, > the developers have been working on code cleaning and refactoring for > open source in the last few months. They are ready

Re: [DISCUSS] MySQL version support policy

2024-03-25 Thread Dongjoon Hyun
Hi, Cheng. Thank you for the suggestion. Your suggestion seems to have at least two themes. A. Adding a new Apache Spark community policy (contract) to guarantee MySQL LTS Versions Support. B. Dropping the support of non-LTS version support (MySQL 8.3/8.2/8.1) And, it brings me three questions.

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Dongjoon Hyun
Ya, I also have a similar opinion with Mridul. +1 Thank you, Gengliang. Dongjoon. On Mon, Mar 11, 2024 at 1:34 PM Mridul Muralidharan wrote: > > I am supportive of the proposal - this is a step in the right direction ! > Additional metadata (explicit and inferred) for log records, and

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Dongjoon Hyun
master [image: Screenshot 2024-02-29 at 21.12.24.png] Could you do the follow-up, please? Thank you in advance. Dongjoon. On Thu, Feb 29, 2024 at 2:48 PM John Zhuge wrote: > Excellent work, congratulations! > > On Wed, Feb 28, 2024 at 10:12 PM Dongjoon Hyun > wrote: > >> C

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Dongjoon Hyun
gt; Could you please share a list of fixes as the link provided by you is > not working. > > On Thu, Feb 29, 2024 at 11:27 AM Dongjoon Hyun > wrote: > >> Hi, >> >> If you are observing correctness issues, you may hit some old (and fixed) >> correctness is

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Dongjoon Hyun
Hi, If you are observing correctness issues, you may hit some old (and fixed) correctness issues. For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness issues.

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-28 Thread Dongjoon Hyun
Congratulations! Bests, Dongjoon. On Wed, Feb 28, 2024 at 11:43 AM beliefer wrote: > Congratulations! > > > > At 2024-02-28 17:43:25, "Jungtaek Lim" > wrote: > > Hi everyone, > > We are happy to announce the availability of Spark 3.5.1! > > Spark 3.5.1 is a maintenance release containing

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Dongjoon Hyun
Hi, All. Unfortunately, the Apache Spark `3.5.1 RC2` document artifact seems to be generated from unknown source code instead of the correct source code of the tag, `3.5.1`. https://spark.apache.org/docs/3.5.1/ [image: Screenshot 2024-02-23 at 14.13.07.png] Dongjoon. On Wed, Feb 21, 2024 at

Re: ASF board report draft for February

2024-02-18 Thread Dongjoon Hyun
+1, it looks good to me. Thank you, Matei. Dongjoon On Sat, Feb 17, 2024 at 11:21 AM Matei Zaharia wrote: > Hi all, > > I missed some reminder emails about our board report this month, but here > is my draft. I’ll submit it tomorrow if that’s ok. > > == > > Issues for the board: >

Re: Heads-up: Update on Spark 3.5.1 RC

2024-02-13 Thread Dongjoon Hyun
Thank you for the update, Jungtaek. Dongjoon. On Tue, Feb 13, 2024 at 7:29 AM Jungtaek Lim wrote: > Hi, > > Just a head-up since I didn't give an update for a week after the last > update from the discussion thread. > > I've been following the automated release process and encountered several

Re: [DISCUSS] Release Spark 3.5.1?

2024-02-03 Thread Dongjoon Hyun
+1 On Sat, Feb 3, 2024 at 9:18 PM yangjie01 wrote: > +1 > > 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>> 写入: > > > +1 > > > Jungtaek Lim kabhwan.opensou...@gmail.com>> 于2024年2月3日周六 21:14写道: > > > > Hi dev, > > > > looks like there are a huge number of commits being pushed to branch-3.5

[ANNOUNCE] Apache Spark 3.3.4 released

2023-12-16 Thread Dongjoon Hyun
would not have been possible without you. Dongjoon Hyun

[VOTE][RESULT] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
The vote passes with 6 +1s (3 binding +1s). Thanks to all who helped with the release! (* = binding) +1: - Dongjoon Hyun * - Yuming Wang * - Kent Yao - Liang-Chi Hsieh * - Yang Jie - Malcolm Decuire +0: None -1: None

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
> > > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > > > > > > > +1(non-binding > > > > > > > > Kent Yao > > > > > > > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > > > > > > >

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Dongjoon Hyun
the above exception, another exception occurred: > > Traceback (most recent call last): > File "", line 1, in > File > "/home/mridul/work/apache/vote/spark/python/pyspark/serializers.py", line > 468, in dumps > raise pickle.PicklingError(msg) >

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread Dongjoon Hyun
+1 Dongjoon On 2023/12/08 21:41:00 Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.3.4. > > The vote is open until December 15th 1AM (PST) and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 votes. >

Re: Spark on Yarn with Java 17

2023-12-09 Thread Dongjoon Hyun
d > Java 8 runtime? > > On Fri, Dec 8, 2023 at 4:33 PM Dongjoon Hyun wrote: > >> Hi, Jason. >> >> Apache Spark 4.0.0 depends on only Apache Hadoop client library. >> >> You can track all `Apache Spark 4` activities including Hadoop dependency >

Re: Spark on Yarn with Java 17

2023-12-08 Thread Dongjoon Hyun
Hi, Jason. Apache Spark 4.0.0 depends on only Apache Hadoop client library. You can track all `Apache Spark 4` activities including Hadoop dependency here. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) According to the release history, the original suggested

[VOTE] Release Spark 3.3.4 (RC1)

2023-12-08 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.3.4. The vote is open until December 15th 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.3.4 [ ] -1 Do not release this package

Re: Apache Spark 3.3.4 EOL Release?

2023-12-08 Thread Dongjoon Hyun
; > > >> > > Thanks Dongjoon! >> > > >> > > On Mon, Dec 4, 2023 at 9:26 AM Yang Jie wrote: >> > > > >> > > > +1 for a 3.3.4 EOL Release. Thanks Dongjoon. >> > > > >> > > > Jie

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Dongjoon Hyun
04 15:08:25 Tom Graves wrote: > > > +1 for a 3.3.4 EOL Release. Thanks Dongjoon. > > > Tom > > > On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun < > dongjoon.h...@gmail.com> wrote: > > > > > > Hi, All. > > > > >

`orc-format` 1.0 (ORC-1531) for Apache ORC 2.0

2023-12-03 Thread Dongjoon Hyun
Hi, All. As one of the key parts of Apache ORC 2.0, we've been discussing a new repository and module, `orc-format`, in the following. https://github.com/apache/orc/issues/1543 Now, we are ready to create a new repo. Please take a look at the POC repo and code and let us know your thoughts.

Apache Spark 3.3.4 EOL Release?

2023-12-01 Thread Dongjoon Hyun
Hi, All. Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022, branch-3.3 has been maintained and served well until now. - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun 9th, 2022) - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm (vote result on June

[FYI] SPARK-45981: Improve Python language test coverage

2023-12-01 Thread Dongjoon Hyun
Hi, All. As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community starts to have test coverage for all supported Python versions from Today. - https://github.com/apache/spark/actions/runs/7061665420 Here is a summary. 1. Main CI: All PRs and commits on `master` branch are

[ANNOUNCE] Apache Spark 3.4.2 released

2023-11-30 Thread Dongjoon Hyun
not have been possible without you. Dongjoon Hyun

[VOTE][RESULT] Release Spark 3.4.2 (RC1)

2023-11-30 Thread Dongjoon Hyun
The vote passes with 6 +1s (3 binding +1s) and one non-binding -1. Thanks to all who helped with the release! (* = binding) +1: - Dongjoon Hyun * - Kent Yao - Yang Jie - Mridul Muralidharan * - Liang-Chi Hsieh * - Jia Fan +0: None -1: - Marc Le Bihan

Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-30 Thread Dongjoon Hyun
On Wed, Nov 29, 2023 at 5:08 AM Yang Jie wrote: > > >> > > >> +1(non-binding) > > >> > > >> Jie Yang > > >> > > >> On 2023/11/29 02:08:04 Kent Yao wrote: > > >> > +1(non-binding) > > >> >

Re: Remove HiveContext from Apache Spark 4.0

2023-11-29 Thread Dongjoon Hyun
Thank you for the heads-up. I agree with your intention and the fact that it's not useful in Apache Spark 4.0.0. However, as you know, historically, it was removed once and explicitly added back to the Apache Spark 3.0 via the vote. SPARK-31088 Add back HiveContext and createExternalTable (As a

Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-26 Thread Dongjoon Hyun
coder.scala:62) > at org.apache.spark.sql.Encoders$.bean(Encoders.scala:179) > at org.apache.spark.sql.Encoders.bean(Encoders.scala) > > > https://issues.apache.org/jira/browse/SPARK-45311 > > Thanks ! > > Marc Le Bihan > > > On 25/11/2023 11:48, Dongjoon Hyun wrote: &g

Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-25 Thread Dongjoon Hyun
+1 Dongjoon. On 2023/11/25 10:48:41 Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.4.2. > > The vote is open until November 30th 1AM (PST) and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 votes. >

[VOTE] Release Spark 3.4.2 (RC1)

2023-11-25 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.4.2. The vote is open until November 30th 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.4.2 [ ] -1 Do not release this package

Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files

2023-11-24 Thread Dongjoon Hyun
+1 Thanks, Dongjoon. On Fri, Nov 24, 2023 at 7:14 PM Ye Zhou wrote: > +1(non-binding) > > On Fri, Nov 24, 2023 at 11:16 Mridul Muralidharan > wrote: > >> >> +1 >> >> Regards, >> Mridul >> >> On Fri, Nov 24, 2023 at 8:21 AM Kent Yao wrote: >> >>> Hi Spark Dev, >>> >>> Following the discussion

Re: [DISCUSS] SPIP: Testing Framework for Spark UI Javascript files

2023-11-21 Thread Dongjoon Hyun
Thank you for proposing a new UI test framework for Apache Spark 4.0. It looks very useful. Thanks, Dongjoon. On Tue, Nov 21, 2023 at 1:51 AM Kent Yao wrote: > Hi Spark Dev, > > This is a call to discuss a new SPIP: Testing Framework for > Spark UI Javascript files [1]. The SPIP aims to

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-15 Thread Dongjoon Hyun
+1 - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Apache Spark 3.4.2 (?)

2023-11-12 Thread Dongjoon Hyun
t; On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura > > wrote: > >> > >> +1 > >> > >> Kazu > >> > >> On Nov 7, 2023, at 5:23 PM, L. C. Hsieh wrote: > >> > >> +1 > >> > >> On Tue, Nov 7, 2023 at 4:56 PM D

Re: Apache Spark 3.4.2 (?)

2023-11-07 Thread Dongjoon Hyun
Thank you all! Dongjoon On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: > +1 > > On Mon, Nov 6, 2023 at 4:30 PM yangjie01 > wrote: > >> +1 >> >> >> >> *发件人**: *Yuming Wang >> *日期**: *2023年11月7日 星期二 07:00 >> *收件人**: *Santosh Pingale

Re: ASF board report draft for Nov 2023

2023-11-06 Thread Dongjoon Hyun
Thank you, Matei. It would be great if we can include upcoming plans briefly. - Apache Spark 3.4.2 (https://lists.apache.org/thread/35o2169l5r05k2mknqjy9mztq3ty1btr) - Apache Spark 3.3.4 EOL (December 16th) Dongjoon. On 2023/11/06 05:32:11 Matei Zaharia wrote: > It’s time to send our

Apache Spark 3.4.2 (?)

2023-11-06 Thread Dongjoon Hyun
Hi, All. Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 commits including important security and correctness patches like SPARK-44251, SPARK-44805, and SPARK-44940. https://github.com/apache/spark/releases/tag/v3.4.1 $ git log --oneline v3.4.1..HEAD | wc -l

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Dongjoon Hyun
Congratulations! Dongjoon. On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon wrote: > Woohoo! > > On Tue, 3 Oct 2023 at 22:47, Hussein Awala wrote: > >> Congrats to all of you! >> >> On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: >> >>> Congratulations! Well deserved! >>> >>> -Rui >>> >>> >>> On Mon,

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Dongjoon Hyun
titioning. And the reason v1 data sources support those is because > the v1 paths are kind of interleaved with Spark’s Hive integration. I > understand separating that Hive integration or making it more data > source-ish would put us closer to supporting bucketing and partitioning in > v2 and the

Re: Plans for built-in v2 data sources in Spark 4

2023-09-14 Thread Dongjoon Hyun
Hi, Will. According to the following JIRA, as of now, there is no plan or on-going discussion to switch it. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) Thanks, Dongjoon. On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski wrote: > Hey everyone, > > > > I was

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-12 Thread Dongjoon Hyun
+1 Dongjoon. On 2023/09/12 03:38:37 Kent Yao wrote: > +1 (non-binding), great work! > > Kent Yao > > Yuming Wang 于2023年9月12日周二 11:32写道: > > > > +1. > > > > On Tue, Sep 12, 2023 at 10:57 AM yangjie01 > > wrote: > >> > >> +1 > >> > >> > >> > >> 发件人: Jia Fan > >> 日期: 2023年9月12日 星期二 10:08 > >>

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Dongjoon Hyun
Sorry but I'm -1 because there exists a late-arrival correctness patch although it's not a regression. - https://issues.apache.org/jira/browse/SPARK-44805 "Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true" - https://github.com/apache/spark/pull/42850 -

Re: Clarification on ExecutorRoll Plugin & Ignore Decommission Fetch Failure

2023-08-25 Thread Dongjoon Hyun
Hi, Arun. Here are some answers to your questions. First, the fetch failure is irrelevant to the Executor Rolling feature because the plugin itself only asked the Spark scheduler to decommission it, not terminate it. More specifically, it's independent from the underlying Decommissioning

Apache Spark 4.0.0-SNAPSHOT is ready for Java 21

2023-08-25 Thread Dongjoon Hyun
Hi, All. Java 21 will be released in a month and Apache Spark master branch (4.0.0-SNAPSHOT) achieved the first milestone (SPARK-43831: Build and Run Spark on Java 21) Today. 1. JDK 21: https://openjdk.org/projects/jdk/21/ - 2023/08/24 Final Release Candidate - 2023/09/19 General

Re: Volcano in spark distro

2023-08-22 Thread Dongjoon Hyun
Of course, we can make Apache Spark distribution bigger and bigger, but I'm a little neutral about Volcano. In any way, I'd like to say that the root cause of the difference is those scheduler designs instead of Apache Spark itself. For example, Apache YuniKorn doesn't force us to add a new

Re: KubernetesLocalDiskShuffleDataIO mount path dependency doubt.

2023-08-11 Thread Dongjoon Hyun
Hi, Arun. SPARK-35593 (Support shuffle data recovery on the reused PVCs) was Apache Spark 3.2.0 feature whose plugin follows only the legacy Spark shuffle directory structure to be safe. You can see the AS-IS test coverage in the corresponding `KubernetesLocalDiskShuffleDataIOSuite`.

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-10 Thread Dongjoon Hyun
+1 Dongjoon On 2023/08/10 07:14:07 yangjie01 wrote: > +1 > Thanks, Jie Yang > > > 发件人: Yuming Wang > 日期: 2023年8月10日 星期四 13:33 > 收件人: Dongjoon Hyun > 抄送: dev > 主题: Re: [VOTE] Release Apache Spark 3.3.3 (RC1) > > +1 myself. > > On Tue

Re: ASF board report draft for August 2023

2023-08-08 Thread Dongjoon Hyun
Thank you, Matei. It looks good to me. Dongjoon On Mon, Aug 7, 2023 at 22:54 Matei Zaharia wrote: > It’s time to send our quarterly report to the ASF board on August 9th. > Here’s what I wrote as a draft — feel free to suggest changes. > > = > > Issues for the

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-07 Thread Dongjoon Hyun
st and the master use the same yml file. > > > > Jie Yang > > > > *发件人**: *Dongjoon Hyun > *日期**: *2023年8月8日 星期二 00:18 > *收件人**: *Yuming Wang > *抄送**: *dev > *主题**: *Re: [VOTE] Release Apache Spark 3.3.3 (RC1) > > > > Hi, Yuming. > > > >

Re: Welcome two new Apache Spark committers

2023-08-07 Thread Dongjoon Hyun
Congratulations, Peter and Xiduo. :) Dongjoon. On Sun, Aug 6, 2023 at 10:08 PM XiDuo You wrote: > Thank you all ! > > Jia Fan 于2023年8月7日周一 11:31写道: > > > > Congratulations! > > > > > > Jia Fan > > > > > > 2023年8月7日 11:28,Ye Xianjin 写道: > > > > Congratulations! > > >

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-07 Thread Dongjoon Hyun
Hi, Yuming. One of the community GitHub Action test pipelines is unhealthy consistently due to Python mypy linter. https://github.com/apache/spark/actions/workflows/build_branch33.yml It seems due to the pipeline difference between the same Python mypy linter already pass in commit build,

Re: [Reminder] Spark 3.5 RC Cut

2023-08-04 Thread Dongjoon Hyun
Thank you again, Emil and Bjorn. FYI, SPARK-44678 landed at branch-3.5 like the following. https://github.com/apache/spark/pull/42345 [SPARK-44678][BUILD][3.5] Downgrade Hadoop to 3.3.4 Dongjoon. On 2023/08/02 18:58:51 Bjørn Jørgensen wrote: > @Dongjoon Hyun FYI > [image: image.png] &

Re: [Reminder] Spark 3.5 RC Cut

2023-08-02 Thread Dongjoon Hyun
fact that 3.3.6 is also affected. >> >> > HADOOP-18757 seems to be merged just two weeks ago and there is no >> > Apache Hadoop release with it, isn't it? >> >> That is correct, there is no hadoop release containing the fix. So >> therefore 3.3.6 would als

Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Dongjoon Hyun
more, please? Dongjoon. On Tue, Aug 1, 2023 at 9:46 PM Emil Ejbyfeldt wrote: > Hi, > > Yes, sorry about that seem to have messed up the link. Should have been > https://issues.apache.org/jira/browse/HADOOP-18757 > > Best, > Emil > > On 01/08/2023 19:08, Dongjo

Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Dongjoon Hyun
Hi, Emil. HADOOP-18568 is still open and it seems to be never a part of the Hadoop trunk branch. Do you mean another JIRA? Dongjoon. On Tue, Aug 1, 2023 at 2:59 AM Emil Ejbyfeldt wrote: > Hi, > > We previously ran some experiments on builds from the 3.5 branch and > noticed that Hadoop had

Re: Time for Spark 3.3.3 release?

2023-07-29 Thread Dongjoon Hyun
+1 Thank you for volunteering, Yuming. Dongjoon On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang wrote: > Hi Spark devs, > > Since Apache Spark 3.3.2 tag creation (Feb 11), 60 patches > have > arrived at branch-3.3. > > Shall we make

Re: Spark 3.0.0 EOL

2023-07-24 Thread Dongjoon Hyun
As Hyukjin replied, Apache Spark 3.0.0 is already in EOL status. To Pralabh, FYI, in the community, - Apache Spark 3.2 also reached the EOL already. https://lists.apache.org/thread/n4mdfwr5ksgpmrz0jpqp335qpvormos1 If you are considering Apache Spark 4, here is the other 3.x timeline, -

Re: Spark Docker Official Image is now available

2023-07-20 Thread Dongjoon Hyun
Thank you! Dongjoon On Thu, Jul 20, 2023 at 8:40 AM Xiao Li wrote: > Thank you, Yikun! This is great! > > On Wed, Jul 19, 2023 at 7:55 PM Ruifeng Zheng wrote: > >> Awesome, thank you YiKun for driving this! >> >> On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon >> wrote: >> >>> This is amazing,

Re: Spark 3.5 Branch Cut

2023-07-17 Thread Dongjoon Hyun
Thank you so much, Yuanjian! Dongjoon. On Mon, Jul 17, 2023 at 1:05 PM Yuanjian Li wrote: > Hi, all > > FYI, I cut branch-3.5 as https://github.com/apache/spark/tree/branch-3.5 > > Here is the complete list of exception merge requests received before the > cut: > >- > >SPARK-44421:

Re: Time for Spark v3.5.0 release

2023-07-03 Thread Dongjoon Hyun
+1 Thank you, Yuanjian Dongjoon On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon wrote: > Yeah one day postponed shouldn't be a big deal. > > On Tue, Jul 4, 2023 at 7:10 AM Yuanjian Li wrote: > >> Hi All, >> >> According to the Spark versioning policy at >>

  1   2   3   4   5   6   7   8   >