Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-03-11 Thread Ashish Singh
Hi Kalyan, Is this something you are still interested in pursuing? There are some open discussion threads on the doc you shared. @Mridul Muralidharan In what state are your efforts along this? Is it something that your team is actively pursuing/ building or are mostly planning right now? Asking

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Xinrong Meng
+1 Thanks @Gengliang Wang ! On Mon, Mar 11, 2024 at 1:09 PM Gengliang Wang wrote: > Hi Steve, > > thanks for the suggestion in this email thread and the SPIP doc! I will > read the Audit Log and seek your feedback through PR reviews during the > implementation process. > > > So worrying about

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Dongjoon Hyun
Ya, I also have a similar opinion with Mridul. +1 Thank you, Gengliang. Dongjoon. On Mon, Mar 11, 2024 at 1:34 PM Mridul Muralidharan wrote: > > I am supportive of the proposal - this is a step in the right direction ! > Additional metadata (explicit and inferred) for log records, and

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Gengliang Wang
Hi Steve, thanks for the suggestion in this email thread and the SPIP doc! I will read the Audit Log and seek your feedback through PR reviews during the implementation process. > So worrying about how pass and manage that at the thread level matters. We can have a specific logger for

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread huaxin gao
+1 On Mon, Mar 11, 2024 at 7:02 AM Wenchen Fan wrote: > +1 > > On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon wrote: > >> +1 >> >> On Mon, 11 Mar 2024 at 18:11, yangjie01 >> wrote: >> >>> +1 >>> >>> >>> >>> Jie Yang >>> >>> >>> >>> *发件人**: *Haejoon Lee >>> *日期**: *2024年3月11日 星期一 17:09 >>>

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Mridul Muralidharan
I am supportive of the proposal - this is a step in the right direction ! Additional metadata (explicit and inferred) for log records, and exposing them for indexing is extremely useful. The specifics of the API still need some work IMO and does not need to be this disruptive, but I consider

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Denny Lee
+1 (non-binding) On Sun, Mar 10, 2024 at 23:36 Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc > >

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Steve Loughran
I consider the context info as more important than just logging; at hadoop level we do it to attach things like task/jobIds, kerberos principals etc to all store requests. https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/auditing.html So worrying about how pass and manage that at

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Kent Yao
+1 (non-binding) Kent Yao Hyukjin Kwon 于2024年3月11日周一 17:26写道: > > +1 > > On Mon, 11 Mar 2024 at 18:11, yangjie01 wrote: >> >> +1 >> >> >> >> Jie Yang >> >> >> >> 发件人: Haejoon Lee >> 日期: 2024年3月11日 星期一 17:09 >> 收件人: Gengliang Wang >> 抄送: dev >> 主题: Re: [VOTE] SPIP: Structured Logging

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Mich Talebzadeh
+1 Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Wenchen Fan
+1 On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon wrote: > +1 > > On Mon, 11 Mar 2024 at 18:11, yangjie01 > wrote: > >> +1 >> >> >> >> Jie Yang >> >> >> >> *发件人**: *Haejoon Lee >> *日期**: *2024年3月11日 星期一 17:09 >> *收件人**: *Gengliang Wang >> *抄送**: *dev >> *主题**: *Re: [VOTE] SPIP: Structured

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Hyukjin Kwon
+1 On Mon, 11 Mar 2024 at 18:11, yangjie01 wrote: > +1 > > > > Jie Yang > > > > *发件人**: *Haejoon Lee > *日期**: *2024年3月11日 星期一 17:09 > *收件人**: *Gengliang Wang > *抄送**: *dev > *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark > > > > +1 > > > > On Mon, Mar 11, 2024 at

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread yangjie01
+1 Jie Yang 发件人: Haejoon Lee 日期: 2024年3月11日 星期一 17:09 收件人: Gengliang Wang 抄送: dev 主题: Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark +1 On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang mailto:ltn...@gmail.com>> wrote: Hi all, I'd like to start the vote for SPIP: Structured

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Haejoon Lee
+1 On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc > >

[VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-10 Thread Gengliang Wang
Hi all, I'd like to start the vote for SPIP: Structured Logging Framework for Apache Spark References: - JIRA ticket - SPIP doc -

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-10 Thread Gengliang Wang
Thanks everyone for the valuable feedback! Given the generally positive feedback received, I plan to move forward by initiating the voting thread. I encourage you to participate in the upcoming thread. Warm regards, Gengliang On Sat, Mar 9, 2024 at 12:55 PM Mich Talebzadeh wrote: > Splendid.

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Mich Talebzadeh
Splendid. Thanks Gengliang Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Gengliang Wang
Hi Mich, Thanks for your suggestions. I agree that we should avoid confusion with Spark Structured Streaming. So, I'll go with "Structured Logging Framework for Apache Spark". This keeps the standard term "Structured Logging" and distinguishes it from "Structured Streaming" clearly. Thanks for

SPARK-44951, Improve Spark Dynamic Allocation

2024-03-08 Thread Mich Talebzadeh
Hi all, On this ticket, improve Spark Dynamic Allocation I see no movement since it was opened back in August 2023 I may be wrong of course Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
Okay, Let me double-check it carefully. Thank you very much for your help! 发件人: Jungtaek Lim 发送时间: 2024年3月5日 21:56:41 收件人: Pan,Bingkun 抄送: Dongjoon Hyun; dev; user 主题: Re: [ANNOUNCE] Apache Spark 3.5.1 released Yeah the approach seems OK to me - please double

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-05 Thread Mich Talebzadeh
Hi Jason, I read your notes and the code simulating the problem as link https://issues.apache.org/jira/browse/SPARK-38388 and the specific repartition issue (SPARK-38388) that this code aims to demonstrate The code below from the above link Jira import scala.sys.process._ import

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
Yeah the approach seems OK to me - please double check that the doc generation in Spark repo won't fail after the move of the js file. Other than that, it would be probably just a matter of updating the release process. On Tue, Mar 5, 2024 at 7:24 PM Pan,Bingkun wrote: > Okay, I see. > >

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
Okay, I see. Perhaps we can solve this confusion by sharing the same file `version.json` across `all versions` in the `Spark website repo`? Make each version of the document display the `same` data in the dropdown menu. 发件人: Jungtaek Lim 发送时间: 2024年3月5日

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
Let me be more specific. We have two active release version lines, 3.4.x and 3.5.x. We just released Spark 3.5.1, having a dropdown as 3.5.1 and 3.4.2 given the fact the last version of 3.4.x is 3.4.2. After a month we released Spark 3.4.3. In the dropdown of Spark 3.4.3, there will be 3.5.1 and

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
Based on my understanding, we should not update versions that have already been released, such as the situation you mentioned: `But what about dropout of version D? Should we add E in the dropdown?` We only need to record the latest `version. json` file that has already been published at the

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
But this does not answer my question about updating the dropdown for the doc of "already released versions", right? Let's say we just released version D, and the dropdown has version A, B, C. We have another release tomorrow as version E, and it's probably easy to add A, B, C, D in the dropdown

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
According to my understanding, the original intention of this feature is that when a user has entered the pyspark document, if he finds that the version he is currently in is not the version he wants, he can easily jump to the version he wants by clicking on the drop-down box. Additionally, in

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Yang Jie
hmm... I guess this is meant to cc @Bingkun Pan ? On 2024/03/05 02:16:12 Hyukjin Kwon wrote: > Is this related to https://github.com/apache/spark/pull/42428? > > cc @Yang,Jie(INF) > > On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim > wrote: > > > Shall we revisit this functionality? The API doc

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread yangjie01
That sounds like a great suggestion. 发件人: Jungtaek Lim 日期: 2024年3月5日 星期二 10:46 收件人: Hyukjin Kwon 抄送: yangjie01 , Dongjoon Hyun , dev , user 主题: Re: [ANNOUNCE] Apache Spark 3.5.1 released Yes, it's relevant to that PR. I wonder, if we want to expose version switcher, it should be in

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Jungtaek Lim
Yes, it's relevant to that PR. I wonder, if we want to expose version switcher, it should be in versionless doc (spark-website) rather than the doc being pinned to a specific version. On Tue, Mar 5, 2024 at 11:18 AM Hyukjin Kwon wrote: > Is this related to

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Hyukjin Kwon
Is this related to https://github.com/apache/spark/pull/42428? cc @Yang,Jie(INF) On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim wrote: > Shall we revisit this functionality? The API doc is built with individual > versions, and for each individual version we depend on other released > versions.

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
Thanks Jason for detailed information and big associated with it. Hopefully someone provided more information about this pressing issue. On Mon, Mar 4, 2024 at 1:26 PM Jason Xu wrote: > Hi Prem, > > From the symptom of shuffle fetch failure and few duplicate data and few > missing data, I think

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Jason Xu
Hi Prem, >From the symptom of shuffle fetch failure and few duplicate data and few missing data, I think you might run into this correctness bug: https://issues.apache.org/jira/browse/SPARK-38388. Node/shuffle failure is hard to avoid, I wonder if you have non-deterministic logic and calling

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
super :( On Mon, Mar 4, 2024 at 6:19 AM Mich Talebzadeh wrote: > "... in a nutshell if fetchFailedException occurs due to data node reboot > then it can create duplicate / missing data . so this is more of > hardware(env issue ) rather than spark issue ." > > As an overall conclusion your

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Mich Talebzadeh
"... in a nutshell if fetchFailedException occurs due to data node reboot then it can create duplicate / missing data . so this is more of hardware(env issue ) rather than spark issue ." As an overall conclusion your point is correct but again the answer is not binary. Spark core relies on

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-03 Thread Prem Sahoo
thanks Mich, in a nutshell if fetchFailedException occurs due to data node reboot then it can create duplicate / missing data . so this is more of hardware(env issue ) rather than spark issue . On Sat, Mar 2, 2024 at 7:45 AM Mich Talebzadeh wrote: > Hi, > > It seems to me that there are

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-03 Thread Jungtaek Lim
Shall we revisit this functionality? The API doc is built with individual versions, and for each individual version we depend on other released versions. This does not seem to be right to me. Also, the functionality is only in PySpark API doc which does not seem to be consistent as well. I don't

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mich Talebzadeh
Hi Gengliang, Thanks for taking the initiative to improve the Spark logging system. Transitioning to structured logs seems like a worthy way to enhance the ability to analyze and troubleshoot Spark jobs and hopefully the future integration with cloud logging systems. While "Structured Spark

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mridul Muralidharan
Hi Gengling, Thanks for sharing this ! I added a few queries to the proposal doc, and we can continue discussing there, but overall I am in favor of this. Regards, Mridul On Fri, Mar 1, 2024 at 1:35 AM Gengliang Wang wrote: > Hi All, > > I propose to enhance our logging system by

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-02 Thread Mich Talebzadeh
Hi, It seems to me that there are issues related to below * I think when a task failed in between and retry task started and completed it may create duplicate as failed task has some data + retry task has full data. but my question is why spark keeps delta data or according to you if

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
Hello Mich, thanks for your reply. As an engineer I can chip in. You may have partial execution and retries meaning when spark encounters a *FetchFailedException*, it may retry fetching the data from the unavailable (the one being rebooted) node a few times before marking it permanently

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Mich Talebzadeh
Hi, Your point -> "When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have scenario when spark job complains *FetchFailedException as one of the data node got ** rebooted middle of job running ."* As an engineer I

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Mich Talebzadeh
Hi Bhuwan et al, Thank you for passing on the DataBricks Structured Streaming team's review of the SPIP document. FYI, I work closely with Pawan and other members to help deliver this piece of work. We appreciate your insights, especially regarding the cost savings potential from the PoC. Pavan

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Nivedita VY
+1 Nivi

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Pavan Kotikalapudi
Thanks Bhuwan and rest of the databricks team for the reviews, I appreciate your reviews, was very helpful in evaluating a few options that were overlooked earlier (especially about mixed spark apps running on notebooks). Regarding the use-cases, It could handle multiple streaming queries

RE: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Nivedita VY
+1 Nivi

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Bhuwan Sahni
Hi Pavan, I am from the DataBricks Structured Streaming team, and we did a review of the SPIP internally. Wanted to pass on the points discussed in the meeting. Thanks for putting together the SPIP document. It's useful to have dynamic resource allocation for Streaming queries, and it's

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
Hello All, in the list of JIRAs i didn't find anything related to fetchFailedException. as mentioned above "When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have a scenario when spark job complains

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Peter Toth
Congratulations and thanks Jungtaek for driving this! Xinrong Meng ezt írta (időpont: 2024. márc. 1., P, 5:24): > Congratulations! > > Thanks, > Xinrong > > On Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun > wrote: > >> Congratulations! >> >> Bests, >> Dongjoon. >> >> On Wed, Feb 28, 2024 at

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Jungtaek Lim
Thanks for reporting - this is odd - the dropdown did not exist in other recent releases. https://spark.apache.org/docs/3.5.0/api/python/index.html https://spark.apache.org/docs/3.4.2/api/python/index.html https://spark.apache.org/docs/3.3.4/api/python/index.html Looks like the dropdown feature

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Dongjoon Hyun
BTW, Jungtaek. PySpark document seems to show a wrong branch. At this time, `master`. https://spark.apache.org/docs/3.5.1/api/python/index.html PySpark Overview Date: Feb 24, 2024 Version: master

[DISCUSS] SPIP: Structured Spark Logging

2024-02-29 Thread Gengliang Wang
Hi All, I propose to enhance our logging system by transitioning to structured logs. This initiative is designed to tackle the challenges of analyzing distributed logs from drivers, workers, and executors by allowing them to be queried using a fixed schema. The goal is to improve the

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Dongjoon Hyun
Please use the url as thr full string including '()' part. Or you can seach directly at ASF Jira with 'Spark' project and three labels, 'Correctness', 'correctness' and 'data-loss'. Dongjoon On Thu, Feb 29, 2024 at 11:54 Prem Sahoo wrote: > Hello Dongjoon, > Thanks for emailing me. > Could

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread John Zhuge
Excellent work, congratulations! On Wed, Feb 28, 2024 at 10:12 PM Dongjoon Hyun wrote: > Congratulations! > > Bests, > Dongjoon. > > On Wed, Feb 28, 2024 at 11:43 AM beliefer wrote: > >> Congratulations! >> >> >> >> At 2024-02-28 17:43:25, "Jungtaek Lim" >> wrote: >> >> Hi everyone, >> >> We

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Prem Sahoo
Congratulations Sent from my iPhoneOn Feb 29, 2024, at 4:54 PM, Xinrong Meng wrote:Congratulations!Thanks,XinrongOn Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun wrote:Congratulations!Bests,Dongjoon.On Wed, Feb 28, 2024 at 11:43 AM beliefer

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
Hello Dongjoon, Thanks for emailing me. Could you please share a list of fixes as the link provided by you is not working. On Thu, Feb 29, 2024 at 11:27 AM Dongjoon Hyun wrote: > Hi, > > If you are observing correctness issues, you may hit some old (and fixed) > correctness issues. > > For

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Xinrong Meng
Congratulations! Thanks, Xinrong On Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun wrote: > Congratulations! > > Bests, > Dongjoon. > > On Wed, Feb 28, 2024 at 11:43 AM beliefer wrote: > >> Congratulations! >> >> >> >> At 2024-02-28 17:43:25, "Jungtaek Lim" >> wrote: >> >> Hi everyone, >> >> We

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Dongjoon Hyun
Hi, If you are observing correctness issues, you may hit some old (and fixed) correctness issues. For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness issues.

When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have scenario when spark job complains FetchFailedException as one of the data node got rebooted middle of job running . Now due to this we have few duplicate data and

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-28 Thread Dongjoon Hyun
Congratulations! Bests, Dongjoon. On Wed, Feb 28, 2024 at 11:43 AM beliefer wrote: > Congratulations! > > > > At 2024-02-28 17:43:25, "Jungtaek Lim" > wrote: > > Hi everyone, > > We are happy to announce the availability of Spark 3.5.1! > > Spark 3.5.1 is a maintenance release containing

unsubscribe

2024-02-28 Thread Sssxxx
unsubscribe Sssxxx sixliu_s...@foxmail.com

Re:[ANNOUNCE] Apache Spark 3.5.1 released

2024-02-28 Thread beliefer
Congratulations! At 2024-02-28 17:43:25, "Jungtaek Lim" wrote: Hi everyone, We are happy to announce the availability of Spark 3.5.1! Spark 3.5.1 is a maintenance release containing stability fixes. This release is based on the branch-3.5 maintenance branch of Spark. We strongly

[ANNOUNCE] Apache Spark 3.5.1 released

2024-02-28 Thread Jungtaek Lim
Hi everyone, We are happy to announce the availability of Spark 3.5.1! Spark 3.5.1 is a maintenance release containing stability fixes. This release is based on the branch-3.5 maintenance branch of Spark. We strongly recommend all 3.5 users to upgrade to this stable release. To download Spark

Re: Please unlock Jira ticket for SPARK-24815, Dynamic resource allocation for structured streaming

2024-02-26 Thread Pavan Kotikalapudi
Thanks Yuming. On Mon, Feb 26, 2024 at 9:55 PM Yuming Wang wrote: > Unlocked. > > On Tue, Feb 27, 2024 at 11:47 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> >> Hi, >> >> Can a committer please unlock this SPIP? It is for Dynamic resource >> allocation for structured streaming

Re: Please unlock Jira ticket for SPARK-24815, Dynamic resource allocation for structured streaming

2024-02-26 Thread Yuming Wang
Unlocked. On Tue, Feb 27, 2024 at 11:47 AM Mich Talebzadeh wrote: > > Hi, > > Can a committer please unlock this SPIP? It is for Dynamic resource > allocation for structured streaming that has got 6 votes. it was locked > because of inactivity by GitHub actions > > [SPARK-24815] Structured

Please unlock Jira ticket for SPARK-24815, Dynamic resource allocation for structured streaming

2024-02-26 Thread Mich Talebzadeh
Hi, Can a committer please unlock this SPIP? It is for Dynamic resource allocation for structured streaming that has got 6 votes. it was locked because of inactivity by GitHub actions [SPARK-24815] Structured Streaming should support dynamic allocation - ASF JIRA (apache.org)

unsubscribe

2024-02-24 Thread Ameet Kini
unsubscribe

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Jungtaek Lim
Thanks for figuring this out. That is my bad. My understanding is that 3.5.1 RC2 doc should be correctly generated in VOTE but it happened during the finalization step. I lost the build artifact for docs (I followed steps and removed docs from dev dist before realizing I shouldn't remove them)

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Dongjoon Hyun
Hi, All. Unfortunately, the Apache Spark `3.5.1 RC2` document artifact seems to be generated from unknown source code instead of the correct source code of the tag, `3.5.1`. https://spark.apache.org/docs/3.5.1/ [image: Screenshot 2024-02-23 at 14.13.07.png] Dongjoon. On Wed, Feb 21, 2024 at

Proposal about moving on from the Shepherd terminology in SPIPs

2024-02-23 Thread Mich Talebzadeh
We had a discussion about getting a Shepherd to assist with Structured streaming SPIP a few hours ago. As an active member I am proposing a move to replace the current terminology "SPIP Shepherd" with the more respectful and inclusive term "SPIP Mentor." We have over the past few years have tried

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
Hi Pavan and those who kindly voted for this SPIP Great to have 6+ votes and no -1 and 0. The so-called mass volume is there. The rest is admin matter and how to drive the project forward and yes there is more than one way of skinning the cat. I think we need some flexibility in the rules given

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Pavan Kotikalapudi
Thanks for the pointers Mich, will wait for Jungtaek Lee or any other PMC members to respond. aggregating upvotes to this email thread +6 Mich Talebzadeh Adam Hobbs Pavan Kotikalapudi Krystal Mitchell Sona Torosyan Aaron Kern Thank you, Pavan On Thu, Feb 22, 2024 at 3:07 PM Mich Talebzadeh

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Aaron Kern
+1

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
Hi, please check this doc Spark Project Improvement Proposals (SPIP) | Apache Spark and specifically the below extract Discussing an SPIP All discussion of an SPIP should take place in a public forum, preferably the discussion attached to

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Pavan Kotikalapudi
Hi Mich, We have five +1s till now. Mich Talebzadeh Adam Hobbs Pavan Kotikalapudi Krystal Mitchell Sona Torosyan (few more in github pr) +0: None -1: None Does it pass the required condition as approved? Not sure of that though, nothing about minimum required is mentioned in the past

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Sona Torosyan
+1

Re: Generating config docs automatically

2024-02-22 Thread Nicholas Chammas
Thank you, Holden! Yes, having everything live in the ConfigEntry is attractive. The main reason I proposed an alternative where the groups are defined in YAML is that if the config groups are defined in ConfigEntry, then altering the groupings – which is relevant only to the display of config

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
Hi Pavan, Do you have a list of votes for this feature by any chance? Does it pass the required condition as approved? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Pavan Kotikalapudi
Yes. The PR was closed due to inactivity by github actions.. The msg also says > If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! On Thu, Feb 22, 2024 at 1:09 AM Mich Talebzadeh

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
I can see it was closed. Was it because of inactivity? Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-21 Thread Pavan Kotikalapudi
Hi Spark PMC members, I think we have few upvotes for this effort here and more people are showing interest (see PR comments .) Is anyone interested in mentoring and reviewing this effort? Also can the repository admin/owner

Re: Generating config docs automatically

2024-02-21 Thread Holden Karau
I think this is a good idea. I like having everything in one source of truth rather than two (so option 1 sounds like a good idea); but that’s just my opinion. I'd be happy to help with reviews though. On Wed, Feb 21, 2024 at 6:37 AM Nicholas Chammas wrote: > I know config documentation is not

Re: Generating config docs automatically

2024-02-21 Thread Nicholas Chammas
I know config documentation is not the most exciting thing. If there is anything I can do to make this as easy as possible for a committer to shepherd, I’m all ears! > On Feb 14, 2024, at 8:53 PM, Nicholas Chammas > wrote: > > I’m interested in automating our config documentation and need

[VOTE][RESULT] Release Apache Spark 3.5.1 (RC2)

2024-02-21 Thread Jungtaek Lim
The vote passes with 6 +1s (4 binding +1s). Thanks to all who helped with the release! (* = binding) +1: Jungtaek Lim Wenchen Fan (*) Cheng Pan Xiao Li (*) Hyukjin Kwon (*) Maxim Gekk (*) +0: None -1: None

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-21 Thread Jungtaek Lim
Thanks everyone for participating the vote! The vote passed. I'll send out the vote result and proceed to the next steps. On Wed, Feb 21, 2024 at 4:36 PM Maxim Gekk wrote: > +1 > > On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon wrote: > >> +1 >> >> On Tue, 20 Feb 2024 at 22:00, Cheng Pan wrote:

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Maxim Gekk
+1 On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon wrote: > +1 > > On Tue, 20 Feb 2024 at 22:00, Cheng Pan wrote: > >> +1 (non-binding) >> >> - Build successfully from source code. >> - Pass integration tests with Spark ClickHouse Connector[1] >> >> [1]

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Hyukjin Kwon
+1 On Tue, 20 Feb 2024 at 22:00, Cheng Pan wrote: > +1 (non-binding) > > - Build successfully from source code. > - Pass integration tests with Spark ClickHouse Connector[1] > > [1] https://github.com/housepower/spark-clickhouse-connector/pull/299 > > Thanks, > Cheng Pan > > > > On Feb 20,

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Xiao Li
+1 Xiao Cheng Pan 于2024年2月20日周二 04:59写道: > +1 (non-binding) > > - Build successfully from source code. > - Pass integration tests with Spark ClickHouse Connector[1] > > [1] https://github.com/housepower/spark-clickhouse-connector/pull/299 > > Thanks, > Cheng Pan > > > > On Feb 20, 2024, at

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-20 Thread Krystal Mitchell
+1 On 2024/01/17 17:49:32 Pavan Kotikalapudi wrote: > Thanks for proposing and voting for the feature Mich. > > adding some references to the thread. > >- Jira ticket - SPARK-24815 > >- Design Doc > >

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Cheng Pan
+1 (non-binding) - Build successfully from source code. - Pass integration tests with Spark ClickHouse Connector[1] [1] https://github.com/housepower/spark-clickhouse-connector/pull/299 Thanks, Cheng Pan > On Feb 20, 2024, at 10:56, Jungtaek Lim wrote: > > Thanks Sean, let's continue the

Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code Asia 2024 are now open! We will be supporting Community over Code Asia, Hangzhou, China July 26th - 28th, 2024. TAC exists

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-19 Thread Wenchen Fan
+1, thanks for making the release! On Sat, Feb 17, 2024 at 3:54 AM Sean Owen wrote: > Yeah let's get that fix in, but it seems to be a minor test only issue so > should not block release. > > On Fri, Feb 16, 2024, 9:30 AM yangjie01 wrote: > >> Very sorry. When I was fixing `SPARK-45242 ( >>

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-19 Thread Jungtaek Lim
Thanks Sean, let's continue the process for this RC. +1 (non-binding) - downloaded all files from URL - checked signature - extracted all archives - ran all tests from source files in source archive file, via running "sbt clean test package" - Ubuntu 20.04.4 LTS, OpenJDK 17.0.9. Also bump to

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Mich Talebzadeh
Ok thanks for your clarifications Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Chao Sun
Hi Mich, > Also have you got some benchmark results from your tests that you can possibly share? We only have some partial benchmark results internally so far. Once shuffle and better memory management have been introduced, we plan to publish the benchmark results (at least TPC-H) in the repo.

Re: ASF board report draft for February

2024-02-18 Thread Mich Talebzadeh
Np, thanks for addressing the point promptly Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The

Re: ASF board report draft for February

2024-02-18 Thread Matei Zaharia
Thanks for the clarification. I updated it to say Comet is in the process of being open sourced. > On Feb 18, 2024, at 1:55 AM, Mich Talebzadeh > wrote: > > Hi Matei, > > With regard to your last point > > "- Project Comet, a plugin designed to accelerate Spark query execution by >

Re: ASF board report draft for February

2024-02-18 Thread Mich Talebzadeh
Hi Matei, With regard to your last point "- Project Comet, a plugin designed to accelerate Spark query execution by leveraging DataFusion and Arrow, has been open-sourced under the Apache Arrow project. For more information, visit https://github.com/apache/arrow-datafusion-comet.; If my

Re: ASF board report draft for February

2024-02-18 Thread Dongjoon Hyun
+1, it looks good to me. Thank you, Matei. Dongjoon On Sat, Feb 17, 2024 at 11:21 AM Matei Zaharia wrote: > Hi all, > > I missed some reminder emails about our board report this month, but here > is my draft. I’ll submit it tomorrow if that’s ok. > > == > > Issues for the board: >

<    1   2   3   4   5   6   7   8   9   10   >