Re: Spark Materialized Views: Improve Query Performance and Data Management

2024-05-03 Thread Jungtaek Lim
most likely there is at least a high level of design and in many cases there is a separate doc for detailed design (This is optional but people tend to provide the doc for the project with non-trivial design). Hope this clarifies the meaning of SPIP. Thanks, Jungtaek Lim (HeartSaVioR) On Sat, May 4, 2024

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Jungtaek Lim
the community to produce a Spark 4.0 Preview soon even if >>>>>> certain features targeting the Delta 4.0 release are still incomplete. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> On Wed, Apr 17

Re: [DISCUSS] Un-deprecate Trigger.Once

2024-04-21 Thread Jungtaek Lim
e Spark 3.4.0 and `Undeprecation(?)` > may cause another confusion in the community, not only for Trigger.Once but > also for all historic `Deprecated` items. > > Dongjoon. > > > On Fri, Apr 19, 2024 at 7:44 PM Jungtaek Lim > wrote: > >> Hi dev, >> >> I'd

[DISCUSS] Un-deprecate Trigger.Once

2024-04-19 Thread Jungtaek Lim
that the trigger won't be available sooner (though we rarely remove public API). So maybe warning log on usage sounds to me as a reasonable alternative. Thoughts? Thanks, Jungtaek Lim (HeartSaVioR)

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-14 Thread Jungtaek Lim
+1 (non-binding), thanks Dongjoon. On Sun, Apr 14, 2024 at 7:22 AM Dongjoon Hyun wrote: > Please vote on SPARK-4 to use ANSI SQL mode by default. > The technical scope is defined in the following PR which is > one line of code change and one line of migration guide. > > - DISCUSSION: >

Re: [DISCUSS] Spark 4.0.0 release

2024-04-14 Thread Jungtaek Lim
W.r.t. state data source - reader (SPARK-45511 ), there are several follow-up tickets, but we don't plan to address them soon. The current implementation is the final shape for Spark 4.0.0, unless there are demands on the follow-up tickets. We

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-04-11 Thread Jungtaek Lim
se it, limitations etcAs you might have already >> noticed in the PR, This change is turned off by default, will only work if >> `spark.dynamicAllocation.streaming.enabled` is true. >> >> Regarding the concerns about expertise in DRA, I will find some core >&

Re: Apache Spark 3.4.3 (?)

2024-04-07 Thread Jungtaek Lim
Sounds like a plan. +1 (non-binding) Thanks for volunteering! On Sun, Apr 7, 2024 at 5:45 AM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark 3.4.2 tag was created on Nov 24th and `branch-3.4` has 85 > commits including important security and correctness patches like > SPARK-45580, SPARK-46092,

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
in the codebase that this must not be enabled by default.) On Tue, Mar 26, 2024 at 7:02 PM Pavan Kotikalapudi wrote: > Hi Bhuwan, > > Glad to hear back from you! Very much appreciate your help on reviewing > the design doc/PR and endorsing this proposal. > > Thank you so much @Jun

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
e know if this is resolved. This seems to me as a blocker to move on. Please also let me know if the contribution is withdrawn from the employer. Thanks, Jungtaek Lim (HeartSaVioR) On Mon, Mar 25, 2024 at 11:47 PM Bhuwan Sahni wrote: > Hi Pavan, > > I looked at the PR, and the changes look simpl

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Jungtaek Lim
+1 (non-binding), thanks Gengliang! On Mon, Mar 11, 2024 at 5:46 PM Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc >

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
see. > > Perhaps we can solve this confusion by sharing the same file `version.json` > across `all versions` in the `Spark website repo`? Make each version of > the document display the `same` data in the dropdown menu. > -- > *发件人:* Jungtaek Lim > *发送时间:* 2

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
possible. > > Only by sharing the same version. json file in each version. > ------ > *发件人:* Jungtaek Lim > *发送时间:* 2024年3月5日 16:47:30 > *收件人:* Pan,Bingkun > *抄送:* Dongjoon Hyun; dev; user > *主题:* Re: [ANNOUNCE] Apache Spark 3.5.1 released &

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
https://github.com/apache/spark/pull/42881 > > So, we need to manually update this file. I can manually submit an update > first to get this feature working. > ------ > *发件人:* Jungtaek Lim > *发送时间:* 2024年3月4日 6:34:42 > *收件人:* Dongjoon Hyun > *抄送:* dev;

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Jungtaek Lim
ull/42428? > > cc @Yang,Jie(INF) > > On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim > wrote: > >> Shall we revisit this functionality? The API doc is built with individual >> versions, and for each individual version we depend on other released >> versions. Thi

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-03 Thread Jungtaek Lim
? What's the criteria of pruning the version? Unless we have a good answer to these questions, I think it's better to revert the functionality - it missed various considerations. On Fri, Mar 1, 2024 at 2:44 PM Jungtaek Lim wrote: > Thanks for reporting - this is odd - the dropdown did not ex

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Jungtaek Lim
to update the version. (For automatic bumping I don't have a good idea.) I'll look into it. Please expect some delay during the holiday weekend in S. Korea. Thanks again. Jungtaek Lim (HeartSaVioR) On Fri, Mar 1, 2024 at 2:14 PM Dongjoon Hyun wrote: > BTW, Jungtaek. > > PySpark document se

[ANNOUNCE] Apache Spark 3.5.1 released

2024-02-28 Thread Jungtaek Lim
you. Jungtaek Lim ps. Yikun is helping us through releasing the official docker image for Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available.

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-23 Thread Jungtaek Lim
t; generated from unknown source code instead of the correct source code of > the tag, `3.5.1`. > > https://spark.apache.org/docs/3.5.1/ > > [image: Screenshot 2024-02-23 at 14.13.07.png] > > Dongjoon. > > > > On Wed, Feb 21, 2024 at 7:15 AM Jungtaek Lim > wrote:

[VOTE][RESULT] Release Apache Spark 3.5.1 (RC2)

2024-02-21 Thread Jungtaek Lim
The vote passes with 6 +1s (4 binding +1s). Thanks to all who helped with the release! (* = binding) +1: Jungtaek Lim Wenchen Fan (*) Cheng Pan Xiao Li (*) Hyukjin Kwon (*) Maxim Gekk (*) +0: None -1: None

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-21 Thread Jungtaek Lim
024 at 22:00, Cheng Pan wrote: >> >>> +1 (non-binding) >>> >>> - Build successfully from source code. >>> - Pass integration tests with Spark ClickHouse Connector[1] >>> >>> [1] https://github.com/housepower/spark-clickhouse-connector/pu

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-19 Thread Jungtaek Lim
he `connect` module. >> >> >> >> I have submitted a backport PR >> <https://github.com/apache/spark/pull/45141> to branch-3.5, and if >> necessary, we can merge it to fix this test issue. >> >> >> >> Jie Yang >> >> &

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-16 Thread Jungtaek Lim
], 0 CollectMetrics my_metric, > [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], > 44 >+- LocalRelation , [id#0, name#0] > +- LocalRelation , [id#0, name#0] > (PlanTest.scala:179) > > On Thu, Feb 15, 2024 at 1:34

Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-15 Thread Jungtaek Lim
UPDATE: The vote thread is up now. https://lists.apache.org/thread/f28h0brncmkoyv5mtsqtxx38hx309c2j On Tue, Feb 6, 2024 at 11:30 AM Jungtaek Lim wrote: > Thanks all for the positive feedback! Will figure out time to go through > the RC process. Stay tuned! > > On Mon, Feb 5, 202

Re: Heads-up: Update on Spark 3.5.1 RC

2024-02-15 Thread Jungtaek Lim
UPDATE: Now the vote thread is up for RC2. https://lists.apache.org/thread/f28h0brncmkoyv5mtsqtxx38hx309c2j On Wed, Feb 14, 2024 at 2:59 AM Dongjoon Hyun wrote: > Thank you for the update, Jungtaek. > > Dongjoon. > > On Tue, Feb 13, 2024 at 7:29 AM Jungtaek Lim > wrote: &

[VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-15 Thread Jungtaek Lim
DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured out doc generation issue after tagging RC1. Please vote on releasing the following candidate as Apache Spark version 3.5.1. The vote is open until February 18th 9AM (PST) and passes if a majority +1 PMC votes are cast,

Heads-up: Update on Spark 3.5.1 RC

2024-02-13 Thread Jungtaek Lim
issues. Maybe I'll need to start with RC2 after things are sorted out and necessary fixes are landed to branch-3.5. Thanks, Jungtaek Lim (HeartSaVioR)

Re: Enhanced Console Sink for Structured Streaming

2024-02-05 Thread Jungtaek Lim
Maybe we could keep the default as it is, and explicitly turn on verboseMode to enable auxiliary information. I'm not a believer that anyone will parse the output of console sink (which means this could be a breaking change), but changing the default behavior should be taken conservatively. We can

Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-05 Thread Jungtaek Lim
;>> wrote: >>>> >>>>> +1 >>>>> >>>>> On Sun, Feb 4, 2024 at 6:07 AM beliefer wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> >>>>>> >>>>>&g

[DISCUSS] Release Spark 3.5.1?

2024-02-03 Thread Jungtaek Lim
correctness issues) or critical, which justifies the release. https://issues.apache.org/jira/projects/SPARK/versions/12353495 What do you think about releasing 3.5.1 with the current head of branch-3.5? I'm happy to volunteer as the release manager. Thanks, Jungtaek Lim (HeartSaVioR)

Re: Spark 3.5.1

2024-01-31 Thread Jungtaek Lim
not familiar with Spark project's release process), and seek another volunteer if I can't make any progress. Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Jan 30, 2024 at 7:15 PM Santosh Pingale wrote: > Hey there > > Spark 3.5 branch has accumulated 199 commits with quite a few bug > f

[VOTE][RESULT] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-11 Thread Jungtaek Lim
The vote passes with 12 +1s (3 binding +1s). Thanks to all who reviews the SPIP doc and votes! (* = binding) +1: - Jungtaek Lim - Anish Shrigondekar - Mich Talebzadeh - Raghu Angadi - 刘唯 - Shixiong Zhu (*) - Bartosz Konieczny - Praveen Gattu - Burak Yavuz - Bhuwan Sahni - L. C. Hsieh

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-11 Thread Jungtaek Lim
Talebzadeh, >>>>>>>>>> Dad | Technologist | Solutions Architect | Engineer >>>>>>>>>> London >>>>>>>>>> United Kingdom >>>>>>>>>> >>>>>>>>>> >>>>

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Jungtaek Lim
Friendly reminder, VOTE thread is now live! https://lists.apache.org/thread/16ryx828bwoth31hobknxnjfxjxj07mf The vote made here is not counted toward, so please ensure you vote in the VOTE thread. Thanks! On Tue, Jan 9, 2024 at 9:33 AM Jungtaek Lim wrote: > Thanks everyone for the feedb

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Starting with my +1 (non-binding). Thanks! On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Streaming - Arbitrary > State API v2. > > References: > >- JIRA ticket <https://issues.apache.org/jira/brow

[VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
p=sharing> - Discussion thread <https://lists.apache.org/thread/3jyjdgk1m5zyqfmrocnt6t415703nc8l> Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because … Thanks! Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
gt;> Hi dev, >> >> >> >> Addressed the comments that Jungtaek had on the doc. Bumping the >> thread once again to see if other folks have any feedback on the proposal. >> >> >> >> Thanks, >> >> Anish >> >> >> >>

Re: Apache Spark 3.3.4 EOL Release?

2023-12-11 Thread Jungtaek Lim
volunteering, but I'll figure out I can make it, hopefully before the end of this year. Thanks, Jungtaek Lim (HeartSaVioR) On Sat, Dec 9, 2023 at 2:22 AM Dongjoon Hyun wrote: > Thank you, Mridul, and Kent, too. > > Additionally, thank you for volunteering as a release manager,

Re: Apache Spark 3.3.4 EOL Release?

2023-12-07 Thread Jungtaek Lim
+1 to release 3.3.4 and consider 3.3 as EOL. Btw, it'd be probably ideal if we could encourage taking an opportunity of experiencing the release process to people who hadn't had a time to go through (when there are people who are happy to take it). If you don't mind and we are not very strict on

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2023-11-27 Thread Jungtaek Lim
Kindly bump for better reach after the long holiday. Please kindly review the proposal which opens the chance to address complex use cases of streaming. Thanks! On Thu, Nov 23, 2023 at 8:19 AM Jungtaek Lim wrote: > Thanks Anish for proposing SPIP and initiating this thread! I beli

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2023-11-22 Thread Jungtaek Lim
; JIRA: https://issues.apache.org/jira/browse/SPARK-45939 > SPIP: > https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig/edit?usp=sharing > Design Doc: > https://docs.google.com/document/d/1QjZmNZ-fHBeeCYKninySDIoOEWfX6EmqXs2lK097u9o/edit?usp=sharing > > cc

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-15 Thread Jungtaek Lim
+1 (non-binding) On Thu, Nov 16, 2023 at 4:23 AM Ilan Filonenko wrote: > +1 (non-binding) > > On Wed, Nov 15, 2023 at 12:57 PM Xiao Li wrote: > >> +1 >> >> bo yang 于2023年11月15日周三 05:55写道: >> >>> +1 >>> >>> On Tue, Nov 14, 2023 at 7:18 PM huaxin gao >>> wrote: >>> +1 On Tue,

[VOTE][RESULT] SPIP: State Data Source - Reader

2023-10-25 Thread Jungtaek Lim
The vote passes with 9 +1s (4 binding +1s). Thanks to all who reviews the SPIP doc and votes! (* = binding) +1: - Jungtaek Lim - Wenchen Fan (*) - Anish Shrigondekar - L. C. Hsieh (*) - Jia Fan - Bartosz Konieczny - Yuanjian Li (*) - Shixiong Zhu (*) - Yuepeng Pan +0: None -1: None

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-25 Thread Jungtaek Lim
Thanks all for participating! The vote passed. I'll send out the result to a separate thread. On Thu, Oct 26, 2023 at 10:52 AM Yuepeng Pan wrote: > +1 (non-binding) > > Regards, > Roc > > At 2023-10-23 12:23:52, "Jungtaek Lim" > wrote: > > Hi all, > >

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-24 Thread Jungtaek Lim
>>> +1 >>> >>> On Mon, Oct 23, 2023 at 6:31 PM Anish Shrigondekar >>> wrote: >>> > >>> > +1 (non-binding) >>> > >>> > Thanks, >>> > Anish >>> > >>> > On Mon, Oct 23, 2

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-23 Thread Jungtaek Lim
M Jungtaek Lim wrote: > I don't see major comments as of now. Given that the thread was initiated > more than 10 days ago and I see multiple supporters, I'm going to initiate > a VOTE thread. > > Please participate in the VOTE thread as well. Thanks! > > On Thu, Oct 19

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-22 Thread Jungtaek Lim
Starting with my +1 (non-binding). Thanks! On Mon, Oct 23, 2023 at 1:23 PM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: State Data Source - Reader. > > The high level summary of the SPIP is that we propose a new data source > which enables a read

[VOTE] SPIP: State Data Source - Reader

2023-10-22 Thread Jungtaek Lim
- Discussion thread <https://lists.apache.org/thread/l16cjqrpfbrlb8svhdw3qlfkh9pnlkcc> Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because … Thanks! Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-22 Thread Jungtaek Lim
I don't see major comments as of now. Given that the thread was initiated more than 10 days ago and I see multiple supporters, I'm going to initiate a VOTE thread. Please participate in the VOTE thread as well. Thanks! On Thu, Oct 19, 2023 at 11:39 AM Jungtaek Lim wrote: > Also, I w

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Jungtaek Lim
dynamic state rebalancing that could probably be >>> implemented with a lower latency directly in the stateful API. Instead I'm >>> thinking more of an offline job to rebalance the state and later restart >>> the stateful pipeline with the changed number of shuffle part

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Jungtaek Lim
I'm >> thinking more of an offline job to rebalance the state and later restart >> the stateful pipeline with the changed number of shuffle partitions. >> >> Best, >> Bartosz. >> >> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim < >> kabhwan.opensou..

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Jungtaek Lim
left some comments concerning ongoing maintenance and > compatibility-related matters, which we can continue to discuss. > > Jungtaek Lim 于2023年10月17日周二 05:23写道: > >> Thanks Bartosz and Anish for your support! >> >> I'll wait for a couple more days to see whether we ca

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim
umber of shuffle partitions. >> >> Best, >> Bartosz. >> >> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> bump for better reach >>> >>> On Thu, Oct

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim
bump for better reach On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim wrote: > Sorry, please use this link instead for SPIP doc: > https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing > > > On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim &

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-12 Thread Jungtaek Lim
Sorry, please use this link instead for SPIP doc: https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim wrote: > Hi dev, > > I'd like to start a discussion on "State Data Source - Reader"

[DISCUSS] SPIP: State Data Source - Reader

2023-10-12 Thread Jungtaek Lim
sues.apache.org/jira/browse/SPARK-45511 Looking forward to your feedback! Thanks, Jungtaek Lim (HeartSaVioR) ps. The scope of the project is narrowed to the reader in this SPIP, since the writer requires us to consider more cases. We are planning on it.

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim
look back which criteria we use for evicting states, which could become outputs of the operator. On Tue, Oct 10, 2023 at 8:10 PM Jungtaek Lim wrote: > We wouldn't like to expose the internal mechanism to the public. > > As you are a very detail oriented engineer tracking major changes,

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim
ime_data") > }) > > A little bit as you can do with Apache Flink in fact: > > https://github.com/immerok/recipes/blob/main/late-data-to-sink/src/main/java/com/immerok/cookbook/LateDataToSeparateSink.java#L81 > > WDYT? > > Best, > Bartosz. > > PS. Will be

Re: Watermark on late data only

2023-10-09 Thread Jungtaek Lim
Technically speaking, "late data" represents the data which cannot be processed due to the fact the engine threw out the state associated with the data already. That said, the only reason watermark does exist for streaming is to handle stateful operators. From the engine's point of view, there is

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread Jungtaek Lim
Congrats! 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성: > Congratulations! > > > > Jie Yang > > > > *发件人**: *Dongjoon Hyun > *日期**: *2023年10月4日 星期三 13:04 > *收件人**: *Hyukjin Kwon > *抄送**: *Hussein Awala , Rui Wang , > Gengliang Wang , Xiao Li , " > dev@spark.apache.org" > *主题**: *Re: Welcome to

[DISCUSS] Porting back SPARK-45178 to 3.5/3.4 version lines

2023-09-20 Thread Jungtaek Lim
leave these version lines as they are. Looking for voices on this. Thanks in advance! Jungtaek Lim (HeartSaVioR)

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving this release and the patience on multiple RCs! On Tue, Sep 12, 2023 at 10:00 AM Yuanjian Li wrote: > +1 (non-binding) > > Yuanjian Li 于2023年9月11日周一 09:36写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving this release! On Fri, Sep 8, 2023 at 11:29 AM Holden Karau wrote: > +1 pip installing seems to function :) > > On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > >> +1. >> >> On Thu, Sep 7, 2023 at 10:33 PM yangjie01 >> wrote: >> >>> +1 >>> >>> >>> >>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-01 Thread Jungtaek Lim
le, I'd like to see this be fixed in 3.5.0. There is no fix yet but I'm working on it. I'll give an update here. Maybe we could lower down priority and let the release go with describing this as a "known issue", if I couldn't make progress in a couple of days. I'm sorry about that. Thanks

Re: Welcome two new Apache Spark committers

2023-08-06 Thread Jungtaek Lim
Congrats Peter and Xiduo! On Mon, Aug 7, 2023 at 11:33 AM yangjie01 wrote: > Congratulations, Peter and Xiduo ~ > > > > *发件人**: *Hyukjin Kwon > *日期**: *2023年8月7日 星期一 10:30 > *收件人**: *Ruifeng Zheng > *抄送**: *Xiao Li , Debasish Das < > debasish.da...@gmail.com>, Wenchen Fan , Spark dev > list

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Jungtaek Lim
streaming. On Tue, Jul 11, 2023 at 8:35 AM Matei Zaharia wrote: > +1 > > On Jul 10, 2023, at 10:19 AM, Takuya UESHIN > wrote: > > +1 > > On Sun, Jul 9, 2023 at 10:05 PM Ruifeng Zheng wrote: > >> +1 >> >> On Mon, Jul 10, 2023 at 8:20 AM Jungtaek

Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Jungtaek Lim
+1 On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin wrote: > +1! > > > On Fri, Jul 7 2023 at 11:58 AM, Holden Karau > wrote: > >> +1 >> >> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: >> >>> +1 >>> >>> On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>>

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Jungtaek Lim
+1 On Wed, Jul 5, 2023 at 2:23 AM L. C. Hsieh wrote: > +1 > > Thanks Yuanjian. > > On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: > > > > +1 > > > > > > > > 发件人: Maxim Gekk > > 日期: 2023年7月4日 星期二 17:24 > > 收件人: Kent Yao > > 抄送: "dev@spark.apache.org" > > 主题: Re: Time for Spark v3.5.0

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Jungtaek Lim
I concur with Holden and Mridul. Let's build a plan before we call the tentative deadline. I understand setting the tentative deadline would definitely help in pushing back features which "never ever ends", but at least we may want to list up features and discuss for priority. It is still possible

Re: ASF policy violation and Scala version issues

2023-06-11 Thread Jungtaek Lim
Are we concerned that a library does not release a new version which bumps the Scala version, which the Scala version is announced in less than a week? Shall we respect the efforts of all maintainers of open source projects we use as dependencies, regardless whether they are ASF projects or

Re: JDK version support policy?

2023-06-07 Thread Jungtaek Lim
+1 to drop Java 8 but +1 to set the lowest support version to Java 11. Considering the phase for only security updates, 11 LTS would not be EOLed in very long time. Unless that’s coupled with other deps which require bumping JDK version (hope someone can bring up lists), it doesn’t seem to buy

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Jungtaek Lim
Shall we initiate a new discussion thread for Scala 2.13 by default? While I'm not an expert on this area, it sounds like the change is major and (probably) breaking. It seems to be worth having a separate discussion thread rather than just treat it like one of 25 items. On Tue, May 30, 2023 at

Re: Parametrisable output metadata path

2023-04-17 Thread Jungtaek Lim
small correction: "I intentionally didn't enumerate." The meaning could be quite different so making a small correction. On Tue, Apr 18, 2023 at 5:38 AM Jungtaek Lim wrote: > There seems to be miscommunication - I didn't mean "Delta Lake". I meant > "any" Da

Re: Parametrisable output metadata path

2023-04-17 Thread Jungtaek Lim
> Hi Jungtaek, > integration with Delta Lake is not an option to me, I raised a PR for > improvement of FileStreamSink with the new parameter: > https://github.com/apache/spark/pull/40821. Can you please take a look? > > -- > Kind regards/ Pozdrawiam, > Wojciech Indyk >

Re: Parametrisable output metadata path

2023-04-15 Thread Jungtaek Lim
irectory, so someone might find it useful. For end-to-end exactly once, people can either use a limited current FileStream sink or use Data Lake products. I don't see the value in making improvements to the current FileStream sink. Thanks, Jungtaek Lim (HeartSaVioR) On Sun, Apr 16, 2023 at 2:

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-11 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving the release! On Wed, Apr 12, 2023 at 3:41 AM Xinrong Meng wrote: > +1 non-binding > > Thank you Doogjoon! > > Wenchen Fan 于2023年4月10日 周一下午11:32写道: > >> +1 >> >> On Tue, Apr 11, 2023 at 10:09 AM Hyukjin Kwon >> wrote: >> >>> +1 >>> >>> On Tue, 11 Apr 2023 at

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-11 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving the release! On Wed, Apr 12, 2023 at 10:42 AM Ye Zhou wrote: > +1 non-binding > > On Tue, Apr 11, 2023 at 18:40 Ye Zhou wrote: > >> Yes, it is not a regression issue. We can fix it after the release. >> >> Thanks >> Ye >> >> On Tue, Apr 11, 2023 at 17:42

Re: Slack for PySpark users

2023-04-03 Thread Jungtaek Lim
be there. On Tue, Apr 4, 2023 at 7:04 AM Jungtaek Lim wrote: > The number of subscribers doesn't give any meaningful value. Please look > into the number of mails being sent to the list. > > https://lists.apache.org/list.html?u...@spark.apache.org > The latest month there were more than 2

Re: Slack for PySpark users

2023-04-03 Thread Jungtaek Lim
subscribers. > > May I ask if the users prefer to use the ASF Official Slack channel > than the user mailing list? > > Dongjoon. > > > > On Thu, Mar 30, 2023 at 9:10 PM Jungtaek Lim > wrote: > >> I'm reading through the page "Briefing: The Apache Way&q

Re: Slack for PySpark users

2023-03-30 Thread Jungtaek Lim
I'm reading through the page "Briefing: The Apache Way", and in the section of "Open Communications", restriction of communication inside ASF INFRA (mailing list) is more about code and decision-making. https://www.apache.org/theapacheway/#what-makes-the-apache-way-so-hard-to-define It's

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-20 Thread Jungtaek Lim
Jungtaek Lim wrote: > Given that I got more than 3 PMC members' positive votes as well as > several active contributors' positive votes as well, I will proceed with > the actual work. > (It may take a couple of more days as folk in US will help me and there's > a holiday in US.) > &g

Re: Time for Spark 3.4.0 release?

2023-01-17 Thread Jungtaek Lim
+1 on delaying. I see there’s a JIRA ticket about DStream depreciation, we are working on this - thanks for taking this into account! 2023년 1월 18일 (수) 오후 12:43, Hyukjin Kwon 님이 작성: > +1. Thanks for driving this, Xinrong. > > On Wed, 18 Jan 2023 at 12:31, Xinrong Meng > wrote: > >> Hi All, >> >>

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-15 Thread Jungtaek Lim
2023 at 11:16 PM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim >>> wrote: >>> > >>> > Yes, exactly. I'm sorry to bring confusion - should have clarified >>> action items on the pro

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-15 Thread Jungtaek Lim
ted > > Jungtaek, can you please provide / elaborate on the concrete actions you > intend on taking for the depreciation process? > > Best, > > Jerry > > On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: > >> +1 >> >> On Thu, Jan 12, 2023 at 10:39 PM

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
he API? > > > On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Maybe I need to clarify - my proposal is "explicitly" deprecating it, >> which incurs code change for sure. Guidance on the Spark website is done >> a

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
mote again that they are encouraged to move to SS.) This is not an action item from the proposal: - Add (tentative) target version to remove the API on the deprecation message. Hope this makes the proposal crystally clear. On Fri, Jan 13, 2023 at 3:05 PM Jungtaek Lim wrote: > Maybe I need to

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
at 5:08 PM Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> +1 >>> >>> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon >>> wrote: >>> >>>> +1 >>>> >>>> On Fri, 13 Jan 2023 at 08:51, J

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Jungtaek Lim
bump for more visibility. On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim wrote: > Hi dev, > > I'd like to propose the deprecation of DStream in Spark 3.4, in favor of > promoting Structured Streaming. > (Sorry for the late proposal, if we don't make the change in 3.4, we will

[DISCUSS] Deprecate DStream in 3.4

2023-01-10 Thread Jungtaek Lim
to propose the target version for removal. The goal is to guide users to refrain from constructing a new workload with DStream. We might want to go with this in future, but it would require a new discussion thread at that time. What do you think? Thanks, Jungtaek Lim (HeartSaVioR)

[VOTE][RESULT][SPIP] Asynchronous Offset Management in Structured Streaming

2022-12-04 Thread Jungtaek Lim
The vote passes with 7 +1s (5 binding +1s). Thanks to all who reviews the SPIP doc and votes! (* = binding) +1: - Jungtaek Lim - Xingbo Jiang - Mridul Muralidharan (*) - Hyukjin Kwon (*) - Shixiong Zhu (*) - Wenchen Fan (*) - Dongjoon Hyun (*) +0: None -1: None Thanks, Jungtaek Lim

Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
Starting with +1 from me. On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: Asynchronous Offset Management in > Structured Streaming. > > The high level summary of the SPIP is that we propose a couple of > improvements

[VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
ache.org/thread/yv8ffr56prjr16qh12lwjyjl1q8dl7lp> Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because … Thanks! Jungtaek Lim (HeartSaVioR)

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
;>> may serve as the "future" engine powering Spark Streaming. Improving the >>>>> "current" engine does not mean we cannot work on a "future" engine. These >>>>> two are not mutually exclusive. I would like to focus the discussion o

Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Jungtaek Lim
Thanks Chao for driving the release! On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan wrote: > Thanks, Chao! > > On Wed, Nov 30, 2022 at 1:33 AM Chao Sun wrote: > >> We are happy to announce the availability of Apache Spark 3.2.3! >> >> Spark 3.2.3 is a maintenance release containing stability

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-23 Thread Jungtaek Lim
t; Jungtaek, >> >> Thanks for taking up the role to shepard this SPIP! Thank you for also >> chiming in on your thoughts concerning the continuous mode! >> >> Best, >> >> Jerry >> >> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim < >> kabhw

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-22 Thread Jungtaek Lim
Just FYI, I'm shepherding this SPIP project. I think the major meta question would be, "why don't we spend effort on continuous mode rather than initiating another feature aiming for the same workload?". Jerry already updated the doc to answer the question, but I can also share my thoughts about

Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Jungtaek Lim
+1 Nice to see the chance for driver to reduce resource usage and increase stability, especially the fact that the driver is SPOF. It's even promising to have a future plan to pre-bake the kvstore for SHS from the driver. Thanks for driving the effort, Gengliang! On Thu, Nov 17, 2022 at 5:32 AM

Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-18 Thread Jungtaek Lim
No further voice so far. I'm going to submit a PR. Thanks again for the feedback! On Mon, Oct 17, 2022 at 9:30 AM Jungtaek Lim wrote: > Thanks Gabor and Dongjoon for supporting this! > > Bump to reach more eyes. If there is no further voice on this in a couple > of days, I

Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-16 Thread Jungtaek Lim
; >> BR, >> G >> >> >> On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> Hi all, >>> >>> I would like to propose flipping the default value of Kafka offset >>> fetching config

[DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-12 Thread Jungtaek Lim
ld be introduced inevitably (they can set topic based ACL rule), but most people will get benefited. IMHO this is something we can deal with release/migration note. Would like to hear the voices on this. Thanks, Jungtaek Lim (HeartSaVioR)

  1   2   3   4   >