Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Wenchen Fan
Thanks, Chao!

On Wed, Nov 30, 2022 at 1:33 AM Chao Sun  wrote:

> We are happy to announce the availability of Apache Spark 3.2.3!
>
> Spark 3.2.3 is a maintenance release containing stability fixes. This
> release is based on the branch-3.2 maintenance branch of Spark. We strongly
> recommend all 3.2 users to upgrade to this stable release.
>
> To download Spark 3.2.3, head over to the download page:
> https://spark.apache.org/downloads.html
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-3.html
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
> Chao
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Jungtaek Lim
Thanks Chao for driving the release!

On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan  wrote:

> Thanks, Chao!
>
> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun  wrote:
>
>> We are happy to announce the availability of Apache Spark 3.2.3!
>>
>> Spark 3.2.3 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.2 maintenance branch of Spark. We
>> strongly
>> recommend all 3.2 users to upgrade to this stable release.
>>
>> To download Spark 3.2.3, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-2-3.html
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Chao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Contributions needed: 4 higher order functions

2022-11-30 Thread Hyukjin Kwon
Hi all,

There are four higher order functions in our backlog:

- https://issues.apache.org/jira/browse/SPARK-41235
- https://issues.apache.org/jira/browse/SPARK-41234
- https://issues.apache.org/jira/browse/SPARK-41233
- https://issues.apache.org/jira/browse/SPARK-41232

Would be a great chance for new contributors to understand and get into
Catalyst optimizer and Spark SQL.

Any help on these tickets would be much appreciated.


Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Maxim Gekk
Thank you, Chao!

On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim 
wrote:

> Thanks Chao for driving the release!
>
> On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan  wrote:
>
>> Thanks, Chao!
>>
>> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun  wrote:
>>
>>> We are happy to announce the availability of Apache Spark 3.2.3!
>>>
>>> Spark 3.2.3 is a maintenance release containing stability fixes. This
>>> release is based on the branch-3.2 maintenance branch of Spark. We
>>> strongly
>>> recommend all 3.2 users to upgrade to this stable release.
>>>
>>> To download Spark 3.2.3, head over to the download page:
>>> https://spark.apache.org/downloads.html
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-3-2-3.html
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this
>>> release. This release would not have been possible without you.
>>>
>>> Chao
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Custom resolution rules that grow query plans

2022-11-30 Thread Ted Chester Jenks
Hello,

I wish to write a custom logical plan rule that modifies the output schema and 
grows the logical plan. The purpose of the rule is roughly to apply a 
projection on top of DatasourceV2Relation depending on some condition:


case class MyRule extends Rule[LogicalPlan] {
  override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case relation: DataSourceV2Relation if someCondition(relation) =>
  Project(getExpressions(relation), relation)
  }
}


We add this rule to extendedResolutionRules understanding it can’t be a 
postHocResolutionRule or optimizer rule because it modifies the output scheme 
and the Project needs to be resolved.

As extendedResolutionRule it runs in the fixed-point “Resolution” batch, 
meaning the batch keeps running indefinitely as the rule grows the query plan 
on every iteration.

It is possible to avoid the batch running indefinitely by adding to the 
relation options or node tags to mark the node was processed. This feels a 
little hacky. Is there functionality in Spark I am missing that can achieve the 
desired behavior without resorting to this?

I imagine there may be a rule in spark that deals with this but I could not 
find it.

If this is not covered, I can draft a contribution to cover this case.

Cheers,
Ted



Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Mridul Muralidharan
Thanks for all the clarifications and details Jerry, Jungtaek :-)
This looks like an exciting improvement to Structured Streaming - looking
forward to it becoming part of Apache Spark !

Regards,
Mridul


On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng 
wrote:

> Hi all,
>
> I will add my two cents.  Improving the Microbatch execution engine does
> not prevent us from working/improving on the continuous execution engine in
> the future.  These are orthogonal issues.  This new mode I am proposing in
> the microbatch execution engine intends to lower latency of this execution
> engine that most people use today.  We can view it as an incremental
> improvement on the existing engine. I see the continuous execution engine
> as a partially completed re-write of spark streaming and may serve as the
> "future" engine powering Spark Streaming.   Improving the "current" engine
> does not mean we cannot work on a "future" engine.  These two are not
> mutually exclusive. I would like to focus the discussion on the merits of
> this feature in regards to the current micro-batch execution engine and not
> a discussion on the future of continuous execution engine.
>
> Best,
>
> Jerry
>
>
> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim 
> wrote:
>
>> Hi Mridul,
>>
>> I'd like to make clear to avoid any misunderstanding - the decision was
>> not led by me. (I'm just a one of engineers in the team. Not even TL.) As
>> you see the direction, there was an internal consensus to not revisit the
>> continuous mode. There are various reasons, which I think we know already.
>> You seem to remember I have raised concerns about continuous mode, but have
>> you indicated that it was even over 2 years ago? I still see no traction
>> around the project. The main reason I abandoned the discussion was due to
>> promising effort on integrating push based shuffle into continuous mode to
>> achieve shuffle, but no effort has been made so far.
>>
>> The goal of this SPIP is to have an alternative approach dealing with
>> same workload, given that we no longer have confidence of success of
>> continuous mode. But I also want to make clear that deprecating and
>> eventually retiring continuous mode is not a goal of this project. If that
>> happens eventually, that would be a side-effect. Someone may have concerns
>> that we have two different projects aiming for similar thing, but I'd
>> rather see both projects having competition. If anyone willing to improve
>> continuous mode can start making the effort right now. This SPIP does not
>> block it.
>>
>>
>> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Hi Jungtaek,
>>>
>>>   Given the goal of the SPIP is reducing latency for stateless apps, and
>>> should reasonably fit continuous mode design goals, it feels odd to not
>>> support it fin the proposal.
>>>
>>> I know you have raised concerns about continuous mode in past as well in
>>> dev@ list, and we are further ignoring it in this proposal (and
>>> possibly other enhancements in past few releases).
>>>
>>> Do you want to revisit the discussion to support it and propose a vote
>>> on that ? And move it to deprecated ?
>>>
>>> I am much more comfortable not supporting this SPIP for CM if it was
>>> deprecated.
>>>
>>> Thoughts ?
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>>
>>>
>>> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng 
>>> wrote:
>>>
 Jungtaek,

 Thanks for taking up the role to shepard this SPIP!  Thank you for also
 chiming in on your thoughts concerning the continuous mode!

 Best,

 Jerry

 On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Just FYI, I'm shepherding this SPIP project.
>
> I think the major meta question would be, "why don't we spend
> effort on continuous mode rather than initiating another feature aiming 
> for
> the same workload?". Jerry already updated the doc to answer the question,
> but I can also share my thoughts about it.
>
> I feel like the current "continuous mode" is a niche solution. (It's
> not to blame. If you have to deal with such workload but can't rewrite the
> underlying engine from scratch, then there are really few options.)
> Since the implementation went with a workaround to implement which the
> architecture does not support natively e.g. distributed snapshot, it gets
> quite tricky on maintaining and expanding the project. It also requires 
> 3rd
> parties to implement a separate source and sink implementation, which I'm
> not sure how many 3rd parties actually followed so far.
>
> Eventually, "continuous mode" becomes an area no one in the active
> community knows the details and has willingness to maintain. I wouldn't 
> say
> we are confident to remove the tag on "experimental", although the feature
> has been shipped for years. It was introduced in Spark 2.3, surprising
> enough?
>
> 

Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Yang,Jie(INF)
Thanks, Chao!

发件人: Maxim Gekk 
日期: 2022年11月30日 星期三 19:40
收件人: Jungtaek Lim 
抄送: Wenchen Fan , Chao Sun , dev 
, user 
主题: Re: [ANNOUNCE] Apache Spark 3.2.3 released

Thank you, Chao!

On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim 
mailto:kabhwan.opensou...@gmail.com>> wrote:
Thanks Chao for driving the release!

On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Thanks, Chao!

On Wed, Nov 30, 2022 at 1:33 AM Chao Sun 
mailto:sunc...@apache.org>> wrote:
We are happy to announce the availability of Apache Spark 3.2.3!

Spark 3.2.3 is a maintenance release containing stability fixes. This
release is based on the branch-3.2 maintenance branch of Spark. We strongly
recommend all 3.2 users to upgrade to this stable release.

To download Spark 3.2.3, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-3.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.

Chao

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Dongjoon Hyun
Thank you, Chao!

On Wed, Nov 30, 2022 at 8:16 AM Yang,Jie(INF)  wrote:

> Thanks, Chao!
>
>
>
> *发件人**: *Maxim Gekk 
> *日期**: *2022年11月30日 星期三 19:40
> *收件人**: *Jungtaek Lim 
> *抄送**: *Wenchen Fan , Chao Sun ,
> dev , user 
> *主题**: *Re: [ANNOUNCE] Apache Spark 3.2.3 released
>
>
>
> Thank you, Chao!
>
>
>
> On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
> Thanks Chao for driving the release!
>
>
>
> On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan  wrote:
>
> Thanks, Chao!
>
>
>
> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun  wrote:
>
> We are happy to announce the availability of Apache Spark 3.2.3!
>
> Spark 3.2.3 is a maintenance release containing stability fixes. This
> release is based on the branch-3.2 maintenance branch of Spark. We strongly
> recommend all 3.2 users to upgrade to this stable release.
>
> To download Spark 3.2.3, head over to the download page:
> https://spark.apache.org/downloads.html
> 
>
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-2-3.html
> 
>
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
>
> Chao
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread huaxin gao
Thanks Chao for driving the release!

On Wed, Nov 30, 2022 at 9:24 AM Dongjoon Hyun 
wrote:

> Thank you, Chao!
>
> On Wed, Nov 30, 2022 at 8:16 AM Yang,Jie(INF)  wrote:
>
>> Thanks, Chao!
>>
>>
>>
>> *发件人**: *Maxim Gekk 
>> *日期**: *2022年11月30日 星期三 19:40
>> *收件人**: *Jungtaek Lim 
>> *抄送**: *Wenchen Fan , Chao Sun ,
>> dev , user 
>> *主题**: *Re: [ANNOUNCE] Apache Spark 3.2.3 released
>>
>>
>>
>> Thank you, Chao!
>>
>>
>>
>> On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>> Thanks Chao for driving the release!
>>
>>
>>
>> On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan  wrote:
>>
>> Thanks, Chao!
>>
>>
>>
>> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun  wrote:
>>
>> We are happy to announce the availability of Apache Spark 3.2.3!
>>
>> Spark 3.2.3 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.2 maintenance branch of Spark. We
>> strongly
>> recommend all 3.2 users to upgrade to this stable release.
>>
>> To download Spark 3.2.3, head over to the download page:
>> https://spark.apache.org/downloads.html
>> 
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-2-3.html
>> 
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Chao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread L. C. Hsieh
Thanks, Chao!

On Wed, Nov 30, 2022 at 9:58 AM huaxin gao  wrote:
>
> Thanks Chao for driving the release!
>
> On Wed, Nov 30, 2022 at 9:24 AM Dongjoon Hyun  wrote:
>>
>> Thank you, Chao!
>>
>> On Wed, Nov 30, 2022 at 8:16 AM Yang,Jie(INF)  wrote:
>>>
>>> Thanks, Chao!
>>>
>>>
>>>
>>> 发件人: Maxim Gekk 
>>> 日期: 2022年11月30日 星期三 19:40
>>> 收件人: Jungtaek Lim 
>>> 抄送: Wenchen Fan , Chao Sun , dev 
>>> , user 
>>> 主题: Re: [ANNOUNCE] Apache Spark 3.2.3 released
>>>
>>>
>>>
>>> Thank you, Chao!
>>>
>>>
>>>
>>> On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim 
>>>  wrote:
>>>
>>> Thanks Chao for driving the release!
>>>
>>>
>>>
>>> On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan  wrote:
>>>
>>> Thanks, Chao!
>>>
>>>
>>>
>>> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun  wrote:
>>>
>>> We are happy to announce the availability of Apache Spark 3.2.3!
>>>
>>> Spark 3.2.3 is a maintenance release containing stability fixes. This
>>> release is based on the branch-3.2 maintenance branch of Spark. We strongly
>>> recommend all 3.2 users to upgrade to this stable release.
>>>
>>> To download Spark 3.2.3, head over to the download page:
>>> https://spark.apache.org/downloads.html
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-3-2-3.html
>>>
>>> We would like to acknowledge all community members for contributing to this
>>> release. This release would not have been possible without you.
>>>
>>> Chao
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Holden Karau
Do we want to start syndicating Apache Spark Twitter to a Mastodon
instance. It seems like a lot of software dev folks are moving over there
and it would be good to reach our users where they are.

Any objections / concerns? Any thoughts on which server we should pick if
we do this?
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Shixiong Zhu
+1

This is exciting. I agree with Jerry that this SPIP and continuous
processing are orthogonal. This SPIP itself would be a great improvement
and impact most Structured Streaming users.

Best Regards,
Shixiong


On Wed, Nov 30, 2022 at 6:57 AM Mridul Muralidharan 
wrote:

>
> Thanks for all the clarifications and details Jerry, Jungtaek :-)
> This looks like an exciting improvement to Structured Streaming - looking
> forward to it becoming part of Apache Spark !
>
> Regards,
> Mridul
>
>
> On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng 
> wrote:
>
>> Hi all,
>>
>> I will add my two cents.  Improving the Microbatch execution engine does
>> not prevent us from working/improving on the continuous execution engine in
>> the future.  These are orthogonal issues.  This new mode I am proposing in
>> the microbatch execution engine intends to lower latency of this execution
>> engine that most people use today.  We can view it as an incremental
>> improvement on the existing engine. I see the continuous execution engine
>> as a partially completed re-write of spark streaming and may serve as the
>> "future" engine powering Spark Streaming.   Improving the "current" engine
>> does not mean we cannot work on a "future" engine.  These two are not
>> mutually exclusive. I would like to focus the discussion on the merits of
>> this feature in regards to the current micro-batch execution engine and not
>> a discussion on the future of continuous execution engine.
>>
>> Best,
>>
>> Jerry
>>
>>
>> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi Mridul,
>>>
>>> I'd like to make clear to avoid any misunderstanding - the decision was
>>> not led by me. (I'm just a one of engineers in the team. Not even TL.) As
>>> you see the direction, there was an internal consensus to not revisit the
>>> continuous mode. There are various reasons, which I think we know already.
>>> You seem to remember I have raised concerns about continuous mode, but have
>>> you indicated that it was even over 2 years ago? I still see no traction
>>> around the project. The main reason I abandoned the discussion was due to
>>> promising effort on integrating push based shuffle into continuous mode to
>>> achieve shuffle, but no effort has been made so far.
>>>
>>> The goal of this SPIP is to have an alternative approach dealing with
>>> same workload, given that we no longer have confidence of success of
>>> continuous mode. But I also want to make clear that deprecating and
>>> eventually retiring continuous mode is not a goal of this project. If that
>>> happens eventually, that would be a side-effect. Someone may have concerns
>>> that we have two different projects aiming for similar thing, but I'd
>>> rather see both projects having competition. If anyone willing to improve
>>> continuous mode can start making the effort right now. This SPIP does not
>>> block it.
>>>
>>>
>>> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan 
>>> wrote:
>>>

 Hi Jungtaek,

   Given the goal of the SPIP is reducing latency for stateless apps,
 and should reasonably fit continuous mode design goals, it feels odd to not
 support it fin the proposal.

 I know you have raised concerns about continuous mode in past as well
 in dev@ list, and we are further ignoring it in this proposal (and
 possibly other enhancements in past few releases).

 Do you want to revisit the discussion to support it and propose a vote
 on that ? And move it to deprecated ?

 I am much more comfortable not supporting this SPIP for CM if it was
 deprecated.

 Thoughts ?

 Regards,
 Mridul




 On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng 
 wrote:

> Jungtaek,
>
> Thanks for taking up the role to shepard this SPIP!  Thank you for
> also chiming in on your thoughts concerning the continuous mode!
>
> Best,
>
> Jerry
>
> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> Just FYI, I'm shepherding this SPIP project.
>>
>> I think the major meta question would be, "why don't we spend
>> effort on continuous mode rather than initiating another feature aiming 
>> for
>> the same workload?". Jerry already updated the doc to answer the 
>> question,
>> but I can also share my thoughts about it.
>>
>> I feel like the current "continuous mode" is a niche solution. (It's
>> not to blame. If you have to deal with such workload but can't rewrite 
>> the
>> underlying engine from scratch, then there are really few options.)
>> Since the implementation went with a workaround to implement which
>> the architecture does not support natively e.g. distributed snapshot, it
>> gets quite tricky on maintaining and expanding the project. It also
>> requires 3rd parties to implement a separate source and sink
>

Re: Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Dmitry
Hello,
Does any long-term statistics about number of developers who moved to
mastodon and activity use exists?

I believe the most devs are still using Twitter.


чт, 1 дек. 2022 г., 01:35 Holden Karau :

> Do we want to start syndicating Apache Spark Twitter to a Mastodon
> instance. It seems like a lot of software dev folks are moving over there
> and it would be good to reach our users where they are.
>
> Any objections / concerns? Any thoughts on which server we should pick if
> we do this?
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Holden Karau
I agree that there is probably a majority still on twitter, but it would be
a syndication (e.g. we'd keep both).

As to the # of devs it's hard to say since:
1) It's a federated service
2) Figuring out if an account is a dev or not is hard

But, for example,

There seems to be roughly an aggregate 6 million users (
https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time ),
which seems to be about only ~1% of Twitters size.

Nova's (large K8s focused I believe) has ~29k, tech.lgbt has ~6k, The BSD
mastodon has ~1k ( https://bsd.network/about )

It's hard to say, but I've noticed a larger number of my tech affiliated
friends moving to Mastodon (personally I now do both).

On Wed, Nov 30, 2022 at 3:17 PM Dmitry  wrote:

> Hello,
> Does any long-term statistics about number of developers who moved to
> mastodon and activity use exists?
>
> I believe the most devs are still using Twitter.
>
>
> чт, 1 дек. 2022 г., 01:35 Holden Karau :
>
>> Do we want to start syndicating Apache Spark Twitter to a Mastodon
>> instance. It seems like a lot of software dev folks are moving over there
>> and it would be good to reach our users where they are.
>>
>> Any objections / concerns? Any thoughts on which server we should pick if
>> we do this?
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Dmitry
My personal opinion, one of the most features of Twiiter that it is not
federated and is good platform for annonces and so on. So it means "it
would be good to reach our users where they are" means stay in twitter(most
companies who use Spark/Databricks are in Twitter)
For Federated  features, I think Slack would be a better platform, a lot of
Apache Big data projects have slack for federated features

чт, 1 дек. 2022 г., 02:33 Holden Karau :

> I agree that there is probably a majority still on twitter, but it would
> be a syndication (e.g. we'd keep both).
>
> As to the # of devs it's hard to say since:
> 1) It's a federated service
> 2) Figuring out if an account is a dev or not is hard
>
> But, for example,
>
> There seems to be roughly an aggregate 6 million users (
> https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time ),
> which seems to be about only ~1% of Twitters size.
>
> Nova's (large K8s focused I believe) has ~29k, tech.lgbt has ~6k, The BSD
> mastodon has ~1k ( https://bsd.network/about )
>
> It's hard to say, but I've noticed a larger number of my tech affiliated
> friends moving to Mastodon (personally I now do both).
>
> On Wed, Nov 30, 2022 at 3:17 PM Dmitry  wrote:
>
>> Hello,
>> Does any long-term statistics about number of developers who moved to
>> mastodon and activity use exists?
>>
>> I believe the most devs are still using Twitter.
>>
>>
>> чт, 1 дек. 2022 г., 01:35 Holden Karau :
>>
>>> Do we want to start syndicating Apache Spark Twitter to a Mastodon
>>> instance. It seems like a lot of software dev folks are moving over there
>>> and it would be good to reach our users where they are.
>>>
>>> Any objections / concerns? Any thoughts on which server we should pick
>>> if we do this?
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Hyukjin Kwon
+1

On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu  wrote:

> +1
>
> This is exciting. I agree with Jerry that this SPIP and continuous
> processing are orthogonal. This SPIP itself would be a great improvement
> and impact most Structured Streaming users.
>
> Best Regards,
> Shixiong
>
>
> On Wed, Nov 30, 2022 at 6:57 AM Mridul Muralidharan 
> wrote:
>
>>
>> Thanks for all the clarifications and details Jerry, Jungtaek :-)
>> This looks like an exciting improvement to Structured Streaming - looking
>> forward to it becoming part of Apache Spark !
>>
>> Regards,
>> Mridul
>>
>>
>> On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng 
>> wrote:
>>
>>> Hi all,
>>>
>>> I will add my two cents.  Improving the Microbatch execution engine does
>>> not prevent us from working/improving on the continuous execution engine in
>>> the future.  These are orthogonal issues.  This new mode I am proposing in
>>> the microbatch execution engine intends to lower latency of this execution
>>> engine that most people use today.  We can view it as an incremental
>>> improvement on the existing engine. I see the continuous execution engine
>>> as a partially completed re-write of spark streaming and may serve as the
>>> "future" engine powering Spark Streaming.   Improving the "current" engine
>>> does not mean we cannot work on a "future" engine.  These two are not
>>> mutually exclusive. I would like to focus the discussion on the merits of
>>> this feature in regards to the current micro-batch execution engine and not
>>> a discussion on the future of continuous execution engine.
>>>
>>> Best,
>>>
>>> Jerry
>>>
>>>
>>> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi Mridul,

 I'd like to make clear to avoid any misunderstanding - the decision was
 not led by me. (I'm just a one of engineers in the team. Not even TL.) As
 you see the direction, there was an internal consensus to not revisit the
 continuous mode. There are various reasons, which I think we know already.
 You seem to remember I have raised concerns about continuous mode, but have
 you indicated that it was even over 2 years ago? I still see no traction
 around the project. The main reason I abandoned the discussion was due to
 promising effort on integrating push based shuffle into continuous mode to
 achieve shuffle, but no effort has been made so far.

 The goal of this SPIP is to have an alternative approach dealing with
 same workload, given that we no longer have confidence of success of
 continuous mode. But I also want to make clear that deprecating and
 eventually retiring continuous mode is not a goal of this project. If that
 happens eventually, that would be a side-effect. Someone may have concerns
 that we have two different projects aiming for similar thing, but I'd
 rather see both projects having competition. If anyone willing to improve
 continuous mode can start making the effort right now. This SPIP does not
 block it.


 On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan 
 wrote:

>
> Hi Jungtaek,
>
>   Given the goal of the SPIP is reducing latency for stateless apps,
> and should reasonably fit continuous mode design goals, it feels odd to 
> not
> support it fin the proposal.
>
> I know you have raised concerns about continuous mode in past as well
> in dev@ list, and we are further ignoring it in this proposal (and
> possibly other enhancements in past few releases).
>
> Do you want to revisit the discussion to support it and propose a vote
> on that ? And move it to deprecated ?
>
> I am much more comfortable not supporting this SPIP for CM if it was
> deprecated.
>
> Thoughts ?
>
> Regards,
> Mridul
>
>
>
>
> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng <
> jerry.boyang.p...@gmail.com> wrote:
>
>> Jungtaek,
>>
>> Thanks for taking up the role to shepard this SPIP!  Thank you for
>> also chiming in on your thoughts concerning the continuous mode!
>>
>> Best,
>>
>> Jerry
>>
>> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Just FYI, I'm shepherding this SPIP project.
>>>
>>> I think the major meta question would be, "why don't we spend
>>> effort on continuous mode rather than initiating another feature aiming 
>>> for
>>> the same workload?". Jerry already updated the doc to answer the 
>>> question,
>>> but I can also share my thoughts about it.
>>>
>>> I feel like the current "continuous mode" is a niche solution. (It's
>>> not to blame. If you have to deal with such workload but can't rewrite 
>>> the
>>> underlying engine from scratch, then there are really few options.)
>>> Since the implementation went with a workaround to implement wh

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Wenchen Fan
+1 to improve the widely used micro-batch mode first.

On Thu, Dec 1, 2022 at 8:49 AM Hyukjin Kwon  wrote:

> +1
>
> On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu  wrote:
>
>> +1
>>
>> This is exciting. I agree with Jerry that this SPIP and continuous
>> processing are orthogonal. This SPIP itself would be a great improvement
>> and impact most Structured Streaming users.
>>
>> Best Regards,
>> Shixiong
>>
>>
>> On Wed, Nov 30, 2022 at 6:57 AM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Thanks for all the clarifications and details Jerry, Jungtaek :-)
>>> This looks like an exciting improvement to Structured Streaming -
>>> looking forward to it becoming part of Apache Spark !
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng 
>>> wrote:
>>>
 Hi all,

 I will add my two cents.  Improving the Microbatch execution engine
 does not prevent us from working/improving on the continuous execution
 engine in the future.  These are orthogonal issues.  This new mode I am
 proposing in the microbatch execution engine intends to lower latency of
 this execution engine that most people use today.  We can view it as an
 incremental improvement on the existing engine. I see the continuous
 execution engine as a partially completed re-write of spark streaming and
 may serve as the "future" engine powering Spark Streaming.   Improving the
 "current" engine does not mean we cannot work on a "future" engine.  These
 two are not mutually exclusive. I would like to focus the discussion on the
 merits of this feature in regards to the current micro-batch execution
 engine and not a discussion on the future of continuous execution engine.

 Best,

 Jerry


 On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Hi Mridul,
>
> I'd like to make clear to avoid any misunderstanding - the decision
> was not led by me. (I'm just a one of engineers in the team. Not even TL.)
> As you see the direction, there was an internal consensus to not revisit
> the continuous mode. There are various reasons, which I think we know
> already. You seem to remember I have raised concerns about continuous 
> mode,
> but have you indicated that it was even over 2 years ago? I still see no
> traction around the project. The main reason I abandoned the discussion 
> was
> due to promising effort on integrating push based shuffle into continuous
> mode to achieve shuffle, but no effort has been made so far.
>
> The goal of this SPIP is to have an alternative approach dealing with
> same workload, given that we no longer have confidence of success of
> continuous mode. But I also want to make clear that deprecating and
> eventually retiring continuous mode is not a goal of this project. If that
> happens eventually, that would be a side-effect. Someone may have concerns
> that we have two different projects aiming for similar thing, but I'd
> rather see both projects having competition. If anyone willing to improve
> continuous mode can start making the effort right now. This SPIP does not
> block it.
>
>
> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan 
> wrote:
>
>>
>> Hi Jungtaek,
>>
>>   Given the goal of the SPIP is reducing latency for stateless apps,
>> and should reasonably fit continuous mode design goals, it feels odd to 
>> not
>> support it fin the proposal.
>>
>> I know you have raised concerns about continuous mode in past as well
>> in dev@ list, and we are further ignoring it in this proposal (and
>> possibly other enhancements in past few releases).
>>
>> Do you want to revisit the discussion to support it and propose a
>> vote on that ? And move it to deprecated ?
>>
>> I am much more comfortable not supporting this SPIP for CM if it was
>> deprecated.
>>
>> Thoughts ?
>>
>> Regards,
>> Mridul
>>
>>
>>
>>
>> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng <
>> jerry.boyang.p...@gmail.com> wrote:
>>
>>> Jungtaek,
>>>
>>> Thanks for taking up the role to shepard this SPIP!  Thank you for
>>> also chiming in on your thoughts concerning the continuous mode!
>>>
>>> Best,
>>>
>>> Jerry
>>>
>>> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Just FYI, I'm shepherding this SPIP project.

 I think the major meta question would be, "why don't we spend
 effort on continuous mode rather than initiating another feature 
 aiming for
 the same workload?". Jerry already updated the doc to answer the 
 question,
 but I can also share my thoughts about it.

 I feel like the current "continuous mode" is a niche so

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
Thanks all for the support! Great to see we drive the discussion for
Structured Streaming and have sufficient support.

We would like to move forward with the vote thread. Please also participate
in the vote. Thanks again!

On Thu, Dec 1, 2022 at 10:04 AM Wenchen Fan  wrote:

> +1 to improve the widely used micro-batch mode first.
>
> On Thu, Dec 1, 2022 at 8:49 AM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu  wrote:
>>
>>> +1
>>>
>>> This is exciting. I agree with Jerry that this SPIP and continuous
>>> processing are orthogonal. This SPIP itself would be a great improvement
>>> and impact most Structured Streaming users.
>>>
>>> Best Regards,
>>> Shixiong
>>>
>>>
>>> On Wed, Nov 30, 2022 at 6:57 AM Mridul Muralidharan 
>>> wrote:
>>>

 Thanks for all the clarifications and details Jerry, Jungtaek :-)
 This looks like an exciting improvement to Structured Streaming -
 looking forward to it becoming part of Apache Spark !

 Regards,
 Mridul


 On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng 
 wrote:

> Hi all,
>
> I will add my two cents.  Improving the Microbatch execution engine
> does not prevent us from working/improving on the continuous execution
> engine in the future.  These are orthogonal issues.  This new mode I am
> proposing in the microbatch execution engine intends to lower latency of
> this execution engine that most people use today.  We can view it as an
> incremental improvement on the existing engine. I see the continuous
> execution engine as a partially completed re-write of spark streaming and
> may serve as the "future" engine powering Spark Streaming.   Improving the
> "current" engine does not mean we cannot work on a "future" engine.  These
> two are not mutually exclusive. I would like to focus the discussion on 
> the
> merits of this feature in regards to the current micro-batch execution
> engine and not a discussion on the future of continuous execution engine.
>
> Best,
>
> Jerry
>
>
> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> Hi Mridul,
>>
>> I'd like to make clear to avoid any misunderstanding - the decision
>> was not led by me. (I'm just a one of engineers in the team. Not even 
>> TL.)
>> As you see the direction, there was an internal consensus to not revisit
>> the continuous mode. There are various reasons, which I think we know
>> already. You seem to remember I have raised concerns about continuous 
>> mode,
>> but have you indicated that it was even over 2 years ago? I still see no
>> traction around the project. The main reason I abandoned the discussion 
>> was
>> due to promising effort on integrating push based shuffle into continuous
>> mode to achieve shuffle, but no effort has been made so far.
>>
>> The goal of this SPIP is to have an alternative approach dealing with
>> same workload, given that we no longer have confidence of success of
>> continuous mode. But I also want to make clear that deprecating and
>> eventually retiring continuous mode is not a goal of this project. If 
>> that
>> happens eventually, that would be a side-effect. Someone may have 
>> concerns
>> that we have two different projects aiming for similar thing, but I'd
>> rather see both projects having competition. If anyone willing to improve
>> continuous mode can start making the effort right now. This SPIP does not
>> block it.
>>
>>
>> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Hi Jungtaek,
>>>
>>>   Given the goal of the SPIP is reducing latency for stateless apps,
>>> and should reasonably fit continuous mode design goals, it feels odd to 
>>> not
>>> support it fin the proposal.
>>>
>>> I know you have raised concerns about continuous mode in past as
>>> well in dev@ list, and we are further ignoring it in this proposal
>>> (and possibly other enhancements in past few releases).
>>>
>>> Do you want to revisit the discussion to support it and propose a
>>> vote on that ? And move it to deprecated ?
>>>
>>> I am much more comfortable not supporting this SPIP for CM if it was
>>> deprecated.
>>>
>>> Thoughts ?
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>>
>>>
>>> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng <
>>> jerry.boyang.p...@gmail.com> wrote:
>>>
 Jungtaek,

 Thanks for taking up the role to shepard this SPIP!  Thank you for
 also chiming in on your thoughts concerning the continuous mode!

 Best,

 Jerry

 On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Jus

[VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
Hi all,

I'd like to start the vote for SPIP: Asynchronous Offset Management in
Structured Streaming.

The high level summary of the SPIP is that we propose a couple of
improvements on offset management in microbatch execution to lower down
processing latency, which would help for certain types of workloads.

References:

   - JIRA ticket 
   - SPIP doc
   

   - Discussion thread
   

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thanks!
Jungtaek Lim (HeartSaVioR)


Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Jungtaek Lim
Starting with +1 from me.

On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim 
wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Asynchronous Offset Management in
> Structured Streaming.
>
> The high level summary of the SPIP is that we propose a couple of
> improvements on offset management in microbatch execution to lower down
> processing latency, which would help for certain types of workloads.
>
> References:
>
>- JIRA ticket 
>- SPIP doc
>
> 
>- Discussion thread
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
> Jungtaek Lim (HeartSaVioR)
>


Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Xingbo Jiang
+1

On Wed, Nov 30, 2022 at 5:59 PM Jungtaek Lim 
wrote:

> Starting with +1 from me.
>
> On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim 
> wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Asynchronous Offset Management in
>> Structured Streaming.
>>
>> The high level summary of the SPIP is that we propose a couple of
>> improvements on offset management in microbatch execution to lower down
>> processing latency, which would help for certain types of workloads.
>>
>> References:
>>
>>- JIRA ticket 
>>- SPIP doc
>>
>> 
>>- Discussion thread
>>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks!
>> Jungtaek Lim (HeartSaVioR)
>>
>


Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Wed, Nov 30, 2022 at 8:55 PM Xingbo Jiang  wrote:

> +1
>
> On Wed, Nov 30, 2022 at 5:59 PM Jungtaek Lim 
> wrote:
>
>> Starting with +1 from me.
>>
>> On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Asynchronous Offset Management in
>>> Structured Streaming.
>>>
>>> The high level summary of the SPIP is that we propose a couple of
>>> improvements on offset management in microbatch execution to lower down
>>> processing latency, which would help for certain types of workloads.
>>>
>>> References:
>>>
>>>- JIRA ticket 
>>>- SPIP doc
>>>
>>> 
>>>- Discussion thread
>>>
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks!
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>


Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Hyukjin Kwon
+1

On Thu, 1 Dec 2022 at 12:39, Mridul Muralidharan  wrote:

>
> +1
>
> Regards,
> Mridul
>
> On Wed, Nov 30, 2022 at 8:55 PM Xingbo Jiang 
> wrote:
>
>> +1
>>
>> On Wed, Nov 30, 2022 at 5:59 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Starting with +1 from me.
>>>
>>> On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi all,

 I'd like to start the vote for SPIP: Asynchronous Offset Management in
 Structured Streaming.

 The high level summary of the SPIP is that we propose a couple of
 improvements on offset management in microbatch execution to lower down
 processing latency, which would help for certain types of workloads.

 References:

- JIRA ticket 
- SPIP doc

 
- Discussion thread


 Please vote on the SPIP for the next 72 hours:

 [ ] +1: Accept the proposal as an official SPIP
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …

 Thanks!
 Jungtaek Lim (HeartSaVioR)

>>>


Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Shixiong Zhu
+1


On Wed, Nov 30, 2022 at 8:04 PM Hyukjin Kwon  wrote:

> +1
>
> On Thu, 1 Dec 2022 at 12:39, Mridul Muralidharan  wrote:
>
>>
>> +1
>>
>> Regards,
>> Mridul
>>
>> On Wed, Nov 30, 2022 at 8:55 PM Xingbo Jiang 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Nov 30, 2022 at 5:59 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Starting with +1 from me.

 On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Asynchronous Offset Management in
> Structured Streaming.
>
> The high level summary of the SPIP is that we propose a couple of
> improvements on offset management in microbatch execution to lower down
> processing latency, which would help for certain types of workloads.
>
> References:
>
>- JIRA ticket 
>- SPIP doc
>
> 
>- Discussion thread
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
> Jungtaek Lim (HeartSaVioR)
>