Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Russell Jurney
;>>> don’t have a clear view on whether it works with Spark 4 or if it needs
>>>>> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
>>>>> and mentored the Spark feature for it.
>>>>>
>>>>> Can someone tell me what is involved? Point me at a ticket?
>>>>>
>>>>> Russell
>>>>>
>>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund 
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>> We rely on GraphX for an important component of our product. And we
>>>>>> really want it to stay a typed interface. Please keep GraphX.
>>>>>>
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Holden Karau 
>>>>>> *Date: *Sunday, October 6, 2024 at 06:22
>>>>>> *To: *Ángel 
>>>>>> *Cc: *Russell Jurney , Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com>, Spark dev list ,
>>>>>> user @spark 
>>>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
>>>>>> interested in GraphX OR leave it as is?
>>>>>>
>>>>>> So are there companies using it? And are they willing to contribute
>>>>>> to maintaining it?
>>>>>>
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>
>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>>>>>>
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>>>>>>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>> Pronouns: she/her
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel 
>>>>>> wrote:
>>>>>>
>>>>>> That would definitely affect companies using GraphX, but at least
>>>>>> they’d have the choice to migrate their code.
>>>>>>
>>>>>> I think that’s probably the way to go.
>>>>>>
>>>>>>
>>>>>>
>>>>>> El dom, 6 oct 2024 a las 6:09, Holden Karau ()
>>>>>> escribió:
>>>>>>
>>>>>> So removing GraphX from Spark would not prevent GraphFrames from
>>>>>> continuing, they could pick up the GraphX source and incorporate it into
>>>>>> their project.
>>>>>>
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>
>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>>>>>>
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1>
>>>>>>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>> Pronouns: she/her
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney <
>>>>>> russell.jur...@gmail.com> wrote:
>>>>>>
>>>>>> A lot of people like me use GraphFrames for its connected components
>>>>>> implementat

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Holden Karau
That’s awesome!

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Mon, Oct 7, 2024 at 5:42 PM Russell Jurney 
wrote:

> I’ll organize a hackathon. A friend wants to finish the implementation of
> Lucian modularity for GraphFrames. I’ll fix some GraphX bugs at it.
>
> I did just blog all about the motif matching in GraphFrames:
>
> https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5
>
> Russ
>
> On Mon, Oct 7, 2024 at 5:38 PM Holden Karau 
> wrote:
>
>> So this discuss thread and the vote thread to deprecate to leave the
>> option of removing it during 4.X are probably the highest profile it’s been
>> in years.
>>
>> In the past for parts of Spark I’ve cared about I’ve organized virtual
>> meetings to co-ordinate work — if your connected with some of the
>> Spark+Graph community reaching out to find others and organizing a meeting
>> could be a way to raise the profile a bit? Maybe organize a virtual
>> hackathon (I’m meaning to try this for some other things so happy to share
>> what I learn from doing that)?
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney 
>> wrote:
>>
>>> I’ll look for a bug to fix. If GraphX is outside of Spark, Spark would
>>> tend to break GraphFrames and it will be burdensome on an external project
>>> to keep up. Graph computing on Spark is implrtant to a lot of people, is
>>> there a way to raise visibility here?
>>>
>>> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau 
>>> wrote:
>>>
>>>> There are no specific tickets associated with the lack of maintaince or
>>>> this as the component has not been maintained for a sufficiently long time.
>>>> If your interested in taking it on that’s wonderful, probably starting with
>>>> fixing some bugs could be a great place to start and figure out if it’s
>>>> something you want to do long term.
>>>>
>>>> I would recommend making a first bug fix in a actively maintained area
>>>> of Spark to get to
>>>> Know some reviewers since there is not anyone tracking the GraphX PRs.
>>>>
>>>> As a note I don’t think GraphX is required for Graph Frames long term,
>>>> so another option would be to talk to the GraphFrames folks and move the
>>>> GraphX code over to it.
>>>>
>>>> Ideally we’d have someone willing to act as a mentor or guide but so
>>>> far we have no volunteers (especially no one familiar with the graph X
>>>> code).
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> Pronouns: she/her
>>>>
>>>>
>>>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney 
>>>> wrote:
>>>>
>>>>> I volunteer to maintain GraphX to keep GraphFrames a viable project. I
>>>>> don’t have a clear view on whether it works with Spark 4 or if it needs
>>>>> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
>>>>> and mentored the Spark feature for it.
>>>>>
>>>>> Can someone tell me what is involved? Point me at a ticket?
>>>>>
>>>>> Russell
>>>>>
>>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund 
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>> We rely on GraphX for an important component of our product. And we
>>>>>> really want it to stay a typed interface. Please keep GraphX.
>>>>>>
>&

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Russell Jurney
I’ll organize a hackathon. A friend wants to finish the implementation of
Lucian modularity for GraphFrames. I’ll fix some GraphX bugs at it.

I did just blog all about the motif matching in GraphFrames:
https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5

Russ

On Mon, Oct 7, 2024 at 5:38 PM Holden Karau  wrote:

> So this discuss thread and the vote thread to deprecate to leave the
> option of removing it during 4.X are probably the highest profile it’s been
> in years.
>
> In the past for parts of Spark I’ve cared about I’ve organized virtual
> meetings to co-ordinate work — if your connected with some of the
> Spark+Graph community reaching out to find others and organizing a meeting
> could be a way to raise the profile a bit? Maybe organize a virtual
> hackathon (I’m meaning to try this for some other things so happy to share
> what I learn from doing that)?
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney 
> wrote:
>
>> I’ll look for a bug to fix. If GraphX is outside of Spark, Spark would
>> tend to break GraphFrames and it will be burdensome on an external project
>> to keep up. Graph computing on Spark is implrtant to a lot of people, is
>> there a way to raise visibility here?
>>
>> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau 
>> wrote:
>>
>>> There are no specific tickets associated with the lack of maintaince or
>>> this as the component has not been maintained for a sufficiently long time.
>>> If your interested in taking it on that’s wonderful, probably starting with
>>> fixing some bugs could be a great place to start and figure out if it’s
>>> something you want to do long term.
>>>
>>> I would recommend making a first bug fix in a actively maintained area
>>> of Spark to get to
>>> Know some reviewers since there is not anyone tracking the GraphX PRs.
>>>
>>> As a note I don’t think GraphX is required for Graph Frames long term,
>>> so another option would be to talk to the GraphFrames folks and move the
>>> GraphX code over to it.
>>>
>>> Ideally we’d have someone willing to act as a mentor or guide but so far
>>> we have no volunteers (especially no one familiar with the graph X code).
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>>
>>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney 
>>> wrote:
>>>
>>>> I volunteer to maintain GraphX to keep GraphFrames a viable project. I
>>>> don’t have a clear view on whether it works with Spark 4 or if it needs
>>>> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
>>>> and mentored the Spark feature for it.
>>>>
>>>> Can someone tell me what is involved? Point me at a ticket?
>>>>
>>>> Russell
>>>>
>>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund 
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> We rely on GraphX for an important component of our product. And we
>>>>> really want it to stay a typed interface. Please keep GraphX.
>>>>>
>>>>>
>>>>> Erik
>>>>>
>>>>>
>>>>>
>>>>> *From: *Holden Karau 
>>>>> *Date: *Sunday, October 6, 2024 at 06:22
>>>>> *To: *Ángel 
>>>>> *Cc: *Russell Jurney , Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com>, Spark dev list ,
>>>>> user @spark 
>>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
>>>>> interested in GraphX OR leave it as is?
>>>>>
>>>>> So are there companies using it? And are they willing to contribute to
>>>>> maintaining it?
>>>>>
>>>>> Twitter: https://twitter.com/holdenkarau
>>>&g

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Holden Karau
So this discuss thread and the vote thread to deprecate to leave the option
of removing it during 4.X are probably the highest profile it’s been in
years.

In the past for parts of Spark I’ve cared about I’ve organized virtual
meetings to co-ordinate work — if your connected with some of the
Spark+Graph community reaching out to find others and organizing a meeting
could be a way to raise the profile a bit? Maybe organize a virtual
hackathon (I’m meaning to try this for some other things so happy to share
what I learn from doing that)?

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Mon, Oct 7, 2024 at 5:02 PM Russell Jurney 
wrote:

> I’ll look for a bug to fix. If GraphX is outside of Spark, Spark would
> tend to break GraphFrames and it will be burdensome on an external project
> to keep up. Graph computing on Spark is implrtant to a lot of people, is
> there a way to raise visibility here?
>
> On Mon, Oct 7, 2024 at 4:24 PM Holden Karau 
> wrote:
>
>> There are no specific tickets associated with the lack of maintaince or
>> this as the component has not been maintained for a sufficiently long time.
>> If your interested in taking it on that’s wonderful, probably starting with
>> fixing some bugs could be a great place to start and figure out if it’s
>> something you want to do long term.
>>
>> I would recommend making a first bug fix in a actively maintained area of
>> Spark to get to
>> Know some reviewers since there is not anyone tracking the GraphX PRs.
>>
>> As a note I don’t think GraphX is required for Graph Frames long term, so
>> another option would be to talk to the GraphFrames folks and move the
>> GraphX code over to it.
>>
>> Ideally we’d have someone willing to act as a mentor or guide but so far
>> we have no volunteers (especially no one familiar with the graph X code).
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney 
>> wrote:
>>
>>> I volunteer to maintain GraphX to keep GraphFrames a viable project. I
>>> don’t have a clear view on whether it works with Spark 4 or if it needs
>>> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
>>> and mentored the Spark feature for it.
>>>
>>> Can someone tell me what is involved? Point me at a ticket?
>>>
>>> Russell
>>>
>>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund 
>>> wrote:
>>>
>>>> Hello,
>>>> We rely on GraphX for an important component of our product. And we
>>>> really want it to stay a typed interface. Please keep GraphX.
>>>>
>>>>
>>>> Erik
>>>>
>>>>
>>>>
>>>> *From: *Holden Karau 
>>>> *Date: *Sunday, October 6, 2024 at 06:22
>>>> *To: *Ángel 
>>>> *Cc: *Russell Jurney , Mich Talebzadeh <
>>>> mich.talebza...@gmail.com>, Spark dev list ,
>>>> user @spark 
>>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
>>>> interested in GraphX OR leave it as is?
>>>>
>>>> So are there companies using it? And are they willing to contribute to
>>>> maintaining it?
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>>>>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>>>>
>

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Russell Jurney
I’ll look for a bug to fix. If GraphX is outside of Spark, Spark would tend
to break GraphFrames and it will be burdensome on an external project to
keep up. Graph computing on Spark is implrtant to a lot of people, is there
a way to raise visibility here?

On Mon, Oct 7, 2024 at 4:24 PM Holden Karau  wrote:

> There are no specific tickets associated with the lack of maintaince or
> this as the component has not been maintained for a sufficiently long time.
> If your interested in taking it on that’s wonderful, probably starting with
> fixing some bugs could be a great place to start and figure out if it’s
> something you want to do long term.
>
> I would recommend making a first bug fix in a actively maintained area of
> Spark to get to
> Know some reviewers since there is not anyone tracking the GraphX PRs.
>
> As a note I don’t think GraphX is required for Graph Frames long term, so
> another option would be to talk to the GraphFrames folks and move the
> GraphX code over to it.
>
> Ideally we’d have someone willing to act as a mentor or guide but so far
> we have no volunteers (especially no one familiar with the graph X code).
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney 
> wrote:
>
>> I volunteer to maintain GraphX to keep GraphFrames a viable project. I
>> don’t have a clear view on whether it works with Spark 4 or if it needs
>> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
>> and mentored the Spark feature for it.
>>
>> Can someone tell me what is involved? Point me at a ticket?
>>
>> Russell
>>
>> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund 
>> wrote:
>>
>>> Hello,
>>> We rely on GraphX for an important component of our product. And we
>>> really want it to stay a typed interface. Please keep GraphX.
>>>
>>>
>>> Erik
>>>
>>>
>>>
>>> *From: *Holden Karau 
>>> *Date: *Sunday, October 6, 2024 at 06:22
>>> *To: *Ángel 
>>> *Cc: *Russell Jurney , Mich Talebzadeh <
>>> mich.talebza...@gmail.com>, Spark dev list , user
>>> @spark 
>>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
>>> interested in GraphX OR leave it as is?
>>>
>>> So are there companies using it? And are they willing to contribute to
>>> maintaining it?
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>>>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>>>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>> Pronouns: she/her
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Oct 5, 2024 at 9:17 PM Ángel 
>>> wrote:
>>>
>>> That would definitely affect companies using GraphX, but at least they’d
>>> have the choice to migrate their code.
>>>
>>> I think that’s probably the way to go.
>>>
>>>
>>>
>>> El dom, 6 oct 2024 a las 6:09, Holden Karau ()
>>> escribió:
>>>
>>> So removing GraphX from Spark would not prevent GraphFrames from
>>> continuing, they could pick up the GraphX source and incorporate it into
>>> their project.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>>>
>>> Books (Learning Spark, High Performance Spa

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-07 Thread Holden Karau
The VOTE is now closed and I believe it passes*, albeit with notable
dissent.

The votes are:

+1:
Mich Talebzadeh
Dongjoon Hyun
LC Hsieh
Sean Owen
Jungtaek Lim
Mridul Muralidharan
Yang Jie
beliefer
Wenchen Fan
Denny Lee
Hyukjin Kwon
Herman van Hovell


0s:

-1:
Mark Hamstra
Ángel


While not on the VOTE thread it’s self there are ~two user on the user list
who asked that it not be deprecated and one on the DISCUSS thread who
offered to take on maintaining it.

Given the dissent I think that we should update the docs to include a
message:

GraphX is not currently maintained and if maintainers are not found may be
removed in future minor versions (deprecated). If you are interested in
helping maintain GraphX please reach out on the developer mailing list. We
recommend exploring GraphFrames or openCypher on Spark as potential
migration targets.


I hope this course of action seems like a reasonable balancing of folks
views.

While it’s possible that we un-deprecate GraphX I think having the reality
of the current state of GraphX communicated as we go into 4.0 gives us much
needed freedom for any future decisions we make here.

If any committer is willing to step-forward to be a mentor I think the
chance of growing new maintainers here is much higher.

* I _believe_ that since it is not strictly a code change the -1 is not a
veto. If there is disagreement here happy to discuss that on private@.

On a personal note I would like us to see this not as a failing on part of
the project but rather a healthy activity allowing us set user expectations
more clearly.

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Fri, Oct 4, 2024 at 2:57 PM Mark Hamstra  wrote:

> -1(*) reasoning posted in the DISCUSS thread
>
> On Mon, Sep 30, 2024 at 12:40 PM Holden Karau 
> wrote:
> >
> > I think it has been de-facto deprecated, we haven’t updated it
> meaningfully in several years. I think removing the API would be excessive
> but deprecating it would give us the flexibility to remove it in the not
> too distant future.
> >
> > That being said this is not a vote to remove GraphX, I think that
> whenever that time comes (if it does) we should have a separate vote
> >
> > This VOTE will be open for a little more than one week, ending on
> October 8th*. To vote reply with:
> > +1 Deprecate GraphX
> > 0 I’m indifferent
> > -1 Don’t deprecate GraphX because ABC
> >
> > If you have a binding vote to simplify you tallying at the end please
> mark your vote with a *.
> >
> > (*mostly because I’m going camping for my birthday)
> >
> > Twitter: https://twitter.com/holdenkarau
> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Holden Karau
There are no specific tickets associated with the lack of maintaince or
this as the component has not been maintained for a sufficiently long time.
If your interested in taking it on that’s wonderful, probably starting with
fixing some bugs could be a great place to start and figure out if it’s
something you want to do long term.

I would recommend making a first bug fix in a actively maintained area of
Spark to get to
Know some reviewers since there is not anyone tracking the GraphX PRs.

As a note I don’t think GraphX is required for Graph Frames long term, so
another option would be to talk to the GraphFrames folks and move the
GraphX code over to it.

Ideally we’d have someone willing to act as a mentor or guide but so far we
have no volunteers (especially no one familiar with the graph X code).

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Mon, Oct 7, 2024 at 3:25 PM Russell Jurney 
wrote:

> I volunteer to maintain GraphX to keep GraphFrames a viable project. I
> don’t have a clear view on whether it works with Spark 4 or if it needs
> updates? I don’t have Spark commits but I’m a committer on Apache DataFu
> and mentored the Spark feature for it.
>
> Can someone tell me what is involved? Point me at a ticket?
>
> Russell
>
> On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund 
> wrote:
>
>> Hello,
>> We rely on GraphX for an important component of our product. And we
>> really want it to stay a typed interface. Please keep GraphX.
>>
>>
>> Erik
>>
>>
>>
>> *From: *Holden Karau 
>> *Date: *Sunday, October 6, 2024 at 06:22
>> *To: *Ángel 
>> *Cc: *Russell Jurney , Mich Talebzadeh <
>> mich.talebza...@gmail.com>, Spark dev list , user
>> @spark 
>> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
>> interested in GraphX OR leave it as is?
>>
>> So are there companies using it? And are they willing to contribute to
>> maintaining it?
>>
>> Twitter: https://twitter.com/holdenkarau
>>
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>> Pronouns: she/her
>>
>>
>>
>>
>>
>> On Sat, Oct 5, 2024 at 9:17 PM Ángel 
>> wrote:
>>
>> That would definitely affect companies using GraphX, but at least they’d
>> have the choice to migrate their code.
>>
>> I think that’s probably the way to go.
>>
>>
>>
>> El dom, 6 oct 2024 a las 6:09, Holden Karau ()
>> escribió:
>>
>> So removing GraphX from Spark would not prevent GraphFrames from
>> continuing, they could pick up the GraphX source and incorporate it into
>> their project.
>>
>> Twitter: https://twitter.com/holdenkarau
>>
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1>
>>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>> Pronouns: she/her
>>
>>
>>
>>
>>
>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney 
>> wrote:
>>
>> A lot of people like me use GraphFrames for its connected components
>> implementation and its motif matching feature. I am willing to work on it
>> to keep it alive. They did a 0.8.3 release not too

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Russell Jurney
I volunteer to maintain GraphX to keep GraphFrames a viable project. I
don’t have a clear view on whether it works with Spark 4 or if it needs
updates? I don’t have Spark commits but I’m a committer on Apache DataFu
and mentored the Spark feature for it.

Can someone tell me what is involved? Point me at a ticket?

Russell

On Mon, Oct 7, 2024 at 12:11 AM Erik Eklund 
wrote:

> Hello,
> We rely on GraphX for an important component of our product. And we really
> want it to stay a typed interface. Please keep GraphX.
>
>
> Erik
>
>
>
> *From: *Holden Karau 
> *Date: *Sunday, October 6, 2024 at 06:22
> *To: *Ángel 
> *Cc: *Russell Jurney , Mich Talebzadeh <
> mich.talebza...@gmail.com>, Spark dev list , user
> @spark 
> *Subject: *Re: [DISCUSS] Deprecate GraphX OR Find new maintainers
> interested in GraphX OR leave it as is?
>
> So are there companies using it? And are they willing to contribute to
> maintaining it?
>
> Twitter: https://twitter.com/holdenkarau
>
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,OT9ylxCx5xRNCToPSzu0VEvefs4uts16fTBydH2NiLHMGEwLjrEXgkhU8W-Ai6xD8VDMyWea44GBMOEecMNdapaZKZbBTrZpquOBKi6YRlqu-FVAzji6-w,,&typo=1>
>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,h0ccgHctUPRY4zAN_qZ-qdBgLDpQLtm7KaOL4u12U4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1>
>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> Pronouns: she/her
>
>
>
>
>
> On Sat, Oct 5, 2024 at 9:17 PM Ángel 
> wrote:
>
> That would definitely affect companies using GraphX, but at least they’d
> have the choice to migrate their code.
>
> I think that’s probably the way to go.
>
>
>
> El dom, 6 oct 2024 a las 6:09, Holden Karau ()
> escribió:
>
> So removing GraphX from Spark would not prevent GraphFrames from
> continuing, they could pick up the GraphX source and incorporate it into
> their project.
>
> Twitter: https://twitter.com/holdenkarau
>
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.fighthealthinsurance.com%2f%3fq%3dhk_email&c=E,1,9xMMQlY7gtmkqxT0NTmS8KMg4wOUjw0PWKM-oepAYAkE-SiM5pyXCb80AuRZYJ4zMIedVlwVMAKi_eh52Hof0LsteXx2eIslnsDBdmVeuocpILpneg,,&typo=1>
>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2famzn.to%2f2MaRAG9&c=E,1,kbGbMBRMidAYi0aqUmj949vRahpEjVzSgJv_YYtO5EteSXZy4RrMYXJU48mN2CyS5sdovsgiFAAiBLnyQ29gCCn8xbTrEJmfIhjtH7tD4N31VUoLtQ,,&typo=1>
>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> Pronouns: she/her
>
>
>
>
>
> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney 
> wrote:
>
> A lot of people like me use GraphFrames for its connected components
> implementation and its motif matching feature. I am willing to work on it
> to keep it alive. They did a 0.8.3 release not too long ago. Please keep
> GraphX alive.
>
>
>
> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh 
> wrote:
>
> I added the user list as they may have vested interest here and and
> hopefully can contribute
>
> Few suggestions:
>
>1. Data-Driven Decision Making: Return to the core metrics—analyze
>usage trends, performance benchmarks, and the actual impact on businesses
>that rely on GraphX. Objectivity can be restored by letting data speak
>louder than opinions so to speak.
>2. Broaden the Discussion: Engage more stakeholders from diverse
>backgrounds (especially spark  users) to bring in new perspectives and
>counterbalance the more vocal but potentially narrow interests of core
>maintainers or open-source contributors.
>3. Define Clear Criteria for Decision Making: Agree on a set of
>objective criteria by which the project’s future will be judged. These
>could include market demand, contribution levels, maintenance costs,
>alternative solutions, and alignment with the overall Spark ecosystem
>goals. Some have already been covered.
>4. Timely Conclusion of Discussions: Set a timeline for making a
>decision. Long, open-ended discussions tend to lose focus. Putting
>deadlines forces participants to focus on key issues and prevents endless
>debates.
>5. Borrowing from commercial settings, it is often necessary for a
>strong leadership team to step in and make the final decision after
>considering 

Re: How to run spark connect in kubernetes?

2024-10-07 Thread Steve Loughran
https://isitdns.com/

On Wed, 2 Oct 2024 at 22:45, kant kodali  wrote:

> please ignore this. it was a dns issue
>
> On Wed, Oct 2, 2024 at 11:16 AM kant kodali  wrote:
>
>> Here
>> 
>> are more details about my question that I posted in SO
>>
>> On Tue, Oct 1, 2024 at 11:32 PM kant kodali  wrote:
>>
>>> Hi All,
>>>
>>> Is it possible to run a Spark Connect server in Kubernetes while
>>> configuring it to communicate with Kubernetes as the cluster manager? If
>>> so, is there any example?
>>>
>>> Thanks
>>>
>>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-07 Thread Holden Karau
+1

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Mon, Sep 30, 2024 at 11:01 AM Holden Karau 
wrote:

> I think it has been de-facto deprecated, we haven’t updated it
> meaningfully in several years. I think removing the API would be excessive
> but deprecating it would give us the flexibility to remove it in the not
> too distant future.
>
> That being said this is not a vote to remove GraphX, I think that whenever
> that time comes (if it does) we should have a separate vote
>
> This VOTE will be open for a little more than one week, ending on October
> 8th*. To vote reply with:
> +1 Deprecate GraphX
> 0 I’m indifferent
> -1 Don’t deprecate GraphX because ABC
>
> If you have a binding vote to simplify you tallying at the end please mark
> your vote with a *.
>
> (*mostly because I’m going camping for my birthday)
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Holden Karau
So are there companies using it? And are they willing to contribute to
maintaining it?

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Sat, Oct 5, 2024 at 9:17 PM Ángel  wrote:

> That would definitely affect companies using GraphX, but at least they’d
> have the choice to migrate their code.
>
> I think that’s probably the way to go.
>
> El dom, 6 oct 2024 a las 6:09, Holden Karau ()
> escribió:
>
>> So removing GraphX from Spark would not prevent GraphFrames from
>> continuing, they could pick up the GraphX source and incorporate it into
>> their project.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney 
>> wrote:
>>
>>> A lot of people like me use GraphFrames for its connected components
>>> implementation and its motif matching feature. I am willing to work on it
>>> to keep it alive. They did a 0.8.3 release not too long ago. Please keep
>>> GraphX alive.
>>>
>>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 I added the user list as they may have vested interest here and and
 hopefully can contribute

 Few suggestions:


1. Data-Driven Decision Making: Return to the core metrics—analyze
usage trends, performance benchmarks, and the actual impact on 
 businesses
that rely on GraphX. Objectivity can be restored by letting data speak
louder than opinions so to speak.
2. Broaden the Discussion: Engage more stakeholders from diverse
backgrounds (especially spark  users) to bring in new perspectives and
counterbalance the more vocal but potentially narrow interests of core
maintainers or open-source contributors.
3. Define Clear Criteria for Decision Making: Agree on a set of
objective criteria by which the project’s future will be judged. These
could include market demand, contribution levels, maintenance costs,
alternative solutions, and alignment with the overall Spark ecosystem
goals. Some have already been covered.
4. Timely Conclusion of Discussions: Set a timeline for making a
decision. Long, open-ended discussions tend to lose focus. Putting
deadlines forces participants to focus on key issues and prevents 
 endless
debates.
5. Borrowing from commercial settings, it is often necessary for a
strong leadership team to step in and make the final decision after
considering the input. When the objectivity of discussions starts to 
 wane,
leadership needs to cut through the round discussions and steer towards
action based on business and technical realities.


 HTH

 Mich Talebzadeh,

 Architect | Data Engineer | Data Science | Financial Crime
 PhD  Imperial
 College London 
 London, United Kingdom


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* The information provided is correct to the best of my
 knowledge but of course cannot be guaranteed . It is essential to note
 that, as with any advice, quote "one test result is worth one-thousand
 expert opinions (Werner
 Von Braun
 )".


 On Sat, 5 Oct 2024 at 06:26, Ángel 
 wrote:

> I completely agree with everyone here. I don’t think the issue is
> deprecating it; to me, the problem lies in not providing a new and better
> solution for handling graphs in Spark. In the past, I used GraphX via
> GraphFrames for record linkage, and I found it both useful and effective.
> Is there any discussion about a potential replacement?
>
> I’d be willing to help maintain GraphX, though I don’t have previous
> experience with maintaining open-source projects. All I can promise is 
> good
> intentions, willingness to learn and lots of energy and passion. Is that
> enough?
>
> Btw, what's your take on this?
>
>
>-
>
>GraphX will be deprecated in favor of a n

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Ángel
That would definitely affect companies using GraphX, but at least they’d
have the choice to migrate their code.

I think that’s probably the way to go.

El dom, 6 oct 2024 a las 6:09, Holden Karau ()
escribió:

> So removing GraphX from Spark would not prevent GraphFrames from
> continuing, they could pick up the GraphX source and incorporate it into
> their project.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney 
> wrote:
>
>> A lot of people like me use GraphFrames for its connected components
>> implementation and its motif matching feature. I am willing to work on it
>> to keep it alive. They did a 0.8.3 release not too long ago. Please keep
>> GraphX alive.
>>
>> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh 
>> wrote:
>>
>>> I added the user list as they may have vested interest here and and
>>> hopefully can contribute
>>>
>>> Few suggestions:
>>>
>>>
>>>1. Data-Driven Decision Making: Return to the core metrics—analyze
>>>usage trends, performance benchmarks, and the actual impact on businesses
>>>that rely on GraphX. Objectivity can be restored by letting data speak
>>>louder than opinions so to speak.
>>>2. Broaden the Discussion: Engage more stakeholders from diverse
>>>backgrounds (especially spark  users) to bring in new perspectives and
>>>counterbalance the more vocal but potentially narrow interests of core
>>>maintainers or open-source contributors.
>>>3. Define Clear Criteria for Decision Making: Agree on a set of
>>>objective criteria by which the project’s future will be judged. These
>>>could include market demand, contribution levels, maintenance costs,
>>>alternative solutions, and alignment with the overall Spark ecosystem
>>>goals. Some have already been covered.
>>>4. Timely Conclusion of Discussions: Set a timeline for making a
>>>decision. Long, open-ended discussions tend to lose focus. Putting
>>>deadlines forces participants to focus on key issues and prevents endless
>>>debates.
>>>5. Borrowing from commercial settings, it is often necessary for a
>>>strong leadership team to step in and make the final decision after
>>>considering the input. When the objectivity of discussions starts to 
>>> wane,
>>>leadership needs to cut through the round discussions and steer towards
>>>action based on business and technical realities.
>>>
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>>
>>> Architect | Data Engineer | Data Science | Financial Crime
>>> PhD  Imperial
>>> College London 
>>> London, United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> Von Braun
>>> )".
>>>
>>>
>>> On Sat, 5 Oct 2024 at 06:26, Ángel 
>>> wrote:
>>>
 I completely agree with everyone here. I don’t think the issue is
 deprecating it; to me, the problem lies in not providing a new and better
 solution for handling graphs in Spark. In the past, I used GraphX via
 GraphFrames for record linkage, and I found it both useful and effective.
 Is there any discussion about a potential replacement?

 I’d be willing to help maintain GraphX, though I don’t have previous
 experience with maintaining open-source projects. All I can promise is good
 intentions, willingness to learn and lots of energy and passion. Is that
 enough?

 Btw, what's your take on this?


-

GraphX will be deprecated in favor of a new graphing component,
SparkGraph, based on Cypher
, a much richer
graph language than previously offered by GraphX.



 https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0

 El sáb, 5 oct 2024 a las 2:17, Mark Hamstra ()
 escribió:

> As I wrote to Holden privately, I might well change my vote to be in
> favor of a deprecation label combined with some effective means of
> communicating that this doesn't mean the end for GraphX if interested
> 

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Holden Karau
So removing GraphX from Spark would not prevent GraphFrames from
continuing, they could pick up the GraphX source and incorporate it into
their project.

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Sat, Oct 5, 2024 at 5:22 PM Russell Jurney 
wrote:

> A lot of people like me use GraphFrames for its connected components
> implementation and its motif matching feature. I am willing to work on it
> to keep it alive. They did a 0.8.3 release not too long ago. Please keep
> GraphX alive.
>
> On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh 
> wrote:
>
>> I added the user list as they may have vested interest here and and
>> hopefully can contribute
>>
>> Few suggestions:
>>
>>
>>1. Data-Driven Decision Making: Return to the core metrics—analyze
>>usage trends, performance benchmarks, and the actual impact on businesses
>>that rely on GraphX. Objectivity can be restored by letting data speak
>>louder than opinions so to speak.
>>2. Broaden the Discussion: Engage more stakeholders from diverse
>>backgrounds (especially spark  users) to bring in new perspectives and
>>counterbalance the more vocal but potentially narrow interests of core
>>maintainers or open-source contributors.
>>3. Define Clear Criteria for Decision Making: Agree on a set of
>>objective criteria by which the project’s future will be judged. These
>>could include market demand, contribution levels, maintenance costs,
>>alternative solutions, and alignment with the overall Spark ecosystem
>>goals. Some have already been covered.
>>4. Timely Conclusion of Discussions: Set a timeline for making a
>>decision. Long, open-ended discussions tend to lose focus. Putting
>>deadlines forces participants to focus on key issues and prevents endless
>>debates.
>>5. Borrowing from commercial settings, it is often necessary for a
>>strong leadership team to step in and make the final decision after
>>considering the input. When the objectivity of discussions starts to wane,
>>leadership needs to cut through the round discussions and steer towards
>>action based on business and technical realities.
>>
>>
>> HTH
>>
>> Mich Talebzadeh,
>>
>> Architect | Data Engineer | Data Science | Financial Crime
>> PhD  Imperial
>> College London 
>> London, United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von Braun
>> )".
>>
>>
>> On Sat, 5 Oct 2024 at 06:26, Ángel 
>> wrote:
>>
>>> I completely agree with everyone here. I don’t think the issue is
>>> deprecating it; to me, the problem lies in not providing a new and better
>>> solution for handling graphs in Spark. In the past, I used GraphX via
>>> GraphFrames for record linkage, and I found it both useful and effective.
>>> Is there any discussion about a potential replacement?
>>>
>>> I’d be willing to help maintain GraphX, though I don’t have previous
>>> experience with maintaining open-source projects. All I can promise is good
>>> intentions, willingness to learn and lots of energy and passion. Is that
>>> enough?
>>>
>>> Btw, what's your take on this?
>>>
>>>
>>>-
>>>
>>>GraphX will be deprecated in favor of a new graphing component,
>>>SparkGraph, based on Cypher
>>>, a much richer
>>>graph language than previously offered by GraphX.
>>>
>>>
>>>
>>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0
>>>
>>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra ()
>>> escribió:
>>>
 As I wrote to Holden privately, I might well change my vote to be in
 favor of a deprecation label combined with some effective means of
 communicating that this doesn't mean the end for GraphX if interested
 contributors come forward to rescue it. I don't like either the idea
 of keeping unmaintained code and public APIs around (especially if
 there are problems with them) or the idea of removing Spark
 functionality just because no one has contributed to it for a while. A
 naked deprecation label feels somewhat drastic and pre-e

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Russell Jurney
A lot of people like me use GraphFrames for its connected components
implementation and its motif matching feature. I am willing to work on it
to keep it alive. They did a 0.8.3 release not too long ago. Please keep
GraphX alive.

On Sat, Oct 5, 2024 at 3:44 PM Mich Talebzadeh 
wrote:

> I added the user list as they may have vested interest here and and
> hopefully can contribute
>
> Few suggestions:
>
>
>1. Data-Driven Decision Making: Return to the core metrics—analyze
>usage trends, performance benchmarks, and the actual impact on businesses
>that rely on GraphX. Objectivity can be restored by letting data speak
>louder than opinions so to speak.
>2. Broaden the Discussion: Engage more stakeholders from diverse
>backgrounds (especially spark  users) to bring in new perspectives and
>counterbalance the more vocal but potentially narrow interests of core
>maintainers or open-source contributors.
>3. Define Clear Criteria for Decision Making: Agree on a set of
>objective criteria by which the project’s future will be judged. These
>could include market demand, contribution levels, maintenance costs,
>alternative solutions, and alignment with the overall Spark ecosystem
>goals. Some have already been covered.
>4. Timely Conclusion of Discussions: Set a timeline for making a
>decision. Long, open-ended discussions tend to lose focus. Putting
>deadlines forces participants to focus on key issues and prevents endless
>debates.
>5. Borrowing from commercial settings, it is often necessary for a
>strong leadership team to step in and make the final decision after
>considering the input. When the objectivity of discussions starts to wane,
>leadership needs to cut through the round discussions and steer towards
>action based on business and technical realities.
>
>
> HTH
>
> Mich Talebzadeh,
>
> Architect | Data Engineer | Data Science | Financial Crime
> PhD  Imperial College
> London 
> London, United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Sat, 5 Oct 2024 at 06:26, Ángel  wrote:
>
>> I completely agree with everyone here. I don’t think the issue is
>> deprecating it; to me, the problem lies in not providing a new and better
>> solution for handling graphs in Spark. In the past, I used GraphX via
>> GraphFrames for record linkage, and I found it both useful and effective.
>> Is there any discussion about a potential replacement?
>>
>> I’d be willing to help maintain GraphX, though I don’t have previous
>> experience with maintaining open-source projects. All I can promise is good
>> intentions, willingness to learn and lots of energy and passion. Is that
>> enough?
>>
>> Btw, what's your take on this?
>>
>>
>>-
>>
>>GraphX will be deprecated in favor of a new graphing component,
>>SparkGraph, based on Cypher
>>, a much richer
>>graph language than previously offered by GraphX.
>>
>>
>>
>> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0
>>
>> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra ()
>> escribió:
>>
>>> As I wrote to Holden privately, I might well change my vote to be in
>>> favor of a deprecation label combined with some effective means of
>>> communicating that this doesn't mean the end for GraphX if interested
>>> contributors come forward to rescue it. I don't like either the idea
>>> of keeping unmaintained code and public APIs around (especially if
>>> there are problems with them) or the idea of removing Spark
>>> functionality just because no one has contributed to it for a while. A
>>> naked deprecation label feels somewhat drastic and pre-emptive to me.
>>> I don't expect that GraphX will be the last part of Spark to run the
>>> risk of death through neglect, and I think we need an effective means
>>> of encouraging resuscitation that a deprecation label on its own does
>>> not provide. On the other hand, if no one really is willing to come to
>>> the aid of GraphX or other neglected functionality given adequate
>>> warning of possible removal, I'm not then opposed to the usual
>>> deprecation and removal process.
>>>
>>>
>>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen  wrote:
>>> >
>>> > This is a reasonable discussion, but maybe the more practical point
>>> is: are you sure you want to block

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Mich Talebzadeh
I added the user list as they may have vested interest here and and
hopefully can contribute

Few suggestions:


   1. Data-Driven Decision Making: Return to the core metrics—analyze usage
   trends, performance benchmarks, and the actual impact on businesses that
   rely on GraphX. Objectivity can be restored by letting data speak louder
   than opinions so to speak.
   2. Broaden the Discussion: Engage more stakeholders from diverse
   backgrounds (especially spark  users) to bring in new perspectives and
   counterbalance the more vocal but potentially narrow interests of core
   maintainers or open-source contributors.
   3. Define Clear Criteria for Decision Making: Agree on a set of
   objective criteria by which the project’s future will be judged. These
   could include market demand, contribution levels, maintenance costs,
   alternative solutions, and alignment with the overall Spark ecosystem
   goals. Some have already been covered.
   4. Timely Conclusion of Discussions: Set a timeline for making a
   decision. Long, open-ended discussions tend to lose focus. Putting
   deadlines forces participants to focus on key issues and prevents endless
   debates.
   5. Borrowing from commercial settings, it is often necessary for a
   strong leadership team to step in and make the final decision after
   considering the input. When the objectivity of discussions starts to wane,
   leadership needs to cut through the round discussions and steer towards
   action based on business and technical realities.


HTH

Mich Talebzadeh,

Architect | Data Engineer | Data Science | Financial Crime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sat, 5 Oct 2024 at 06:26, Ángel  wrote:

> I completely agree with everyone here. I don’t think the issue is
> deprecating it; to me, the problem lies in not providing a new and better
> solution for handling graphs in Spark. In the past, I used GraphX via
> GraphFrames for record linkage, and I found it both useful and effective.
> Is there any discussion about a potential replacement?
>
> I’d be willing to help maintain GraphX, though I don’t have previous
> experience with maintaining open-source projects. All I can promise is good
> intentions, willingness to learn and lots of energy and passion. Is that
> enough?
>
> Btw, what's your take on this?
>
>
>-
>
>GraphX will be deprecated in favor of a new graphing component,
>SparkGraph, based on Cypher
>, a much richer
>graph language than previously offered by GraphX.
>
>
>
> https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0
>
> El sáb, 5 oct 2024 a las 2:17, Mark Hamstra ()
> escribió:
>
>> As I wrote to Holden privately, I might well change my vote to be in
>> favor of a deprecation label combined with some effective means of
>> communicating that this doesn't mean the end for GraphX if interested
>> contributors come forward to rescue it. I don't like either the idea
>> of keeping unmaintained code and public APIs around (especially if
>> there are problems with them) or the idea of removing Spark
>> functionality just because no one has contributed to it for a while. A
>> naked deprecation label feels somewhat drastic and pre-emptive to me.
>> I don't expect that GraphX will be the last part of Spark to run the
>> risk of death through neglect, and I think we need an effective means
>> of encouraging resuscitation that a deprecation label on its own does
>> not provide. On the other hand, if no one really is willing to come to
>> the aid of GraphX or other neglected functionality given adequate
>> warning of possible removal, I'm not then opposed to the usual
>> deprecation and removal process.
>>
>>
>> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen  wrote:
>> >
>> > This is a reasonable discussion, but maybe the more practical point is:
>> are you sure you want to block this unilaterally? This effectively makes a
>> decision that GraphX cannot be removed for a long while. I'd understand it
>> more if we had an active maintainer and/or active user proposing to veto,
>> but my understanding is this is just a proposal to block this on behalf of
>> some users, someone else who might do some work and hasn't to date for some
>> reason. Add to that the fact that the 'pro' arguments all seem to be
>> a

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Mark Hamstra
I'm not saying that deprecation necessarily precludes further
contributions to the deprecated code. Without explicit and visible
encouragement of further contributions, though, a deprecation label
does actively discourage further contributions.

That, then, raises the question of whether we do want to actively
discourage further development of GraphX. I don't have a strong
opinion on that, and I could be persuaded that we do want to actively
discourage (perhaps in favor of exclusive use of GraphFrames.) I'm not
convinced, though, that a lack of recent contributions alone is
sufficient reason to discourage further contributions to a neglected
area of Spark in the way that a deprecation label devoid of caveats
would.


On Fri, Oct 4, 2024 at 5:27 PM Sean Owen  wrote:
>
> Deprecation doesn't stop any of that though, if you want to encourage people 
> to do something with GraphX. We can un-deprecate things. We don't have to 
> remove deprecated things.
>
> But, why would we not encourage people to work on GraphFrames if interested 
> in this domain?
>
> Nobody has been willing to come to the aid of GraphX in years and there is 
> still no particular answer to why that would be different, as I take it you 
> are not volunteering to work on it and don't have a use case here either.
>
> For those reasons, I don't believe this is motivated enough to sustain a veto.
>
> On Fri, Oct 4, 2024 at 7:16 PM Mark Hamstra  wrote:
>>
>> As I wrote to Holden privately, I might well change my vote to be in
>> favor of a deprecation label combined with some effective means of
>> communicating that this doesn't mean the end for GraphX if interested
>> contributors come forward to rescue it. I don't like either the idea
>> of keeping unmaintained code and public APIs around (especially if
>> there are problems with them) or the idea of removing Spark
>> functionality just because no one has contributed to it for a while. A
>> naked deprecation label feels somewhat drastic and pre-emptive to me.
>> I don't expect that GraphX will be the last part of Spark to run the
>> risk of death through neglect, and I think we need an effective means
>> of encouraging resuscitation that a deprecation label on its own does
>> not provide. On the other hand, if no one really is willing to come to
>> the aid of GraphX or other neglected functionality given adequate
>> warning of possible removal, I'm not then opposed to the usual
>> deprecation and removal process.
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Ángel
I completely agree with everyone here. I don’t think the issue is
deprecating it; to me, the problem lies in not providing a new and better
solution for handling graphs in Spark. In the past, I used GraphX via
GraphFrames for record linkage, and I found it both useful and effective.
Is there any discussion about a potential replacement?

I’d be willing to help maintain GraphX, though I don’t have previous
experience with maintaining open-source projects. All I can promise is good
intentions, willingness to learn and lots of energy and passion. Is that
enough?

Btw, what's your take on this?


   -

   GraphX will be deprecated in favor of a new graphing component,
   SparkGraph, based on Cypher
   , a much richer
   graph language than previously offered by GraphX.


https://cloud.google.com/blog/products/data-analytics/introducing-spark-3-and-hadoop-3-on-dataproc-image-version-2-0

El sáb, 5 oct 2024 a las 2:17, Mark Hamstra ()
escribió:

> As I wrote to Holden privately, I might well change my vote to be in
> favor of a deprecation label combined with some effective means of
> communicating that this doesn't mean the end for GraphX if interested
> contributors come forward to rescue it. I don't like either the idea
> of keeping unmaintained code and public APIs around (especially if
> there are problems with them) or the idea of removing Spark
> functionality just because no one has contributed to it for a while. A
> naked deprecation label feels somewhat drastic and pre-emptive to me.
> I don't expect that GraphX will be the last part of Spark to run the
> risk of death through neglect, and I think we need an effective means
> of encouraging resuscitation that a deprecation label on its own does
> not provide. On the other hand, if no one really is willing to come to
> the aid of GraphX or other neglected functionality given adequate
> warning of possible removal, I'm not then opposed to the usual
> deprecation and removal process.
>
>
> On Fri, Oct 4, 2024 at 4:10 PM Sean Owen  wrote:
> >
> > This is a reasonable discussion, but maybe the more practical point is:
> are you sure you want to block this unilaterally? This effectively makes a
> decision that GraphX cannot be removed for a long while. I'd understand it
> more if we had an active maintainer and/or active user proposing to veto,
> but my understanding is this is just a proposal to block this on behalf of
> some users, someone else who might do some work and hasn't to date for some
> reason. Add to that the fact that the 'pro' arguments all seem to be
> arguments for working on GraphFrames, and I find this somewhat drastic.
> >
> > On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra 
> wrote:
> >>
> >> "You can't say nothing is removable until there are no users."
> >>
> >> That is not what I am saying. Rather, I am countering what others seem
> >> to be suggesting: There are no users and no interest, therefore we can
> >> and should deprecate.
> >>
> >> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen  wrote:
> >> >
> >> > I could flip this argument around. More strongly, not being
> deprecated means "won't be removed" and likewise implies support and
> development. I don't think either of the latter have been true for years.
> What suggests this will change? A todo list is not going to do anything,
> IMHO.
> >> >
> >> > I'm also concerned about the cost of that, which I have observed.
> GraphX PRs are almost certainly not going to be reviewed because of its
> state. Deprecation both communicates that reality, and leaves an option
> open, whereas not deprecating forecloses that option for a while.
> >> >
> >> > I don't think the question is, does anyone use it? because anyone can
> continue to use it -- in Spark 3.x for sure, and in 4.x if not removed.
> >> > You can't say nothing is removable until there are no users.
> >> >
> >> > Also, why would GraphFrames not be the logical home of this going
> forward anyway? which I think is the subtext.
> >> >
> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra 
> wrote:
> >> >>
> >> >> I'm -1(*) because, while it technically means "might be removed in
> the
> >> >> future", I think developers and users are more prone to interpret
> >> >> something being marked as deprecated as "very likely will be removed
> >> >> in the future, so don't depend on this or waste your time
> contributing
> >> >> to its further development." I don't think the latter is what we want
> >> >> just because something hasn't been updated meaningfully in a while.
> >> >> There have been How To articles for GraphX and Graph Frames posted in
> >> >> the not too distant past, and the Google Search trend shows a pretty
> >> >> steady level of interest, not a decline to zero, so I don't think
> that
> >> >> it is accurate to declare that there is no use or interest in GraphX.
> >> >>
> >> >> Unless retaining GraphX is imposing significant costs on continuing
> >> >> Spark development, I can't suppo

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Sean Owen
Deprecation doesn't stop any of that though, if you want to encourage
people to do something with GraphX. We can un-deprecate things. We don't
have to remove deprecated things.

But, why would we not encourage people to work on GraphFrames if interested
in this domain?

Nobody has been willing to come to the aid of GraphX in years and there is
still no particular answer to why that would be different, as I take it you
are not volunteering to work on it and don't have a use case here either.

For those reasons, I don't believe this is motivated enough to sustain a
veto.

On Fri, Oct 4, 2024 at 7:16 PM Mark Hamstra  wrote:

> As I wrote to Holden privately, I might well change my vote to be in
> favor of a deprecation label combined with some effective means of
> communicating that this doesn't mean the end for GraphX if interested
> contributors come forward to rescue it. I don't like either the idea
> of keeping unmaintained code and public APIs around (especially if
> there are problems with them) or the idea of removing Spark
> functionality just because no one has contributed to it for a while. A
> naked deprecation label feels somewhat drastic and pre-emptive to me.
> I don't expect that GraphX will be the last part of Spark to run the
> risk of death through neglect, and I think we need an effective means
> of encouraging resuscitation that a deprecation label on its own does
> not provide. On the other hand, if no one really is willing to come to
> the aid of GraphX or other neglected functionality given adequate
> warning of possible removal, I'm not then opposed to the usual
> deprecation and removal process.
>
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Mark Hamstra
As I wrote to Holden privately, I might well change my vote to be in
favor of a deprecation label combined with some effective means of
communicating that this doesn't mean the end for GraphX if interested
contributors come forward to rescue it. I don't like either the idea
of keeping unmaintained code and public APIs around (especially if
there are problems with them) or the idea of removing Spark
functionality just because no one has contributed to it for a while. A
naked deprecation label feels somewhat drastic and pre-emptive to me.
I don't expect that GraphX will be the last part of Spark to run the
risk of death through neglect, and I think we need an effective means
of encouraging resuscitation that a deprecation label on its own does
not provide. On the other hand, if no one really is willing to come to
the aid of GraphX or other neglected functionality given adequate
warning of possible removal, I'm not then opposed to the usual
deprecation and removal process.


On Fri, Oct 4, 2024 at 4:10 PM Sean Owen  wrote:
>
> This is a reasonable discussion, but maybe the more practical point is: are 
> you sure you want to block this unilaterally? This effectively makes a 
> decision that GraphX cannot be removed for a long while. I'd understand it 
> more if we had an active maintainer and/or active user proposing to veto, but 
> my understanding is this is just a proposal to block this on behalf of some 
> users, someone else who might do some work and hasn't to date for some 
> reason. Add to that the fact that the 'pro' arguments all seem to be 
> arguments for working on GraphFrames, and I find this somewhat drastic.
>
> On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra  wrote:
>>
>> "You can't say nothing is removable until there are no users."
>>
>> That is not what I am saying. Rather, I am countering what others seem
>> to be suggesting: There are no users and no interest, therefore we can
>> and should deprecate.
>>
>> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen  wrote:
>> >
>> > I could flip this argument around. More strongly, not being deprecated 
>> > means "won't be removed" and likewise implies support and development. I 
>> > don't think either of the latter have been true for years. What suggests 
>> > this will change? A todo list is not going to do anything, IMHO.
>> >
>> > I'm also concerned about the cost of that, which I have observed. GraphX 
>> > PRs are almost certainly not going to be reviewed because of its state. 
>> > Deprecation both communicates that reality, and leaves an option open, 
>> > whereas not deprecating forecloses that option for a while.
>> >
>> > I don't think the question is, does anyone use it? because anyone can 
>> > continue to use it -- in Spark 3.x for sure, and in 4.x if not removed.
>> > You can't say nothing is removable until there are no users.
>> >
>> > Also, why would GraphFrames not be the logical home of this going forward 
>> > anyway? which I think is the subtext.
>> >
>> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra  wrote:
>> >>
>> >> I'm -1(*) because, while it technically means "might be removed in the
>> >> future", I think developers and users are more prone to interpret
>> >> something being marked as deprecated as "very likely will be removed
>> >> in the future, so don't depend on this or waste your time contributing
>> >> to its further development." I don't think the latter is what we want
>> >> just because something hasn't been updated meaningfully in a while.
>> >> There have been How To articles for GraphX and Graph Frames posted in
>> >> the not too distant past, and the Google Search trend shows a pretty
>> >> steady level of interest, not a decline to zero, so I don't think that
>> >> it is accurate to declare that there is no use or interest in GraphX.
>> >>
>> >> Unless retaining GraphX is imposing significant costs on continuing
>> >> Spark development, I can't support deprecating GraphX. I can support
>> >> encouraging GraphX and Graph Frames development through something like
>> >> a To Do list or document of "What we'd like to see in the way of
>> >> further development of Spark's graph processing capabilities" -- i.e.,
>> >> things that encourage and support new contributions to address any
>> >> shortcomings in Spark's graph processing, not things that discourage
>> >> contributions and use in the way that I believe simply declaring
>> >> GraphX to be deprecated would.
>> >>
>> >>
>> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau  
>> >> wrote:
>> >> >
>> >> > Since we're getting close to cutting a 4.0 branch I'd like to float the 
>> >> > idea of officially deprecating Graph X. What that would mean (to me) is 
>> >> > we would update the docs to indicate that Graph X is deprecated and 
>> >> > it's APIs may be removed at anytime in the future.
>> >> >
>> >> > Alternatively, we could mark it as "unmaintained and in search of 
>> >> > maintainers" with a note that if no maintainers are found, we may 
>> >> > remove it in a fut

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Sean Owen
This is a reasonable discussion, but maybe the more practical point is: are
you sure you want to block this unilaterally? This effectively makes a
decision that GraphX cannot be removed for a long while. I'd understand it
more if we had an active maintainer and/or active user proposing to veto,
but my understanding is this is just a proposal to block this on behalf of
some users, someone else who might do some work and hasn't to date for some
reason. Add to that the fact that the 'pro' arguments all seem to be
arguments for working on GraphFrames, and I find this somewhat drastic.

On Fri, Oct 4, 2024 at 5:23 PM Mark Hamstra  wrote:

> "You can't say nothing is removable until there are no users."
>
> That is not what I am saying. Rather, I am countering what others seem
> to be suggesting: There are no users and no interest, therefore we can
> and should deprecate.
>
> On Fri, Oct 4, 2024 at 3:10 PM Sean Owen  wrote:
> >
> > I could flip this argument around. More strongly, not being deprecated
> means "won't be removed" and likewise implies support and development. I
> don't think either of the latter have been true for years. What suggests
> this will change? A todo list is not going to do anything, IMHO.
> >
> > I'm also concerned about the cost of that, which I have observed. GraphX
> PRs are almost certainly not going to be reviewed because of its state.
> Deprecation both communicates that reality, and leaves an option open,
> whereas not deprecating forecloses that option for a while.
> >
> > I don't think the question is, does anyone use it? because anyone can
> continue to use it -- in Spark 3.x for sure, and in 4.x if not removed.
> > You can't say nothing is removable until there are no users.
> >
> > Also, why would GraphFrames not be the logical home of this going
> forward anyway? which I think is the subtext.
> >
> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra 
> wrote:
> >>
> >> I'm -1(*) because, while it technically means "might be removed in the
> >> future", I think developers and users are more prone to interpret
> >> something being marked as deprecated as "very likely will be removed
> >> in the future, so don't depend on this or waste your time contributing
> >> to its further development." I don't think the latter is what we want
> >> just because something hasn't been updated meaningfully in a while.
> >> There have been How To articles for GraphX and Graph Frames posted in
> >> the not too distant past, and the Google Search trend shows a pretty
> >> steady level of interest, not a decline to zero, so I don't think that
> >> it is accurate to declare that there is no use or interest in GraphX.
> >>
> >> Unless retaining GraphX is imposing significant costs on continuing
> >> Spark development, I can't support deprecating GraphX. I can support
> >> encouraging GraphX and Graph Frames development through something like
> >> a To Do list or document of "What we'd like to see in the way of
> >> further development of Spark's graph processing capabilities" -- i.e.,
> >> things that encourage and support new contributions to address any
> >> shortcomings in Spark's graph processing, not things that discourage
> >> contributions and use in the way that I believe simply declaring
> >> GraphX to be deprecated would.
> >>
> >>
> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau 
> wrote:
> >> >
> >> > Since we're getting close to cutting a 4.0 branch I'd like to float
> the idea of officially deprecating Graph X. What that would mean (to me) is
> we would update the docs to indicate that Graph X is deprecated and it's
> APIs may be removed at anytime in the future.
> >> >
> >> > Alternatively, we could mark it as "unmaintained and in search of
> maintainers" with a note that if no maintainers are found, we may remove it
> in a future minor version.
> >> >
> >> > Looking at the source graph X, I don't see any meaningful active
> development going back over three years*. There is even a thread on user@
> from 2017 asking if graph X is maintained anymore, with no response from
> the developers.
> >> >
> >> > Now I'm open to the idea that GraphX is stable and "works as is" and
> simply doesn't require modifications but given the user thread I'm a little
> concerned here about bringing this API with us into Spark 4 if we don't
> have anyone signed up to maintain it.
> >> >
> >> > * Excluding globally applied changes
> >> > --
> >> > Twitter: https://twitter.com/holdenkarau
> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> >> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >> > Pronouns: she/her
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Mich Talebzadeh
Personally, I would promote Best Practices given fair bit of opinions
either way.

   - Share best practices for using GraphX effectively, including tips for
   optimizing performance and avoiding common pitfalls.
   - Encourage the use of alternative libraries when appropriate,
   highlighting their advantages and how they can complement GraphX.

I concur that it is important to avoid labelling GraphX as deprecated
without providing clear guidance and alternatives. Instead, perhaps we
should focus on documenting its current status, encouraging contributions,
and promoting best practices for graph processing in Spark. I saw someone
created some documents
HTH

Mich Talebzadeh,

*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Fri, 4 Oct 2024 at 23:09, Mark Hamstra  wrote:

> No, I wouldn't encourage anyone to base a new production deployment on
> GraphX, but neither would I encourage new production deployments based
> on the RDD API without deep study and understanding of the
> implications and limitations. What I would be most comfortable with is
> documenting the current status and shortcomings of GraphX, along with
> encouraging contributions to remedy that situation. I'm not sure what
> the best (or even an effective) way of accomplishing that is, but I'm
> pretty sure it's not just labeling GraphX as deprecated.
>
> On Fri, Oct 4, 2024 at 3:00 PM Holden Karau 
> wrote:
> >
> > Personally I think people should not depend on it — there’s literally no
> one working on it, and not being up front about that I think draws
> everything else into question.
> >
> > Would anyone here feel comfortable using GraphX for a new production
> deployment today?
> >
> >
> > Twitter: https://twitter.com/holdenkarau
> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
> >
> >
> > On Fri, Oct 4, 2024 at 2:56 PM Mark Hamstra 
> wrote:
> >>
> >> I'm -1(*) because, while it technically means "might be removed in the
> >> future", I think developers and users are more prone to interpret
> >> something being marked as deprecated as "very likely will be removed
> >> in the future, so don't depend on this or waste your time contributing
> >> to its further development." I don't think the latter is what we want
> >> just because something hasn't been updated meaningfully in a while.
> >> There have been How To articles for GraphX and Graph Frames posted in
> >> the not too distant past, and the Google Search trend shows a pretty
> >> steady level of interest, not a decline to zero, so I don't think that
> >> it is accurate to declare that there is no use or interest in GraphX.
> >>
> >> Unless retaining GraphX is imposing significant costs on continuing
> >> Spark development, I can't support deprecating GraphX. I can support
> >> encouraging GraphX and Graph Frames development through something like
> >> a To Do list or document of "What we'd like to see in the way of
> >> further development of Spark's graph processing capabilities" -- i.e.,
> >> things that encourage and support new contributions to address any
> >> shortcomings in Spark's graph processing, not things that discourage
> >> contributions and use in the way that I believe simply declaring
> >> GraphX to be deprecated would.
> >>
> >>
> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau 
> wrote:
> >> >
> >> > Since we're getting close to cutting a 4.0 branch I'd like to float
> the idea of officially deprecating Graph X. What that would mean (to me) is
> we would update the docs to indicate that Graph X is deprecated and it's
> APIs may be removed at anytime in the future.
> >> >
> >> > Alternatively, we could mark it as "unmaintained and in search of
> maintainers" with a note that if no maintainers are found, we may remove it
> in a future minor version.
> >> >
> >> > Looking at the source graph X, I don't see any meaningful active
> development going back over three years*. There is even a thread on user@
> from 2017 asking if graph X is maintained anymore, with no response from
> the developers.
> >> >
> >> > Now I'm open to the idea that GraphX is stable and "works as is" and
> simply doesn't require modifications but given the user thread I'm a little
> concerned here about bringing this API with us into Spark 4 if we don't
> have anyone signed up to maintain it.
> >> >
> >> > * Excluding globally applied changes
> >> > --
> >> > Twitter: https://twitter.com/holdenkarau
> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> >> > Books (Learning Spark, High Performan

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Mark Hamstra
"You can't say nothing is removable until there are no users."

That is not what I am saying. Rather, I am countering what others seem
to be suggesting: There are no users and no interest, therefore we can
and should deprecate.

On Fri, Oct 4, 2024 at 3:10 PM Sean Owen  wrote:
>
> I could flip this argument around. More strongly, not being deprecated means 
> "won't be removed" and likewise implies support and development. I don't 
> think either of the latter have been true for years. What suggests this will 
> change? A todo list is not going to do anything, IMHO.
>
> I'm also concerned about the cost of that, which I have observed. GraphX PRs 
> are almost certainly not going to be reviewed because of its state. 
> Deprecation both communicates that reality, and leaves an option open, 
> whereas not deprecating forecloses that option for a while.
>
> I don't think the question is, does anyone use it? because anyone can 
> continue to use it -- in Spark 3.x for sure, and in 4.x if not removed.
> You can't say nothing is removable until there are no users.
>
> Also, why would GraphFrames not be the logical home of this going forward 
> anyway? which I think is the subtext.
>
> On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra  wrote:
>>
>> I'm -1(*) because, while it technically means "might be removed in the
>> future", I think developers and users are more prone to interpret
>> something being marked as deprecated as "very likely will be removed
>> in the future, so don't depend on this or waste your time contributing
>> to its further development." I don't think the latter is what we want
>> just because something hasn't been updated meaningfully in a while.
>> There have been How To articles for GraphX and Graph Frames posted in
>> the not too distant past, and the Google Search trend shows a pretty
>> steady level of interest, not a decline to zero, so I don't think that
>> it is accurate to declare that there is no use or interest in GraphX.
>>
>> Unless retaining GraphX is imposing significant costs on continuing
>> Spark development, I can't support deprecating GraphX. I can support
>> encouraging GraphX and Graph Frames development through something like
>> a To Do list or document of "What we'd like to see in the way of
>> further development of Spark's graph processing capabilities" -- i.e.,
>> things that encourage and support new contributions to address any
>> shortcomings in Spark's graph processing, not things that discourage
>> contributions and use in the way that I believe simply declaring
>> GraphX to be deprecated would.
>>
>>
>> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau  wrote:
>> >
>> > Since we're getting close to cutting a 4.0 branch I'd like to float the 
>> > idea of officially deprecating Graph X. What that would mean (to me) is we 
>> > would update the docs to indicate that Graph X is deprecated and it's APIs 
>> > may be removed at anytime in the future.
>> >
>> > Alternatively, we could mark it as "unmaintained and in search of 
>> > maintainers" with a note that if no maintainers are found, we may remove 
>> > it in a future minor version.
>> >
>> > Looking at the source graph X, I don't see any meaningful active 
>> > development going back over three years*. There is even a thread on user@ 
>> > from 2017 asking if graph X is maintained anymore, with no response from 
>> > the developers.
>> >
>> > Now I'm open to the idea that GraphX is stable and "works as is" and 
>> > simply doesn't require modifications but given the user thread I'm a 
>> > little concerned here about bringing this API with us into Spark 4 if we 
>> > don't have anyone signed up to maintain it.
>> >
>> > * Excluding globally applied changes
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Fight Health Insurance: https://www.fighthealthinsurance.com/
>> > Books (Learning Spark, High Performance Spark, etc.): 
>> > https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> > Pronouns: she/her
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Holden Karau
Interesting, personally there are many use cases where I would recommend
RDDs — definitely to more advanced users — and I think that RDDs and GraphX
are in pretty different boats (RDDs are very actively used).

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Fri, Oct 4, 2024 at 3:08 PM Mark Hamstra  wrote:

> No, I wouldn't encourage anyone to base a new production deployment on
> GraphX, but neither would I encourage new production deployments based
> on the RDD API without deep study and understanding of the
> implications and limitations. What I would be most comfortable with is
> documenting the current status and shortcomings of GraphX, along with
> encouraging contributions to remedy that situation. I'm not sure what
> the best (or even an effective) way of accomplishing that is, but I'm
> pretty sure it's not just labeling GraphX as deprecated.
>
> On Fri, Oct 4, 2024 at 3:00 PM Holden Karau 
> wrote:
> >
> > Personally I think people should not depend on it — there’s literally no
> one working on it, and not being up front about that I think draws
> everything else into question.
> >
> > Would anyone here feel comfortable using GraphX for a new production
> deployment today?
> >
> >
> > Twitter: https://twitter.com/holdenkarau
> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
> >
> >
> > On Fri, Oct 4, 2024 at 2:56 PM Mark Hamstra 
> wrote:
> >>
> >> I'm -1(*) because, while it technically means "might be removed in the
> >> future", I think developers and users are more prone to interpret
> >> something being marked as deprecated as "very likely will be removed
> >> in the future, so don't depend on this or waste your time contributing
> >> to its further development." I don't think the latter is what we want
> >> just because something hasn't been updated meaningfully in a while.
> >> There have been How To articles for GraphX and Graph Frames posted in
> >> the not too distant past, and the Google Search trend shows a pretty
> >> steady level of interest, not a decline to zero, so I don't think that
> >> it is accurate to declare that there is no use or interest in GraphX.
> >>
> >> Unless retaining GraphX is imposing significant costs on continuing
> >> Spark development, I can't support deprecating GraphX. I can support
> >> encouraging GraphX and Graph Frames development through something like
> >> a To Do list or document of "What we'd like to see in the way of
> >> further development of Spark's graph processing capabilities" -- i.e.,
> >> things that encourage and support new contributions to address any
> >> shortcomings in Spark's graph processing, not things that discourage
> >> contributions and use in the way that I believe simply declaring
> >> GraphX to be deprecated would.
> >>
> >>
> >> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau 
> wrote:
> >> >
> >> > Since we're getting close to cutting a 4.0 branch I'd like to float
> the idea of officially deprecating Graph X. What that would mean (to me) is
> we would update the docs to indicate that Graph X is deprecated and it's
> APIs may be removed at anytime in the future.
> >> >
> >> > Alternatively, we could mark it as "unmaintained and in search of
> maintainers" with a note that if no maintainers are found, we may remove it
> in a future minor version.
> >> >
> >> > Looking at the source graph X, I don't see any meaningful active
> development going back over three years*. There is even a thread on user@
> from 2017 asking if graph X is maintained anymore, with no response from
> the developers.
> >> >
> >> > Now I'm open to the idea that GraphX is stable and "works as is" and
> simply doesn't require modifications but given the user thread I'm a little
> concerned here about bringing this API with us into Spark 4 if we don't
> have anyone signed up to maintain it.
> >> >
> >> > * Excluding globally applied changes
> >> > --
> >> > Twitter: https://twitter.com/holdenkarau
> >> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> >> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >> > Pronouns: she/her
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Sean Owen
I could flip this argument around. More strongly, *not* being deprecated
means "won't be removed" and likewise implies support and development. I
don't think either of the latter have been true for years. What suggests
this will change? A todo list is not going to do anything, IMHO.

I'm also concerned about the cost of that, which I have observed. GraphX
PRs are almost certainly not going to be reviewed because of its state.
Deprecation both communicates that reality, and leaves an option open,
whereas not deprecating forecloses that option for a while.

I don't think the question is, does anyone use it? because anyone can
continue to use it -- in Spark 3.x for sure, and in 4.x if not removed.
You can't say nothing is removable until there are no users.

Also, why would GraphFrames not be the logical home of this going forward
anyway? which I think is the subtext.

On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra  wrote:

> I'm -1(*) because, while it technically means "might be removed in the
> future", I think developers and users are more prone to interpret
> something being marked as deprecated as "very likely will be removed
> in the future, so don't depend on this or waste your time contributing
> to its further development." I don't think the latter is what we want
> just because something hasn't been updated meaningfully in a while.
> There have been How To articles for GraphX and Graph Frames posted in
> the not too distant past, and the Google Search trend shows a pretty
> steady level of interest, not a decline to zero, so I don't think that
> it is accurate to declare that there is no use or interest in GraphX.
>
> Unless retaining GraphX is imposing significant costs on continuing
> Spark development, I can't support deprecating GraphX. I can support
> encouraging GraphX and Graph Frames development through something like
> a To Do list or document of "What we'd like to see in the way of
> further development of Spark's graph processing capabilities" -- i.e.,
> things that encourage and support new contributions to address any
> shortcomings in Spark's graph processing, not things that discourage
> contributions and use in the way that I believe simply declaring
> GraphX to be deprecated would.
>
>
> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau 
> wrote:
> >
> > Since we're getting close to cutting a 4.0 branch I'd like to float the
> idea of officially deprecating Graph X. What that would mean (to me) is we
> would update the docs to indicate that Graph X is deprecated and it's APIs
> may be removed at anytime in the future.
> >
> > Alternatively, we could mark it as "unmaintained and in search of
> maintainers" with a note that if no maintainers are found, we may remove it
> in a future minor version.
> >
> > Looking at the source graph X, I don't see any meaningful active
> development going back over three years*. There is even a thread on user@
> from 2017 asking if graph X is maintained anymore, with no response from
> the developers.
> >
> > Now I'm open to the idea that GraphX is stable and "works as is" and
> simply doesn't require modifications but given the user thread I'm a little
> concerned here about bringing this API with us into Spark 4 if we don't
> have anyone signed up to maintain it.
> >
> > * Excluding globally applied changes
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Mark Hamstra
No, I wouldn't encourage anyone to base a new production deployment on
GraphX, but neither would I encourage new production deployments based
on the RDD API without deep study and understanding of the
implications and limitations. What I would be most comfortable with is
documenting the current status and shortcomings of GraphX, along with
encouraging contributions to remedy that situation. I'm not sure what
the best (or even an effective) way of accomplishing that is, but I'm
pretty sure it's not just labeling GraphX as deprecated.

On Fri, Oct 4, 2024 at 3:00 PM Holden Karau  wrote:
>
> Personally I think people should not depend on it — there’s literally no one 
> working on it, and not being up front about that I think draws everything 
> else into question.
>
> Would anyone here feel comfortable using GraphX for a new production 
> deployment today?
>
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Fri, Oct 4, 2024 at 2:56 PM Mark Hamstra  wrote:
>>
>> I'm -1(*) because, while it technically means "might be removed in the
>> future", I think developers and users are more prone to interpret
>> something being marked as deprecated as "very likely will be removed
>> in the future, so don't depend on this or waste your time contributing
>> to its further development." I don't think the latter is what we want
>> just because something hasn't been updated meaningfully in a while.
>> There have been How To articles for GraphX and Graph Frames posted in
>> the not too distant past, and the Google Search trend shows a pretty
>> steady level of interest, not a decline to zero, so I don't think that
>> it is accurate to declare that there is no use or interest in GraphX.
>>
>> Unless retaining GraphX is imposing significant costs on continuing
>> Spark development, I can't support deprecating GraphX. I can support
>> encouraging GraphX and Graph Frames development through something like
>> a To Do list or document of "What we'd like to see in the way of
>> further development of Spark's graph processing capabilities" -- i.e.,
>> things that encourage and support new contributions to address any
>> shortcomings in Spark's graph processing, not things that discourage
>> contributions and use in the way that I believe simply declaring
>> GraphX to be deprecated would.
>>
>>
>> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau  wrote:
>> >
>> > Since we're getting close to cutting a 4.0 branch I'd like to float the 
>> > idea of officially deprecating Graph X. What that would mean (to me) is we 
>> > would update the docs to indicate that Graph X is deprecated and it's APIs 
>> > may be removed at anytime in the future.
>> >
>> > Alternatively, we could mark it as "unmaintained and in search of 
>> > maintainers" with a note that if no maintainers are found, we may remove 
>> > it in a future minor version.
>> >
>> > Looking at the source graph X, I don't see any meaningful active 
>> > development going back over three years*. There is even a thread on user@ 
>> > from 2017 asking if graph X is maintained anymore, with no response from 
>> > the developers.
>> >
>> > Now I'm open to the idea that GraphX is stable and "works as is" and 
>> > simply doesn't require modifications but given the user thread I'm a 
>> > little concerned here about bringing this API with us into Spark 4 if we 
>> > don't have anyone signed up to maintain it.
>> >
>> > * Excluding globally applied changes
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Fight Health Insurance: https://www.fighthealthinsurance.com/
>> > Books (Learning Spark, High Performance Spark, etc.): 
>> > https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> > Pronouns: she/her
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Holden Karau
Personally I think people should not depend on it — there’s literally no
one working on it, and not being up front about that I think draws
everything else into question.

Would anyone here feel comfortable using GraphX for a new production
deployment today?


Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/

Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Fri, Oct 4, 2024 at 2:56 PM Mark Hamstra  wrote:

> I'm -1(*) because, while it technically means "might be removed in the
> future", I think developers and users are more prone to interpret
> something being marked as deprecated as "very likely will be removed
> in the future, so don't depend on this or waste your time contributing
> to its further development." I don't think the latter is what we want
> just because something hasn't been updated meaningfully in a while.
> There have been How To articles for GraphX and Graph Frames posted in
> the not too distant past, and the Google Search trend shows a pretty
> steady level of interest, not a decline to zero, so I don't think that
> it is accurate to declare that there is no use or interest in GraphX.
>
> Unless retaining GraphX is imposing significant costs on continuing
> Spark development, I can't support deprecating GraphX. I can support
> encouraging GraphX and Graph Frames development through something like
> a To Do list or document of "What we'd like to see in the way of
> further development of Spark's graph processing capabilities" -- i.e.,
> things that encourage and support new contributions to address any
> shortcomings in Spark's graph processing, not things that discourage
> contributions and use in the way that I believe simply declaring
> GraphX to be deprecated would.
>
>
> On Sun, Sep 29, 2024 at 11:04 AM Holden Karau 
> wrote:
> >
> > Since we're getting close to cutting a 4.0 branch I'd like to float the
> idea of officially deprecating Graph X. What that would mean (to me) is we
> would update the docs to indicate that Graph X is deprecated and it's APIs
> may be removed at anytime in the future.
> >
> > Alternatively, we could mark it as "unmaintained and in search of
> maintainers" with a note that if no maintainers are found, we may remove it
> in a future minor version.
> >
> > Looking at the source graph X, I don't see any meaningful active
> development going back over three years*. There is even a thread on user@
> from 2017 asking if graph X is maintained anymore, with no response from
> the developers.
> >
> > Now I'm open to the idea that GraphX is stable and "works as is" and
> simply doesn't require modifications but given the user thread I'm a little
> concerned here about bringing this API with us into Spark 4 if we don't
> have anyone signed up to maintain it.
> >
> > * Excluding globally applied changes
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-04 Thread Mark Hamstra
-1(*) reasoning posted in the DISCUSS thread

On Mon, Sep 30, 2024 at 12:40 PM Holden Karau  wrote:
>
> I think it has been de-facto deprecated, we haven’t updated it meaningfully 
> in several years. I think removing the API would be excessive but deprecating 
> it would give us the flexibility to remove it in the not too distant future.
>
> That being said this is not a vote to remove GraphX, I think that whenever 
> that time comes (if it does) we should have a separate vote
>
> This VOTE will be open for a little more than one week, ending on October 
> 8th*. To vote reply with:
> +1 Deprecate GraphX
> 0 I’m indifferent
> -1 Don’t deprecate GraphX because ABC
>
> If you have a binding vote to simplify you tallying at the end please mark 
> your vote with a *.
>
> (*mostly because I’m going camping for my birthday)
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Mark Hamstra
I'm -1(*) because, while it technically means "might be removed in the
future", I think developers and users are more prone to interpret
something being marked as deprecated as "very likely will be removed
in the future, so don't depend on this or waste your time contributing
to its further development." I don't think the latter is what we want
just because something hasn't been updated meaningfully in a while.
There have been How To articles for GraphX and Graph Frames posted in
the not too distant past, and the Google Search trend shows a pretty
steady level of interest, not a decline to zero, so I don't think that
it is accurate to declare that there is no use or interest in GraphX.

Unless retaining GraphX is imposing significant costs on continuing
Spark development, I can't support deprecating GraphX. I can support
encouraging GraphX and Graph Frames development through something like
a To Do list or document of "What we'd like to see in the way of
further development of Spark's graph processing capabilities" -- i.e.,
things that encourage and support new contributions to address any
shortcomings in Spark's graph processing, not things that discourage
contributions and use in the way that I believe simply declaring
GraphX to be deprecated would.


On Sun, Sep 29, 2024 at 11:04 AM Holden Karau  wrote:
>
> Since we're getting close to cutting a 4.0 branch I'd like to float the idea 
> of officially deprecating Graph X. What that would mean (to me) is we would 
> update the docs to indicate that Graph X is deprecated and it's APIs may be 
> removed at anytime in the future.
>
> Alternatively, we could mark it as "unmaintained and in search of 
> maintainers" with a note that if no maintainers are found, we may remove it 
> in a future minor version.
>
> Looking at the source graph X, I don't see any meaningful active development 
> going back over three years*. There is even a thread on user@ from 2017 
> asking if graph X is maintained anymore, with no response from the developers.
>
> Now I'm open to the idea that GraphX is stable and "works as is" and simply 
> doesn't require modifications but given the user thread I'm a little 
> concerned here about bringing this API with us into Spark 4 if we don't have 
> anyone signed up to maintain it.
>
> * Excluding globally applied changes
> --
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-04 Thread Ángel
The graphframes library depends on GraphX and has changed recently (3
months ago).

https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/graphframes/GraphFrame.scala




El vie, 4 oct 2024, 11:35, Nimrod Ofek  escribió:

> Hi,
>
> Did anyone do any search about the GraphX API in Gitlab/Github and
> different search engines to see if they are searched and actually used - or
> we are considering it not used because the API wasn't changed?
>
> Thanks!
> Nimrod
>
> On Mon, Sep 30, 2024 at 9:02 PM Holden Karau 
> wrote:
>
>> I think it has been de-facto deprecated, we haven’t updated it
>> meaningfully in several years. I think removing the API would be excessive
>> but deprecating it would give us the flexibility to remove it in the not
>> too distant future.
>>
>> That being said this is not a vote to remove GraphX, I think that
>> whenever that time comes (if it does) we should have a separate vote
>>
>> This VOTE will be open for a little more than one week, ending on October
>> 8th*. To vote reply with:
>> +1 Deprecate GraphX
>> 0 I’m indifferent
>> -1 Don’t deprecate GraphX because ABC
>>
>> If you have a binding vote to simplify you tallying at the end please
>> mark your vote with a *.
>>
>> (*mostly because I’m going camping for my birthday)
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>


Re: [VOTE][RESULT] Single-pass Analyzer for Catalyst

2024-10-04 Thread Vladimir Golubev
Looks like a missed a vote from Mich Talebzadeh

Updated list, 15 +1s (8 bindings, * = binding):

+1:
Reynold Xin (*)
Herman van Hovell (*)
Dongjoon Hyun (*)
Xiao Li (*)
Jungtaek Lim
Gengliang Wang (*)
John Zhuge
Yang Jie
Mridul Muralidharan (*)
Peter Toth
Wenchen Fan (*)
L. C. Hsieh (*)
Angel Alvarez
Mich Talebzadeh
Vladimir Golubev

+0: None

-1: None

Vladimir.

On Fri, Oct 4, 2024 at 11:34 AM Mich Talebzadeh 
wrote:

> yes it was +1
>
> +1 on the assumption that we should phase this release on an incremental
> basis. Probably will take us to the end of release 5.
>
> cheers
>
> Mich Talebzadeh,
>
> Architect | Data Engineer | Data Science | Financial Crime
> PhD  Imperial College
> London 
> London, United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Fri, 4 Oct 2024 at 10:10, Vladimir Golubev  wrote:
>
>> Oh I see, you mean "+1 IF we do it incrementally"
>>
>> On Fri, Oct 4, 2024 at 11:02 AM Vladimir Golubev 
>> wrote:
>>
>>> Hey Mich. Maybe I did... Was it a +1? I just see "+1 on the assumption
>>> that we should phase this release on an incremental basis."
>>>
>>> On Fri, Oct 4, 2024 at 10:50 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 I believe you missed my vote

 cheers


 Mich Talebzadeh,

 Architect | Data Engineer | Data Science | Financial Crime
 PhD  Imperial
 College London 
 London, United Kingdom


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* The information provided is correct to the best of my
 knowledge but of course cannot be guaranteed . It is essential to note
 that, as with any advice, quote "one test result is worth one-thousand
 expert opinions (Werner
 Von Braun
 )".


 On Fri, 4 Oct 2024 at 09:09, Vladimir Golubev 
 wrote:

> Hi folks!
>
> The vote for 'SPIP: Single-pass Analyzer for Catalyst' passed with 14
> +1s (8 bindings, * = binding):
>
> +1:
> Reynold Xin (*)
> Herman van Hovell (*)
> Dongjoon Hyun (*)
> Xiao Li (*)
> Jungtaek Lim
> Gengliang Wang (*)
> John Zhuge
> Yang Jie
> Mridul Muralidharan (*)
> Peter Toth
> Wenchen Fan (*)
> L. C. Hsieh (*)
> Angel Alvarez
> Vladimir Golubev
>
> +0: None
>
> -1: None
>
> Thanks to all participants!
>
> Vladimir.
>



Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-04 Thread Hyukjin Kwon
FWIW, this is a Google Search trend in last 5 years:

[image: Screenshot 2024-10-04 at 6.54.42 PM.png]

I think it's fine to deprecate it

On Fri, 4 Oct 2024 at 18:40, Nimrod Ofek  wrote:

> Hi,
>
> Did anyone do any search about the GraphX API in Gitlab/Github and
> different search engines to see if they are searched and actually used - or
> we are considering it not used because the API wasn't changed?
>
> Thanks!
> Nimrod
>
> On Mon, Sep 30, 2024 at 9:02 PM Holden Karau 
> wrote:
>
>> I think it has been de-facto deprecated, we haven’t updated it
>> meaningfully in several years. I think removing the API would be excessive
>> but deprecating it would give us the flexibility to remove it in the not
>> too distant future.
>>
>> That being said this is not a vote to remove GraphX, I think that
>> whenever that time comes (if it does) we should have a separate vote
>>
>> This VOTE will be open for a little more than one week, ending on October
>> 8th*. To vote reply with:
>> +1 Deprecate GraphX
>> 0 I’m indifferent
>> -1 Don’t deprecate GraphX because ABC
>>
>> If you have a binding vote to simplify you tallying at the end please
>> mark your vote with a *.
>>
>> (*mostly because I’m going camping for my birthday)
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-04 Thread Nimrod Ofek
Hi,

Did anyone do any search about the GraphX API in Gitlab/Github and
different search engines to see if they are searched and actually used - or
we are considering it not used because the API wasn't changed?

Thanks!
Nimrod

On Mon, Sep 30, 2024 at 9:02 PM Holden Karau  wrote:

> I think it has been de-facto deprecated, we haven’t updated it
> meaningfully in several years. I think removing the API would be excessive
> but deprecating it would give us the flexibility to remove it in the not
> too distant future.
>
> That being said this is not a vote to remove GraphX, I think that whenever
> that time comes (if it does) we should have a separate vote
>
> This VOTE will be open for a little more than one week, ending on October
> 8th*. To vote reply with:
> +1 Deprecate GraphX
> 0 I’m indifferent
> -1 Don’t deprecate GraphX because ABC
>
> If you have a binding vote to simplify you tallying at the end please mark
> your vote with a *.
>
> (*mostly because I’m going camping for my birthday)
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-03 Thread Hyukjin Kwon
+1

On Fri, 4 Oct 2024 at 07:14, Ángel  wrote:

> -1 Don’t deprecate GraphX because may be useful for some people and ...
> would there be any replacement for that API? Anyway, I don't think
> deprecating an API only because it hasn't been updated in ages is a good
> practice (but I could be perfectly wrong).
>
> El jue, 3 oct 2024, 16:31, Wenchen Fan  escribió:
>
>> +1
>>
>> On Tue, Oct 1, 2024 at 4:20 PM beliefer  wrote:
>>
>>> +1.
>>>
>>> I didn't hear users need it.
>>>
>>>
>>> At 2024-10-01 02:01:17, "Holden Karau"  wrote:
>>>
>>> I think it has been de-facto deprecated, we haven’t updated it
>>> meaningfully in several years. I think removing the API would be excessive
>>> but deprecating it would give us the flexibility to remove it in the not
>>> too distant future.
>>>
>>> That being said this is not a vote to remove GraphX, I think that
>>> whenever that time comes (if it does) we should have a separate vote
>>>
>>> This VOTE will be open for a little more than one week, ending on
>>> October 8th*. To vote reply with:
>>> +1 Deprecate GraphX
>>> 0 I’m indifferent
>>> -1 Don’t deprecate GraphX because ABC
>>>
>>> If you have a binding vote to simplify you tallying at the end please
>>> mark your vote with a *.
>>>
>>> (*mostly because I’m going camping for my birthday)
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> 
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Mich Talebzadeh
+1 on the assumption that we should phase this release on an incremental
basis. Probably will take us to end of release 5.

HTH

Mich Talebzadeh,

Architect | Data Engineer | Data Science | Financial Crime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Thu, 3 Oct 2024 at 21:50, Ángel  wrote:

> +1
>
> El jue, 3 oct 2024, 20:06, Wenchen Fan  escribió:
>
>> +1
>>
>> On Wed, Oct 2, 2024 at 7:50 AM Peter Toth  wrote:
>>
>>> +1
>>>
>>> On Tue, Oct 1, 2024, 08:33 Yang Jie  wrote:
>>>
 +1, Thanks

 Jie Yang

 On 2024/10/01 03:26:40 John Zhuge wrote:
 > +1 (non-binding)
 >
 > On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
 >  wrote:
 >
 > > +1
 > >
 > > On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com>
 > > wrote:
 > >
 > >> +1 (non-binding), promising proposal!
 > >>
 > >> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이
 작성:
 > >>
 > >>> Thank you for the swift clarification, Reynold and Xiao.
 > >>>
 > >>> It seems that the Target Version was set mistakenly initially.
 > >>>
 > >>> I removed the `Target Version` from the SPIP JIRA.
 > >>>
 > >>> https://issues.apache.org/jira/browse/SPARK-49834
 > >>>
 > >>> I'm switching my cast to +1 for this SPIP vote.
 > >>>
 > >>> Thanks,
 > >>> Dongjoon.
 > >>>
 > >>> On 2024/09/30 22:55:41 Xiao Li wrote:
 > >>> > +1 in support of the direction of the Single-pass Analyzer for
 > >>> Catalyst.
 > >>> >
 > >>> > I think we should not have a target version for the new Catalyst
 > >>> SPARK-49834
 > >>> > . It should
 not be
 > >>> a
 > >>> > blocker for Spark 4.0. When implementing the new analyzer, the
 code
 > >>> changes
 > >>> > must not affect users of the existing analyzer to avoid any
 user-facing
 > >>> > impacts.
 > >>> >
 > >>> > Reynold Xin  于2024年9月30日周一
 15:39写道:
 > >>> >
 > >>> > > I don't actually "lead" this. But I don't think this needs to
 target
 > >>> a
 > >>> > > specific Spark version given it should not have any user
 facing
 > >>> > > consequences?
 > >>> > >
 > >>> > >
 > >>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun <
 dongj...@apache.org>
 > >>> wrote:
 > >>> > >
 > >>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
 > >>> > >>
 > >>> > >> I'm wondering if this is really achievable goal for Apache
 Spark
 > >>> 4.0.0.
 > >>> > >>
 > >>> > >> If it's expected that we are unable to deliver it, shall we
 > >>> postpone this
 > >>> > >> vote until 4.1.0 planning?
 > >>> > >>
 > >>> > >> Anyway, since SPARK-49834 has a target version 4.0.0
 explicitly,
 > >>> > >>
 > >>> > >> -1 from my side.
 > >>> > >>
 > >>> > >> Thanks,
 > >>> > >> Dongjoon.
 > >>> > >>
 > >>> > >>
 > >>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
 > >>> > >> > +1
 > >>> > >> >
 > >>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
 > >>> >>> > >>> > >> >
 > >>> > >> > wrote:
 > >>> > >> >
 > >>> > >> > > +1
 > >>> > >> > >
 > >>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
 > >>> vvdr@gmail.com>
 > >>> > >> > > wrote:
 > >>> > >> > >
 > >>> > >> > >> Hi all,
 > >>> > >> > >>
 > >>> > >> > >> I’d like to start a vote for a single-pass Analyzer for
 the
 > >>> Catalyst
 > >>> > >> > >> project. This project will introduce a new analysis
 framework
 > >>> to the
 > >>> > >> > >> Catalyst, which will eventually replace the fixed-point
 one.
 > >>> > >> > >>
 > >>> > >> > >> Please refer to the SPIP jira:
 > >>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
 > >>> > >> > >>
 > >>> > >> > >> [ ] +1: Accept the proposal
 > >>> > >> > >> [ ] +0
 > >>> > >> > >> [ ] -1: I don’t think this is a good idea because …
 > >>> > >> > >>
 > >>> > >> > >> Thanks!
 > >>> > >> > >>
 > >>> > >> > >> Vladimir
 > >>> > >> > >>
 > >>> > >> > >
 > >>> > >> >
 > >>> > >>
 > >>> > >>
 > >>>
 -
 > >>> > >> To unsubsc

Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Ángel
+1

El jue, 3 oct 2024, 20:06, Wenchen Fan  escribió:

> +1
>
> On Wed, Oct 2, 2024 at 7:50 AM Peter Toth  wrote:
>
>> +1
>>
>> On Tue, Oct 1, 2024, 08:33 Yang Jie  wrote:
>>
>>> +1, Thanks
>>>
>>> Jie Yang
>>>
>>> On 2024/10/01 03:26:40 John Zhuge wrote:
>>> > +1 (non-binding)
>>> >
>>> > On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
>>> >  wrote:
>>> >
>>> > > +1
>>> > >
>>> > > On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com>
>>> > > wrote:
>>> > >
>>> > >> +1 (non-binding), promising proposal!
>>> > >>
>>> > >> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
>>> > >>
>>> > >>> Thank you for the swift clarification, Reynold and Xiao.
>>> > >>>
>>> > >>> It seems that the Target Version was set mistakenly initially.
>>> > >>>
>>> > >>> I removed the `Target Version` from the SPIP JIRA.
>>> > >>>
>>> > >>> https://issues.apache.org/jira/browse/SPARK-49834
>>> > >>>
>>> > >>> I'm switching my cast to +1 for this SPIP vote.
>>> > >>>
>>> > >>> Thanks,
>>> > >>> Dongjoon.
>>> > >>>
>>> > >>> On 2024/09/30 22:55:41 Xiao Li wrote:
>>> > >>> > +1 in support of the direction of the Single-pass Analyzer for
>>> > >>> Catalyst.
>>> > >>> >
>>> > >>> > I think we should not have a target version for the new Catalyst
>>> > >>> SPARK-49834
>>> > >>> > . It should
>>> not be
>>> > >>> a
>>> > >>> > blocker for Spark 4.0. When implementing the new analyzer, the
>>> code
>>> > >>> changes
>>> > >>> > must not affect users of the existing analyzer to avoid any
>>> user-facing
>>> > >>> > impacts.
>>> > >>> >
>>> > >>> > Reynold Xin  于2024年9月30日周一 15:39写道:
>>> > >>> >
>>> > >>> > > I don't actually "lead" this. But I don't think this needs to
>>> target
>>> > >>> a
>>> > >>> > > specific Spark version given it should not have any user facing
>>> > >>> > > consequences?
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun <
>>> dongj...@apache.org>
>>> > >>> wrote:
>>> > >>> > >
>>> > >>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
>>> > >>> > >>
>>> > >>> > >> I'm wondering if this is really achievable goal for Apache
>>> Spark
>>> > >>> 4.0.0.
>>> > >>> > >>
>>> > >>> > >> If it's expected that we are unable to deliver it, shall we
>>> > >>> postpone this
>>> > >>> > >> vote until 4.1.0 planning?
>>> > >>> > >>
>>> > >>> > >> Anyway, since SPARK-49834 has a target version 4.0.0
>>> explicitly,
>>> > >>> > >>
>>> > >>> > >> -1 from my side.
>>> > >>> > >>
>>> > >>> > >> Thanks,
>>> > >>> > >> Dongjoon.
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>>> > >>> > >> > +1
>>> > >>> > >> >
>>> > >>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
>>> > >>> >> > >>> > >> >
>>> > >>> > >> > wrote:
>>> > >>> > >> >
>>> > >>> > >> > > +1
>>> > >>> > >> > >
>>> > >>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
>>> > >>> vvdr@gmail.com>
>>> > >>> > >> > > wrote:
>>> > >>> > >> > >
>>> > >>> > >> > >> Hi all,
>>> > >>> > >> > >>
>>> > >>> > >> > >> I’d like to start a vote for a single-pass Analyzer for
>>> the
>>> > >>> Catalyst
>>> > >>> > >> > >> project. This project will introduce a new analysis
>>> framework
>>> > >>> to the
>>> > >>> > >> > >> Catalyst, which will eventually replace the fixed-point
>>> one.
>>> > >>> > >> > >>
>>> > >>> > >> > >> Please refer to the SPIP jira:
>>> > >>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
>>> > >>> > >> > >>
>>> > >>> > >> > >> [ ] +1: Accept the proposal
>>> > >>> > >> > >> [ ] +0
>>> > >>> > >> > >> [ ] -1: I don’t think this is a good idea because …
>>> > >>> > >> > >>
>>> > >>> > >> > >> Thanks!
>>> > >>> > >> > >>
>>> > >>> > >> > >> Vladimir
>>> > >>> > >> > >>
>>> > >>> > >> > >
>>> > >>> > >> >
>>> > >>> > >>
>>> > >>> > >>
>>> > >>>
>>> -
>>> > >>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> >
>>> > >>>
>>> > >>>
>>> -
>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>>
>>> > >>>
>>> >
>>> > --
>>> > John Zhuge
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Vladimir Golubev
Hi Mich! Thank you for this input.

Yes, this is exactly the approach I would propose too. Putting the new
analyzer under a flag and making the tests pass for both implementations is
crucial. We need to compare the logical (analyzed) plans and ensure that
they are identical.

On Thu, Oct 3, 2024 at 7:41 PM Mich Talebzadeh 
wrote:

>
> With a project manager hat on and having read the SPIP
> 
>
> This proposed single-pass Analyzer framework does potentially offer
> significant long-term benefits in terms of efficiency, maintenance, and
> stability, especially for large or complex queries. However, the rewrite
> involves substantial challenges, including the complexity of the
> transition, the resource cost, and the risk of breaking edge cases or
> existing workflows during the migration period. The key trade-off is
> between the upfront complexity and development time versus the potential
> long-term gains in performance, predictability, and ease of maintenance.
> Phasing the implementation could be an effective method to balance the
> risks and rewards. It allows the community to gradually transition to the
> new framework while mitigating potential disruptions. This is a
> proposal that we can consider
>
> Phase 1: Experimental Opt-In
>- Introduce the single-pass Analyzer framework as an experimental
> feature.
>- Allow users to opt-in through a configuration setting say (
> *`spark.sql.analyzer.singlePass.enabled=true*`), so the developers can
> start testing their workflows with it.
>- Maintain the existing fixed-point Analyzer as the default to ensure
> stability for current users.
>- Gradually build out coverage for common SQL and DataFrame operations.
>
> Phase 2: Expanded Operator Coverage
>- Incrementally support more SQL operators, expressions, and DataFrame
> functionality as feedback and testing reveal areas of improvement.
>- Ensure unit and integration tests run against both frameworks to
> maintain backward compatibility.
>- Provide detailed documentation and migration guides so users are
> aware of the differences and can adjust their code if needed.
>
> Phase 3: Deprecation of the Fixed-Point Analyzer
>- Once the single-pass Analyzer has full coverage and has been tested
> extensively in production environments, deprecate the old fixed-point
> framework.
>- Offer a transition period where both frameworks are supported to give
> users time to adjust.
>
> Phase 4: Full Transition and Removal of Fixed-Point Framework
>- After sufficient testing and user adoption, make the single-pass
> Analyzer the default and eventually remove the old framework in a future
> major release (say Spark 5.0).
>
> Benefits of Phasing:
> - Risk Mitigation: The phased approach allows gradual adoption, reducing
> the risk of breaking existing workloads or workflows. It ensures there is
> plenty of time for testing and feedback before a full transition.
> - Early User Feedback: Users can test the new framework in their
> environments early and provide feedback, allowing developers to address
> edge cases before it becomes the default.
> - Controlled Rollout: Phasing ensures that any unforeseen issues can be
> addressed incrementally without large disruptions to Spark deployments.
>
> This approach will hopefully ensure a smooth transition to the new
> framework while balancing the trade-offs of complexity, resource
> availability and long-term gains.
>
> HTH
>
> Mich Talebzadeh,
>
> Architect | Data Engineer | Data Science | Financial Crime
> PhD  Imperial College
> London 
> London, United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Mon, 30 Sept 2024 at 23:38, Reynold Xin 
> wrote:
>
>> I don't actually "lead" this. But I don't think this needs to target a
>> specific Spark version given it should not have any user facing
>> consequences?
>>
>>
>> On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you for leading this, Vladimir, Reynold, Herman.
>>>
>>> I'm wondering if this is really achievable goal for Apache Spark 4.0.0.
>>>
>>> If it's expected that we are unable to deliver it, shall we postpone
>>> this vote until 4.1.0 planning?
>>>
>>> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>>>
>>> -1 from my side.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>>> > +1
>>

Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Mich Talebzadeh
With a project manager hat on and having read the SPIP


This proposed single-pass Analyzer framework does potentially offer
significant long-term benefits in terms of efficiency, maintenance, and
stability, especially for large or complex queries. However, the rewrite
involves substantial challenges, including the complexity of the
transition, the resource cost, and the risk of breaking edge cases or
existing workflows during the migration period. The key trade-off is
between the upfront complexity and development time versus the potential
long-term gains in performance, predictability, and ease of maintenance.
Phasing the implementation could be an effective method to balance the
risks and rewards. It allows the community to gradually transition to the
new framework while mitigating potential disruptions. This is a
proposal that we can consider

Phase 1: Experimental Opt-In
   - Introduce the single-pass Analyzer framework as an experimental
feature.
   - Allow users to opt-in through a configuration setting say (
*`spark.sql.analyzer.singlePass.enabled=true*`), so the developers can
start testing their workflows with it.
   - Maintain the existing fixed-point Analyzer as the default to ensure
stability for current users.
   - Gradually build out coverage for common SQL and DataFrame operations.

Phase 2: Expanded Operator Coverage
   - Incrementally support more SQL operators, expressions, and DataFrame
functionality as feedback and testing reveal areas of improvement.
   - Ensure unit and integration tests run against both frameworks to
maintain backward compatibility.
   - Provide detailed documentation and migration guides so users are aware
of the differences and can adjust their code if needed.

Phase 3: Deprecation of the Fixed-Point Analyzer
   - Once the single-pass Analyzer has full coverage and has been tested
extensively in production environments, deprecate the old fixed-point
framework.
   - Offer a transition period where both frameworks are supported to give
users time to adjust.

Phase 4: Full Transition and Removal of Fixed-Point Framework
   - After sufficient testing and user adoption, make the single-pass
Analyzer the default and eventually remove the old framework in a future
major release (say Spark 5.0).

Benefits of Phasing:
- Risk Mitigation: The phased approach allows gradual adoption, reducing
the risk of breaking existing workloads or workflows. It ensures there is
plenty of time for testing and feedback before a full transition.
- Early User Feedback: Users can test the new framework in their
environments early and provide feedback, allowing developers to address
edge cases before it becomes the default.
- Controlled Rollout: Phasing ensures that any unforeseen issues can be
addressed incrementally without large disruptions to Spark deployments.

This approach will hopefully ensure a smooth transition to the new
framework while balancing the trade-offs of complexity, resource
availability and long-term gains.

HTH

Mich Talebzadeh,

Architect | Data Engineer | Data Science | Financial Crime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Mon, 30 Sept 2024 at 23:38, Reynold Xin 
wrote:

> I don't actually "lead" this. But I don't think this needs to target a
> specific Spark version given it should not have any user facing
> consequences?
>
>
> On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun  wrote:
>
>> Thank you for leading this, Vladimir, Reynold, Herman.
>>
>> I'm wondering if this is really achievable goal for Apache Spark 4.0.0.
>>
>> If it's expected that we are unable to deliver it, shall we postpone this
>> vote until 4.1.0 planning?
>>
>> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>>
>> -1 from my side.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>> > +1
>> >
>> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin > >
>> > wrote:
>> >
>> > > +1
>> > >
>> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev 
>> > > wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> I’d like to start a vote for a single-pass Analyzer for the Catalyst
>> > >> project. This project will introduce a new analysis framework to the
>> > >> Catalyst, which will eventually replace the fixed-point one.
>> > >>
>> > >> Please refer to the SPIP jira:
>> > >> https://issues.apache.org/jira/browse/SPARK-4983

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-03 Thread Ángel
-1 Don’t deprecate GraphX because may be useful for some people and ...
would there be any replacement for that API? Anyway, I don't think
deprecating an API only because it hasn't been updated in ages is a good
practice (but I could be perfectly wrong).

El jue, 3 oct 2024, 16:31, Wenchen Fan  escribió:

> +1
>
> On Tue, Oct 1, 2024 at 4:20 PM beliefer  wrote:
>
>> +1.
>>
>> I didn't hear users need it.
>>
>>
>> At 2024-10-01 02:01:17, "Holden Karau"  wrote:
>>
>> I think it has been de-facto deprecated, we haven’t updated it
>> meaningfully in several years. I think removing the API would be excessive
>> but deprecating it would give us the flexibility to remove it in the not
>> too distant future.
>>
>> That being said this is not a vote to remove GraphX, I think that
>> whenever that time comes (if it does) we should have a separate vote
>>
>> This VOTE will be open for a little more than one week, ending on October
>> 8th*. To vote reply with:
>> +1 Deprecate GraphX
>> 0 I’m indifferent
>> -1 Don’t deprecate GraphX because ABC
>>
>> If you have a binding vote to simplify you tallying at the end please
>> mark your vote with a *.
>>
>> (*mostly because I’m going camping for my birthday)
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread L. C. Hsieh
+1

On Thu, Oct 3, 2024 at 7:31 AM Wenchen Fan  wrote:
>
> +1
>
> On Wed, Oct 2, 2024 at 7:50 AM Peter Toth  wrote:
>>
>> +1
>>
>>
>> On Tue, Oct 1, 2024, 08:33 Yang Jie  wrote:
>>>
>>> +1, Thanks
>>>
>>> Jie Yang
>>>
>>> On 2024/10/01 03:26:40 John Zhuge wrote:
>>> > +1 (non-binding)
>>> >
>>> > On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
>>> >  wrote:
>>> >
>>> > > +1
>>> > >
>>> > > On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim 
>>> > > 
>>> > > wrote:
>>> > >
>>> > >> +1 (non-binding), promising proposal!
>>> > >>
>>> > >> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
>>> > >>
>>> > >>> Thank you for the swift clarification, Reynold and Xiao.
>>> > >>>
>>> > >>> It seems that the Target Version was set mistakenly initially.
>>> > >>>
>>> > >>> I removed the `Target Version` from the SPIP JIRA.
>>> > >>>
>>> > >>> https://issues.apache.org/jira/browse/SPARK-49834
>>> > >>>
>>> > >>> I'm switching my cast to +1 for this SPIP vote.
>>> > >>>
>>> > >>> Thanks,
>>> > >>> Dongjoon.
>>> > >>>
>>> > >>> On 2024/09/30 22:55:41 Xiao Li wrote:
>>> > >>> > +1 in support of the direction of the Single-pass Analyzer for
>>> > >>> Catalyst.
>>> > >>> >
>>> > >>> > I think we should not have a target version for the new Catalyst
>>> > >>> SPARK-49834
>>> > >>> > . It should not 
>>> > >>> > be
>>> > >>> a
>>> > >>> > blocker for Spark 4.0. When implementing the new analyzer, the code
>>> > >>> changes
>>> > >>> > must not affect users of the existing analyzer to avoid any 
>>> > >>> > user-facing
>>> > >>> > impacts.
>>> > >>> >
>>> > >>> > Reynold Xin  于2024年9月30日周一 15:39写道:
>>> > >>> >
>>> > >>> > > I don't actually "lead" this. But I don't think this needs to 
>>> > >>> > > target
>>> > >>> a
>>> > >>> > > specific Spark version given it should not have any user facing
>>> > >>> > > consequences?
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
>>> > >>> > > 
>>> > >>> wrote:
>>> > >>> > >
>>> > >>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
>>> > >>> > >>
>>> > >>> > >> I'm wondering if this is really achievable goal for Apache Spark
>>> > >>> 4.0.0.
>>> > >>> > >>
>>> > >>> > >> If it's expected that we are unable to deliver it, shall we
>>> > >>> postpone this
>>> > >>> > >> vote until 4.1.0 planning?
>>> > >>> > >>
>>> > >>> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>>> > >>> > >>
>>> > >>> > >> -1 from my side.
>>> > >>> > >>
>>> > >>> > >> Thanks,
>>> > >>> > >> Dongjoon.
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>>> > >>> > >> > +1
>>> > >>> > >> >
>>> > >>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
>>> > >>> >> > >>> > >> >
>>> > >>> > >> > wrote:
>>> > >>> > >> >
>>> > >>> > >> > > +1
>>> > >>> > >> > >
>>> > >>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
>>> > >>> vvdr@gmail.com>
>>> > >>> > >> > > wrote:
>>> > >>> > >> > >
>>> > >>> > >> > >> Hi all,
>>> > >>> > >> > >>
>>> > >>> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
>>> > >>> Catalyst
>>> > >>> > >> > >> project. This project will introduce a new analysis 
>>> > >>> > >> > >> framework
>>> > >>> to the
>>> > >>> > >> > >> Catalyst, which will eventually replace the fixed-point one.
>>> > >>> > >> > >>
>>> > >>> > >> > >> Please refer to the SPIP jira:
>>> > >>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
>>> > >>> > >> > >>
>>> > >>> > >> > >> [ ] +1: Accept the proposal
>>> > >>> > >> > >> [ ] +0
>>> > >>> > >> > >> [ ] -1: I don’t think this is a good idea because …
>>> > >>> > >> > >>
>>> > >>> > >> > >> Thanks!
>>> > >>> > >> > >>
>>> > >>> > >> > >> Vladimir
>>> > >>> > >> > >>
>>> > >>> > >> > >
>>> > >>> > >> >
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> -
>>> > >>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> >
>>> > >>>
>>> > >>> -
>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>>
>>> > >>>
>>> >
>>> > --
>>> > John Zhuge
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-03 Thread Denny Lee
+1 (non-binding)

On Thu, Oct 3, 2024 at 7:40 AM Wenchen Fan  wrote:

> +1
>
> On Tue, Oct 1, 2024 at 4:20 PM beliefer  wrote:
>
>> +1.
>>
>> I didn't hear users need it.
>>
>>
>> At 2024-10-01 02:01:17, "Holden Karau"  wrote:
>>
>> I think it has been de-facto deprecated, we haven’t updated it
>> meaningfully in several years. I think removing the API would be excessive
>> but deprecating it would give us the flexibility to remove it in the not
>> too distant future.
>>
>> That being said this is not a vote to remove GraphX, I think that
>> whenever that time comes (if it does) we should have a separate vote
>>
>> This VOTE will be open for a little more than one week, ending on October
>> 8th*. To vote reply with:
>> +1 Deprecate GraphX
>> 0 I’m indifferent
>> -1 Don’t deprecate GraphX because ABC
>>
>> If you have a binding vote to simplify you tallying at the end please
>> mark your vote with a *.
>>
>> (*mostly because I’m going camping for my birthday)
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-03 Thread Wenchen Fan
+1

On Tue, Oct 1, 2024 at 4:20 PM beliefer  wrote:

> +1.
>
> I didn't hear users need it.
>
>
> At 2024-10-01 02:01:17, "Holden Karau"  wrote:
>
> I think it has been de-facto deprecated, we haven’t updated it
> meaningfully in several years. I think removing the API would be excessive
> but deprecating it would give us the flexibility to remove it in the not
> too distant future.
>
> That being said this is not a vote to remove GraphX, I think that whenever
> that time comes (if it does) we should have a separate vote
>
> This VOTE will be open for a little more than one week, ending on October
> 8th*. To vote reply with:
> +1 Deprecate GraphX
> 0 I’m indifferent
> -1 Don’t deprecate GraphX because ABC
>
> If you have a binding vote to simplify you tallying at the end please mark
> your vote with a *.
>
> (*mostly because I’m going camping for my birthday)
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Wenchen Fan
+1

On Wed, Oct 2, 2024 at 7:50 AM Peter Toth  wrote:

> +1
>
> On Tue, Oct 1, 2024, 08:33 Yang Jie  wrote:
>
>> +1, Thanks
>>
>> Jie Yang
>>
>> On 2024/10/01 03:26:40 John Zhuge wrote:
>> > +1 (non-binding)
>> >
>> > On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
>> >  wrote:
>> >
>> > > +1
>> > >
>> > > On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com>
>> > > wrote:
>> > >
>> > >> +1 (non-binding), promising proposal!
>> > >>
>> > >> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
>> > >>
>> > >>> Thank you for the swift clarification, Reynold and Xiao.
>> > >>>
>> > >>> It seems that the Target Version was set mistakenly initially.
>> > >>>
>> > >>> I removed the `Target Version` from the SPIP JIRA.
>> > >>>
>> > >>> https://issues.apache.org/jira/browse/SPARK-49834
>> > >>>
>> > >>> I'm switching my cast to +1 for this SPIP vote.
>> > >>>
>> > >>> Thanks,
>> > >>> Dongjoon.
>> > >>>
>> > >>> On 2024/09/30 22:55:41 Xiao Li wrote:
>> > >>> > +1 in support of the direction of the Single-pass Analyzer for
>> > >>> Catalyst.
>> > >>> >
>> > >>> > I think we should not have a target version for the new Catalyst
>> > >>> SPARK-49834
>> > >>> > . It should
>> not be
>> > >>> a
>> > >>> > blocker for Spark 4.0. When implementing the new analyzer, the
>> code
>> > >>> changes
>> > >>> > must not affect users of the existing analyzer to avoid any
>> user-facing
>> > >>> > impacts.
>> > >>> >
>> > >>> > Reynold Xin  于2024年9月30日周一 15:39写道:
>> > >>> >
>> > >>> > > I don't actually "lead" this. But I don't think this needs to
>> target
>> > >>> a
>> > >>> > > specific Spark version given it should not have any user facing
>> > >>> > > consequences?
>> > >>> > >
>> > >>> > >
>> > >>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun <
>> dongj...@apache.org>
>> > >>> wrote:
>> > >>> > >
>> > >>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
>> > >>> > >>
>> > >>> > >> I'm wondering if this is really achievable goal for Apache
>> Spark
>> > >>> 4.0.0.
>> > >>> > >>
>> > >>> > >> If it's expected that we are unable to deliver it, shall we
>> > >>> postpone this
>> > >>> > >> vote until 4.1.0 planning?
>> > >>> > >>
>> > >>> > >> Anyway, since SPARK-49834 has a target version 4.0.0
>> explicitly,
>> > >>> > >>
>> > >>> > >> -1 from my side.
>> > >>> > >>
>> > >>> > >> Thanks,
>> > >>> > >> Dongjoon.
>> > >>> > >>
>> > >>> > >>
>> > >>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>> > >>> > >> > +1
>> > >>> > >> >
>> > >>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
>> > >>> > > >>> > >> >
>> > >>> > >> > wrote:
>> > >>> > >> >
>> > >>> > >> > > +1
>> > >>> > >> > >
>> > >>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
>> > >>> vvdr@gmail.com>
>> > >>> > >> > > wrote:
>> > >>> > >> > >
>> > >>> > >> > >> Hi all,
>> > >>> > >> > >>
>> > >>> > >> > >> I’d like to start a vote for a single-pass Analyzer for
>> the
>> > >>> Catalyst
>> > >>> > >> > >> project. This project will introduce a new analysis
>> framework
>> > >>> to the
>> > >>> > >> > >> Catalyst, which will eventually replace the fixed-point
>> one.
>> > >>> > >> > >>
>> > >>> > >> > >> Please refer to the SPIP jira:
>> > >>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
>> > >>> > >> > >>
>> > >>> > >> > >> [ ] +1: Accept the proposal
>> > >>> > >> > >> [ ] +0
>> > >>> > >> > >> [ ] -1: I don’t think this is a good idea because …
>> > >>> > >> > >>
>> > >>> > >> > >> Thanks!
>> > >>> > >> > >>
>> > >>> > >> > >> Vladimir
>> > >>> > >> > >>
>> > >>> > >> > >
>> > >>> > >> >
>> > >>> > >>
>> > >>> > >>
>> > >>>
>> -
>> > >>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >>> > >>
>> > >>> > >>
>> > >>> >
>> > >>>
>> > >>>
>> -
>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >>>
>> > >>>
>> >
>> > --
>> > John Zhuge
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: How to run spark connect in kubernetes?

2024-10-02 Thread kant kodali
please ignore this. it was a dns issue

On Wed, Oct 2, 2024 at 11:16 AM kant kodali  wrote:

> Here
> 
> are more details about my question that I posted in SO
>
> On Tue, Oct 1, 2024 at 11:32 PM kant kodali  wrote:
>
>> Hi All,
>>
>> Is it possible to run a Spark Connect server in Kubernetes while
>> configuring it to communicate with Kubernetes as the cluster manager? If
>> so, is there any example?
>>
>> Thanks
>>
>


Re: How to run spark connect in kubernetes?

2024-10-02 Thread kant kodali
Here

are more details about my question that I posted in SO

On Tue, Oct 1, 2024 at 11:32 PM kant kodali  wrote:

> Hi All,
>
> Is it possible to run a Spark Connect server in Kubernetes while
> configuring it to communicate with Kubernetes as the cluster manager? If
> so, is there any example?
>
> Thanks
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-01 Thread Peter Toth
+1

On Tue, Oct 1, 2024, 08:33 Yang Jie  wrote:

> +1, Thanks
>
> Jie Yang
>
> On 2024/10/01 03:26:40 John Zhuge wrote:
> > +1 (non-binding)
> >
> > On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
> >  wrote:
> >
> > > +1
> > >
> > > On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com>
> > > wrote:
> > >
> > >> +1 (non-binding), promising proposal!
> > >>
> > >> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
> > >>
> > >>> Thank you for the swift clarification, Reynold and Xiao.
> > >>>
> > >>> It seems that the Target Version was set mistakenly initially.
> > >>>
> > >>> I removed the `Target Version` from the SPIP JIRA.
> > >>>
> > >>> https://issues.apache.org/jira/browse/SPARK-49834
> > >>>
> > >>> I'm switching my cast to +1 for this SPIP vote.
> > >>>
> > >>> Thanks,
> > >>> Dongjoon.
> > >>>
> > >>> On 2024/09/30 22:55:41 Xiao Li wrote:
> > >>> > +1 in support of the direction of the Single-pass Analyzer for
> > >>> Catalyst.
> > >>> >
> > >>> > I think we should not have a target version for the new Catalyst
> > >>> SPARK-49834
> > >>> > . It should
> not be
> > >>> a
> > >>> > blocker for Spark 4.0. When implementing the new analyzer, the code
> > >>> changes
> > >>> > must not affect users of the existing analyzer to avoid any
> user-facing
> > >>> > impacts.
> > >>> >
> > >>> > Reynold Xin  于2024年9月30日周一 15:39写道:
> > >>> >
> > >>> > > I don't actually "lead" this. But I don't think this needs to
> target
> > >>> a
> > >>> > > specific Spark version given it should not have any user facing
> > >>> > > consequences?
> > >>> > >
> > >>> > >
> > >>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun <
> dongj...@apache.org>
> > >>> wrote:
> > >>> > >
> > >>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
> > >>> > >>
> > >>> > >> I'm wondering if this is really achievable goal for Apache Spark
> > >>> 4.0.0.
> > >>> > >>
> > >>> > >> If it's expected that we are unable to deliver it, shall we
> > >>> postpone this
> > >>> > >> vote until 4.1.0 planning?
> > >>> > >>
> > >>> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
> > >>> > >>
> > >>> > >> -1 from my side.
> > >>> > >>
> > >>> > >> Thanks,
> > >>> > >> Dongjoon.
> > >>> > >>
> > >>> > >>
> > >>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
> > >>> > >> > +1
> > >>> > >> >
> > >>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
> > >>>  > >>> > >> >
> > >>> > >> > wrote:
> > >>> > >> >
> > >>> > >> > > +1
> > >>> > >> > >
> > >>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
> > >>> vvdr@gmail.com>
> > >>> > >> > > wrote:
> > >>> > >> > >
> > >>> > >> > >> Hi all,
> > >>> > >> > >>
> > >>> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
> > >>> Catalyst
> > >>> > >> > >> project. This project will introduce a new analysis
> framework
> > >>> to the
> > >>> > >> > >> Catalyst, which will eventually replace the fixed-point
> one.
> > >>> > >> > >>
> > >>> > >> > >> Please refer to the SPIP jira:
> > >>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
> > >>> > >> > >>
> > >>> > >> > >> [ ] +1: Accept the proposal
> > >>> > >> > >> [ ] +0
> > >>> > >> > >> [ ] -1: I don’t think this is a good idea because …
> > >>> > >> > >>
> > >>> > >> > >> Thanks!
> > >>> > >> > >>
> > >>> > >> > >> Vladimir
> > >>> > >> > >>
> > >>> > >> > >
> > >>> > >> >
> > >>> > >>
> > >>> > >>
> > >>> -
> > >>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>> > >>
> > >>> > >>
> > >>> >
> > >>>
> > >>> -
> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>>
> > >>>
> >
> > --
> > John Zhuge
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re:[VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-01 Thread beliefer
+1.

I didn't hear users need it.




At 2024-10-01 02:01:17, "Holden Karau"  wrote:

I think it has been de-facto deprecated, we haven’t updated it meaningfully in 
several years. I think removing the API would be excessive but deprecating it 
would give us the flexibility to remove it in the not too distant future.


That being said this is not a vote to remove GraphX, I think that whenever that 
time comes (if it does) we should have a separate vote


This VOTE will be open for a little more than one week, ending on October 8th*. 
To vote reply with:
+1 Deprecate GraphX
0 I’m indifferent 
-1 Don’t deprecate GraphX because ABC


If you have a binding vote to simplify you tallying at the end please mark your 
vote with a *.


(*mostly because I’m going camping for my birthday)


Twitter: https://twitter.com/holdenkarau

Fight Health Insurance: https://www.fighthealthinsurance.com/
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her

Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Yang Jie
+1, Thanks

Jie Yang

On 2024/10/01 03:26:40 John Zhuge wrote:
> +1 (non-binding)
> 
> On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
>  wrote:
> 
> > +1
> >
> > On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim 
> > wrote:
> >
> >> +1 (non-binding), promising proposal!
> >>
> >> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
> >>
> >>> Thank you for the swift clarification, Reynold and Xiao.
> >>>
> >>> It seems that the Target Version was set mistakenly initially.
> >>>
> >>> I removed the `Target Version` from the SPIP JIRA.
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-49834
> >>>
> >>> I'm switching my cast to +1 for this SPIP vote.
> >>>
> >>> Thanks,
> >>> Dongjoon.
> >>>
> >>> On 2024/09/30 22:55:41 Xiao Li wrote:
> >>> > +1 in support of the direction of the Single-pass Analyzer for
> >>> Catalyst.
> >>> >
> >>> > I think we should not have a target version for the new Catalyst
> >>> SPARK-49834
> >>> > . It should not be
> >>> a
> >>> > blocker for Spark 4.0. When implementing the new analyzer, the code
> >>> changes
> >>> > must not affect users of the existing analyzer to avoid any user-facing
> >>> > impacts.
> >>> >
> >>> > Reynold Xin  于2024年9月30日周一 15:39写道:
> >>> >
> >>> > > I don't actually "lead" this. But I don't think this needs to target
> >>> a
> >>> > > specific Spark version given it should not have any user facing
> >>> > > consequences?
> >>> > >
> >>> > >
> >>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
> >>> wrote:
> >>> > >
> >>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
> >>> > >>
> >>> > >> I'm wondering if this is really achievable goal for Apache Spark
> >>> 4.0.0.
> >>> > >>
> >>> > >> If it's expected that we are unable to deliver it, shall we
> >>> postpone this
> >>> > >> vote until 4.1.0 planning?
> >>> > >>
> >>> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
> >>> > >>
> >>> > >> -1 from my side.
> >>> > >>
> >>> > >> Thanks,
> >>> > >> Dongjoon.
> >>> > >>
> >>> > >>
> >>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
> >>> > >> > +1
> >>> > >> >
> >>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
> >>>  >>> > >> >
> >>> > >> > wrote:
> >>> > >> >
> >>> > >> > > +1
> >>> > >> > >
> >>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
> >>> vvdr@gmail.com>
> >>> > >> > > wrote:
> >>> > >> > >
> >>> > >> > >> Hi all,
> >>> > >> > >>
> >>> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
> >>> Catalyst
> >>> > >> > >> project. This project will introduce a new analysis framework
> >>> to the
> >>> > >> > >> Catalyst, which will eventually replace the fixed-point one.
> >>> > >> > >>
> >>> > >> > >> Please refer to the SPIP jira:
> >>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
> >>> > >> > >>
> >>> > >> > >> [ ] +1: Accept the proposal
> >>> > >> > >> [ ] +0
> >>> > >> > >> [ ] -1: I don’t think this is a good idea because …
> >>> > >> > >>
> >>> > >> > >> Thanks!
> >>> > >> > >>
> >>> > >> > >> Vladimir
> >>> > >> > >>
> >>> > >> > >
> >>> > >> >
> >>> > >>
> >>> > >>
> >>> -
> >>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>> > >>
> >>> > >>
> >>> >
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
> >>>
> 
> -- 
> John Zhuge
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Yang Jie
+1, Thanks

Jie Yang

On 2024/10/01 03:26:40 John Zhuge wrote:
> +1 (non-binding)
> 
> On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
>  wrote:
> 
> > +1
> >
> > On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim 
> > wrote:
> >
> >> +1 (non-binding), promising proposal!
> >>
> >> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
> >>
> >>> Thank you for the swift clarification, Reynold and Xiao.
> >>>
> >>> It seems that the Target Version was set mistakenly initially.
> >>>
> >>> I removed the `Target Version` from the SPIP JIRA.
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-49834
> >>>
> >>> I'm switching my cast to +1 for this SPIP vote.
> >>>
> >>> Thanks,
> >>> Dongjoon.
> >>>
> >>> On 2024/09/30 22:55:41 Xiao Li wrote:
> >>> > +1 in support of the direction of the Single-pass Analyzer for
> >>> Catalyst.
> >>> >
> >>> > I think we should not have a target version for the new Catalyst
> >>> SPARK-49834
> >>> > . It should not be
> >>> a
> >>> > blocker for Spark 4.0. When implementing the new analyzer, the code
> >>> changes
> >>> > must not affect users of the existing analyzer to avoid any user-facing
> >>> > impacts.
> >>> >
> >>> > Reynold Xin  于2024年9月30日周一 15:39写道:
> >>> >
> >>> > > I don't actually "lead" this. But I don't think this needs to target
> >>> a
> >>> > > specific Spark version given it should not have any user facing
> >>> > > consequences?
> >>> > >
> >>> > >
> >>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
> >>> wrote:
> >>> > >
> >>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
> >>> > >>
> >>> > >> I'm wondering if this is really achievable goal for Apache Spark
> >>> 4.0.0.
> >>> > >>
> >>> > >> If it's expected that we are unable to deliver it, shall we
> >>> postpone this
> >>> > >> vote until 4.1.0 planning?
> >>> > >>
> >>> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
> >>> > >>
> >>> > >> -1 from my side.
> >>> > >>
> >>> > >> Thanks,
> >>> > >> Dongjoon.
> >>> > >>
> >>> > >>
> >>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
> >>> > >> > +1
> >>> > >> >
> >>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
> >>>  >>> > >> >
> >>> > >> > wrote:
> >>> > >> >
> >>> > >> > > +1
> >>> > >> > >
> >>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
> >>> vvdr@gmail.com>
> >>> > >> > > wrote:
> >>> > >> > >
> >>> > >> > >> Hi all,
> >>> > >> > >>
> >>> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
> >>> Catalyst
> >>> > >> > >> project. This project will introduce a new analysis framework
> >>> to the
> >>> > >> > >> Catalyst, which will eventually replace the fixed-point one.
> >>> > >> > >>
> >>> > >> > >> Please refer to the SPIP jira:
> >>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
> >>> > >> > >>
> >>> > >> > >> [ ] +1: Accept the proposal
> >>> > >> > >> [ ] +0
> >>> > >> > >> [ ] -1: I don’t think this is a good idea because …
> >>> > >> > >>
> >>> > >> > >> Thanks!
> >>> > >> > >>
> >>> > >> > >> Vladimir
> >>> > >> > >>
> >>> > >> > >
> >>> > >> >
> >>> > >>
> >>> > >>
> >>> -
> >>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>> > >>
> >>> > >>
> >>> >
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
> >>>
> 
> -- 
> John Zhuge
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Yang Jie
+1, Thanks

Jie Yang

On 2024/10/01 02:53:11 Mridul Muralidharan wrote:
> +1.
> 
> Regards,
> Mridul
> 
> PS: In the past, I did have fun with graphx ... unfortunate that it has
> come to this :-(
> 
> 
> On Mon, Sep 30, 2024 at 6:23 PM Sean Owen  wrote:
> 
> > For reasons in the previous thread, yes +1 to deprecation
> >
> > On Mon, Sep 30, 2024 at 1:02 PM Holden Karau 
> > wrote:
> >
> >> I think it has been de-facto deprecated, we haven’t updated it
> >> meaningfully in several years. I think removing the API would be excessive
> >> but deprecating it would give us the flexibility to remove it in the not
> >> too distant future.
> >>
> >> That being said this is not a vote to remove GraphX, I think that
> >> whenever that time comes (if it does) we should have a separate vote
> >>
> >> This VOTE will be open for a little more than one week, ending on October
> >> 8th*. To vote reply with:
> >> +1 Deprecate GraphX
> >> 0 I’m indifferent
> >> -1 Don’t deprecate GraphX because ABC
> >>
> >> If you have a binding vote to simplify you tallying at the end please
> >> mark your vote with a *.
> >>
> >> (*mostly because I’m going camping for my birthday)
> >>
> >> Twitter: https://twitter.com/holdenkarau
> >> Fight Health Insurance: https://www.fighthealthinsurance.com/
> >> 
> >> Books (Learning Spark, High Performance Spark, etc.):
> >> https://amzn.to/2MaRAG9  
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >> Pronouns: she/her
> >>
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Vladimir Golubev
Folks, thanks for voting!

Yes, the target version was set by my mistake, Dongjoon. Sorry for that.

Vladimir.

On Tue, Oct 1, 2024, 05:28 John Zhuge  wrote:

> +1 (non-binding)
>
> On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
>  wrote:
>
>> +1
>>
>> On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> +1 (non-binding), promising proposal!
>>>
>>> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
>>>
 Thank you for the swift clarification, Reynold and Xiao.

 It seems that the Target Version was set mistakenly initially.

 I removed the `Target Version` from the SPIP JIRA.

 https://issues.apache.org/jira/browse/SPARK-49834

 I'm switching my cast to +1 for this SPIP vote.

 Thanks,
 Dongjoon.

 On 2024/09/30 22:55:41 Xiao Li wrote:
 > +1 in support of the direction of the Single-pass Analyzer for
 Catalyst.
 >
 > I think we should not have a target version for the new Catalyst
 SPARK-49834
 > . It should not
 be a
 > blocker for Spark 4.0. When implementing the new analyzer, the code
 changes
 > must not affect users of the existing analyzer to avoid any
 user-facing
 > impacts.
 >
 > Reynold Xin  于2024年9月30日周一 15:39写道:
 >
 > > I don't actually "lead" this. But I don't think this needs to
 target a
 > > specific Spark version given it should not have any user facing
 > > consequences?
 > >
 > >
 > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
 wrote:
 > >
 > >> Thank you for leading this, Vladimir, Reynold, Herman.
 > >>
 > >> I'm wondering if this is really achievable goal for Apache Spark
 4.0.0.
 > >>
 > >> If it's expected that we are unable to deliver it, shall we
 postpone this
 > >> vote until 4.1.0 planning?
 > >>
 > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
 > >>
 > >> -1 from my side.
 > >>
 > >> Thanks,
 > >> Dongjoon.
 > >>
 > >>
 > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
 > >> > +1
 > >> >
 > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
 >>> > >> >
 > >> > wrote:
 > >> >
 > >> > > +1
 > >> > >
 > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
 vvdr@gmail.com>
 > >> > > wrote:
 > >> > >
 > >> > >> Hi all,
 > >> > >>
 > >> > >> I’d like to start a vote for a single-pass Analyzer for the
 Catalyst
 > >> > >> project. This project will introduce a new analysis framework
 to the
 > >> > >> Catalyst, which will eventually replace the fixed-point one.
 > >> > >>
 > >> > >> Please refer to the SPIP jira:
 > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
 > >> > >>
 > >> > >> [ ] +1: Accept the proposal
 > >> > >> [ ] +0
 > >> > >> [ ] -1: I don’t think this is a good idea because …
 > >> > >>
 > >> > >> Thanks!
 > >> > >>
 > >> > >> Vladimir
 > >> > >>
 > >> > >
 > >> >
 > >>
 > >>
 -
 > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 > >>
 > >>
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>
> --
> John Zhuge
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread John Zhuge
+1 (non-binding)

On Mon, Sep 30, 2024 at 7:42 PM Gengliang Wang
 wrote:

> +1
>
> On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim 
> wrote:
>
>> +1 (non-binding), promising proposal!
>>
>> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
>>
>>> Thank you for the swift clarification, Reynold and Xiao.
>>>
>>> It seems that the Target Version was set mistakenly initially.
>>>
>>> I removed the `Target Version` from the SPIP JIRA.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-49834
>>>
>>> I'm switching my cast to +1 for this SPIP vote.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>> On 2024/09/30 22:55:41 Xiao Li wrote:
>>> > +1 in support of the direction of the Single-pass Analyzer for
>>> Catalyst.
>>> >
>>> > I think we should not have a target version for the new Catalyst
>>> SPARK-49834
>>> > . It should not be
>>> a
>>> > blocker for Spark 4.0. When implementing the new analyzer, the code
>>> changes
>>> > must not affect users of the existing analyzer to avoid any user-facing
>>> > impacts.
>>> >
>>> > Reynold Xin  于2024年9月30日周一 15:39写道:
>>> >
>>> > > I don't actually "lead" this. But I don't think this needs to target
>>> a
>>> > > specific Spark version given it should not have any user facing
>>> > > consequences?
>>> > >
>>> > >
>>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
>>> wrote:
>>> > >
>>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
>>> > >>
>>> > >> I'm wondering if this is really achievable goal for Apache Spark
>>> 4.0.0.
>>> > >>
>>> > >> If it's expected that we are unable to deliver it, shall we
>>> postpone this
>>> > >> vote until 4.1.0 planning?
>>> > >>
>>> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>>> > >>
>>> > >> -1 from my side.
>>> > >>
>>> > >> Thanks,
>>> > >> Dongjoon.
>>> > >>
>>> > >>
>>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>>> > >> > +1
>>> > >> >
>>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
>>> >> > >> >
>>> > >> > wrote:
>>> > >> >
>>> > >> > > +1
>>> > >> > >
>>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
>>> vvdr@gmail.com>
>>> > >> > > wrote:
>>> > >> > >
>>> > >> > >> Hi all,
>>> > >> > >>
>>> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
>>> Catalyst
>>> > >> > >> project. This project will introduce a new analysis framework
>>> to the
>>> > >> > >> Catalyst, which will eventually replace the fixed-point one.
>>> > >> > >>
>>> > >> > >> Please refer to the SPIP jira:
>>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
>>> > >> > >>
>>> > >> > >> [ ] +1: Accept the proposal
>>> > >> > >> [ ] +0
>>> > >> > >> [ ] -1: I don’t think this is a good idea because …
>>> > >> > >>
>>> > >> > >> Thanks!
>>> > >> > >>
>>> > >> > >> Vladimir
>>> > >> > >>
>>> > >> > >
>>> > >> >
>>> > >>
>>> > >>
>>> -
>>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> > >>
>>> > >>
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

-- 
John Zhuge


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Mridul Muralidharan
+1.

Regards,
Mridul

PS: In the past, I did have fun with graphx ... unfortunate that it has
come to this :-(


On Mon, Sep 30, 2024 at 6:23 PM Sean Owen  wrote:

> For reasons in the previous thread, yes +1 to deprecation
>
> On Mon, Sep 30, 2024 at 1:02 PM Holden Karau 
> wrote:
>
>> I think it has been de-facto deprecated, we haven’t updated it
>> meaningfully in several years. I think removing the API would be excessive
>> but deprecating it would give us the flexibility to remove it in the not
>> too distant future.
>>
>> That being said this is not a vote to remove GraphX, I think that
>> whenever that time comes (if it does) we should have a separate vote
>>
>> This VOTE will be open for a little more than one week, ending on October
>> 8th*. To vote reply with:
>> +1 Deprecate GraphX
>> 0 I’m indifferent
>> -1 Don’t deprecate GraphX because ABC
>>
>> If you have a binding vote to simplify you tallying at the end please
>> mark your vote with a *.
>>
>> (*mostly because I’m going camping for my birthday)
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Mon, Sep 30, 2024 at 5:39 PM Reynold Xin 
wrote:

> I don't actually "lead" this. But I don't think this needs to target a
> specific Spark version given it should not have any user facing
> consequences?
>
>
> On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun  wrote:
>
>> Thank you for leading this, Vladimir, Reynold, Herman.
>>
>> I'm wondering if this is really achievable goal for Apache Spark 4.0.0.
>>
>> If it's expected that we are unable to deliver it, shall we postpone this
>> vote until 4.1.0 planning?
>>
>> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>>
>> -1 from my side.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>> > +1
>> >
>> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin > >
>> > wrote:
>> >
>> > > +1
>> > >
>> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev 
>> > > wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> I’d like to start a vote for a single-pass Analyzer for the Catalyst
>> > >> project. This project will introduce a new analysis framework to the
>> > >> Catalyst, which will eventually replace the fixed-point one.
>> > >>
>> > >> Please refer to the SPIP jira:
>> > >> https://issues.apache.org/jira/browse/SPARK-49834
>> > >>
>> > >> [ ] +1: Accept the proposal
>> > >> [ ] +0
>> > >> [ ] -1: I don’t think this is a good idea because …
>> > >>
>> > >> Thanks!
>> > >>
>> > >> Vladimir
>> > >>
>> > >
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Gengliang Wang
+1

On Mon, Sep 30, 2024 at 6:22 PM Jungtaek Lim 
wrote:

> +1 (non-binding), promising proposal!
>
> 2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:
>
>> Thank you for the swift clarification, Reynold and Xiao.
>>
>> It seems that the Target Version was set mistakenly initially.
>>
>> I removed the `Target Version` from the SPIP JIRA.
>>
>> https://issues.apache.org/jira/browse/SPARK-49834
>>
>> I'm switching my cast to +1 for this SPIP vote.
>>
>> Thanks,
>> Dongjoon.
>>
>> On 2024/09/30 22:55:41 Xiao Li wrote:
>> > +1 in support of the direction of the Single-pass Analyzer for Catalyst.
>> >
>> > I think we should not have a target version for the new Catalyst
>> SPARK-49834
>> > . It should not be a
>> > blocker for Spark 4.0. When implementing the new analyzer, the code
>> changes
>> > must not affect users of the existing analyzer to avoid any user-facing
>> > impacts.
>> >
>> > Reynold Xin  于2024年9月30日周一 15:39写道:
>> >
>> > > I don't actually "lead" this. But I don't think this needs to target a
>> > > specific Spark version given it should not have any user facing
>> > > consequences?
>> > >
>> > >
>> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
>> wrote:
>> > >
>> > >> Thank you for leading this, Vladimir, Reynold, Herman.
>> > >>
>> > >> I'm wondering if this is really achievable goal for Apache Spark
>> 4.0.0.
>> > >>
>> > >> If it's expected that we are unable to deliver it, shall we postpone
>> this
>> > >> vote until 4.1.0 planning?
>> > >>
>> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>> > >>
>> > >> -1 from my side.
>> > >>
>> > >> Thanks,
>> > >> Dongjoon.
>> > >>
>> > >>
>> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>> > >> > +1
>> > >> >
>> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
>> > > >> >
>> > >> > wrote:
>> > >> >
>> > >> > > +1
>> > >> > >
>> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
>> vvdr@gmail.com>
>> > >> > > wrote:
>> > >> > >
>> > >> > >> Hi all,
>> > >> > >>
>> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
>> Catalyst
>> > >> > >> project. This project will introduce a new analysis framework
>> to the
>> > >> > >> Catalyst, which will eventually replace the fixed-point one.
>> > >> > >>
>> > >> > >> Please refer to the SPIP jira:
>> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
>> > >> > >>
>> > >> > >> [ ] +1: Accept the proposal
>> > >> > >> [ ] +0
>> > >> > >> [ ] -1: I don’t think this is a good idea because …
>> > >> > >>
>> > >> > >> Thanks!
>> > >> > >>
>> > >> > >> Vladimir
>> > >> > >>
>> > >> > >
>> > >> >
>> > >>
>> > >> -
>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >>
>> > >>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Jungtaek Lim
+1 (non-binding)

2024년 10월 1일 (화) 오전 8:19, Sean Owen 님이 작성:

> For reasons in the previous thread, yes +1 to deprecation
>
> On Mon, Sep 30, 2024 at 1:02 PM Holden Karau 
> wrote:
>
>> I think it has been de-facto deprecated, we haven’t updated it
>> meaningfully in several years. I think removing the API would be excessive
>> but deprecating it would give us the flexibility to remove it in the not
>> too distant future.
>>
>> That being said this is not a vote to remove GraphX, I think that
>> whenever that time comes (if it does) we should have a separate vote
>>
>> This VOTE will be open for a little more than one week, ending on October
>> 8th*. To vote reply with:
>> +1 Deprecate GraphX
>> 0 I’m indifferent
>> -1 Don’t deprecate GraphX because ABC
>>
>> If you have a binding vote to simplify you tallying at the end please
>> mark your vote with a *.
>>
>> (*mostly because I’m going camping for my birthday)
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Jungtaek Lim
+1 (non-binding), promising proposal!

2024년 10월 1일 (화) 오전 8:04, Dongjoon Hyun 님이 작성:

> Thank you for the swift clarification, Reynold and Xiao.
>
> It seems that the Target Version was set mistakenly initially.
>
> I removed the `Target Version` from the SPIP JIRA.
>
> https://issues.apache.org/jira/browse/SPARK-49834
>
> I'm switching my cast to +1 for this SPIP vote.
>
> Thanks,
> Dongjoon.
>
> On 2024/09/30 22:55:41 Xiao Li wrote:
> > +1 in support of the direction of the Single-pass Analyzer for Catalyst.
> >
> > I think we should not have a target version for the new Catalyst
> SPARK-49834
> > . It should not be a
> > blocker for Spark 4.0. When implementing the new analyzer, the code
> changes
> > must not affect users of the existing analyzer to avoid any user-facing
> > impacts.
> >
> > Reynold Xin  于2024年9月30日周一 15:39写道:
> >
> > > I don't actually "lead" this. But I don't think this needs to target a
> > > specific Spark version given it should not have any user facing
> > > consequences?
> > >
> > >
> > > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun 
> wrote:
> > >
> > >> Thank you for leading this, Vladimir, Reynold, Herman.
> > >>
> > >> I'm wondering if this is really achievable goal for Apache Spark
> 4.0.0.
> > >>
> > >> If it's expected that we are unable to deliver it, shall we postpone
> this
> > >> vote until 4.1.0 planning?
> > >>
> > >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
> > >>
> > >> -1 from my side.
> > >>
> > >> Thanks,
> > >> Dongjoon.
> > >>
> > >>
> > >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
> > >> > +1
> > >> >
> > >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin
>  > >> >
> > >> > wrote:
> > >> >
> > >> > > +1
> > >> > >
> > >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev <
> vvdr@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > >> Hi all,
> > >> > >>
> > >> > >> I’d like to start a vote for a single-pass Analyzer for the
> Catalyst
> > >> > >> project. This project will introduce a new analysis framework to
> the
> > >> > >> Catalyst, which will eventually replace the fixed-point one.
> > >> > >>
> > >> > >> Please refer to the SPIP jira:
> > >> > >> https://issues.apache.org/jira/browse/SPARK-49834
> > >> > >>
> > >> > >> [ ] +1: Accept the proposal
> > >> > >> [ ] +0
> > >> > >> [ ] -1: I don’t think this is a good idea because …
> > >> > >>
> > >> > >> Thanks!
> > >> > >>
> > >> > >> Vladimir
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >> -
> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>
> > >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Sean Owen
For reasons in the previous thread, yes +1 to deprecation

On Mon, Sep 30, 2024 at 1:02 PM Holden Karau  wrote:

> I think it has been de-facto deprecated, we haven’t updated it
> meaningfully in several years. I think removing the API would be excessive
> but deprecating it would give us the flexibility to remove it in the not
> too distant future.
>
> That being said this is not a vote to remove GraphX, I think that whenever
> that time comes (if it does) we should have a separate vote
>
> This VOTE will be open for a little more than one week, ending on October
> 8th*. To vote reply with:
> +1 Deprecate GraphX
> 0 I’m indifferent
> -1 Don’t deprecate GraphX because ABC
>
> If you have a binding vote to simplify you tallying at the end please mark
> your vote with a *.
>
> (*mostly because I’m going camping for my birthday)
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Dongjoon Hyun
Thank you for the swift clarification, Reynold and Xiao.

It seems that the Target Version was set mistakenly initially.

I removed the `Target Version` from the SPIP JIRA.

https://issues.apache.org/jira/browse/SPARK-49834

I'm switching my cast to +1 for this SPIP vote.

Thanks,
Dongjoon.

On 2024/09/30 22:55:41 Xiao Li wrote:
> +1 in support of the direction of the Single-pass Analyzer for Catalyst.
> 
> I think we should not have a target version for the new Catalyst SPARK-49834
> . It should not be a
> blocker for Spark 4.0. When implementing the new analyzer, the code changes
> must not affect users of the existing analyzer to avoid any user-facing
> impacts.
> 
> Reynold Xin  于2024年9月30日周一 15:39写道:
> 
> > I don't actually "lead" this. But I don't think this needs to target a
> > specific Spark version given it should not have any user facing
> > consequences?
> >
> >
> > On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun  wrote:
> >
> >> Thank you for leading this, Vladimir, Reynold, Herman.
> >>
> >> I'm wondering if this is really achievable goal for Apache Spark 4.0.0.
> >>
> >> If it's expected that we are unable to deliver it, shall we postpone this
> >> vote until 4.1.0 planning?
> >>
> >> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
> >>
> >> -1 from my side.
> >>
> >> Thanks,
> >> Dongjoon.
> >>
> >>
> >> On 2024/09/30 17:51:24 Herman van Hovell wrote:
> >> > +1
> >> >
> >> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin  >> >
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev 
> >> > > wrote:
> >> > >
> >> > >> Hi all,
> >> > >>
> >> > >> I’d like to start a vote for a single-pass Analyzer for the Catalyst
> >> > >> project. This project will introduce a new analysis framework to the
> >> > >> Catalyst, which will eventually replace the fixed-point one.
> >> > >>
> >> > >> Please refer to the SPIP jira:
> >> > >> https://issues.apache.org/jira/browse/SPARK-49834
> >> > >>
> >> > >> [ ] +1: Accept the proposal
> >> > >> [ ] +0
> >> > >> [ ] -1: I don’t think this is a good idea because …
> >> > >>
> >> > >> Thanks!
> >> > >>
> >> > >> Vladimir
> >> > >>
> >> > >
> >> >
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
> >>
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Xiao Li
+1 in support of the direction of the Single-pass Analyzer for Catalyst.

I think we should not have a target version for the new Catalyst SPARK-49834
. It should not be a
blocker for Spark 4.0. When implementing the new analyzer, the code changes
must not affect users of the existing analyzer to avoid any user-facing
impacts.

Reynold Xin  于2024年9月30日周一 15:39写道:

> I don't actually "lead" this. But I don't think this needs to target a
> specific Spark version given it should not have any user facing
> consequences?
>
>
> On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun  wrote:
>
>> Thank you for leading this, Vladimir, Reynold, Herman.
>>
>> I'm wondering if this is really achievable goal for Apache Spark 4.0.0.
>>
>> If it's expected that we are unable to deliver it, shall we postpone this
>> vote until 4.1.0 planning?
>>
>> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>>
>> -1 from my side.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On 2024/09/30 17:51:24 Herman van Hovell wrote:
>> > +1
>> >
>> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin > >
>> > wrote:
>> >
>> > > +1
>> > >
>> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev 
>> > > wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> I’d like to start a vote for a single-pass Analyzer for the Catalyst
>> > >> project. This project will introduce a new analysis framework to the
>> > >> Catalyst, which will eventually replace the fixed-point one.
>> > >>
>> > >> Please refer to the SPIP jira:
>> > >> https://issues.apache.org/jira/browse/SPARK-49834
>> > >>
>> > >> [ ] +1: Accept the proposal
>> > >> [ ] +0
>> > >> [ ] -1: I don’t think this is a good idea because …
>> > >>
>> > >> Thanks!
>> > >>
>> > >> Vladimir
>> > >>
>> > >
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Reynold Xin
I don't actually "lead" this. But I don't think this needs to target a
specific Spark version given it should not have any user facing
consequences?


On Mon, Sep 30, 2024 at 3:36 PM Dongjoon Hyun  wrote:

> Thank you for leading this, Vladimir, Reynold, Herman.
>
> I'm wondering if this is really achievable goal for Apache Spark 4.0.0.
>
> If it's expected that we are unable to deliver it, shall we postpone this
> vote until 4.1.0 planning?
>
> Anyway, since SPARK-49834 has a target version 4.0.0 explicitly,
>
> -1 from my side.
>
> Thanks,
> Dongjoon.
>
>
> On 2024/09/30 17:51:24 Herman van Hovell wrote:
> > +1
> >
> > On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin  >
> > wrote:
> >
> > > +1
> > >
> > > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev 
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> I’d like to start a vote for a single-pass Analyzer for the Catalyst
> > >> project. This project will introduce a new analysis framework to the
> > >> Catalyst, which will eventually replace the fixed-point one.
> > >>
> > >> Please refer to the SPIP jira:
> > >> https://issues.apache.org/jira/browse/SPARK-49834
> > >>
> > >> [ ] +1: Accept the proposal
> > >> [ ] +0
> > >> [ ] -1: I don’t think this is a good idea because …
> > >>
> > >> Thanks!
> > >>
> > >> Vladimir
> > >>
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Dongjoon Hyun
Thank you for leading this, Vladimir, Reynold, Herman.

I'm wondering if this is really achievable goal for Apache Spark 4.0.0.

If it's expected that we are unable to deliver it, shall we postpone this vote 
until 4.1.0 planning?

Anyway, since SPARK-49834 has a target version 4.0.0 explicitly, 

-1 from my side.

Thanks,
Dongjoon.


On 2024/09/30 17:51:24 Herman van Hovell wrote:
> +1
> 
> On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin 
> wrote:
> 
> > +1
> >
> > On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev 
> > wrote:
> >
> >> Hi all,
> >>
> >> I’d like to start a vote for a single-pass Analyzer for the Catalyst
> >> project. This project will introduce a new analysis framework to the
> >> Catalyst, which will eventually replace the fixed-point one.
> >>
> >> Please refer to the SPIP jira:
> >> https://issues.apache.org/jira/browse/SPARK-49834
> >>
> >> [ ] +1: Accept the proposal
> >> [ ] +0
> >> [ ] -1: I don’t think this is a good idea because …
> >>
> >> Thanks!
> >>
> >> Vladimir
> >>
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread L. C. Hsieh
+1

On Mon, Sep 30, 2024 at 1:25 PM Herman van Hovell
 wrote:
>
> +1
>
> On Mon, Sep 30, 2024 at 12:21 PM Dongjoon Hyun  wrote:
>>
>> +1
>>
>> Thank you, Holden.
>>
>> Dongjoon.
>>
>> On 2024/09/30 18:01:17 Holden Karau wrote:
>> > I think it has been de-facto deprecated, we haven’t updated it meaningfully
>> > in several years. I think removing the API would be excessive but
>> > deprecating it would give us the flexibility to remove it in the not too
>> > distant future.
>> >
>> > That being said this is not a vote to remove GraphX, I think that whenever
>> > that time comes (if it does) we should have a separate vote
>> >
>> > This VOTE will be open for a little more than one week, ending on October
>> > 8th*. To vote reply with:
>> > +1 Deprecate GraphX
>> > 0 I’m indifferent
>> > -1 Don’t deprecate GraphX because ABC
>> >
>> > If you have a binding vote to simplify you tallying at the end please mark
>> > your vote with a *.
>> >
>> > (*mostly because I’m going camping for my birthday)
>> >
>> > Twitter: https://twitter.com/holdenkarau
>> > Fight Health Insurance: https://www.fighthealthinsurance.com/
>> > 
>> > Books (Learning Spark, High Performance Spark, etc.):
>> > https://amzn.to/2MaRAG9  
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> > Pronouns: she/her
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Herman van Hovell
+1

On Mon, Sep 30, 2024 at 12:21 PM Dongjoon Hyun  wrote:

> +1
>
> Thank you, Holden.
>
> Dongjoon.
>
> On 2024/09/30 18:01:17 Holden Karau wrote:
> > I think it has been de-facto deprecated, we haven’t updated it
> meaningfully
> > in several years. I think removing the API would be excessive but
> > deprecating it would give us the flexibility to remove it in the not too
> > distant future.
> >
> > That being said this is not a vote to remove GraphX, I think that
> whenever
> > that time comes (if it does) we should have a separate vote
> >
> > This VOTE will be open for a little more than one week, ending on October
> > 8th*. To vote reply with:
> > +1 Deprecate GraphX
> > 0 I’m indifferent
> > -1 Don’t deprecate GraphX because ABC
> >
> > If you have a binding vote to simplify you tallying at the end please
> mark
> > your vote with a *.
> >
> > (*mostly because I’m going camping for my birthday)
> >
> > Twitter: https://twitter.com/holdenkarau
> > Fight Health Insurance: https://www.fighthealthinsurance.com/
> > 
> > Books (Learning Spark, High Performance Spark, etc.):
> > https://amzn.to/2MaRAG9  
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Mich Talebzadeh
+1

Mich Talebzadeh,

Architect | Data Engineer | Data Science | Financial Crime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Mon, 30 Sept 2024 at 19:39, Holden Karau  wrote:

> I think it has been de-facto deprecated, we haven’t updated it
> meaningfully in several years. I think removing the API would be excessive
> but deprecating it would give us the flexibility to remove it in the not
> too distant future.
>
> That being said this is not a vote to remove GraphX, I think that whenever
> that time comes (if it does) we should have a separate vote
>
> This VOTE will be open for a little more than one week, ending on October
> 8th*. To vote reply with:
> +1 Deprecate GraphX
> 0 I’m indifferent
> -1 Don’t deprecate GraphX because ABC
>
> If you have a binding vote to simplify you tallying at the end please mark
> your vote with a *.
>
> (*mostly because I’m going camping for my birthday)
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>


Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Dongjoon Hyun
+1

Thank you, Holden.

Dongjoon.

On 2024/09/30 18:01:17 Holden Karau wrote:
> I think it has been de-facto deprecated, we haven’t updated it meaningfully
> in several years. I think removing the API would be excessive but
> deprecating it would give us the flexibility to remove it in the not too
> distant future.
> 
> That being said this is not a vote to remove GraphX, I think that whenever
> that time comes (if it does) we should have a separate vote
> 
> This VOTE will be open for a little more than one week, ending on October
> 8th*. To vote reply with:
> +1 Deprecate GraphX
> 0 I’m indifferent
> -1 Don’t deprecate GraphX because ABC
> 
> If you have a binding vote to simplify you tallying at the end please mark
> your vote with a *.
> 
> (*mostly because I’m going camping for my birthday)
> 
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Herman van Hovell
+1

On Mon, Sep 30, 2024 at 8:29 AM Reynold Xin 
wrote:

> +1
>
> On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev 
> wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for a single-pass Analyzer for the Catalyst
>> project. This project will introduce a new analysis framework to the
>> Catalyst, which will eventually replace the fixed-point one.
>>
>> Please refer to the SPIP jira:
>> https://issues.apache.org/jira/browse/SPARK-49834
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks!
>>
>> Vladimir
>>
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-09-30 Thread Sean Owen
I support deprecating GraphX because:

   - GraphFrames supersedes it, really
   - No maintainers and no reason to believe there will be - we can take
   the last 5+ years as thorough evidence
   - Low (but not trivial) docs hits compared to other modules:
   
https://analytics.apache.org/index.php?module=CoreHome&action=index&date=yesterday&period=day&idSite=40#?period=year&date=2024-09-29&idSite=40&category=General_Actions&subcategory=General_Pages
   - If it *exists* in 4.x then it has to live as long as 4.x does, and
   that's already a super long time (4+ years?); deprecating is just a step to
   removing it in 5.x. (Well, we *can* take a decision to remove it in some
   4.x version if it's really a problem, but deprecating well in advance is a
   prerequisite.

There is one problem: deprecated in favor of what? GraphFrames. But,
GraphFrames uses GraphX :)  But it is likewise in a similar bucket.
*Maintained* but no active development; not sure about usage. So I think
this is kind of "deprecated without replacement".

But we're only talking about deprecating here, which I think more
accurately communicates its state to users than not doing so.


On Mon, Sep 30, 2024 at 12:20 PM Mich Talebzadeh 
wrote:

> Hi,
>
> These are my Views:
>
> 1. Deprecation Consideration: I lean towards the idea of officially
> deprecating GraphX, given the lack of active development and community
> engagement over the past few years as you alluded. This would set clear
> expectations for users about its future and encourage them to explore
> alternatives that are actively maintained.
>
> 2. User Input: It would be prudent to gather feedback from those currently
> utilizing GraphX. Their insights could help us understand whether they find
> the functionality sufficient as-is or if they have specific needs that
> remain unaddressed.
>
> 3. Search for Maintainers: While I believe deprecation is a prudent step,
> I also think we should issue a call for new maintainers before making any
> final decisions. If there are individuals or teams willing to invest in
> GraphX, it may still have a place in our ecosystem.
>
> Ultimately, I feel that we should prioritize the health of the Spark
> ecosystem and ensure that we are investing resources into actively
> maintained components.
>
> HTH
>
> Mich Talebzadeh
>
> Architect | Data Engineer | Data Science | Financial Crime
> PhD  Imperial College
> London 
>
> London, United Kingdom
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Sun, 29 Sept 2024 at 21:39, Holden Karau 
> wrote:
>
>> Since we're getting close to cutting a 4.0 branch I'd like to float the
>> idea of officially deprecating Graph X. What that would mean (to me) is we
>> would update the docs to indicate that Graph X is deprecated and it's APIs
>> may be removed at anytime in the future.
>>
>> Alternatively, we could mark it as "unmaintained and in search of
>> maintainers" with a note that if no maintainers are found, we may remove it
>> in a future minor version.
>>
>> Looking at the source graph X, I don't see any meaningful active
>> development going back over three years*. There is even a thread on user@
>> from 2017 asking if graph X is maintained anymore, with no response from
>> the developers.
>>
>> Now I'm open to the idea that GraphX is stable and "works as is" and
>> simply doesn't require modifications but given the user thread I'm a little
>> concerned here about bringing this API with us into Spark 4 if we don't
>> have anyone signed up to maintain it.
>>
>> * Excluding globally applied changes
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> 
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>


Re: [VOTE] Single-pass Analyzer for Catalyst

2024-09-30 Thread Reynold Xin
+1

On Mon, Sep 30, 2024 at 6:47 AM Vladimir Golubev  wrote:

> Hi all,
>
> I’d like to start a vote for a single-pass Analyzer for the Catalyst
> project. This project will introduce a new analysis framework to the
> Catalyst, which will eventually replace the fixed-point one.
>
> Please refer to the SPIP jira:
> https://issues.apache.org/jira/browse/SPARK-49834
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks!
>
> Vladimir
>


Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-09-30 Thread Mich Talebzadeh
Hi,

These are my Views:

1. Deprecation Consideration: I lean towards the idea of officially
deprecating GraphX, given the lack of active development and community
engagement over the past few years as you alluded. This would set clear
expectations for users about its future and encourage them to explore
alternatives that are actively maintained.

2. User Input: It would be prudent to gather feedback from those currently
utilizing GraphX. Their insights could help us understand whether they find
the functionality sufficient as-is or if they have specific needs that
remain unaddressed.

3. Search for Maintainers: While I believe deprecation is a prudent step, I
also think we should issue a call for new maintainers before making any
final decisions. If there are individuals or teams willing to invest in
GraphX, it may still have a place in our ecosystem.

Ultimately, I feel that we should prioritize the health of the Spark
ecosystem and ensure that we are investing resources into actively
maintained components.

HTH

Mich Talebzadeh

Architect | Data Engineer | Data Science | Financial Crime
PhD  Imperial College
London 

London, United Kingdom



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sun, 29 Sept 2024 at 21:39, Holden Karau  wrote:

> Since we're getting close to cutting a 4.0 branch I'd like to float the
> idea of officially deprecating Graph X. What that would mean (to me) is we
> would update the docs to indicate that Graph X is deprecated and it's APIs
> may be removed at anytime in the future.
>
> Alternatively, we could mark it as "unmaintained and in search of
> maintainers" with a note that if no maintainers are found, we may remove it
> in a future minor version.
>
> Looking at the source graph X, I don't see any meaningful active
> development going back over three years*. There is even a thread on user@
> from 2017 asking if graph X is maintained anymore, with no response from
> the developers.
>
> Now I'm open to the idea that GraphX is stable and "works as is" and
> simply doesn't require modifications but given the user thread I'm a little
> concerned here about bringing this API with us into Spark 4 if we don't
> have anyone signed up to maintain it.
>
> * Excluding globally applied changes
> --
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> 
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>


Re: [DISCUSS] Creating `branch-4.0` and Feature Freeze for Apache Spark 4.0

2024-09-27 Thread Dongjoon Hyun
Thank you for the detailed proposal with dates, Hyukjin. I guess it's aligned 
with Herman already, too. I'm fine if we have a pre-defined and predictably 
achievable schedule. :-)

We know the well-known goal of `Spark Connect` and I love to have. I've been 
monitoring the progress, but I'm not sure if the delivery schedule is feasible 
or not in the given status. At this point, I believe you and Herman's words 
because you are the leaders and contributors for that area.

For the proposed schedule, the worst case in the community would be a future 
delay due to the risk of unknown personal schedules because Apache Spark is a 
community-driven project depending on the contributors' voluntarily passion and 
time. And, we know that it varies during the following Holiday seasons much. I 
also believe that you and Herman considered them already in the personal 
schedules.

- 2024.12.20 ~ 2025.01.05 (Christmas and Happy New Year Holiday)
- 2025.01.29 ~ 2025.02.03 (Chinese New Year)

> 2024-01-15 Creating `branch-4.0` (allowing backporting new features)
> 2024-02-01 Feature Freeze (allowing backporting bug fixes only)
> 2024-02-15 Starting Apache Spark 4.0.0 RC1

Thank you again for the replies. Let's wait and collect more opinions and 
update Apache Spark Website for the tentative schedule.

Dongjoon.

On 2024/09/27 03:00:49 Hyukjin Kwon wrote:
> I meant 2025 :-).
> 
> On Fri, Sep 27, 2024 at 11:15 AM Hyukjin Kwon  wrote:
> 
> > We're basically working on making Scala Spark Connect ready.
> > For example, I am working on having a parent class for both Spark Classic
> > and Spark Connect so users would face less breaking changes, and they can
> > run their application without changing anything.
> > In addition, I am also working on sharing the same test base between Spark
> > Classic and Spark Connect.
> > For those, I think it might take a couple of months to stabilize.
> >
> > What about the below schedule?
> >
> > - 2024-01-15 Creating `branch-4.0` (allowing backporting new features)
> > - 2024-02-01 Feature Freeze (allowing backporting bug fixes only)
> > - 2024-02-15 Starting Apache Spark 4.0.0 RC1
> >
> >
> > On Fri, 27 Sept 2024 at 07:35, Dongjoon Hyun 
> > wrote:
> >
> >> Thank you for the reply, Herman.
> >>
> >> Given that December and January are on your schedule,
> >> I'm not sure what date your proposal is. Could you elaborate more?
> >> As we know, the community is less active during that Winter period.
> >>
> >> In addition, although I know that you are leading that area actively with
> >> big refactorings,
> >> it would be greatly appreciated if you could share a more concrete
> >> progress status and
> >> delivery plan (or milestone) for `Connect and Classic Scala interface` to
> >> the community.
> >> Specifically, we are curious about how much we achieved between
> >> `preview1` and `preview2`,
> >> and if it's going to be stabilized enough at that time frame. Let's see
> >> what you have more.
> >>
> >> > We are working on unifying the Connect and Classic Scala interface
> >>
> >> Thank you again!
> >>
> >> Dongjoon.
> >>
> >> On Thu, Sep 26, 2024 at 12:23 PM Herman van Hovell 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Can we push back the dates by at least 2 months?
> >>>
> >>> We are working on unifying the Connect and Classic Scala interface, and
> >>> I would like to avoid rushing things.
> >>>
> >>> Kind regards,
> >>> Herman
> >>>
> >>> On Thu, Sep 26, 2024 at 3:19 PM Dongjoon Hyun 
> >>> wrote:
> >>>
>  Hi, All.
> 
>  We've delivered two preview releases for Apache Spark 4.0 successfully.
>  I believe it's time to discuss cutting branch-4.0 to stabilize more
>  based on them
>  and schedule feature freeze.
> 
>  - https://spark.apache.org/news/spark-4.0.0-preview1.html (June)
>  - https://spark.apache.org/news/spark-4.0.0-preview2.html (September)
> 
>  I'd like to propose as a candidate.
> 
>  - 2024-10-01 Creating `branch-4.0` (allowing backporting new features)
>  - 2024-10-15 Feature Freeze (allowing backporting bug fixes only)
>  - 2024-11-01 Starting Apache Spark 4.0.0 RC1
> 
>  WDYT? Please let me know if you have release blockers for Spark 4 or
>  other schedule candidates in your mind.
> 
>  Thanks,
>  Dongjoon.
> 
> >>>
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Creating `branch-4.0` and Feature Freeze for Apache Spark 4.0

2024-09-26 Thread Hyukjin Kwon
I meant 2025 :-).

On Fri, Sep 27, 2024 at 11:15 AM Hyukjin Kwon  wrote:

> We're basically working on making Scala Spark Connect ready.
> For example, I am working on having a parent class for both Spark Classic
> and Spark Connect so users would face less breaking changes, and they can
> run their application without changing anything.
> In addition, I am also working on sharing the same test base between Spark
> Classic and Spark Connect.
> For those, I think it might take a couple of months to stabilize.
>
> What about the below schedule?
>
> - 2024-01-15 Creating `branch-4.0` (allowing backporting new features)
> - 2024-02-01 Feature Freeze (allowing backporting bug fixes only)
> - 2024-02-15 Starting Apache Spark 4.0.0 RC1
>
>
> On Fri, 27 Sept 2024 at 07:35, Dongjoon Hyun 
> wrote:
>
>> Thank you for the reply, Herman.
>>
>> Given that December and January are on your schedule,
>> I'm not sure what date your proposal is. Could you elaborate more?
>> As we know, the community is less active during that Winter period.
>>
>> In addition, although I know that you are leading that area actively with
>> big refactorings,
>> it would be greatly appreciated if you could share a more concrete
>> progress status and
>> delivery plan (or milestone) for `Connect and Classic Scala interface` to
>> the community.
>> Specifically, we are curious about how much we achieved between
>> `preview1` and `preview2`,
>> and if it's going to be stabilized enough at that time frame. Let's see
>> what you have more.
>>
>> > We are working on unifying the Connect and Classic Scala interface
>>
>> Thank you again!
>>
>> Dongjoon.
>>
>> On Thu, Sep 26, 2024 at 12:23 PM Herman van Hovell 
>> wrote:
>>
>>> Hi,
>>>
>>> Can we push back the dates by at least 2 months?
>>>
>>> We are working on unifying the Connect and Classic Scala interface, and
>>> I would like to avoid rushing things.
>>>
>>> Kind regards,
>>> Herman
>>>
>>> On Thu, Sep 26, 2024 at 3:19 PM Dongjoon Hyun 
>>> wrote:
>>>
 Hi, All.

 We've delivered two preview releases for Apache Spark 4.0 successfully.
 I believe it's time to discuss cutting branch-4.0 to stabilize more
 based on them
 and schedule feature freeze.

 - https://spark.apache.org/news/spark-4.0.0-preview1.html (June)
 - https://spark.apache.org/news/spark-4.0.0-preview2.html (September)

 I'd like to propose as a candidate.

 - 2024-10-01 Creating `branch-4.0` (allowing backporting new features)
 - 2024-10-15 Feature Freeze (allowing backporting bug fixes only)
 - 2024-11-01 Starting Apache Spark 4.0.0 RC1

 WDYT? Please let me know if you have release blockers for Spark 4 or
 other schedule candidates in your mind.

 Thanks,
 Dongjoon.

>>>


Re: [DISCUSS] Creating `branch-4.0` and Feature Freeze for Apache Spark 4.0

2024-09-26 Thread Hyukjin Kwon
We're basically working on making Scala Spark Connect ready.
For example, I am working on having a parent class for both Spark Classic
and Spark Connect so users would face less breaking changes, and they can
run their application without changing anything.
In addition, I am also working on sharing the same test base between Spark
Classic and Spark Connect.
For those, I think it might take a couple of months to stabilize.

What about the below schedule?

- 2024-01-15 Creating `branch-4.0` (allowing backporting new features)
- 2024-02-01 Feature Freeze (allowing backporting bug fixes only)
- 2024-02-15 Starting Apache Spark 4.0.0 RC1


On Fri, 27 Sept 2024 at 07:35, Dongjoon Hyun 
wrote:

> Thank you for the reply, Herman.
>
> Given that December and January are on your schedule,
> I'm not sure what date your proposal is. Could you elaborate more?
> As we know, the community is less active during that Winter period.
>
> In addition, although I know that you are leading that area actively with
> big refactorings,
> it would be greatly appreciated if you could share a more concrete
> progress status and
> delivery plan (or milestone) for `Connect and Classic Scala interface` to
> the community.
> Specifically, we are curious about how much we achieved between `preview1`
> and `preview2`,
> and if it's going to be stabilized enough at that time frame. Let's see
> what you have more.
>
> > We are working on unifying the Connect and Classic Scala interface
>
> Thank you again!
>
> Dongjoon.
>
> On Thu, Sep 26, 2024 at 12:23 PM Herman van Hovell 
> wrote:
>
>> Hi,
>>
>> Can we push back the dates by at least 2 months?
>>
>> We are working on unifying the Connect and Classic Scala interface, and I
>> would like to avoid rushing things.
>>
>> Kind regards,
>> Herman
>>
>> On Thu, Sep 26, 2024 at 3:19 PM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> We've delivered two preview releases for Apache Spark 4.0 successfully.
>>> I believe it's time to discuss cutting branch-4.0 to stabilize more
>>> based on them
>>> and schedule feature freeze.
>>>
>>> - https://spark.apache.org/news/spark-4.0.0-preview1.html (June)
>>> - https://spark.apache.org/news/spark-4.0.0-preview2.html (September)
>>>
>>> I'd like to propose as a candidate.
>>>
>>> - 2024-10-01 Creating `branch-4.0` (allowing backporting new features)
>>> - 2024-10-15 Feature Freeze (allowing backporting bug fixes only)
>>> - 2024-11-01 Starting Apache Spark 4.0.0 RC1
>>>
>>> WDYT? Please let me know if you have release blockers for Spark 4 or
>>> other schedule candidates in your mind.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>


Re: [DISCUSS] Creating `branch-4.0` and Feature Freeze for Apache Spark 4.0

2024-09-26 Thread Herman van Hovell
Hi,

Can we push back the dates by at least 2 months?

We are working on unifying the Connect and Classic Scala interface, and I
would like to avoid rushing things.

Kind regards,
Herman

On Thu, Sep 26, 2024 at 3:19 PM Dongjoon Hyun 
wrote:

> Hi, All.
>
> We've delivered two preview releases for Apache Spark 4.0 successfully.
> I believe it's time to discuss cutting branch-4.0 to stabilize more based
> on them
> and schedule feature freeze.
>
> - https://spark.apache.org/news/spark-4.0.0-preview1.html (June)
> - https://spark.apache.org/news/spark-4.0.0-preview2.html (September)
>
> I'd like to propose as a candidate.
>
> - 2024-10-01 Creating `branch-4.0` (allowing backporting new features)
> - 2024-10-15 Feature Freeze (allowing backporting bug fixes only)
> - 2024-11-01 Starting Apache Spark 4.0.0 RC1
>
> WDYT? Please let me know if you have release blockers for Spark 4 or other
> schedule candidates in your mind.
>
> Thanks,
> Dongjoon.
>


Re: [DISCUSS] Creating `branch-4.0` and Feature Freeze for Apache Spark 4.0

2024-09-26 Thread Dongjoon Hyun
Thank you for the reply, Herman.

Given that December and January are on your schedule,
I'm not sure what date your proposal is. Could you elaborate more?
As we know, the community is less active during that Winter period.

In addition, although I know that you are leading that area actively with
big refactorings,
it would be greatly appreciated if you could share a more concrete progress
status and
delivery plan (or milestone) for `Connect and Classic Scala interface` to
the community.
Specifically, we are curious about how much we achieved between `preview1`
and `preview2`,
and if it's going to be stabilized enough at that time frame. Let's see
what you have more.

> We are working on unifying the Connect and Classic Scala interface

Thank you again!

Dongjoon.

On Thu, Sep 26, 2024 at 12:23 PM Herman van Hovell 
wrote:

> Hi,
>
> Can we push back the dates by at least 2 months?
>
> We are working on unifying the Connect and Classic Scala interface, and I
> would like to avoid rushing things.
>
> Kind regards,
> Herman
>
> On Thu, Sep 26, 2024 at 3:19 PM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> We've delivered two preview releases for Apache Spark 4.0 successfully.
>> I believe it's time to discuss cutting branch-4.0 to stabilize more based
>> on them
>> and schedule feature freeze.
>>
>> - https://spark.apache.org/news/spark-4.0.0-preview1.html (June)
>> - https://spark.apache.org/news/spark-4.0.0-preview2.html (September)
>>
>> I'd like to propose as a candidate.
>>
>> - 2024-10-01 Creating `branch-4.0` (allowing backporting new features)
>> - 2024-10-15 Feature Freeze (allowing backporting bug fixes only)
>> - 2024-11-01 Starting Apache Spark 4.0.0 RC1
>>
>> WDYT? Please let me know if you have release blockers for Spark 4 or
>> other schedule candidates in your mind.
>>
>> Thanks,
>> Dongjoon.
>>
>


Re: [DISCUSS] Spark 3.5.3 breaks Iceberg SparkSessionCatalog

2024-09-25 Thread Wenchen Fan
Hi Russell,

Thanks for testing it out! It's a bit unfortunate that we found this issue
after the RC stage. I've made a fix for it:
https://github.com/apache/spark/pull/48257 . I think it should work but
let's confirm it. After it gets merged, we can probably wait for a while to
accumulate more fixes to land in branch-3.5 and make another release.

On Thu, Sep 26, 2024 at 5:59 AM Russell Spitzer 
wrote:

> Checked and extending Delegating Catalog Extension will be quite difficult
> or at least cause several breaks in current Iceberg SparkSessionCatalog
> implementations. Note this has nothing to do with third party catalogs but
> more directly with how Iceberg works with Spark regardless of Catalog
> implementation.
>
> Main issues on the Iceberg side:
>
> 1. Initialize is final and empty in DelegatingCatalogExtension. This means
> we have no way of taking custom catalog configuration and applying to the
> Iceberg plugin. Currently this is used for a few things; choosing the
> underlying Iceberg catalog implementation, catalog cache settings, Iceberg
> environment context.
>
> 2. No access to delegate catalog object. The delegate is private so we are
> unable to touch it in our extended class which is currently used for
> Iceberg's "staged create" and "staged replace" functions. Here we could
> just work around this by disabling staged create and replace if the
> delegate is being used but that would be a break iceberg behavior.
>
> Outside of these aspects I was able to get everything else working as
> expected but I think both of these are probably blockers.
>
>
> On Wed, Sep 25, 2024 at 3:51 PM Russell Spitzer 
> wrote:
>
>> I think it should be minimally difficult to switch this around on the
>> Iceberg side, we only have to move the initialize code out and duplicate
>> it. Not a huge cost
>>
>> On Sun, Sep 22, 2024 at 11:39 PM Wenchen Fan  wrote:
>>
>>> It's a buggy behavior that a custom v2 catalog (without extending
>>> DelegatingCatalogExtension) expects Spark to still use the v1 DDL commands
>>> to operate on the tables inside it. This is also why the third-party
>>> catalogs (e.g. Unity Catalog and Apache Polaris) can not be used to
>>> overwrite `spark_catalog` if people still want to use the Spark built-in
>>> file sources.
>>>
>>> Technically, I think it's wrong for a third-party catalog to rely on
>>> Spark's session catalog without extending `DelegatingCatalogExtension`, as
>>> it confuses Spark. If it has its own metastore, then it shouldn't delegate
>>> requests to the Spark session catalog and use v1 DDL commands which only
>>> work with the Spark session catalog. Otherwise, it should extend
>>> `DelegatingCatalogExtension` to indicate it.
>>>
>>> On Mon, Sep 23, 2024 at 11:19 AM Manu Zhang 
>>> wrote:
>>>
 Hi Iceberg and Spark community,

 I'd like to bring your attention to a recent change[1] in Spark 3.5.3
 that effectively breaks Iceberg's SparkSessionCatalog[2] and blocks Iceberg
 upgrading to Spark 3.5.3[3].

 SparkSessionCatalog, as a customized Spark V2 session catalog,
 supports creating a V1 table with V1 command. That's no longer allowed
 after the change unless it extends DelegatingCatalogExtension. It is not
 minor work since SparkSessionCatalog already extends a base class[4].

 To resolve this issue, we have to make changes to public interfaces at
 either Spark or Iceberg side. IMHO, it doesn't make sense for a downstream
 project to refactor its interfaces when bumping up a maintenance version of
 Spark. WDYT?


 1. https://github.com/apache/spark/pull/47724
 2.
 https://iceberg.apache.org/docs/nightly/spark-configuration/#replacing-the-session-catalog
 3. https://github.com/apache/iceberg/pull/11160
 
 4.
 https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java

 Thanks,
 Manu




Re: [DISCUSS] Spark 3.5.3 breaks Iceberg SparkSessionCatalog

2024-09-25 Thread Russell Spitzer
Checked and extending Delegating Catalog Extension will be quite difficult
or at least cause several breaks in current Iceberg SparkSessionCatalog
implementations. Note this has nothing to do with third party catalogs but
more directly with how Iceberg works with Spark regardless of Catalog
implementation.

Main issues on the Iceberg side:

1. Initialize is final and empty in DelegatingCatalogExtension. This means
we have no way of taking custom catalog configuration and applying to the
Iceberg plugin. Currently this is used for a few things; choosing the
underlying Iceberg catalog implementation, catalog cache settings, Iceberg
environment context.

2. No access to delegate catalog object. The delegate is private so we are
unable to touch it in our extended class which is currently used for
Iceberg's "staged create" and "staged replace" functions. Here we could
just work around this by disabling staged create and replace if the
delegate is being used but that would be a break iceberg behavior.

Outside of these aspects I was able to get everything else working as
expected but I think both of these are probably blockers.


On Wed, Sep 25, 2024 at 3:51 PM Russell Spitzer 
wrote:

> I think it should be minimally difficult to switch this around on the
> Iceberg side, we only have to move the initialize code out and duplicate
> it. Not a huge cost
>
> On Sun, Sep 22, 2024 at 11:39 PM Wenchen Fan  wrote:
>
>> It's a buggy behavior that a custom v2 catalog (without extending
>> DelegatingCatalogExtension) expects Spark to still use the v1 DDL commands
>> to operate on the tables inside it. This is also why the third-party
>> catalogs (e.g. Unity Catalog and Apache Polaris) can not be used to
>> overwrite `spark_catalog` if people still want to use the Spark built-in
>> file sources.
>>
>> Technically, I think it's wrong for a third-party catalog to rely on
>> Spark's session catalog without extending `DelegatingCatalogExtension`, as
>> it confuses Spark. If it has its own metastore, then it shouldn't delegate
>> requests to the Spark session catalog and use v1 DDL commands which only
>> work with the Spark session catalog. Otherwise, it should extend
>> `DelegatingCatalogExtension` to indicate it.
>>
>> On Mon, Sep 23, 2024 at 11:19 AM Manu Zhang 
>> wrote:
>>
>>> Hi Iceberg and Spark community,
>>>
>>> I'd like to bring your attention to a recent change[1] in Spark 3.5.3
>>> that effectively breaks Iceberg's SparkSessionCatalog[2] and blocks Iceberg
>>> upgrading to Spark 3.5.3[3].
>>>
>>> SparkSessionCatalog, as a customized Spark V2 session catalog,
>>> supports creating a V1 table with V1 command. That's no longer allowed
>>> after the change unless it extends DelegatingCatalogExtension. It is not
>>> minor work since SparkSessionCatalog already extends a base class[4].
>>>
>>> To resolve this issue, we have to make changes to public interfaces at
>>> either Spark or Iceberg side. IMHO, it doesn't make sense for a downstream
>>> project to refactor its interfaces when bumping up a maintenance version of
>>> Spark. WDYT?
>>>
>>>
>>> 1. https://github.com/apache/spark/pull/47724
>>> 2.
>>> https://iceberg.apache.org/docs/nightly/spark-configuration/#replacing-the-session-catalog
>>> 3. https://github.com/apache/iceberg/pull/11160
>>> 
>>> 4.
>>> https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
>>>
>>> Thanks,
>>> Manu
>>>
>>>


Re: [DISCUSS] Spark 3.5.3 breaks Iceberg SparkSessionCatalog

2024-09-25 Thread Russell Spitzer
I think it should be minimally difficult to switch this around on the
Iceberg side, we only have to move the initialize code out and duplicate
it. Not a huge cost

On Sun, Sep 22, 2024 at 11:39 PM Wenchen Fan  wrote:

> It's a buggy behavior that a custom v2 catalog (without extending
> DelegatingCatalogExtension) expects Spark to still use the v1 DDL commands
> to operate on the tables inside it. This is also why the third-party
> catalogs (e.g. Unity Catalog and Apache Polaris) can not be used to
> overwrite `spark_catalog` if people still want to use the Spark built-in
> file sources.
>
> Technically, I think it's wrong for a third-party catalog to rely on
> Spark's session catalog without extending `DelegatingCatalogExtension`, as
> it confuses Spark. If it has its own metastore, then it shouldn't delegate
> requests to the Spark session catalog and use v1 DDL commands which only
> work with the Spark session catalog. Otherwise, it should extend
> `DelegatingCatalogExtension` to indicate it.
>
> On Mon, Sep 23, 2024 at 11:19 AM Manu Zhang 
> wrote:
>
>> Hi Iceberg and Spark community,
>>
>> I'd like to bring your attention to a recent change[1] in Spark 3.5.3
>> that effectively breaks Iceberg's SparkSessionCatalog[2] and blocks Iceberg
>> upgrading to Spark 3.5.3[3].
>>
>> SparkSessionCatalog, as a customized Spark V2 session catalog,
>> supports creating a V1 table with V1 command. That's no longer allowed
>> after the change unless it extends DelegatingCatalogExtension. It is not
>> minor work since SparkSessionCatalog already extends a base class[4].
>>
>> To resolve this issue, we have to make changes to public interfaces at
>> either Spark or Iceberg side. IMHO, it doesn't make sense for a downstream
>> project to refactor its interfaces when bumping up a maintenance version of
>> Spark. WDYT?
>>
>>
>> 1. https://github.com/apache/spark/pull/47724
>> 2.
>> https://iceberg.apache.org/docs/nightly/spark-configuration/#replacing-the-session-catalog
>> 3. https://github.com/apache/iceberg/pull/11160
>> 
>> 4.
>> https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
>>
>> Thanks,
>> Manu
>>
>>


Re: [ANNOUNCE] Apache Spark 3.5.3 released

2024-09-25 Thread Haejoon Lee
Updated release notes, and submitted a PR to publish docker images:
- https://github.com/apache/spark-docker/pull/72

Thanks!

On Wed, Sep 25, 2024 at 2:48 PM Dongjoon Hyun 
wrote:

> Thank you for the release, Heajoon.
>
> Could you publish docker images too like the following?
>
> - https://github.com/apache/spark-docker/pull/64
>   (Publish 3.5.2 to docker registry)
>
> Dongjoon.
>
>
> On Tue, Sep 24, 2024 at 10:29 PM Haejoon Lee 
> wrote:
>
>> Hi, Yang!
>>
>> And thanks Dongjoon for answering the question!
>>
>> For the second question, I got the commit list from the JIRA release note
>> ,
>> but just realized that some commits are not resolved yet & missing from
>> 3.5.3 as you mentioned.
>> > 3.5.3 release notes contains some commits that does not exists in
>> tag/v3.5.3, e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3,
>> would you help explain more about that?
>>
>> Let me update the release notes soon to properly include only the list
>> that was actually completed in 3.5.3.
>>
>> Thanks for the report!
>>
>> Haejoon
>>
>> On Wed, Sep 25, 2024 at 2:16 PM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, Yang.
>>>
>>> For this question, please try with `Private` mode or `Incognito` mode in
>>> your browser.
>>> > I find download page does not contain 3.5.3 link, but release notes
>>> link exists.
>>>
>>> Dongjoon.
>>>
>>> On Tue, Sep 24, 2024 at 8:39 PM Yang Zhang  wrote:
>>>
 Hi,

 I find download page does not contain 3.5.3 link, but release notes
 link exists.

 3.5.3 release notes contains some commits that does not exists in
 tag/v3.5.3, e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3,
 would you help explain more about that?

 Thank you

 On 2024/09/25 01:05:47 Haejoon Lee wrote:
 > We are happy to announce the availability of Apache Spark 3.5.3!
 >
 > Spark 3.5.3 is the third maintenance release containing security
 > and correctness fixes. This release is based on the branch-3.5
 > maintenance branch of Spark. We strongly recommend all 3.5 users
 > to upgrade to this stable release.
 >
 > To download Spark 3.5.3, head over to the download page:
 > https://spark.apache.org/downloads.html
 >
 > To view the release notes:
 > https://spark.apache.org/releases/spark-release-3-5-3.html
 >
 > We would like to acknowledge all community members for contributing
 to this
 > release. This release would not have been possible without you.
 >
 > Haejoon Lee
 >

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [ANNOUNCE] Apache Spark 3.5.3 released

2024-09-24 Thread Dongjoon Hyun
Thank you for the release, Heajoon.

Could you publish docker images too like the following?

- https://github.com/apache/spark-docker/pull/64
  (Publish 3.5.2 to docker registry)

Dongjoon.


On Tue, Sep 24, 2024 at 10:29 PM Haejoon Lee 
wrote:

> Hi, Yang!
>
> And thanks Dongjoon for answering the question!
>
> For the second question, I got the commit list from the JIRA release note
> ,
> but just realized that some commits are not resolved yet & missing from
> 3.5.3 as you mentioned.
> > 3.5.3 release notes contains some commits that does not exists in
> tag/v3.5.3, e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3,
> would you help explain more about that?
>
> Let me update the release notes soon to properly include only the list
> that was actually completed in 3.5.3.
>
> Thanks for the report!
>
> Haejoon
>
> On Wed, Sep 25, 2024 at 2:16 PM Dongjoon Hyun 
> wrote:
>
>> Hi, Yang.
>>
>> For this question, please try with `Private` mode or `Incognito` mode in
>> your browser.
>> > I find download page does not contain 3.5.3 link, but release notes
>> link exists.
>>
>> Dongjoon.
>>
>> On Tue, Sep 24, 2024 at 8:39 PM Yang Zhang  wrote:
>>
>>> Hi,
>>>
>>> I find download page does not contain 3.5.3 link, but release notes link
>>> exists.
>>>
>>> 3.5.3 release notes contains some commits that does not exists in
>>> tag/v3.5.3, e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3,
>>> would you help explain more about that?
>>>
>>> Thank you
>>>
>>> On 2024/09/25 01:05:47 Haejoon Lee wrote:
>>> > We are happy to announce the availability of Apache Spark 3.5.3!
>>> >
>>> > Spark 3.5.3 is the third maintenance release containing security
>>> > and correctness fixes. This release is based on the branch-3.5
>>> > maintenance branch of Spark. We strongly recommend all 3.5 users
>>> > to upgrade to this stable release.
>>> >
>>> > To download Spark 3.5.3, head over to the download page:
>>> > https://spark.apache.org/downloads.html
>>> >
>>> > To view the release notes:
>>> > https://spark.apache.org/releases/spark-release-3-5-3.html
>>> >
>>> > We would like to acknowledge all community members for contributing to
>>> this
>>> > release. This release would not have been possible without you.
>>> >
>>> > Haejoon Lee
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [ANNOUNCE] Apache Spark 3.5.3 released

2024-09-24 Thread Haejoon Lee
Hi, Yang!

And thanks Dongjoon for answering the question!

For the second question, I got the commit list from the JIRA release note
,
but just realized that some commits are not resolved yet & missing from
3.5.3 as you mentioned.
> 3.5.3 release notes contains some commits that does not exists in
tag/v3.5.3, e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3,
would you help explain more about that?

Let me update the release notes soon to properly include only the list that
was actually completed in 3.5.3.

Thanks for the report!

Haejoon

On Wed, Sep 25, 2024 at 2:16 PM Dongjoon Hyun 
wrote:

> Hi, Yang.
>
> For this question, please try with `Private` mode or `Incognito` mode in
> your browser.
> > I find download page does not contain 3.5.3 link, but release notes link
> exists.
>
> Dongjoon.
>
> On Tue, Sep 24, 2024 at 8:39 PM Yang Zhang  wrote:
>
>> Hi,
>>
>> I find download page does not contain 3.5.3 link, but release notes link
>> exists.
>>
>> 3.5.3 release notes contains some commits that does not exists in
>> tag/v3.5.3, e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3,
>> would you help explain more about that?
>>
>> Thank you
>>
>> On 2024/09/25 01:05:47 Haejoon Lee wrote:
>> > We are happy to announce the availability of Apache Spark 3.5.3!
>> >
>> > Spark 3.5.3 is the third maintenance release containing security
>> > and correctness fixes. This release is based on the branch-3.5
>> > maintenance branch of Spark. We strongly recommend all 3.5 users
>> > to upgrade to this stable release.
>> >
>> > To download Spark 3.5.3, head over to the download page:
>> > https://spark.apache.org/downloads.html
>> >
>> > To view the release notes:
>> > https://spark.apache.org/releases/spark-release-3-5-3.html
>> >
>> > We would like to acknowledge all community members for contributing to
>> this
>> > release. This release would not have been possible without you.
>> >
>> > Haejoon Lee
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [ANNOUNCE] Apache Spark 3.5.3 released

2024-09-24 Thread Dongjoon Hyun
Hi, Yang.

For this question, please try with `Private` mode or `Incognito` mode in
your browser.
> I find download page does not contain 3.5.3 link, but release notes link
exists.

Dongjoon.

On Tue, Sep 24, 2024 at 8:39 PM Yang Zhang  wrote:

> Hi,
>
> I find download page does not contain 3.5.3 link, but release notes link
> exists.
>
> 3.5.3 release notes contains some commits that does not exists in
> tag/v3.5.3, e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3,
> would you help explain more about that?
>
> Thank you
>
> On 2024/09/25 01:05:47 Haejoon Lee wrote:
> > We are happy to announce the availability of Apache Spark 3.5.3!
> >
> > Spark 3.5.3 is the third maintenance release containing security
> > and correctness fixes. This release is based on the branch-3.5
> > maintenance branch of Spark. We strongly recommend all 3.5 users
> > to upgrade to this stable release.
> >
> > To download Spark 3.5.3, head over to the download page:
> > https://spark.apache.org/downloads.html
> >
> > To view the release notes:
> > https://spark.apache.org/releases/spark-release-3-5-3.html
> >
> > We would like to acknowledge all community members for contributing to
> this
> > release. This release would not have been possible without you.
> >
> > Haejoon Lee
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [ANNOUNCE] Apache Spark 3.5.3 released

2024-09-24 Thread Yang Zhang
Hi, 

I find download page does not contain 3.5.3 link, but release notes link exists.

3.5.3 release notes contains some commits that does not exists in tag/v3.5.3, 
e.g `[SPARK-49628]` exists in branch-3.5 but not in tag/v3.5.3, would you help 
explain more about that?

Thank you

On 2024/09/25 01:05:47 Haejoon Lee wrote:
> We are happy to announce the availability of Apache Spark 3.5.3!
> 
> Spark 3.5.3 is the third maintenance release containing security
> and correctness fixes. This release is based on the branch-3.5
> maintenance branch of Spark. We strongly recommend all 3.5 users
> to upgrade to this stable release.
> 
> To download Spark 3.5.3, head over to the download page:
> https://spark.apache.org/downloads.html
> 
> To view the release notes:
> https://spark.apache.org/releases/spark-release-3-5-3.html
> 
> We would like to acknowledge all community members for contributing to this
> release. This release would not have been possible without you.
> 
> Haejoon Lee
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS][Spark SQL] Update API

2024-09-24 Thread Wenchen Fan
All the existing DML APIs we support today have a source query so they all
start with the source DataFrame, e.g.
sourceDf.write.insertInto...
sourceDf.write.saveAsTable...
sourceDf.mergeInto...

However, this is not the case for UPDATE and DELETE, as there is no source
query. We need a different style of APIs for them, which should start with
the target table. I'm in favor of option 3 due to its compile-time safety
and clear intention. We can probably support all DML APIs with this style
as well, e.g.
spark.catalog.getTable(...).update(...)
spark.catalog.getTable(...).delete(...)
spark.catalog.getTable(...).insertFrom(...)
spark.catalog.getTable(...).mergeFrom(...)

Or we can make it more like SQL:
spark.catalog.updateTable(tableName, ...)
spark.catalog.deleteFrom(tableName, ...)
spark.catalog.mergeInto(tableName, sourceDataFrame): MergeBuilder
spark.catalog.writeInto(tableName, sourceDataFrame): DataDrameWriterV2

On Tue, Sep 24, 2024 at 8:54 AM Szehon Ho  wrote:

> Hi all,
>
> In https://github.com/apache/spark/pull/47233, we are looking to add a
> Spark DataFrame API for functional equivalence to Spark SQL's UPDATE
> statement.
>
> There are open discussions on the PR about location/format of the API, and
> we wanted to ask on devlist to get more opinions.
>
> One consideration, is that Update SQL is an isolated, terminal operation
> only on DSV2 tables that cannot be chained to other operations.
>
> I made a quick write up about the background and discussed options in
> https://docs.google.com/document/d/1AjkxOU06pFEzFmSbepfxdHoUGtvNAk6X1WY3zHGTW_o/edit.
> It is my first one, so please let me know if I missed something.
>
> Look forward to hearing from more Spark devs on thoughts, either in the
> PR, document, or reply to this email.
>
> Thank you,
> Szehon
>
>


Re: [DISCUSS] [Spark SQL] Single-pass Analyzer SPIP

2024-09-24 Thread Wenchen Fan
Let me add a bit more color since I'm the Shepherd.

I've fixed quite some bugs in the analyzer due to rule order issues. The
recent ones are https://github.com/apache/spark/pull/45718 and
https://github.com/apache/spark/pull/45350 . Dealing with rule order is
very tricky and making all the analyzer rules orthogonal is nearly
impossible. This is definitely the right direction to follow other
mainstream databases and use a single-pass analyzer.

This is a tough project and will likely take years. To reduce risks, it
will not change the codebase invasively. The majority of the new analyzer
will be in the new code files, and only minor refactorings are needed to
reuse some existing analyzer rules. The new analyzer will only be enabled
in the dedicated tests that will be newly built for this new analyzer, so
you should never hit issues caused by the new analyzer in the existing
tests.

On Thu, Sep 19, 2024 at 5:01 PM Reynold Xin 
wrote:

> Great document! Thanks for writing it up.
>
> On Tue, Sep 10, 2024 at 10:00 AM Vladimir Golubev 
> wrote:
>
>> Hey folks, following up on the recent single-pass Analyzer discussion. I
>> made a high-level proposal document for this idea:
>> https://docs.google.com/document/d/1dWxvrJV-0joGdLtWbvJ0uNyTocDMJ90rPRNWa4T56Og.
>> Feel free to comment!
>>
>


Re: [DISCUSS] Spark 3.5.3 breaks Iceberg SparkSessionCatalog

2024-09-22 Thread Wenchen Fan
It's a buggy behavior that a custom v2 catalog (without extending
DelegatingCatalogExtension) expects Spark to still use the v1 DDL commands
to operate on the tables inside it. This is also why the third-party
catalogs (e.g. Unity Catalog and Apache Polaris) can not be used to
overwrite `spark_catalog` if people still want to use the Spark built-in
file sources.

Technically, I think it's wrong for a third-party catalog to rely on
Spark's session catalog without extending `DelegatingCatalogExtension`, as
it confuses Spark. If it has its own metastore, then it shouldn't delegate
requests to the Spark session catalog and use v1 DDL commands which only
work with the Spark session catalog. Otherwise, it should extend
`DelegatingCatalogExtension` to indicate it.

On Mon, Sep 23, 2024 at 11:19 AM Manu Zhang  wrote:

> Hi Iceberg and Spark community,
>
> I'd like to bring your attention to a recent change[1] in Spark 3.5.3 that
> effectively breaks Iceberg's SparkSessionCatalog[2] and blocks Iceberg
> upgrading to Spark 3.5.3[3].
>
> SparkSessionCatalog, as a customized Spark V2 session catalog,
> supports creating a V1 table with V1 command. That's no longer allowed
> after the change unless it extends DelegatingCatalogExtension. It is not
> minor work since SparkSessionCatalog already extends a base class[4].
>
> To resolve this issue, we have to make changes to public interfaces at
> either Spark or Iceberg side. IMHO, it doesn't make sense for a downstream
> project to refactor its interfaces when bumping up a maintenance version of
> Spark. WDYT?
>
>
> 1. https://github.com/apache/spark/pull/47724
> 2.
> https://iceberg.apache.org/docs/nightly/spark-configuration/#replacing-the-session-catalog
> 3. https://github.com/apache/iceberg/pull/11160
> 
> 4.
> https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
>
> Thanks,
> Manu
>
>


Re: Using Redirects for recently broken links

2024-09-20 Thread Hyukjin Kwon
Yeah I think we should fix this. Wonder if there's a way to redirect this
properly.

On Sat, Sep 21, 2024 at 8:24 AM Matthew Powers 
wrote:

> Hey devs :)
>
> When I do a "pyspark groupby" Google search, I get to the following link,
> which is broken:
> https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.groupBy.html
>
> I guess this is the new URL?
> https://spark.apache.org/docs/3.5.2/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.groupBy.html
>
> It's a little hard to find, even when you go to
> https://spark.apache.org/docs/latest/ and search for "pyspark groupby".
>
> Can we just add redirects from the old URLs => the current URLs.  Think
> that would give users a much better experience.
>
> I think some of these pages were removed based on this work-around for
> size limits:
> https://lists.apache.org/thread/q9p2nj5x0z9p6cy32c9vd9lo43v4qxls
>
> Thanks for the help everyone!!
>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-20 Thread Dongjoon Hyun
Thank you all! I'll conclude this vote.

Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-19 Thread Yang Jie
+1

On 2024/09/19 23:33:39 Shubham Patel wrote:
> +1
> 
> On Thu, 19 Sept 2024, 01:05 Gengliang Wang,  wrote:
> 
> > +1
> >
> > On Wed, Sep 18, 2024 at 10:47 PM Xiao Li  wrote:
> >
> >> +1
> >>
> >> On Wed, Sep 18, 2024 at 18:22 Yuming Wang  wrote:
> >>
> >>> +1
> >>>
> >>> On Wed, Sep 18, 2024 at 6:07 PM Cheng Pan  wrote:
> >>>
>  +1 (non-binding)
> 
>  I checked
>  - Signatures and checksums are good.
>  - Build success from source code.
>  - Pass integration test with Apache Kyuubi [1]
> 
>  [1] https://github.com/apache/kyuubi/pull/6699
> 
>  Thanks,
>  Cheng Pan
> 
> 
> 
>  On Sep 16, 2024, at 15:24, Dongjoon Hyun 
>  wrote:
> 
>  Please vote on releasing the following candidate as Apache Spark
>  version 4.0.0-preview2.
> 
>  The vote is open until September 20th 1AM (PDT) and passes if a
>  majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> 
>  [ ] +1 Release this package as Apache Spark 4.0.0-preview2
>  [ ] -1 Do not release this package because ...
> 
>  To learn more about Apache Spark, please see https://spark.apache.org/
> 
>  The tag to be voted on is v4.0.0-preview2-rc1 (commit
>  f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
>  https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
> 
>  The release files, including signatures, digests, etc. can be found at:
>  https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
> 
>  Signatures used for Spark RCs can be found in this file:
>  https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
>  The staging repository for this release can be found at:
>  https://repository.apache.org/content/repositories/orgapachespark-1468/
> 
>  The documentation corresponding to this release can be found at:
>  https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
> 
>  The list of bug fixes going into 4.0.0-preview2 can be found at the
>  following URL:
>  https://issues.apache.org/jira/projects/SPARK/versions/12353359
> 
>  This release is using the release script of the tag v4.0.0-preview2-rc1.
> 
>  FAQ
> 
>  =
>  How can I help test this release?
>  =
> 
>  If you are a Spark user, you can help us test this release by taking
>  an existing Spark workload and running on this release candidate, then
>  reporting any regressions.
> 
>  If you're working in PySpark you can set up a virtual env and install
>  the current RC and see if anything important breaks, in the Java/Scala
>  you can add the staging repository to your projects resolvers and test
>  with the RC (make sure to clean up the artifact cache before/after so
>  you don't end up building with a out of date RC going forward).
> 
> 
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-19 Thread Shubham Patel
+1

On Thu, 19 Sept 2024, 01:05 Gengliang Wang,  wrote:

> +1
>
> On Wed, Sep 18, 2024 at 10:47 PM Xiao Li  wrote:
>
>> +1
>>
>> On Wed, Sep 18, 2024 at 18:22 Yuming Wang  wrote:
>>
>>> +1
>>>
>>> On Wed, Sep 18, 2024 at 6:07 PM Cheng Pan  wrote:
>>>
 +1 (non-binding)

 I checked
 - Signatures and checksums are good.
 - Build success from source code.
 - Pass integration test with Apache Kyuubi [1]

 [1] https://github.com/apache/kyuubi/pull/6699

 Thanks,
 Cheng Pan



 On Sep 16, 2024, at 15:24, Dongjoon Hyun 
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 4.0.0-preview2.

 The vote is open until September 20th 1AM (PDT) and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 4.0.0-preview2
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see https://spark.apache.org/

 The tag to be voted on is v4.0.0-preview2-rc1 (commit
 f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
 https://github.com/apache/spark/tree/v4.0.0-preview2-rc1

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1468/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/

 The list of bug fixes going into 4.0.0-preview2 can be found at the
 following URL:
 https://issues.apache.org/jira/projects/SPARK/versions/12353359

 This release is using the release script of the tag v4.0.0-preview2-rc1.

 FAQ

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with a out of date RC going forward).





Re: Variant spec and its security implications

2024-09-19 Thread Gang Wu
+ dev@spark because authors of variant may not subscribe to dev@parquet

On Mon, Sep 16, 2024 at 7:33 PM Antoine Pitrou  wrote:

>
> Hello,
>
> I've been reading the spec in more detail here:
>
> https://github.com/apache/spark/blob/d84f1a3575c4125009374521d2f179089ebd71ad/common/variant/README.md#encoding-types
>
> and I think that it should have a Security section listing potential
> security issues with this format (especially for readers).
>
> Given that Parquet is frequently used to make data publicly available
> online, it is important for implementers to know of potential issues to
> look for, and ideally protect against.
>
>
> One specific concern is the following snippet about the Object encoding:
>
> "The field ids and field offsets must be in lexicographical order of the
> corresponding field names in the metadata dictionary. However, the
> actual value entries do not need to be in any particular order. This
> implies that the field_offset values may not be monotonically
> increasing."
>
> Having field offsets which are not monotonically increasing makes it
> difficult to verify that the encoded values do not overlap. In general,
> it's useful for data formats to enable easy validation and error report.
> In this particular case, an attacker could perhaps craft a malicious
> Variant with deeply nested overlapping values to achieve a denial of
> service attack, similar to
> https://en.wikipedia.org/wiki/Billion_laughs_attack
>
> (I'm not saying such a malicious Variant is practically doable given
> specifics of the binary encoding, but it will be difficult to prove
> that it isn't)
>
> Regards
>
> Antoine.
>
>
>


Re: [DISCUSS] [Spark SQL] Single-pass Analyzer SPIP

2024-09-19 Thread Reynold Xin
Great document! Thanks for writing it up.

On Tue, Sep 10, 2024 at 10:00 AM Vladimir Golubev 
wrote:

> Hey folks, following up on the recent single-pass Analyzer discussion. I
> made a high-level proposal document for this idea:
> https://docs.google.com/document/d/1dWxvrJV-0joGdLtWbvJ0uNyTocDMJ90rPRNWa4T56Og.
> Feel free to comment!
>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-18 Thread Gengliang Wang
+1

On Wed, Sep 18, 2024 at 10:47 PM Xiao Li  wrote:

> +1
>
> On Wed, Sep 18, 2024 at 18:22 Yuming Wang  wrote:
>
>> +1
>>
>> On Wed, Sep 18, 2024 at 6:07 PM Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> I checked
>>> - Signatures and checksums are good.
>>> - Build success from source code.
>>> - Pass integration test with Apache Kyuubi [1]
>>>
>>> [1] https://github.com/apache/kyuubi/pull/6699
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>> On Sep 16, 2024, at 15:24, Dongjoon Hyun 
>>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 4.0.0-preview2.
>>>
>>> The vote is open until September 20th 1AM (PDT) and passes if a majority
>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v4.0.0-preview2-rc1 (commit
>>> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
>>> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1468/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
>>>
>>> The list of bug fixes going into 4.0.0-preview2 can be found at the
>>> following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>>>
>>> This release is using the release script of the tag v4.0.0-preview2-rc1.
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>>
>>>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-18 Thread Xiao Li
+1

On Wed, Sep 18, 2024 at 18:22 Yuming Wang  wrote:

> +1
>
> On Wed, Sep 18, 2024 at 6:07 PM Cheng Pan  wrote:
>
>> +1 (non-binding)
>>
>> I checked
>> - Signatures and checksums are good.
>> - Build success from source code.
>> - Pass integration test with Apache Kyuubi [1]
>>
>> [1] https://github.com/apache/kyuubi/pull/6699
>>
>> Thanks,
>> Cheng Pan
>>
>>
>>
>> On Sep 16, 2024, at 15:24, Dongjoon Hyun  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 4.0.0-preview2.
>>
>> The vote is open until September 20th 1AM (PDT) and passes if a majority
>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v4.0.0-preview2-rc1 (commit
>> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
>> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1468/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
>>
>> The list of bug fixes going into 4.0.0-preview2 can be found at the
>> following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>>
>> This release is using the release script of the tag v4.0.0-preview2-rc1.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>>
>>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-18 Thread Yuming Wang
+1

On Wed, Sep 18, 2024 at 6:07 PM Cheng Pan  wrote:

> +1 (non-binding)
>
> I checked
> - Signatures and checksums are good.
> - Build success from source code.
> - Pass integration test with Apache Kyuubi [1]
>
> [1] https://github.com/apache/kyuubi/pull/6699
>
> Thanks,
> Cheng Pan
>
>
>
> On Sep 16, 2024, at 15:24, Dongjoon Hyun  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 4.0.0-preview2.
>
> The vote is open until September 20th 1AM (PDT) and passes if a majority
> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v4.0.0-preview2-rc1 (commit
> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1468/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
>
> The list of bug fixes going into 4.0.0-preview2 can be found at the
> following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>
> This release is using the release script of the tag v4.0.0-preview2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
>
>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-18 Thread Cheng Pan
+1 (non-binding)

I checked
- Signatures and checksums are good.
- Build success from source code.
- Pass integration test with Apache Kyuubi [1]

[1] https://github.com/apache/kyuubi/pull/6699

Thanks,
Cheng Pan



> On Sep 16, 2024, at 15:24, Dongjoon Hyun  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 4.0.0-preview2.
> 
> The vote is open until September 20th 1AM (PDT) and passes if a majority +1 
> PMC votes are cast, with a minimum of 3 +1 votes.
> 
> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see https://spark.apache.org/
> 
> The tag to be voted on is v4.0.0-preview2-rc1 (commit 
> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
> 
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
> 
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1468/
> 
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
> 
> The list of bug fixes going into 4.0.0-preview2 can be found at the following 
> URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353359
> 
> This release is using the release script of the tag v4.0.0-preview2-rc1.
> 
> FAQ
> 
> =
> How can I help test this release?
> =
> 
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
> 
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).



Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-18 Thread Wenchen Fan
+1

On Wed, Sep 18, 2024 at 1:21 AM John Zhuge  wrote:

> +1 non-binding
>
> John Zhuge
>
>
> On Mon, Sep 16, 2024 at 11:07 PM Xinrong Meng  wrote:
>
>> +1
>>
>> Thank you @Dongjoon Hyun  !
>>
>> On Tue, Sep 17, 2024 at 11:31 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>> On Mon, Sep 16, 2024 at 6:20 PM L. C. Hsieh  wrote:
>>>
 +1

 On Mon, Sep 16, 2024 at 5:56 PM Dongjoon Hyun 
 wrote:
 >
 > +1
 >
 > Dongjoon
 >
 > On Mon, Sep 16, 2024 at 10:57 AM Holden Karau 
 wrote:
 >>
 >> +1
 >>
 >> Twitter: https://twitter.com/holdenkarau
 >> Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9
 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
 >> Pronouns: she/her
 >>
 >>
 >> On Mon, Sep 16, 2024 at 10:55 AM Zhou Jiang 
 wrote:
 >>>
 >>> + 1
 >>> Sent from my iPhone
 >>>
 >>> On Sep 16, 2024, at 01:04, Dongjoon Hyun 
 wrote:
 >>>
 >>> 
 >>>
 >>> Please vote on releasing the following candidate as Apache Spark
 version 4.0.0-preview2.
 >>>
 >>> The vote is open until September 20th 1AM (PDT) and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
 >>>
 >>> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
 >>> [ ] -1 Do not release this package because ...
 >>>
 >>> To learn more about Apache Spark, please see
 https://spark.apache.org/
 >>>
 >>> The tag to be voted on is v4.0.0-preview2-rc1 (commit
 f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
 >>> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
 >>>
 >>> The release files, including signatures, digests, etc. can be found
 at:
 >>>
 https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
 >>>
 >>> Signatures used for Spark RCs can be found in this file:
 >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
 >>>
 >>> The staging repository for this release can be found at:
 >>>
 https://repository.apache.org/content/repositories/orgapachespark-1468/
 >>>
 >>> The documentation corresponding to this release can be found at:
 >>>
 https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
 >>>
 >>> The list of bug fixes going into 4.0.0-preview2 can be found at the
 following URL:
 >>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
 >>>
 >>> This release is using the release script of the tag
 v4.0.0-preview2-rc1.
 >>>
 >>> FAQ
 >>>
 >>> =
 >>> How can I help test this release?
 >>> =
 >>>
 >>> If you are a Spark user, you can help us test this release by taking
 >>> an existing Spark workload and running on this release candidate,
 then
 >>> reporting any regressions.
 >>>
 >>> If you're working in PySpark you can set up a virtual env and
 install
 >>> the current RC and see if anything important breaks, in the
 Java/Scala
 >>> you can add the staging repository to your projects resolvers and
 test
 >>> with the RC (make sure to clean up the artifact cache before/after
 so
 >>> you don't end up building with a out of date RC going forward).

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [VOTE] Document and Feature Preview via GitHub Pages

2024-09-17 Thread Kent Yao
+1

Thank you all for participating in the vote. I'll conclude this vote.

Bests,

Kent Yao

On 2024/09/12 04:46:26 "Liu(Laswift) Cao" wrote:
> +1 (non-binding)
> 
> Thank you Kent
> 
> On Wed, Sep 11, 2024 at 9:14 PM Holden Karau  wrote:
> 
> > +1
> >
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> > https://amzn.to/2MaRAG9  
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> > Pronouns: she/her
> >
> >
> > On Wed, Sep 11, 2024 at 6:45 PM Xiao Li  wrote:
> >
> >> +1
> >>
> >> Hyukjin Kwon  于2024年9月11日周三 15:49写道:
> >>
> >>> +1
> >>>
> >>> On Thu, Sep 12, 2024 at 5:00 AM Gengliang Wang  wrote:
> >>>
>  +1
> 
>  On Wed, Sep 11, 2024 at 6:30 AM Wenchen Fan 
>  wrote:
> 
> > +1
> >
> > On Wed, Sep 11, 2024 at 5:15 PM Martin Grund
> >  wrote:
> >
> >> +1
> >>
> >> On Wed, Sep 11, 2024 at 9:39 AM Kent Yao  wrote:
> >>
> >>> Hi all,
> >>>
> >>> Following the discussion[1], I'd like to start the vote for
> >>> 'Document and
> >>> Feature Preview via GitHub Pages'
> >>>
> >>>
> >>> Please vote for the next 72 hours:(excluding next weekend)
> >>>
> >>>  [ ] +1: Accept the proposal
> >>>  [ ] +0
> >>>  [ ]- 1: I don’t think this is a good idea because …
> >>>
> >>>
> >>>
> >>> Bests,
> >>> Kent Yao
> >>>
> >>> [1] https://lists.apache.org/thread/xojcdlw77pht9bs4mt4087ynq6k9sbqq
> >>>
> >>> -
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
> >>>
> 
> -- 
> 
> Liu Cao
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-17 Thread John Zhuge
+1 non-binding

John Zhuge


On Mon, Sep 16, 2024 at 11:07 PM Xinrong Meng  wrote:

> +1
>
> Thank you @Dongjoon Hyun  !
>
> On Tue, Sep 17, 2024 at 11:31 AM huaxin gao 
> wrote:
>
>> +1
>>
>> On Mon, Sep 16, 2024 at 6:20 PM L. C. Hsieh  wrote:
>>
>>> +1
>>>
>>> On Mon, Sep 16, 2024 at 5:56 PM Dongjoon Hyun 
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > Dongjoon
>>> >
>>> > On Mon, Sep 16, 2024 at 10:57 AM Holden Karau 
>>> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> Twitter: https://twitter.com/holdenkarau
>>> >> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> >> Pronouns: she/her
>>> >>
>>> >>
>>> >> On Mon, Sep 16, 2024 at 10:55 AM Zhou Jiang 
>>> wrote:
>>> >>>
>>> >>> + 1
>>> >>> Sent from my iPhone
>>> >>>
>>> >>> On Sep 16, 2024, at 01:04, Dongjoon Hyun 
>>> wrote:
>>> >>>
>>> >>> 
>>> >>>
>>> >>> Please vote on releasing the following candidate as Apache Spark
>>> version 4.0.0-preview2.
>>> >>>
>>> >>> The vote is open until September 20th 1AM (PDT) and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >>>
>>> >>> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
>>> >>> [ ] -1 Do not release this package because ...
>>> >>>
>>> >>> To learn more about Apache Spark, please see
>>> https://spark.apache.org/
>>> >>>
>>> >>> The tag to be voted on is v4.0.0-preview2-rc1 (commit
>>> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
>>> >>> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
>>> >>>
>>> >>> The release files, including signatures, digests, etc. can be found
>>> at:
>>> >>>
>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
>>> >>>
>>> >>> Signatures used for Spark RCs can be found in this file:
>>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >>>
>>> >>> The staging repository for this release can be found at:
>>> >>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1468/
>>> >>>
>>> >>> The documentation corresponding to this release can be found at:
>>> >>>
>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
>>> >>>
>>> >>> The list of bug fixes going into 4.0.0-preview2 can be found at the
>>> following URL:
>>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>>> >>>
>>> >>> This release is using the release script of the tag
>>> v4.0.0-preview2-rc1.
>>> >>>
>>> >>> FAQ
>>> >>>
>>> >>> =
>>> >>> How can I help test this release?
>>> >>> =
>>> >>>
>>> >>> If you are a Spark user, you can help us test this release by taking
>>> >>> an existing Spark workload and running on this release candidate,
>>> then
>>> >>> reporting any regressions.
>>> >>>
>>> >>> If you're working in PySpark you can set up a virtual env and install
>>> >>> the current RC and see if anything important breaks, in the
>>> Java/Scala
>>> >>> you can add the staging repository to your projects resolvers and
>>> test
>>> >>> with the RC (make sure to clean up the artifact cache before/after so
>>> >>> you don't end up building with a out of date RC going forward).
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-16 Thread Xinrong Meng
+1

Thank you @Dongjoon Hyun  !

On Tue, Sep 17, 2024 at 11:31 AM huaxin gao  wrote:

> +1
>
> On Mon, Sep 16, 2024 at 6:20 PM L. C. Hsieh  wrote:
>
>> +1
>>
>> On Mon, Sep 16, 2024 at 5:56 PM Dongjoon Hyun 
>> wrote:
>> >
>> > +1
>> >
>> > Dongjoon
>> >
>> > On Mon, Sep 16, 2024 at 10:57 AM Holden Karau 
>> wrote:
>> >>
>> >> +1
>> >>
>> >> Twitter: https://twitter.com/holdenkarau
>> >> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> >> Pronouns: she/her
>> >>
>> >>
>> >> On Mon, Sep 16, 2024 at 10:55 AM Zhou Jiang 
>> wrote:
>> >>>
>> >>> + 1
>> >>> Sent from my iPhone
>> >>>
>> >>> On Sep 16, 2024, at 01:04, Dongjoon Hyun 
>> wrote:
>> >>>
>> >>> 
>> >>>
>> >>> Please vote on releasing the following candidate as Apache Spark
>> version 4.0.0-preview2.
>> >>>
>> >>> The vote is open until September 20th 1AM (PDT) and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v4.0.0-preview2-rc1 (commit
>> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
>> >>> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
>> >>>
>> >>> Signatures used for Spark RCs can be found in this file:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>>
>> https://repository.apache.org/content/repositories/orgapachespark-1468/
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>>
>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
>> >>>
>> >>> The list of bug fixes going into 4.0.0-preview2 can be found at the
>> following URL:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>> >>>
>> >>> This release is using the release script of the tag
>> v4.0.0-preview2-rc1.
>> >>>
>> >>> FAQ
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>>
>> >>> If you are a Spark user, you can help us test this release by taking
>> >>> an existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set up a virtual env and install
>> >>> the current RC and see if anything important breaks, in the Java/Scala
>> >>> you can add the staging repository to your projects resolvers and test
>> >>> with the RC (make sure to clean up the artifact cache before/after so
>> >>> you don't end up building with a out of date RC going forward).
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-16 Thread huaxin gao
+1

On Mon, Sep 16, 2024 at 6:20 PM L. C. Hsieh  wrote:

> +1
>
> On Mon, Sep 16, 2024 at 5:56 PM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Dongjoon
> >
> > On Mon, Sep 16, 2024 at 10:57 AM Holden Karau 
> wrote:
> >>
> >> +1
> >>
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >> Pronouns: she/her
> >>
> >>
> >> On Mon, Sep 16, 2024 at 10:55 AM Zhou Jiang 
> wrote:
> >>>
> >>> + 1
> >>> Sent from my iPhone
> >>>
> >>> On Sep 16, 2024, at 01:04, Dongjoon Hyun 
> wrote:
> >>>
> >>> 
> >>>
> >>> Please vote on releasing the following candidate as Apache Spark
> version 4.0.0-preview2.
> >>>
> >>> The vote is open until September 20th 1AM (PDT) and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 4.0.0-preview2
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see https://spark.apache.org/
> >>>
> >>> The tag to be voted on is v4.0.0-preview2-rc1 (commit
> f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a)
> >>> https://github.com/apache/spark/tree/v4.0.0-preview2-rc1
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> https://repository.apache.org/content/repositories/orgapachespark-1468/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview2-rc1-docs/
> >>>
> >>> The list of bug fixes going into 4.0.0-preview2 can be found at the
> following URL:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
> >>>
> >>> This release is using the release script of the tag
> v4.0.0-preview2-rc1.
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by taking
> >>> an existing Spark workload and running on this release candidate, then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and install
> >>> the current RC and see if anything important breaks, in the Java/Scala
> >>> you can add the staging repository to your projects resolvers and test
> >>> with the RC (make sure to clean up the artifact cache before/after so
> >>> you don't end up building with a out of date RC going forward).
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


  1   2   3   4   5   6   7   8   9   10   >