Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Kent Yao
+1

Thank you for driving this EOL release, Dongjoon!

Kent Yao

On 2023/12/04 19:40:10 Mridul Muralidharan wrote:
> +1
> 
> Regards,
> Mridul
> 
> On Mon, Dec 4, 2023 at 11:40 AM L. C. Hsieh  wrote:
> 
> > +1
> >
> > Thanks Dongjoon!
> >
> > On Mon, Dec 4, 2023 at 9:26 AM Yang Jie  wrote:
> > >
> > > +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> > >
> > > Jie Yang
> > >
> > > On 2023/12/04 15:08:25 Tom Graves wrote:
> > > >  +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> > > > Tom
> > > > On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun <
> > dongjoon.h...@gmail.com> wrote:
> > > >
> > > >  Hi, All.
> > > >
> > > > Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022,
> > branch-3.3 has been maintained and served well until now.
> > > >
> > > > - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun
> > 9th, 2022)
> > > > - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm
> > (vote result on June 14th, 2022)
> > > >
> > > > As of today, branch-3.3 has 56 additional patches after v3.3.3 (tagged
> > on Aug 3rd about 4 month ago) and reaches the end-of-life this month
> > according to the Apache Spark release cadence,
> > https://spark.apache.org/versioning-policy.html .
> > > >
> > > > $ git log --oneline v3.3.3..HEAD | wc -l
> > > > 56
> > > >
> > > > Along with the recent Apache Spark 3.4.2 release, I hope the users can
> > get a chance to have these last bits of Apache Spark 3.3.x, and I'd like to
> > propose to have Apache Spark 3.3.4 EOL Release vote on December 11th and
> > volunteer as the release manager.
> > > >
> > > > WDTY?
> > > >
> > > > Please let us know if you need more patches on branch-3.3.
> > > >
> > > > Thanks,
> > > > Dongjoon.
> > > >
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-04 Thread Sean Owen
It already does. I think that's not the same idea?

On Mon, Dec 4, 2023, 8:12 PM Almog Tavor  wrote:

> I think Spark should start shading it’s problematic deps similar to how
> it’s done in Flink
>
> On Mon, 4 Dec 2023 at 2:57 Sean Owen  wrote:
>
>> I am not sure we can control that - the Scala _x.y suffix has particular
>> meaning in the Scala ecosystem for artifacts and thus the naming of .jar
>> files. And we need to work with the Scala ecosystem.
>>
>> What can't handle these files, Spring Boot? does it somehow assume the
>> .jar file name relates to Java modules?
>>
>> By the by, Spark 4 is already moving to the jakarta.* packages for
>> similar reasons.
>>
>> I don't think Spark does or can really leverage Java modules. It started
>> waaay before that and expect that it has some structural issues that are
>> incompatible with Java modules, like multiple places declaring code in the
>> same Java package.
>>
>> As in all things, if there's a change that doesn't harm anything else and
>> helps support for Java modules, sure, suggest it. If it has the conflicts I
>> think it will, probably not possible and not really a goal I think.
>>
>>
>> On Sun, Dec 3, 2023 at 11:30 AM Marc Le Bihan 
>> wrote:
>>
>>> Hello,
>>>
>>> Last month, I've attempted the experience of upgrading my
>>> Spring-Boot 2 Java project, that relies heavily on Spark 3.4.2, to
>>> Spring-Boot 3. It didn't succeed yet, but was informative.
>>>
>>> Spring-Boot 2 → 3 means especially javax.* becoming jakarka.* :
>>> javax.activation, javax.ws.rs, javax.persistence, javax.validation,
>>> javax.servlet... all of these have to change their packages and
>>> dependencies.
>>> Apart of that, they were some trouble with ANTLR 4 against ANTLR 3,
>>> and few things with SFL4 and Log4J.
>>>
>>> It was not easy, and I guessed that going into modules could be a
>>> key. But when I'm near the Spark submodules of my project, it fail with
>>> messages such as:
>>> package org.apache.spark.sql.types is declared in the unnamed
>>> module, but module fr.ecoemploi.outbound.spark.core does not read it
>>>
>>> But I can't handle the spark dependencies easily, because they have
>>> an "invalid name" for Java. It's a matter that it doesn't want the "_" that
>>> is in the "_2.13" suffix of the jars.
>>> [WARNING] Can't extract module name from
>>> breeze-macros_2.13-2.1.0.jar: breeze.macros.2.13: Invalid module name: '2'
>>> is not a Java identifier
>>> [WARNING] Can't extract module name from
>>> spark-tags_2.13-3.4.2.jar: spark.tags.2.13: Invalid module name: '2' is not
>>> a Java identifier
>>> [WARNING] Can't extract module name from
>>> spark-unsafe_2.13-3.4.2.jar: spark.unsafe.2.13: Invalid module name: '2' is
>>> not a Java identifier
>>> [WARNING] Can't extract module name from
>>> spark-mllib_2.13-3.4.2.jar: spark.mllib.2.13: Invalid module name: '2' is
>>> not a Java identifier
>>> [... around 30 ...]
>>>
>>> I think that changing the naming pattern of the Spark jars for the
>>> 4.x could be a good idea,
>>> but beyond that, what about attempting to integrate Spark into
>>> modules, it's submodules defining module-info.java?
>>>
>>> Is it something that you think that [must | should | might | should
>>> not | must not] be done?
>>>
>>> Regards,
>>>
>>> Marc Le Bihan
>>>
>>


Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-04 Thread Almog Tavor
I think Spark should start shading it’s problematic deps similar to how
it’s done in Flink

On Mon, 4 Dec 2023 at 2:57 Sean Owen  wrote:

> I am not sure we can control that - the Scala _x.y suffix has particular
> meaning in the Scala ecosystem for artifacts and thus the naming of .jar
> files. And we need to work with the Scala ecosystem.
>
> What can't handle these files, Spring Boot? does it somehow assume the
> .jar file name relates to Java modules?
>
> By the by, Spark 4 is already moving to the jakarta.* packages for similar
> reasons.
>
> I don't think Spark does or can really leverage Java modules. It started
> waaay before that and expect that it has some structural issues that are
> incompatible with Java modules, like multiple places declaring code in the
> same Java package.
>
> As in all things, if there's a change that doesn't harm anything else and
> helps support for Java modules, sure, suggest it. If it has the conflicts I
> think it will, probably not possible and not really a goal I think.
>
>
> On Sun, Dec 3, 2023 at 11:30 AM Marc Le Bihan 
> wrote:
>
>> Hello,
>>
>> Last month, I've attempted the experience of upgrading my Spring-Boot
>> 2 Java project, that relies heavily on Spark 3.4.2, to Spring-Boot 3. It
>> didn't succeed yet, but was informative.
>>
>> Spring-Boot 2 → 3 means especially javax.* becoming jakarka.* :
>> javax.activation, javax.ws.rs, javax.persistence, javax.validation,
>> javax.servlet... all of these have to change their packages and
>> dependencies.
>> Apart of that, they were some trouble with ANTLR 4 against ANTLR 3,
>> and few things with SFL4 and Log4J.
>>
>> It was not easy, and I guessed that going into modules could be a
>> key. But when I'm near the Spark submodules of my project, it fail with
>> messages such as:
>> package org.apache.spark.sql.types is declared in the unnamed
>> module, but module fr.ecoemploi.outbound.spark.core does not read it
>>
>> But I can't handle the spark dependencies easily, because they have
>> an "invalid name" for Java. It's a matter that it doesn't want the "_" that
>> is in the "_2.13" suffix of the jars.
>> [WARNING] Can't extract module name from
>> breeze-macros_2.13-2.1.0.jar: breeze.macros.2.13: Invalid module name: '2'
>> is not a Java identifier
>> [WARNING] Can't extract module name from
>> spark-tags_2.13-3.4.2.jar: spark.tags.2.13: Invalid module name: '2' is not
>> a Java identifier
>> [WARNING] Can't extract module name from
>> spark-unsafe_2.13-3.4.2.jar: spark.unsafe.2.13: Invalid module name: '2' is
>> not a Java identifier
>> [WARNING] Can't extract module name from
>> spark-mllib_2.13-3.4.2.jar: spark.mllib.2.13: Invalid module name: '2' is
>> not a Java identifier
>> [... around 30 ...]
>>
>> I think that changing the naming pattern of the Spark jars for the
>> 4.x could be a good idea,
>> but beyond that, what about attempting to integrate Spark into
>> modules, it's submodules defining module-info.java?
>>
>> Is it something that you think that [must | should | might | should
>> not | must not] be done?
>>
>> Regards,
>>
>> Marc Le Bihan
>>
>


Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Mridul Muralidharan
+1

Regards,
Mridul

On Mon, Dec 4, 2023 at 11:40 AM L. C. Hsieh  wrote:

> +1
>
> Thanks Dongjoon!
>
> On Mon, Dec 4, 2023 at 9:26 AM Yang Jie  wrote:
> >
> > +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> >
> > Jie Yang
> >
> > On 2023/12/04 15:08:25 Tom Graves wrote:
> > >  +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> > > Tom
> > > On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> > >
> > >  Hi, All.
> > >
> > > Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022,
> branch-3.3 has been maintained and served well until now.
> > >
> > > - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun
> 9th, 2022)
> > > - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm
> (vote result on June 14th, 2022)
> > >
> > > As of today, branch-3.3 has 56 additional patches after v3.3.3 (tagged
> on Aug 3rd about 4 month ago) and reaches the end-of-life this month
> according to the Apache Spark release cadence,
> https://spark.apache.org/versioning-policy.html .
> > >
> > > $ git log --oneline v3.3.3..HEAD | wc -l
> > > 56
> > >
> > > Along with the recent Apache Spark 3.4.2 release, I hope the users can
> get a chance to have these last bits of Apache Spark 3.3.x, and I'd like to
> propose to have Apache Spark 3.3.4 EOL Release vote on December 11th and
> volunteer as the release manager.
> > >
> > > WDTY?
> > >
> > > Please let us know if you need more patches on branch-3.3.
> > >
> > > Thanks,
> > > Dongjoon.
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Dongjoon Hyun
Thank you all.

Dongjoon.

On Mon, Dec 4, 2023 at 9:40 AM L. C. Hsieh  wrote:

> +1
>
> Thanks Dongjoon!
>
> On Mon, Dec 4, 2023 at 9:26 AM Yang Jie  wrote:
> >
> > +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> >
> > Jie Yang
> >
> > On 2023/12/04 15:08:25 Tom Graves wrote:
> > >  +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> > > Tom
> > > On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> > >
> > >  Hi, All.
> > >
> > > Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022,
> branch-3.3 has been maintained and served well until now.
> > >
> > > - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun
> 9th, 2022)
> > > - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm
> (vote result on June 14th, 2022)
> > >
> > > As of today, branch-3.3 has 56 additional patches after v3.3.3 (tagged
> on Aug 3rd about 4 month ago) and reaches the end-of-life this month
> according to the Apache Spark release cadence,
> https://spark.apache.org/versioning-policy.html .
> > >
> > > $ git log --oneline v3.3.3..HEAD | wc -l
> > > 56
> > >
> > > Along with the recent Apache Spark 3.4.2 release, I hope the users can
> get a chance to have these last bits of Apache Spark 3.3.x, and I'd like to
> propose to have Apache Spark 3.3.4 EOL Release vote on December 11th and
> volunteer as the release manager.
> > >
> > > WDTY?
> > >
> > > Please let us know if you need more patches on branch-3.3.
> > >
> > > Thanks,
> > > Dongjoon.
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread L. C. Hsieh
+1

Thanks Dongjoon!

On Mon, Dec 4, 2023 at 9:26 AM Yang Jie  wrote:
>
> +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
>
> Jie Yang
>
> On 2023/12/04 15:08:25 Tom Graves wrote:
> >  +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> > Tom
> > On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun 
> >  wrote:
> >
> >  Hi, All.
> >
> > Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022, branch-3.3 
> > has been maintained and served well until now.
> >
> > - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun 9th, 
> > 2022)
> > - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm (vote 
> > result on June 14th, 2022)
> >
> > As of today, branch-3.3 has 56 additional patches after v3.3.3 (tagged on 
> > Aug 3rd about 4 month ago) and reaches the end-of-life this month according 
> > to the Apache Spark release cadence, 
> > https://spark.apache.org/versioning-policy.html .
> >
> > $ git log --oneline v3.3.3..HEAD | wc -l
> > 56
> >
> > Along with the recent Apache Spark 3.4.2 release, I hope the users can get 
> > a chance to have these last bits of Apache Spark 3.3.x, and I'd like to 
> > propose to have Apache Spark 3.3.4 EOL Release vote on December 11th and 
> > volunteer as the release manager.
> >
> > WDTY?
> >
> > Please let us know if you need more patches on branch-3.3.
> >
> > Thanks,
> > Dongjoon.
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Yang Jie
+1 for a 3.3.4 EOL Release. Thanks Dongjoon.

Jie Yang

On 2023/12/04 15:08:25 Tom Graves wrote:
>  +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
> Tom
> On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun 
>  wrote:  
>  
>  Hi, All.
> 
> Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022, branch-3.3 has 
> been maintained and served well until now.
> 
> - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun 9th, 
> 2022)
> - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm (vote 
> result on June 14th, 2022)
> 
> As of today, branch-3.3 has 56 additional patches after v3.3.3 (tagged on Aug 
> 3rd about 4 month ago) and reaches the end-of-life this month according to 
> the Apache Spark release cadence, 
> https://spark.apache.org/versioning-policy.html .
> 
> $ git log --oneline v3.3.3..HEAD | wc -l
> 56
> 
> Along with the recent Apache Spark 3.4.2 release, I hope the users can get a 
> chance to have these last bits of Apache Spark 3.3.x, and I'd like to propose 
> to have Apache Spark 3.3.4 EOL Release vote on December 11th and volunteer as 
> the release manager.
> 
> WDTY?
> 
> Please let us know if you need more patches on branch-3.3.
> 
> Thanks,
> Dongjoon.
>   

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Tom Graves
 +1 for a 3.3.4 EOL Release. Thanks Dongjoon.
Tom
On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun 
 wrote:  
 
 Hi, All.

Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022, branch-3.3 has 
been maintained and served well until now.

- https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun 9th, 2022)
- https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm (vote result 
on June 14th, 2022)

As of today, branch-3.3 has 56 additional patches after v3.3.3 (tagged on Aug 
3rd about 4 month ago) and reaches the end-of-life this month according to the 
Apache Spark release cadence, https://spark.apache.org/versioning-policy.html .

$ git log --oneline v3.3.3..HEAD | wc -l
56

Along with the recent Apache Spark 3.4.2 release, I hope the users can get a 
chance to have these last bits of Apache Spark 3.3.x, and I'd like to propose 
to have Apache Spark 3.3.4 EOL Release vote on December 11th and volunteer as 
the release manager.

WDTY?

Please let us know if you need more patches on branch-3.3.

Thanks,
Dongjoon.
  

Re: [DISCUSS] SPIP: ShuffleManager short name registration via SparkPlugin

2023-12-04 Thread Alessandro Bellina
Hello devs,

We are going to be tabling the SPIP proposal given that we don't see
responses in the discussion thread. We still believe that making custom
ShuffleManagers easier to configure is worthwhile, given interactions with
our users, but we can revisit this later. If anyone in the list has any
additional comments please feel free to share.

Thank you

Alessandro


On Sun, Nov 5, 2023 at 8:11 AM Alessandro Bellina 
wrote:

> Thanks for the comments Reynold. This is an ease of use change, and it is
> not absolutely required (as other ease of use changes are not required
> either). That said, do we not want to invest in making Spark easier to
> configure for the average user, or even the user that is trying out Spark?
>
> Here are my thoughts:
>
> - Why can we use short names for SortShuffleManager ("sort"), but the same
> can't be extended? If spark.shuffle.manager is meant to be a pluggable API,
> it seems this mapping should be pluggable as well.
>
> - Plugin developers (like my project) would like to produce a simple
> plugin jar that can be used for all versions of Spark we support, but
> ShuffleManager APIs can change in non-binary compatible ways (it's a
> private API). As a result we document setting spark.shuffle.manager to a
> fully qualified class that is built for each version of Spark we bundle,
> guaranteeing a binary-compatible implementation. Having the ability to
> produce a short name for a fully qualified shuffle manager would remove
> having to look up this mapping.
>
> - ShuffleManager is very flexible (for good reasons) and it can be used to
> move shuffle in several ways, such as RDMA, caching, external stores, etc.
> With this flexibility comes working with other open source projects (such
> as UCX) that have their own configuration system. In this specific example,
> environment variables are needed to setup UCX for use from the JVM and with
> defaults that are particular to our shuffle usage. These configurations, as
> of today, need to be looked up by the user and applied to their
> application, and having a way to setup defaults would greatly improve the
> user experience.
>
> Thanks again for your feedback!
>
> Alessandro
>
> On Sat, Nov 4, 2023 at 6:04 PM Reynold Xin  wrote:
>
>> Why do we need this? The reason data source APIs need it is because it
>> will be used by very unsophisticated end users and used all the time (for
>> each connection / query). Shuffle is something you set up once, presumably
>> by fairly sophisticated admins / engineers.
>>
>>
>>
>> On Sat, Nov 04, 2023 at 2:42 PM, Alessandro Bellina 
>> wrote:
>>
>>> Hello devs,
>>>
>>> I would like to start discussion on the SPIP "ShuffleManager short name
>>> registration via SparkPlugin"
>>>
>>> The idea behind this change is to allow a driver plugin (spark.plugins)
>>> to export ShuffleManagers via short names, along with sensible default
>>> configurations. Users can then use this short name to enable this
>>> ShuffleManager + configs using spark.shuffle.manager.
>>>
>>> SPIP:
>>> https://docs.google.com/document/d/1flijDjMMAAGh2C2k-vg1u651RItaRquLGB_sVudxf6I/edit#heading=h.vqpecs4nrsto
>>> JIRA: https://issues.apache.org/jira/browse/SPARK-45792
>>>
>>> I look forward to hearing your feedback.
>>>
>>> Thanks
>>>
>>> Alessandro
>>>
>>
>>


unsubscribe

2023-12-04 Thread Duy Pham