I think Spark should start shading it’s problematic deps similar to how
it’s done in Flink

On Mon, 4 Dec 2023 at 2:57 Sean Owen <sro...@gmail.com> wrote:

> I am not sure we can control that - the Scala _x.y suffix has particular
> meaning in the Scala ecosystem for artifacts and thus the naming of .jar
> files. And we need to work with the Scala ecosystem.
>
> What can't handle these files, Spring Boot? does it somehow assume the
> .jar file name relates to Java modules?
>
> By the by, Spark 4 is already moving to the jakarta.* packages for similar
> reasons.
>
> I don't think Spark does or can really leverage Java modules. It started
> waaay before that and expect that it has some structural issues that are
> incompatible with Java modules, like multiple places declaring code in the
> same Java package.
>
> As in all things, if there's a change that doesn't harm anything else and
> helps support for Java modules, sure, suggest it. If it has the conflicts I
> think it will, probably not possible and not really a goal I think.
>
>
> On Sun, Dec 3, 2023 at 11:30 AM Marc Le Bihan <mlebiha...@gmail.com>
> wrote:
>
>> Hello,
>>
>>     Last month, I've attempted the experience of upgrading my Spring-Boot
>> 2 Java project, that relies heavily on Spark 3.4.2, to Spring-Boot 3. It
>> didn't succeed yet, but was informative.
>>
>>     Spring-Boot 2 → 3 means especially javax.* becoming jakarka.* :
>> javax.activation, javax.ws.rs, javax.persistence, javax.validation,
>> javax.servlet... all of these have to change their packages and
>> dependencies.
>>     Apart of that, they were some trouble with ANTLR 4 against ANTLR 3,
>> and few things with SFL4 and Log4J.
>>
>>     It was not easy, and I guessed that going into modules could be a
>> key. But when I'm near the Spark submodules of my project, it fail with
>> messages such as:
>>         package org.apache.spark.sql.types is declared in the unnamed
>> module, but module fr.ecoemploi.outbound.spark.core does not read it
>>
>>     But I can't handle the spark dependencies easily, because they have
>> an "invalid name" for Java. It's a matter that it doesn't want the "_" that
>> is in the "_2.13" suffix of the jars.
>>         [WARNING] Can't extract module name from
>> breeze-macros_2.13-2.1.0.jar: breeze.macros.2.13: Invalid module name: '2'
>> is not a Java identifier
>>         [WARNING] Can't extract module name from
>> spark-tags_2.13-3.4.2.jar: spark.tags.2.13: Invalid module name: '2' is not
>> a Java identifier
>>         [WARNING] Can't extract module name from
>> spark-unsafe_2.13-3.4.2.jar: spark.unsafe.2.13: Invalid module name: '2' is
>> not a Java identifier
>>         [WARNING] Can't extract module name from
>> spark-mllib_2.13-3.4.2.jar: spark.mllib.2.13: Invalid module name: '2' is
>> not a Java identifier
>>         [... around 30 ...]
>>
>>     I think that changing the naming pattern of the Spark jars for the
>> 4.x could be a good idea,
>>     but beyond that, what about attempting to integrate Spark into
>> modules, it's submodules defining module-info.java?
>>
>>     Is it something that you think that [must | should | might | should
>> not | must not] be done?
>>
>> Regards,
>>
>> Marc Le Bihan
>>
>

Reply via email to