`orc-format` 1.0 (ORC-1531) for Apache ORC 2.0

2023-12-03 Thread Dongjoon Hyun
Hi, All.

As one of the key parts of Apache ORC 2.0, we've been discussing a new
repository and module, `orc-format`, in the following.

https://github.com/apache/orc/issues/1543

Now, we are ready to create a new repo.

Please take a look at the POC repo and code and let us know your thoughts.

Bests,
Dongjoon


Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-03 Thread Sean Owen
I am not sure we can control that - the Scala _x.y suffix has particular
meaning in the Scala ecosystem for artifacts and thus the naming of .jar
files. And we need to work with the Scala ecosystem.

What can't handle these files, Spring Boot? does it somehow assume the .jar
file name relates to Java modules?

By the by, Spark 4 is already moving to the jakarta.* packages for similar
reasons.

I don't think Spark does or can really leverage Java modules. It started
waaay before that and expect that it has some structural issues that are
incompatible with Java modules, like multiple places declaring code in the
same Java package.

As in all things, if there's a change that doesn't harm anything else and
helps support for Java modules, sure, suggest it. If it has the conflicts I
think it will, probably not possible and not really a goal I think.


On Sun, Dec 3, 2023 at 11:30 AM Marc Le Bihan  wrote:

> Hello,
>
> Last month, I've attempted the experience of upgrading my Spring-Boot
> 2 Java project, that relies heavily on Spark 3.4.2, to Spring-Boot 3. It
> didn't succeed yet, but was informative.
>
> Spring-Boot 2 → 3 means especially javax.* becoming jakarka.* :
> javax.activation, javax.ws.rs, javax.persistence, javax.validation,
> javax.servlet... all of these have to change their packages and
> dependencies.
> Apart of that, they were some trouble with ANTLR 4 against ANTLR 3,
> and few things with SFL4 and Log4J.
>
> It was not easy, and I guessed that going into modules could be a key.
> But when I'm near the Spark submodules of my project, it fail with messages
> such as:
> package org.apache.spark.sql.types is declared in the unnamed
> module, but module fr.ecoemploi.outbound.spark.core does not read it
>
> But I can't handle the spark dependencies easily, because they have an
> "invalid name" for Java. It's a matter that it doesn't want the "_" that is
> in the "_2.13" suffix of the jars.
> [WARNING] Can't extract module name from
> breeze-macros_2.13-2.1.0.jar: breeze.macros.2.13: Invalid module name: '2'
> is not a Java identifier
> [WARNING] Can't extract module name from
> spark-tags_2.13-3.4.2.jar: spark.tags.2.13: Invalid module name: '2' is not
> a Java identifier
> [WARNING] Can't extract module name from
> spark-unsafe_2.13-3.4.2.jar: spark.unsafe.2.13: Invalid module name: '2' is
> not a Java identifier
> [WARNING] Can't extract module name from
> spark-mllib_2.13-3.4.2.jar: spark.mllib.2.13: Invalid module name: '2' is
> not a Java identifier
> [... around 30 ...]
>
> I think that changing the naming pattern of the Spark jars for the 4.x
> could be a good idea,
> but beyond that, what about attempting to integrate Spark into
> modules, it's submodules defining module-info.java?
>
> Is it something that you think that [must | should | might | should
> not | must not] be done?
>
> Regards,
>
> Marc Le Bihan
>


Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-03 Thread Marc Le Bihan

Hello,

    Last month, I've attempted the experience of upgrading my 
Spring-Boot 2 Java project, that relies heavily on Spark 3.4.2, to 
Spring-Boot 3. It didn't succeed yet, but was informative.


    Spring-Boot 2 → 3 means especially javax.* becoming jakarka.* : 
javax.activation, javax.ws.rs, javax.persistence, javax.validation, 
javax.servlet... all of these have to change their packages and 
dependencies.
    Apart of that, they were some trouble with ANTLR 4 against ANTLR 3, 
and few things with SFL4 and Log4J.


    It was not easy, and I guessed that going into modules could be a 
key. But when I'm near the Spark submodules of my project, it fail with 
messages such as:
        package org.apache.spark.sql.types is declared in the unnamed 
module, but module fr.ecoemploi.outbound.spark.core does not read it


    But I can't handle the spark dependencies easily, because they have 
an "invalid name" for Java. It's a matter that it doesn't want the "_" 
that is in the "_2.13" suffix of the jars.
        [WARNING] Can't extract module name from 
breeze-macros_2.13-2.1.0.jar: breeze.macros.2.13: Invalid module name: 
'2' is not a Java identifier
        [WARNING] Can't extract module name from 
spark-tags_2.13-3.4.2.jar: spark.tags.2.13: Invalid module name: '2' is 
not a Java identifier
        [WARNING] Can't extract module name from 
spark-unsafe_2.13-3.4.2.jar: spark.unsafe.2.13: Invalid module name: '2' 
is not a Java identifier
        [WARNING] Can't extract module name from 
spark-mllib_2.13-3.4.2.jar: spark.mllib.2.13: Invalid module name: '2' 
is not a Java identifier

    [... around 30 ...]

    I think that changing the naming pattern of the Spark jars for the 
4.x could be a good idea,
    but beyond that, what about attempting to integrate Spark into 
modules, it's submodules defining module-info.java?


    Is it something that you think that [must | should | might | should 
not | must not] be done?


Regards,

Marc Le Bihan

unsubscribe

2023-12-03 Thread Kalpana Jalawadi



Unsubscribe

2023-12-03 Thread Kalpana Jalawadi