An update on this work.

I’ve been able to update the following to JPMS modules now:

  *   arrow-memory-core
  *   arrow-memory-unsafe
  *   arrow-format
  *   arrow-vector
  *   arrow-compression
  *   arrow-tools
  *   arrow-avro
  *   arrow-jdbc
  *   arrow-algorithm

I don’t think the following should be modularized (but would appreciate 
feedback as well):

  *   arrow-performance (test-only module)
  *   flight-integration-tests (test-only module)
  *   flight-jdbc-core
  *   flight-jdbc-driver (Need to think about this. MySQL suggests they want to 
modularize their JDBC driver: https://bugs.mysql.com/bug.php?id=97683 . How 
does this interact with the ServiceLoader changes in JPMS as well).

I’m starting the flight-related modules now. The first issue I’ve noticed is 
that flight-core and flight-grpc reuse packages. I’d like to combine 
flight-core and flight-grpc because of this (currently we only have Flight 
using grpc in Java).

After Flight I’d take a look at the modules that have native code:

  *   arrow-c-data
  *   arrow-orc
  *   arrow-dataset
  *   arrow-gandiva


From: James Duong <james.du...@improving.com.INVALID>
Date: Tuesday, December 5, 2023 at 1:39 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module 
System
I expect that we’d still have to change CLI arguments in JDK17.

The need for changing the CLI arguments is due to memory-core now being a named 
module while requiring access to Unsafe and using reflection for accessing 
internals of java.nio.Buffer.

We’re using JDK8 code for doing both currently. It might be worth looking into 
if there’s a JDK9+ way of doing this without needing reflection and compiling 
memory-core as a multi-release JAR.

I don’t expect more CLI argument changes unless we use reflection on NIO 
classes in other modules.


From: Adam Lippai <a...@rigo.sk>
Date: Tuesday, December 5, 2023 at 9:28 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module 
System
I believe Spark 4.0 was mentioned before. It’ll require Java 17 and will be
released in a few months (June?).

Best regards,
Adam Lippai

On Tue, Dec 5, 2023 at 12:05 David Li <lidav...@apache.org> wrote:

> Thanks James for delving into this mess.
>
> It looks like this change is unavoidable if we want to modularize? I think
> this is OK. Will the CLI argument change as we continue modularizing, or is
> this the only change that will be needed?
>
> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
> > Hello,
> >
> > I did some work to separate the below PR into smaller PRs.
> >
> >
> >   *   Updating the versions of dependencies and maven plugins is done
> > and merged into master.
> >   *   I separated out the work modularizing arrow-vector,
> > arrow-memory-core/unsafe, and arrow-memory-netty.
> >
> > Modularizing arrow-memory-core requires a smaller change to user
> > command-line arguments. Instead of:
> > --add-opens=java.base/java.nio=ALL-UNNAMED
> >
> > The user needs to add:
> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
> >
> > I initially tried to modularize arrow-vector separately from
> > arrow-memory-core but found that any meaningful operation in
> > arrow-vector would trigger an illegal access in memory-core if it
> > wasn’t modularized.
> >
> > I was able to run the tests for arrow-compression and arrow-tools
> > successfully after modularizing memory-core, memory-unsafe-, and
> > arrow-vector. Note that I had more success by making memory-core and
> > memory-unsafe automatic modules.
> >
> > I think we should make a decision here on if we want to bite the bullet
> > and introduce a breaking user-facing change around command-line
> > options. The other option is to wait for JDK 21 to modularize. That’s
> > farther down the line and requires refactoring much of the memory
> > module code and implementing a module using the foreign memory
> > interface.
> >
> > From: James Duong <james.du...@improving.com.INVALID>
> > Date: Tuesday, November 28, 2023 at 6:48 PM
> > To: dev@arrow.apache.org <dev@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > Hi,
> >
> > I’ve made some major progress on this work in this PR:
> > https://github.com/apache/arrow/pull/38876
> >
> >
> >   *   The maven plugin for compiling module-info.java files using JDK 8
> > is working correctly.
> >   *   arrow-format, arrow-memory-core, arrow-memory-netty,
> > arrow-memory-unsafe, and arrow-vector have been modularized
> > successfully.
> >      *   Tests pass locally for all of these modules.
> >      *   They fail in CI. This is likely from me not updating a profile
> > somewhere.
> >
> > Similar to David’s PR from below, arrow-memory and modules needed to be
> > refactored fairly significantly and split into two modules: a
> > public-facing JPMS module and a separate module which adds to Netty’s
> > packages (memory-netty-buffer-patch). What’s more problematic is that
> > because we are using named modules now, users need to add more
> > arguments to their Java command line to use arrow. If one were to use
> > arrow-memory-netty they would need to add the following:
> >
> > --add-opens java.base/jdk.internal.misc=io.netty.common
> >
> --patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
>
> >
> --add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED
> >
> > Depending on where the memory-netty-buffer-patch JAR is located, and
> > what version, the command the user needs to supply changes, so this
> > seems like it’d be really inconvenient.
> >
> > Do we want to proceed with modularizing existing memory modules? Both
> > netty and unsafe? Or wait until the new memory module from Java 21 is
> > available?
> >
> > The module-info.java files are written fairly naively. I haven’t
> > inspected thoroughly to determine what packages users will need.
> >
> > We can continue modularizing more components in a separate PR. Ideally
> > all the user breakage (class movement, new command-line argument
> > requirements) happens within one major Arrow version.
> >
> > From: James Duong <james.du...@improving.com.INVALID>
> > Date: Tuesday, November 21, 2023 at 1:16 PM
> > To: dev@arrow.apache.org <dev@arrow.apache.org>
> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
> > Module System
> > I’m following up on this topic.
> >
> > David has a PR from last year that’s done much of the heavy lifting for
> > refactoring the codebase to be package-friendly.
> > https://github.com/apache/arrow/pull/13072
> >
> > What’s changed since and what’s left:
> >
> >   *   New components have been added (Flight SQL for example) that will
> > need to be updated for modules.
> >   *   There wasn’t a clear solution on how to do this without breaking
> > JDK 8 support. Compiling module-info.java files require using JDK9, but
> > using JDK9 breaks using JDK8 methods of accessing sun.misc.Unsafe.
> >      *   There is a Gradle plugin that can compile module-info.java
> > files purely syntactically that we can adapt to maven. It has
> > limitations (the one I see is that it can’t iterate through
> > classloaders to handle annotations), but using this might be a good
> > stopgap until we JDK 8 support is deprecated.
> >   *   Some plugins need to be updated:
> >      *   maven-dependency-plugin 3.0.1 can’t parse module-info.class
> > files.
> >      *   checkstyle 3.1.0 can’t parse module-info.java files. Our
> > existing checkstyle rules file can’t be loaded with newer versions. We
> > can exclude module-info.java for now and have a separate Issue for
> > updating checkstyle itself and the rules file.
> >   *   grpc-java could not be modularized when the PR above was written.
> >      *   Grpc 1.57 now can be modularized
> > (grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)
> >
> > From: David Dali Susanibar Arce <davi.sar...@gmail.com>
> > Date: Wednesday, May 25, 2022 at 5:02 AM
> > To: dev@arrow.apache.org <dev@arrow.apache.org>
> > Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module
> System
> > Hi All,
> >
> > This email's purpose is a request for comments to migrate Arrow Java to
> JPMS
> > Java Platform Module System <
> https://openjdk.java.net/projects/jigsaw/spec/><https://openjdk.java.net/projects/jigsaw/spec/%3e><https://openjdk.java.net/projects/jigsaw/spec/%3e%3chttps:/openjdk.java.net/projects/jigsaw/spec/%3e%3e>
> > JSE 9+ (1).
> >
> > Current status:
> >
> > - Arrow Java use JSE1.8 specification
> >
> > - Arrow Java works with JSE1.8/9/11/17
> >
> > - This is possible because Java offers “legacy mode”
> >
> > Proposal:
> >
> > Migrate to JPMS Java Platform Module System. This Draft PR
> > <https://github.com/apache/arrow/pull/13072>(2<
> https://github.com/apache/arrow/pull/13072%3e(2<<https://github.com/apache/arrow/pull/13072%3e(2%3c><https://github.com/apache/arrow/pull/13072%3e(2%3c%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3c%3e>
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2
> <
> https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)><https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3e%3e%3e)%3e>
>
> > contains an initial port of
> > the modules: Format / Memory Core / Memory Netty / Memory Unsafe /
> > Vector
> > for evaluation.
> >
> > Main Reason to migrate:
> >
> > - JPMS offer Strong encapsulation, Well-defined interfaces
> > <https://github.com/nipafx/demo-jigsaw-reflection>, Explicit
> dependencies.
> > <https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)
> >
> > - JPMS offer reliable configuration and security to hide platform
> internals.
> >
> > - JPMS offers a partial solution to solve problems about read (80%)
> /write
> > (20%) code.
> >
> > - JPMS offer optimization for readability about read/write ratio (90/10)
> > thru module-info.java.
> >
> > - Consistency logs, JPMS implement consistency logs to really use that to
> > solve the current problem.
> >
> > - Be able to customize JRE needed with only modules needed (not
> > java.desktop for example and others) thru JLink.
> >
> > - Modules have also been implemented by other languages such as
> Javascript
> > (ES2015), C++(C++20), Net (Nuget/NetCore)..
> >
> > - Consider taking a look at this discussion about pros/cons
> > <
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > (5).
> >
> > - Eventual migration to JPMS is a practical necessity as more projects
> > migrate.
> >
> > Effort:
> >
> > - First of all we need to decide to move from JSE1.8 to JSE9+ or be able
> to
> > offer support for both jar components JSE1.8 and JSE9+ included.
> >
> > - Go bottom up for JPMS.
> >
> > - Packages need to be unique (i.e. org.apache.arrow.memory /
> > io.netty.buffer). Review Draft PR with initial proposal.
> >
> > - Dependencies also need to be modularized. If some of our current
> > dependencies are not able to be used as a module this will be a blocker
> for
> > our modules (we could patch that but this is an extra effort).
> >
> > Killers:
> >
> > - FIXME! I need your support to identify killer reasons to be able to
> push
> > this implementation.
> >
> > Please let us know if Arrow Java to JPMS Java Platform Module System is
> > needed and should be implemented.
> >
> > Please use this file for any comments
> >
> https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing
> >
> > Resources used:
> >
> > (1): https://openjdk.java.net/projects/jigsaw/spec/
> >
> > (2): https://github.com/apache/arrow/pull/13072
> >
> > (3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/
> >
> > (4): https://github.com/nipafx/demo-jigsaw-reflection
> >
> > (5):
> >
> https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/
> >
> > Best regards,
> >
> > --
> > David
>

Reply via email to