Hello,

I did some work to separate the below PR into smaller PRs.


  *   Updating the versions of dependencies and maven plugins is done and 
merged into master.
  *   I separated out the work modularizing arrow-vector, 
arrow-memory-core/unsafe, and arrow-memory-netty.

Modularizing arrow-memory-core requires a smaller change to user command-line 
arguments. Instead of:
--add-opens=java.base/java.nio=ALL-UNNAMED

The user needs to add:
--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED

I initially tried to modularize arrow-vector separately from arrow-memory-core 
but found that any meaningful operation in arrow-vector would trigger an 
illegal access in memory-core if it wasn’t modularized.

I was able to run the tests for arrow-compression and arrow-tools successfully 
after modularizing memory-core, memory-unsafe-, and arrow-vector. Note that I 
had more success by making memory-core and memory-unsafe automatic modules.

I think we should make a decision here on if we want to bite the bullet and 
introduce a breaking user-facing change around command-line options. The other 
option is to wait for JDK 21 to modularize. That’s farther down the line and 
requires refactoring much of the memory module code and implementing a module 
using the foreign memory interface.

From: James Duong <james.du...@improving.com.INVALID>
Date: Tuesday, November 28, 2023 at 6:48 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module 
System
Hi,

I’ve made some major progress on this work in this PR:
https://github.com/apache/arrow/pull/38876


  *   The maven plugin for compiling module-info.java files using JDK 8 is 
working correctly.
  *   arrow-format, arrow-memory-core, arrow-memory-netty, arrow-memory-unsafe, 
and arrow-vector have been modularized successfully.
     *   Tests pass locally for all of these modules.
     *   They fail in CI. This is likely from me not updating a profile 
somewhere.

Similar to David’s PR from below, arrow-memory and modules needed to be 
refactored fairly significantly and split into two modules: a public-facing 
JPMS module and a separate module which adds to Netty’s packages 
(memory-netty-buffer-patch). What’s more problematic is that because we are 
using named modules now, users need to add more arguments to their Java command 
line to use arrow. If one were to use arrow-memory-netty they would need to add 
the following:

--add-opens java.base/jdk.internal.misc=io.netty.common 
--patch-module=io.netty.buffer=${project.basedir}/../memory-netty-buffer-patch/target/arrow-memory-netty-buffer-patch-${project.version}.jar
 
--add-opens=java.base/java.nio=org.apache.arrow.memory.core,io.netty.common,ALL-UNNAMED

Depending on where the memory-netty-buffer-patch JAR is located, and what 
version, the command the user needs to supply changes, so this seems like it’d 
be really inconvenient.

Do we want to proceed with modularizing existing memory modules? Both netty and 
unsafe? Or wait until the new memory module from Java 21 is available?

The module-info.java files are written fairly naively. I haven’t inspected 
thoroughly to determine what packages users will need.

We can continue modularizing more components in a separate PR. Ideally all the 
user breakage (class movement, new command-line argument requirements) happens 
within one major Arrow version.

From: James Duong <james.du...@improving.com.INVALID>
Date: Tuesday, November 21, 2023 at 1:16 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module 
System
I’m following up on this topic.

David has a PR from last year that’s done much of the heavy lifting for 
refactoring the codebase to be package-friendly.
https://github.com/apache/arrow/pull/13072

What’s changed since and what’s left:

  *   New components have been added (Flight SQL for example) that will need to 
be updated for modules.
  *   There wasn’t a clear solution on how to do this without breaking JDK 8 
support. Compiling module-info.java files require using JDK9, but using JDK9 
breaks using JDK8 methods of accessing sun.misc.Unsafe.
     *   There is a Gradle plugin that can compile module-info.java files 
purely syntactically that we can adapt to maven. It has limitations (the one I 
see is that it can’t iterate through classloaders to handle annotations), but 
using this might be a good stopgap until we JDK 8 support is deprecated.
  *   Some plugins need to be updated:
     *   maven-dependency-plugin 3.0.1 can’t parse module-info.class files.
     *   checkstyle 3.1.0 can’t parse module-info.java files. Our existing 
checkstyle rules file can’t be loaded with newer versions. We can exclude 
module-info.java for now and have a separate Issue for updating checkstyle 
itself and the rules file.
  *   grpc-java could not be modularized when the PR above was written.
     *   Grpc 1.57 now can be modularized 
(grpc/grpc-java#3522<https://github.com/grpc/grpc-java/issues/3522>)

From: David Dali Susanibar Arce <davi.sar...@gmail.com>
Date: Wednesday, May 25, 2022 at 5:02 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System
Hi All,

This email's purpose is a request for comments to migrate Arrow Java to JPMS
Java Platform Module System <https://openjdk.java.net/projects/jigsaw/spec/>
JSE 9+ (1).

Current status:

- Arrow Java use JSE1.8 specification

- Arrow Java works with JSE1.8/9/11/17

- This is possible because Java offers “legacy mode”

Proposal:

Migrate to JPMS Java Platform Module System. This Draft PR
<https://github.com/apache/arrow/pull/13072>(2<https://github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2<https://github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2%3chttps:/github.com/apache/arrow/pull/13072%3e(2>>>)
 contains an initial port of
the modules: Format / Memory Core / Memory Netty / Memory Unsafe / Vector
for evaluation.

Main Reason to migrate:

- JPMS offer Strong encapsulation, Well-defined interfaces
<https://github.com/nipafx/demo-jigsaw-reflection>, Explicit dependencies.
<https://nipafx.dev/java-modules-reflection-vs-encapsulation/> (3)(4)

- JPMS offer reliable configuration and security to hide platform internals.

- JPMS offers a partial solution to solve problems about read (80%) /write
(20%) code.

- JPMS offer optimization for readability about read/write ratio (90/10)
thru module-info.java.

- Consistency logs, JPMS implement consistency logs to really use that to
solve the current problem.

- Be able to customize JRE needed with only modules needed (not
java.desktop for example and others) thru JLink.

- Modules have also been implemented by other languages such as Javascript
(ES2015), C++(C++20), Net (Nuget/NetCore)..

- Consider taking a look at this discussion about pros/cons
<https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/>
(5).

- Eventual migration to JPMS is a practical necessity as more projects
migrate.

Effort:

- First of all we need to decide to move from JSE1.8 to JSE9+ or be able to
offer support for both jar components JSE1.8 and JSE9+ included.

- Go bottom up for JPMS.

- Packages need to be unique (i.e. org.apache.arrow.memory /
io.netty.buffer). Review Draft PR with initial proposal.

- Dependencies also need to be modularized. If some of our current
dependencies are not able to be used as a module this will be a blocker for
our modules (we could patch that but this is an extra effort).

Killers:

- FIXME! I need your support to identify killer reasons to be able to push
this implementation.

Please let us know if Arrow Java to JPMS Java Platform Module System is
needed and should be implemented.

Please use this file for any comments
https://docs.google.com/document/d/1qcJ8LPm33UICuGjRnsGBcm8dLI08MyiL8BO5JVzTutA/edit?usp=sharing

Resources used:

(1): https://openjdk.java.net/projects/jigsaw/spec/

(2): https://github.com/apache/arrow/pull/13072

(3): https://nipafx.dev/java-modules-reflection-vs-encapsulation/

(4): https://github.com/nipafx/demo-jigsaw-reflection

(5):
https://www.reddit.com/r/java/comments/okt3j3/do_you_use_jigsaw_modules_in_your_java_projects/

Best regards,

--
David

Reply via email to