andygrove opened a new issue, #33:
URL: https://github.com/apache/datafusion-java/issues/33

   ## Background
   
   `datafusion-java` provides a JVM binding to DataFusion via JNI. To 
distribute it through Maven Central, we need a packaging strategy that delivers 
the compiled Rust native library (`.so` / `.dylib` / `.dll`) alongside the Java 
classes so that consumers get a working artifact with a single dependency 
declaration — no separate native install step.
   
   ## Goal
   
   Publish a single artifact to Maven Central that works out of the box on:
   - Linux x86_64
   - Linux aarch64
   - macOS x86_64
   - macOS aarch64
   
   Windows (x86_64) support is desirable but out of scope for the initial 
release. The design should leave room to add it later without restructuring.
   
   ## Proposed approach: single fat JAR
   
   Bundle all platform-specific native libraries in one published JAR, 
organized by OS/arch under a known resource path:
   
   ```
   org/apache/datafusion/linux/amd64/libdatafusion_jni.so
   org/apache/datafusion/linux/aarch64/libdatafusion_jni.so
   org/apache/datafusion/darwin/x86_64/libdatafusion_jni.dylib
   org/apache/datafusion/darwin/aarch64/libdatafusion_jni.dylib
   ```
   
   At runtime, a loader class detects the current OS/arch, extracts the 
matching library from the JAR to a temp file, and calls `System.load()` on the 
absolute path. A `System.loadLibrary()` attempt should come first so users can 
override with a system-installed build.
   
   This mirrors the approach used by Apache DataFusion Comet (referenced only 
as prior art for fat-JAR packaging — `datafusion-java` is not otherwise related 
to Comet or Spark). The alternative — publishing one JAR per platform with 
Maven classifiers — is also viable but pushes platform selection onto consumers 
and complicates dependency declarations.
   
   ## Work items
   
   - [ ] Add a native loader class that detects OS/arch, extracts from the 
resource path, and loads via `System.load()`, with a `System.loadLibrary()` 
fallback. Include temp-file locking to handle concurrent JVMs.
   - [ ] Set up cross-compilation for the four target triples (Linux x86_64, 
Linux aarch64, macOS x86_64, macOS aarch64). Options: a CI matrix that produces 
per-arch artifacts, or Docker + OSXCross for cross-platform builds from a 
single host.
   - [ ] Wire the build so compiled libraries land at the correct 
`target/classes/...` path before `mvn package` runs.
   - [ ] Add a GitHub Actions release workflow: matrix builds per platform 
produce native libs as artifacts; a final job assembles them into the resource 
tree and runs `mvn deploy`.
   - [ ] Configure Maven Central / Sonatype publishing: staging repo, GPG 
signing, POM metadata.
   - [ ] Document the release process.
   
   ## Future work
   
   - [ ] Windows x86_64 support. The loader OS enum should already account for 
`.dll` and the `win32` path segment so this becomes a build-matrix change. 
Windows complicates temp-file cleanup (can't delete a loaded DLL) — extract to 
a versioned path and let the OS handle cleanup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to