andygrove opened a new issue, #33: URL: https://github.com/apache/datafusion-java/issues/33
## Background `datafusion-java` provides a JVM binding to DataFusion via JNI. To distribute it through Maven Central, we need a packaging strategy that delivers the compiled Rust native library (`.so` / `.dylib` / `.dll`) alongside the Java classes so that consumers get a working artifact with a single dependency declaration — no separate native install step. ## Goal Publish a single artifact to Maven Central that works out of the box on: - Linux x86_64 - Linux aarch64 - macOS x86_64 - macOS aarch64 Windows (x86_64) support is desirable but out of scope for the initial release. The design should leave room to add it later without restructuring. ## Proposed approach: single fat JAR Bundle all platform-specific native libraries in one published JAR, organized by OS/arch under a known resource path: ``` org/apache/datafusion/linux/amd64/libdatafusion_jni.so org/apache/datafusion/linux/aarch64/libdatafusion_jni.so org/apache/datafusion/darwin/x86_64/libdatafusion_jni.dylib org/apache/datafusion/darwin/aarch64/libdatafusion_jni.dylib ``` At runtime, a loader class detects the current OS/arch, extracts the matching library from the JAR to a temp file, and calls `System.load()` on the absolute path. A `System.loadLibrary()` attempt should come first so users can override with a system-installed build. This mirrors the approach used by Apache DataFusion Comet (referenced only as prior art for fat-JAR packaging — `datafusion-java` is not otherwise related to Comet or Spark). The alternative — publishing one JAR per platform with Maven classifiers — is also viable but pushes platform selection onto consumers and complicates dependency declarations. ## Work items - [ ] Add a native loader class that detects OS/arch, extracts from the resource path, and loads via `System.load()`, with a `System.loadLibrary()` fallback. Include temp-file locking to handle concurrent JVMs. - [ ] Set up cross-compilation for the four target triples (Linux x86_64, Linux aarch64, macOS x86_64, macOS aarch64). Options: a CI matrix that produces per-arch artifacts, or Docker + OSXCross for cross-platform builds from a single host. - [ ] Wire the build so compiled libraries land at the correct `target/classes/...` path before `mvn package` runs. - [ ] Add a GitHub Actions release workflow: matrix builds per platform produce native libs as artifacts; a final job assembles them into the resource tree and runs `mvn deploy`. - [ ] Configure Maven Central / Sonatype publishing: staging repo, GPG signing, POM metadata. - [ ] Document the release process. ## Future work - [ ] Windows x86_64 support. The loader OS enum should already account for `.dll` and the `win32` path segment so this becomes a build-matrix change. Windows complicates temp-file cleanup (can't delete a loaded DLL) — extract to a versioned path and let the OS handle cleanup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
