sezruby commented on PR #12245: URL: https://github.com/apache/gluten/pull/12245#issuecomment-4628389623
Closing this — the unbundling direction turns out to be incompatible with gluten's current Spark 3.3 / 3.4 support, and I don't think the workaround is worth the risk. cc @zhztheplayer @FelixYBW **What CI showed.** Spark 3.5 / 4.0 / 4.1 lanes were on track, but `spark-test-spark33` and `spark-test-spark34` (and several `tpc-test-*` lanes built against them) failed early. Root cause traced to the bundled-Arrow being load-bearing for older Spark: - Spark 3.3.1 ships Arrow 7.0.0 - Spark 3.4.4 ships Arrow 11.0.0 - Spark 3.5.5 ships Arrow 15.0.0 - Spark 4.0.x / 4.1.x ship Arrow 18.x Gluten's parent `pom.xml` pins `<arrow.version>15.0.0</arrow.version>` and uses it at compile scope. Today that works because gluten bundles its own Arrow 15 into the velox bundle, which wins classloader resolution at runtime over Spark's older Arrow. Once `arrow-memory-*` / `arrow-vector` flip to `scope=provided` (this PR), the bundle stops shipping Arrow. The compile classpath still has 15, but at runtime on Spark 3.3 / 3.4 only the older Arrow (7 / 11) is on the classpath — `NoSuchMethodError` / `NoClassDefFoundError` follow. **Workarounds considered.** 1. Per-Spark-profile `<arrow.version>` overrides (3.3→7.0, 3.4→11.0, 3.5→15.0, 4.x→18.1). Compiles, but ships gluten built against Arrow 7 on the 3.3 profile — exactly the "API stability across versions" concern you raised on [#12226](https://github.com/apache/gluten/pull/12226) (`> Memory and vector APIs should be stable across minor versions / This sounds a real risk`), now applied across an *eight-version* gap rather than a one-or-two-version gap. Surface area too large to be confident without per-version testing. 2. Conditional `<scope>` (provided on 3.5+, compile on 3.3/3.4). Works mechanically but is ugly and leaves the bug ([#12225](https://github.com/apache/gluten/issues/12225)) latent on Spark 3.3 / 3.4. 3. Drop Spark 3.3 / 3.4 support. Out of scope for this fix. None feels worth it as a one-shot, especially since [#12226](https://github.com/apache/gluten/pull/12226) already neutralized the immediate `NoSuchMethodError` from [#12225](https://github.com/apache/gluten/issues/12225) by un-shading the boundary types. **What I'm keeping.** [#12244](https://github.com/apache/gluten/pull/12244) — drop the `15.0.0-gluten` artifact rename, drop the dead `modify_arrow_dataset_scan_option.patch` from the Arrow JVM build, depend on vanilla Apache Arrow from Maven Central. CI green there. That gives non-ppc64le contributors a faster build-from-source path without changing the runtime/bundling story. **For follow-up.** If gluten ever drops Spark 3.3 / 3.4, this unbundling work is small — the diff is ~3 poms. Happy to revisit then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
