sezruby opened a new pull request, #12244:
URL: https://github.com/apache/gluten/pull/12244

   ## What changes were proposed in this pull request?
   
   Drop the `15.0.0-gluten` custom artifact coordinate that gluten currently 
builds via `dev/build-arrow.sh`. Switch every `org.apache.arrow:*` dependency 
to the vanilla `${arrow.version}` coordinate (15.0.0 for Spark 3.x default; 
18.1.0 already in the Spark-4.0/4.1 profiles).
   
   The `arrow-gluten.version` property and the `versions:set 
-DnewVersion=15.0.0-gluten` step in `build-arrow.sh` are removed. The 
`modify_arrow_dataset_scan_option.patch` is no longer applied to the Arrow Java 
build.
   
   ## Why this is safe — the patch audit
   
   `build-arrow.sh` previously applied four patches and renamed the resulting 
jars to `15.0.0-gluten`. Auditing each:
   
   | Patch | Lines | Touches | Status after this PR |
   |---|---|---|---|
   | `modify_arrow.patch` | 135 | C++ only (CMakeLists, ThirdpartyToolchain, 
helpers.h, Java pom S3/HDFS) | Still applied; no rename needed (CMake patch 
survives jar coordinate change) |
   | `modify_arrow_dataset_scan_option.patch` | 883 | Adds JVM classes 
`CsvFragmentScanOptions`, `CsvConvertOptions`, `ConvertUtil`, etc.; C++ 
`file_csv` / Substrait `expression_internal` / `serde` | **No longer applied to 
Arrow JVM build.** Every gluten consumer of these JVM classes was deleted by 
#12130 (Arrow-CSV / Arrow-Dataset JVM code path removal). The C++ portion is 
still applied via Velox's `CMake/resolve_dependency_modules/arrow/` — 
`get-velox.sh` continues to copy the patch file into the Velox EP. |
   | `cmake-compatibility.patch` | 34 | C++ only (CMake policy version) | 
Unchanged |
   | `support_ibm_power.patch` | 28 | Adds `ppc64le→ppcle_64` arch case to 
`JniLoader.java` in arrow-c and arrow-dataset | **Still applied**; ppc64le CI 
builds still need it. The patch only adds a switch case — it doesn't change any 
public Arrow API and doesn't require a custom artifact coordinate. ppc64le devs 
still get the patched binary by running `build-arrow.sh`, which now installs 
vanilla `15.0.0` (overriding the Central jars in their local m2). |
   
   A separate sweep (`grep -rn 'org.apache.arrow.dataset' --include='*.java' 
--include='*.scala'`) shows that after #12130 the only main-source consumers of 
`arrow-dataset` are `gluten-arrow/.../ArrowNativeMemoryPool` and 
`ArrowReservationListener`, which use only the upstream 
`org.apache.arrow.dataset.jni.{NativeMemoryPool, ReservationListener}` types — 
no patched classes.
   
   ## Effect on contributors
   
   - **x86_64 / aarch64:** `dev/build-arrow.sh` is no longer required to 
bootstrap the build. All Arrow JVM dependencies resolve from Maven Central. CI 
/ local builds skip the ~hour-long Arrow C++/Java compile.
   - **ppc64le:** `dev/build-arrow.sh` is still required (for the patched 
`arrow-c-data`/`arrow-dataset` JNI binaries with ppc64le arch mapping). The 
script now installs `arrow-vector:15.0.0` etc. into local m2, overriding 
Central — same dev-loop as before, just without the rename indirection.
   
   ## Effect on shading
   
   Independent of bundling. The bundled gluten-velox-bundle still ships 
unshaded `org.apache.arrow.*` per #12226. This PR doesn't change which Arrow 
artifacts end up in the bundle — only their coordinate. The follow-up to 
actually unbundle Arrow (use Spark's shipped Arrow at runtime) is tracked 
separately in the discussion under #12226.
   
   ## How was this patch tested?
   
   - `mvn dependency:tree -pl gluten-arrow` shows every `org.apache.arrow:*` 
resolved at vanilla `15.0.0` / `18.1.0` from Central.
   - Sweep for stale references: `grep -rn 
'arrow-gluten\.version\|15\.0\.0-gluten'` — no matches outside of expected pom 
diff context.
   - Manual CI run pending.
   
   ## References
   
   - #12130 — removed the Arrow-CSV / Arrow-Dataset JVM consumers that 
justified `modify_arrow_dataset_scan_option.patch`
   - #12226 — fixed Arrow C-Data shading mismatch (independent; addresses the 
bundled-jar shading bug)
   - Discussion on #12226 — `zhztheplayer` and `FelixYBW` proposed this 
unbundling direction


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to