felipepessoto opened a new pull request, #12371:
URL: https://github.com/apache/gluten/pull/12371

   Fix https://github.com/apache/gluten/issues/9296.
   I wanted to create this PR to start discussing this, so we can have an idea 
of how it would work, if this is worth, etc.
   
   ## What changes are proposed in this pull request?
   Adds an CI pipeline that runs delta-io/delta's `spark` ScalaTest suite
   against the Gluten Velox bundle, so we can validate Gluten against a real 
Delta
   release and catch regressions over time.
   
   Running the Delta UTs on Gluten produces **many expected failures** (Gluten 
does
   not yet offload every Delta code path, and falls back or behaves differently 
in
   places). A plain "red on any failure" gate would be useless. Instead, the
   pipeline keeps a **committed baseline of known failures** and gates each run
   against it:
   
   - **regression** -- a test fails that is *not* in the baseline -> the shard 
fails.
   - **expected** -- a failing test that *is* in the baseline -> ignored.
   - **now-passing** -- a baseline test that starts passing -> fails the shard 
(keeps
     the baseline honest), unless `fail_on_fixed=false`.
   
   ### How it works
   
   1. Builds the Velox/Gluten native libs and assembles the
      `gluten-velox-bundle` fat jar (Spark 4.1 + Scala 2.13 + JDK 17, Delta 
profile).
   2. Clones delta-io/delta at a release tag (currently `v4.2.0`), drops the 
bundle
      onto the `spark` project's test classpath, and patches 
`DeltaSQLCommandTest`
      to register `GlutenPlugin`.
   3. Runs `sbt spark/test` **sharded by suite** across 16 shards, with 
ScalaTest's
      JUnit XML reporter enabled, then gates each shard with
      `compare-test-results.py` against `known-failures.txt`. A final job 
aggregates
      all shards into a single ready-to-commit baseline and flags stale entries.
   
   ### Files
   
   | File | Purpose |
   |---|---|
   | `.github/workflows/delta_spark_ut.yml` | The workflow (build bundle -> 
shard tests -> gate). |
   | `.github/workflows/util/delta-spark-ut/setup-delta.sh` | Clones Delta, 
injects the Gluten bundle, patches `DeltaSQLCommandTest`. |
   | `.github/workflows/util/delta-spark-ut/compare-test-results.py` | Parses 
JUnit XML and enforces / seeds / aggregates against the baseline (stdlib only). 
|
   | `.github/workflows/util/delta-spark-ut/known-failures.txt` | Committed 
baseline of currently-expected failures (`<suite>#<test>` per line). |
   | `.github/workflows/util/delta-spark-ut/README.md` | Documents the gate, 
bootstrapping, and baseline refresh. |
   
   ### Operational hardening
   
   - **JDK 17 + Arrow/Netty**: forked test JVMs get the `--add-opens` set plus
     `-Dio.netty.tryReflectionSetAccessible=true` (otherwise Arrow's allocator
     fails to initialize).
   - **Heap tuning**: forked-test heap and the sbt launcher's idle G1 behavior 
are
     tuned to keep the ~16 GB runner under the cgroup OOM threshold.
   - **Hang watchdog**: a per-shard watchdog dumps threads and kills a forked 
test
     JVM that has gone silent too long, so a wedged suite can't stall the whole 
job.
   - **DeletionVectorsSuite 2B-row tests**: two tests build/read/delete a
     2-billion-row table and balloon the fork to ~13 GB of native memory
     (Velox row-index materialization), OOM-killing it and hanging the shard. 
They
     are force-failed (with a clear message) rather than silently ignored, so 
the
     gap stays visible until the native memory blow-up is fixed.
   
   ### Scope / known limitations
   
   - Velox backend, x86 only; Delta `v4.2.0` / Spark 4.1 / Scala 2.13 / JDK 17.
   - The baseline reflects the *current* set of known Delta-on-Gluten failures;
     refresh it via a `workflow_dispatch` run with `update_baseline=true`.
   - **Future work -- Delta 4.3.0**: attempted, but the bundle (compiled against
     Delta 4.1.0) hits a binary-incompatible Delta change
     (`IdentityColumn.logTableWrite` first param `Snapshot` -> 
`SnapshotDescriptor`),
     which `NoSuchMethodError`s on every write. Supporting 4.3.0 needs the 
bundle
     built against 4.3.0; tracked as follow-up.
   
   ## How was this patch tested?
   
   This change *is* CI. The workflow runs automatically on PRs that touch its 
files
   and via manual dispatch. In the latest runs all 16 shards pass against the
   committed baseline (failures limited to known-failures entries; no 
regressions).
   
   ## Delta Spark UT (Gluten) — shard count vs test parallelism
   
    Sharding is by **suite** (`MurmurHash3(suiteName) % NUM_SHARDS`), so total 
test
    work is fixed (~1250 fork-minutes). The runners are 4-core / ~16 GB.
   
    | Config | Runner jobs | Forks/shard | Max shard | Wall-clock | Billed 
job-hrs* | Outcome |
    |---|---|---|---|---|---|---|
    | 16 shards × 1 fork | 16 | 1 | ~110 min | ~130 min | ~29 | ✅ green |
    | **4 shards × 4 forks** | **4** | **4** | **158 min** | **178 min** | 
**~10.5** | **✅ green** |
    | 4 shards × 1 fork | 4 | 1 | 360 min (hit cap) | — | — | ❌ cancelled |
   
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: GitHub Copilot CLI
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to