felipepessoto opened a new issue, #12377:
URL: https://github.com/apache/gluten/issues/12377
### Backend
VL (Velox)
### Bug description
## Backend
VL (Velox)
## Bug description
**Expected:** A Delta `MERGE INTO` that writes deletion vectors (DVs)
completes successfully, exactly as it does on vanilla Spark + Delta.
**Actual:** Under the Gluten Velox bundle the MERGE intermittently aborts
with a native `VeloxRuntimeError` (`INVALID_STATE`) raised by Gluten's Delta DV
bitmap aggregator:
Delta RoaringBitmapArray row index 9223372036854775807 exceeds max
representable value 9223372030412324864
`9223372036854775807` is exactly `Long.MAX_VALUE` (`2^63 - 1`). The target
table in the failing test is tiny (a handful of rows), so this is **not** a
real row index -- it is a sentinel / placeholder value that is leaking into the
DV-write aggregation.
The aggregation that builds the per-file DV (`PartialAggregation`, function
`addSafe`) packs each matched target row's index into a `RoaringBitmapArray`.
`RoaringBitmapArray::addSafe` enforces `value <= kMaxRepresentableValue` (=
`0x7ffffffe80000000` = `9223372030412324864`, which the code comments say
mirrors Delta JVM's `RoaringBitmapArray.MAX_REPRESENTABLE_VALUE`).
`Long.MAX_VALUE` is one 2^32 block above that ceiling, so the check fails and
the whole stage aborts.
**This is flaky / non-deterministic.** The exact same, byte-for-byte
identical bundle passed this test in one CI run and failed it in the next (see
Logs). So whether the sentinel reaches the aggregator depends on runtime plan /
scan / scheduling (split boundaries, batch composition, task distribution), not
on a source change. It reproduces in the suite:
org.apache.spark.sql.delta.generatedsuites.MergeIntoExtendedSyntaxSQLPathBasedDVsPredPushOnSuite
test: extended syntax - update + conditional insert - isPartitioned:
true
(`...DVsPredPushOn...` = deletion vectors on, predicate pushdown on.)
### Root cause analysis
- The aggregator only skips SQL NULLs; it does not special-case the sentinel:
- `cpp/velox/operators/functions/delta/DeltaBitmapAggregator.cc:63-69`
(`addInput` returns early only when `!value.has_value()`),
- `cpp/velox/operators/functions/delta/DeltaBitmapAggregator.cc:43-46`
(`addRowIndex` checks only `value >= 0`, then calls `bitmap.addSafe`).
- The ceiling and check:
- `cpp/velox/compute/delta/RoaringBitmapArray.cpp:91-98` (`addSafe`,
`VELOX_CHECK_LE(value, kMaxRepresentableValue, ...)`),
- `cpp/velox/compute/delta/RoaringBitmapArray.h:51-56`
(`kMaxHighKey = 0x7ffffffe`, `kMaxLowKeyForMaxHighKey = 0x80000000`,
`kMaxRepresentableValue = (kMaxHighKey << 32) | kMaxLowKeyForMaxHighKey`;
comment: "Matches Delta JVM RoaringBitmapArray.MAX_REPRESENTABLE_VALUE").
Open question for a maintainer with Velox + Delta DV-write context: Delta's
own JVM `RoaringBitmapArray` uses the **same** `MAX_REPRESENTABLE_VALUE`, so
vanilla Delta would reject `Long.MAX_VALUE` too. Since vanilla Delta passes
this MERGE, it must either never produce the sentinel on the DV-write branch or
filter it out before the bitmap is built. That suggests the real defect is
**upstream of the aggregator** -- Gluten's native row-index materialization /
DV-write plan is emitting (and not filtering) a `Long.MAX_VALUE` placeholder
that vanilla Delta would have excluded. The `addSafe` check is just where it
surfaces. Two possible fix directions:
1. Stop the sentinel at the source (mirror Delta's filter so placeholder
rows never reach the DV aggregation), or
2. Make the aggregator skip the sentinel the same way it skips NULLs -- but
only if that matches Delta's documented semantics (silently dropping a
genuinely out-of-range index would corrupt the DV, so option 1 is preferred
unless the sentinel is a contract).
This was written with the assistance of AI tooling.
## Gluten version
main branch
## Spark version
spark-4.0.x (actually Spark 4.1.0 -- Delta 4.2.0's default; the form has no
4.1 option)
## Spark configurations
From the Delta-on-Gluten test harness (patched `DeltaSQLCommandTest`):
spark.plugins = org.apache.gluten.GlutenPlugin
spark.shuffle.manager =
org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.memory.offHeap.enabled = true
spark.memory.offHeap.size = 2g
Delta 4.2.0, Scala 2.13, JDK 17
(Delta defaults: deletion vectors enabled; predicate pushdown enabled.)
## System information
CI runner: ubuntu-22.04 host, ~16 GB RAM, container
apache/gluten:centos-9-jdk17. Not run via dev/info.sh (observed in CI).
## Relevant logs
Delta Spark UT (Gluten) pipeline, apache/gluten run 28198677737, shard 1
(job 83536282846). The prior, byte-for-byte identical run 28148323203 passed
the same test (shard 1: 230 expected failures, 0 regressions) -- demonstrating
the intermittency.
extended syntax - update + conditional insert - isPartitioned: true ***
FAILED ***
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 1028.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 1028.0 (TID 843):
org.apache.gluten.exception.GlutenException: ... Exception:
VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (9223372036854775807 vs. 9223372030412324864)
Delta RoaringBitmapArray row index 9223372036854775807
exceeds max representable value 9223372030412324864
Retriable: False
Expression: value <= kMaxRepresentableValue
Context: Operator: PartialAggregation[9] 9
Function: addSafe
File: /work/cpp/velox/compute/delta/RoaringBitmapArray.cpp
Line: 92
...
at
org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native
Method)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:135)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:316)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:111)
## Reproduction
1. Build the Gluten Velox bundle (Spark 4.1 + Scala 2.13 + JDK 17, Delta
profile).
2. Run delta-io/delta v4.2.0 with the Gluten plugin enabled
(`spark.plugins=org.apache.gluten.GlutenPlugin`), suite
`MergeIntoExtendedSyntaxSQLPathBasedDVsPredPushOnSuite`, test "extended syntax
- update + conditional insert - isPartitioned: true".
- Because it is intermittent, it may take several runs (or concurrent
test forks / CPU contention) to surface. Equivalent minimal repro: a `MERGE
INTO` with an UPDATE action plus a conditional INSERT into a partitioned Delta
table that has deletion vectors enabled, with predicate pushdown on.
CI Link:
https://github.com/apache/gluten/actions/runs/28198677737/job/83536282846?pr=12371#step:9:2828
## Impact / workaround
- Intermittently fails any MERGE-with-DV workload, and makes the
Delta-on-Gluten CI gate flaky (apache/gluten PR #12278): the test is not in the
known-failures baseline (it usually passes), so a run that hits the sentinel is
reported as a regression and turns the gate red.
- No good baseline workaround: because the failure is flaky, adding it to
`known-failures.txt` would instead make the gate red on every run where it
passes (the pipeline runs with `DELTA_FAIL_ON_FIXED=true`). A proper fix (or a
dedicated flaky-quarantine list in the gate) is needed.
### Gluten version
main branch
### Spark version
None
### Spark configurations
Spark 4.1.0
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]