felipepessoto opened a new issue, #12387:
URL: https://github.com/apache/gluten/issues/12387
### Backend
VL (Velox)
### Bug description
## Bug description
**Expected:** Reading from / deleting from a large Delta table that has
deletion vectors (DVs) completes within a bounded, reasonable memory footprint.
Vanilla Spark runs Delta's own "huge table" DV tests fine with a 1 GB test heap
(`-Xmx1024m`).
**Actual:** Under the Gluten Velox bundle, the same reads grow the JVM's
**native** (off-heap) memory monotonically until the kernel/cgroup OOM-kills
the process. On Delta's synthetic 2-billion-row DV table the forked test JVM
climbs to ~13 GB RSS even though its JVM heap is only `-Xmx2G`, i.e. ~11 GB is
native (Velox), not heap. The growth tracks the duration of a single DV read
over the huge table, which points at unbounded native materialization on the DV
/ metadata-row-index read path rather than normal query working set.
Concretely, two Delta tests reproduce it (suite
`org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite`):
- `huge table: read from tables of 2B rows with existing DV of many zeros`
- `huge table: delete a small number of rows from tables of 2B rows with DVs`
Both operate on the suite's 2B-row `table5`. The read test alone grew the
fork from ~5.9 GB to ~13.3 GB over ~13 minutes before the OOM-kill.
Likely area: native row-index materialization on the DV read path. Delta DV
reads use the metadata row index
(`spark.databricks.delta.deletionVectors.useMetadataRowIndex`, default true),
and Gluten offloads that path to Velox (apache/gluten #12269 only falls back
DML DV scans when `useMetadataRowIndex=false`, so the default read path stays
native). A maintainer with Velox memory-tracking context should confirm the
exact allocation site and whether it can be bounded/spilled.
## Gluten version
main branch
## Spark version
spark-4.0.x (actually Spark 4.1.0 -- Delta 4.2.0's default; the form has no
4.1 option)
## Spark configurations
From the Delta-on-Gluten test harness (patched `DeltaSQLCommandTest`):
spark.plugins = org.apache.gluten.GlutenPlugin
spark.shuffle.manager =
org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.memory.offHeap.enabled = true
spark.memory.offHeap.size = 2g
spark.gluten.sql.columnar.backend.velox... (default bundle config)
Delta 4.2.0, Scala 2.13, JDK 17
(The forked test JVM heap is -Xmx2G; off-heap is capped at 2g, yet native
RSS still reaches ~13 GB -- the allocation appears untracked / not honoring the
off-heap cap.)
## System information
CI runner: ubuntu-22.04 host, ~16 GB RAM, container
apache/gluten:centos-9-jdk17. Not run via dev/info.sh (observed in CI).
## Relevant logs
Evidence from the Delta Spark UT (Gluten) pipeline, run 28071158711, shard 2
(job 83108337324). Per-minute memory profiler during the "read from tables of
2B rows with existing DV of many zeros" test (p1289 = forked test JVM with
-Xmx2G; p382 = sbt launcher):
MEM cgroup=12.53G JVMs=[2664M(p382) 5869M(p1289)]
MEM cgroup=13.70G JVMs=[2664M(p382) 7777M(p1289)]
MEM cgroup=13.97G JVMs=[2664M(p382) 8431M(p1289)]
MEM cgroup=14.32G JVMs=[2623M(p382) 11629M(p1289)]
MEM cgroup=14.77G JVMs=[1815M(p382) 13122M(p1289)] <- fork ~13.1G RSS,
heap only 2G
MEM cgroup=14.91G JVMs=[1879M(p382) 13303M(p1289)]
Warning: Unable to read from client ... <- fork
OOM-killed here
MEM cgroup=1.92G JVMs=[1902M(p382)] <- fork gone;
cgroup drops ~13G
After the kernel killed the fork, sbt wedged on the dead fork (no hs_err, no
heap dump -- the signature of a kernel/cgroup OOM-kill rather than a JVM OOM),
and a hang watchdog had to kill the shard after ~16 minutes of silence.
## Reproduction
1. Build the Gluten Velox bundle (Spark 4.1 + Scala 2.13 + JDK 17, Delta
profile).
2. Run delta-io/delta v4.2.0 `DeletionVectorsSuite` with the Gluten plugin
enabled (`spark.plugins=org.apache.gluten.GlutenPlugin`), e.g. the two "huge
table ... 2B rows ... DV" tests above.
- Equivalent minimal repro: with Gluten Velox enabled, run a count/sum
scan over a Delta table of billions of rows that carries deletion vectors;
watch native RSS grow without bound.
## Impact / workaround
- Makes large-table DV reads unusable under Gluten Velox (native memory
blows up and the process is OOM-killed).
- In the Delta CI pipeline (apache/gluten PR #12278) these two tests are
force-failed in `setup-delta.sh` to keep the shard from OOM-hanging. That
workaround should be removed once this is fixed.
This was written with the assistance of AI tooling.
### Gluten version
main branch
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
https://github.com/apache/gluten/actions/runs/28071158711/job/83108337324
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]