felipepessoto opened a new issue, #12195:
URL: https://github.com/apache/gluten/issues/12195
### Description
## Description
Gluten currently does not offload reads of Delta tables' **Change Data
Feed** (`spark.read.format("delta").option("readChangeFeed", "true")...` or the
`table_changes()` SQL function). These queries run entirely on vanilla Spark
instead of the Velox backend.
## Why it falls back today
A normal Delta scan is a `FileSourceScanExec` whose `relation.fileFormat` is
a `DeltaParquetFileFormat`. Gluten's `OffloadDeltaScan` only matches that exact
case and rewrites it into a `DeltaScanTransformer`:
```scala
case scan: FileSourceScanExec
if scan.relation.fileFormat.getClass == classOf[DeltaParquetFileFormat]
=>
DeltaScanTransformer(scan)
```
CDF reads do **not** produce that plan. Delta builds them through
`CDCReader.DeltaCDFRelation`, a generic `BaseRelation` whose `buildScan`
returns RDD[Row]
Because the resulting plan is not a `FileSourceScanExec` over
`DeltaParquetFileFormat`, `OffloadDeltaScan` never matches it, so the entire
query (scan + projections building the metadata columns) stays on vanilla Spark.
## Proposed work
- Recognize the CDF scan path (`DeltaCDFRelation` / the CDC file indexes)
and offload the underlying parquet reads to Velox.
- Materialize the synthesized `_change_type` / `_commit_version` /
`_commit_timestamp` columns (literals + projections) so they can be produced
natively rather than forcing a fallback.
- Add `gluten-ut` coverage for batch CDF reads (`readChangeFeed` and
`table_changes()`), including add/remove/cdc-file combinations and column
mapping.
### Gluten version
main branch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]