[PR] [VL] Add lazy per-column deserialization for Columnar Table Cache [gluten]

via GitHub Sun, 31 May 2026 21:58:07 -0700


jackylee-ch opened a new pull request, #12211:
URL: https://github.com/apache/gluten/pull/12211


   ## What changes are proposed in this pull request?
   
   Add lazy per-column deserialization for Gluten's columnar table cache via a 
new V3 wire format.
   
   **Problem**: The current cache always decodes all N columns on read, even 
when a query only references M columns (M << N). For a 16-column table with a 
1-column query, 15/16 of deserialization work is wasted.
   
   **Solution**: A new V3 format serializes each column independently using 
`PrestoVectorSerde::serializeSingleColumn`. On read, only requested columns are 
deserialized via Velox `LazyVector`, which defers decoding until the column is 
actually accessed by an operator.
   
   **Key design points**:
   - V3 magic `0xFECA5303` vs V2 `0xFECA5302`; V3 code reads V2 data via 
existing path (backward compatible)
   - `CachedColumnLoader` (VectorLoader impl) holds per-column bytes and calls 
`deserializeSingleColumn` on first access, then frees raw bytes to avoid 
double-buffering
   - Read path routes on `isV3Format(bytes)` independent of `lazyEnabled` 
config, preventing V3 bytes from being misrouted to V2 Presto deserializer
   - New config 
`spark.gluten.sql.columnar.tableCache.lazy.deserialization.enabled` (default: 
`false`)
   
   **Performance** (100M rows, 16 columns, repartitionByRange):
   
   | Scenario | Legacy | partitionStats only | Lazy V3 |
   |----------|--------|--------------------|---------| 
   | Read 1/16 cols | ~4,800 ms | ~4,900 ms | ~750 ms (**6.4X**) |
   | Read 4/16 cols | ~5,200 ms | ~5,300 ms | ~1,800 ms (**2.9X**) |
   | Read all 16 cols | ~5,400 ms | ~5,500 ms | ~5,650 ms (4% overhead) |
   | Filter + 2/16 cols | ~4,400 ms | ~1,700 ms | ~720 ms (**6.1X**) |
   | Cache build | ~126 s | ~131 s | ~195 s (write overhead) |
   
   Recommended pairing with `partitionStats.enabled=true` for combined 
batch-skip + column-skip benefit.
   
   ## How was this patch tested?
   
   - Unit tests (pure JVM, no native library): 
`ColumnarCachedBatchFramedBytesSuite` (+5 V3 framing tests), 
`ColumnarCachedBatchKryoSuite`, `ColumnarCachedBatchStatsBlobSuite` — all pass
   - Integration tests: `ColumnarCachedBatchLazySerdeTest` (7 new E2E tests 
covering V3 write/read, column projection, count(*), all-types roundtrip, 
config toggle, cross-config read)
   - Smoke tests: `ColumnarCachedBatchE2ESuite` (+2 V3 smoke tests)
   - Benchmark: `ColumnarTableCacheLazyDeserBenchmark` (5 scenarios comparing 
legacy / partitionStats-only / lazy-V3)
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [VL] Add lazy per-column deserialization for Columnar Table Cache [gluten]

Reply via email to