jackylee-ch opened a new pull request, #12211: URL: https://github.com/apache/gluten/pull/12211
## What changes are proposed in this pull request? Add lazy per-column deserialization for Gluten's columnar table cache via a new V3 wire format. **Problem**: The current cache always decodes all N columns on read, even when a query only references M columns (M << N). For a 16-column table with a 1-column query, 15/16 of deserialization work is wasted. **Solution**: A new V3 format serializes each column independently using `PrestoVectorSerde::serializeSingleColumn`. On read, only requested columns are deserialized via Velox `LazyVector`, which defers decoding until the column is actually accessed by an operator. **Key design points**: - V3 magic `0xFECA5303` vs V2 `0xFECA5302`; V3 code reads V2 data via existing path (backward compatible) - `CachedColumnLoader` (VectorLoader impl) holds per-column bytes and calls `deserializeSingleColumn` on first access, then frees raw bytes to avoid double-buffering - Read path routes on `isV3Format(bytes)` independent of `lazyEnabled` config, preventing V3 bytes from being misrouted to V2 Presto deserializer - New config `spark.gluten.sql.columnar.tableCache.lazy.deserialization.enabled` (default: `false`) **Performance** (100M rows, 16 columns, repartitionByRange): | Scenario | Legacy | partitionStats only | Lazy V3 | |----------|--------|--------------------|---------| | Read 1/16 cols | ~4,800 ms | ~4,900 ms | ~750 ms (**6.4X**) | | Read 4/16 cols | ~5,200 ms | ~5,300 ms | ~1,800 ms (**2.9X**) | | Read all 16 cols | ~5,400 ms | ~5,500 ms | ~5,650 ms (4% overhead) | | Filter + 2/16 cols | ~4,400 ms | ~1,700 ms | ~720 ms (**6.1X**) | | Cache build | ~126 s | ~131 s | ~195 s (write overhead) | Recommended pairing with `partitionStats.enabled=true` for combined batch-skip + column-skip benefit. ## How was this patch tested? - Unit tests (pure JVM, no native library): `ColumnarCachedBatchFramedBytesSuite` (+5 V3 framing tests), `ColumnarCachedBatchKryoSuite`, `ColumnarCachedBatchStatsBlobSuite` — all pass - Integration tests: `ColumnarCachedBatchLazySerdeTest` (7 new E2E tests covering V3 write/read, column projection, count(*), all-types roundtrip, config toggle, cross-config read) - Smoke tests: `ColumnarCachedBatchE2ESuite` (+2 V3 smoke tests) - Benchmark: `ColumnarTableCacheLazyDeserBenchmark` (5 scenarios comparing legacy / partitionStats-only / lazy-V3) ## Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
