guowangy opened a new pull request, #11894:
URL: https://github.com/apache/gluten/pull/11894
## What changes are proposed in this pull request?
Introduces **TypeAwareCompress (TAC)** — a column-wise compression layer for
shuffle that selects
an algorithm based on each buffer's data type, applied per-buffer alongside
the existing LZ4/ZSTD
codec path.
For `INT64`/`UINT64` columns the values are often clustered in a small
range, making
Frame-of-Reference + Bit-Packing (FFOR) significantly more effective than
generic byte-level
compression. TAC exploits this by encoding 8-byte integer buffers with a
4-lane FFOR codec before
the standard codec sees them.
Here is the performance data on TPCH/TPCDS:
| |Total Latency|Shuffle Write Size|
|--------|-------------|------------------|
|TPCH-6T |-15% |-32% |
|TPCDS-6T|-6% |-14% |
### New files
| Path | Purpose |
|------|---------|
| `cpp/core/utils/tac/ffor.hpp` | Header-only 4-lane FFOR codec for
`uint64_t` |
| `cpp/core/utils/tac/FForCodec.{h,cc}` | Arrow-Result wrapper around
`ffor.hpp` |
| `cpp/core/utils/tac/TypeAwareCompressCodec.{h,cc}` | Type dispatch;
self-describing wire format (codec ID + element width embedded in header, so
decompression needs no type hint) |
| `cpp/velox/shuffle/VeloxTypeAwareCompress.h` | Maps Velox `TypeKind` →
`TacDataType` (`BIGINT` → `kUInt64`) |
### Shuffle integration
- `Payload.cc/h`: `BlockPayload::fromBuffers` accepts an optional
`bufferTypes` vector. Per-buffer:
if `TypeAwareCompressCodec::support(type)` is true, use TAC; otherwise
fall back to LZ4/ZSTD.
A new wire marker `kTypeAwareBuffer = -3` is added; decompression in
`readCompressedBuffer` is
self-describing. If TAC compressed size ≥ original, falls back to
`kUncompressedBuffer`.
- `Options.h`: adds `enableTypeAwareCompress` (default `false`) to
`LocalPartitionWriterOptions`.
- `VeloxHashShuffleWriter`: populates `bufferTypes` from the schema when TAC
is enabled.
- `GlutenConfig.scala`: new config
`spark.gluten.sql.columnar.shuffle.typeAwareCompress.enabled` (default `false`).
- `ColumnarShuffleWriter` / `LocalPartitionWriterJniWrapper`: forward the
new option to native.
Disabled by default — no behaviour changes for existing deployments.
## How was this patch tested?
`cpp/core/tests/FForCodecTest.cc` covers:
- Round-trip correctness for random, all-zero, monotonic, and near-max value
patterns
- `maxCompressedLength` boundary checks
- Invalid input size rejection
`cpp/velox/tests/VeloxShuffleWriterTest.cc`: extended to exercise the TAC
path end-to-end through
`VeloxHashShuffleWriter`.
## Was this patch authored or co-authored using generative AI tooling?
Co-authored-by: Claude Sonnet 4.6
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]