hhr293 opened a new pull request, #12299:
URL: https://github.com/apache/gluten/pull/12299

     ## What changes are proposed in this pull request?
   
     [VL] Extend TypeAwareCompress (TAC) to INT128 (DECIMAL HUGEINT).
   
     The TAC codec from #11894 only covered INT64. DECIMAL(p>18) is stored as
     128-bit HugeInt and previously fell through to LZ4, missing a structural
     compression opportunity: in OLAP workloads (TPC-H prices, taxes, etc.)
     DECIMAL values usually fit in INT64, so the high 64 bits are 0 and the
     low 64 bits are narrow — exactly the pattern FFOR exploits.
   
     **Implementation:** Split each 16B value into lo/hi uint64 sub-streams via
     stride-2 gather into stack-allocated scratch buffers, then run the existing
     64-bit FFOR encoder on each. Wire format reuses the 64-bit per-block
     (bw, count, base) header twice — the stream is self-describing, no hi/lo
     length prefix needed. hi sub-streams that are all equal to base degenerate
     to just the 16B header (bw=0). Velox HUGEINT is mapped to `tac::kUInt128`;
     shuffle writer and frame format unchanged.
   
     **Also includes interface cleanup based on community review feedback on
     #11894:**
     - `encodeBlock` / `decodeBlock`: removed template parameters, implicit
       reference-pointer modification, and scratch-buffer parameters. Both now
       take aligned `uint64_t*` and return bytes written/consumed. Alignment
       staging is handled by callers.
     - Added `#include <algorithm>` for `std::min` (previously relied on
       transitive include).
     - Added input validation: reject `bw > 64` (prevents dispatch table OOB),
       reject `blockVals == 0` (prevents infinite loop on corrupt headers),
       bounds-check tail data before memcpy.
     - Updated doc comments for accuracy (byte-order scope, supported types).
     **Results** vs LZ4 on the same int128 input:
     - Compression ratio: 0.116 (TAC) vs 0.193 (LZ4) — 40% smaller
   
     ## How was this patch tested?
   
     - New unit tests in `FForCodecTest.cc`: 128-bit compress/decompress 
round-trips,
       alignment combinations, corrupted-input rejection, and tail handling.
     - Existing 64-bit tests continue to pass.
     - End-to-end validation on TPC-H SF=6000 (no regression on other queries).
   
   ## Performance
    End-to-end on TPC-H SF=6000, the wins concentrate on queries with heavy
     decimal-keyed shuffles:
     - q15: shuffle size −15%, latency −8%
     - q17: shuffle size −8%, latency −3%
     - q18: shuffle size −8%, latency −3%
   
     ## Was this patch authored or co-authored using generative AI tooling?
   
   Reviewed-by: Claude claude-opus-4-7
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to