Dandandan opened a new pull request, #21042: URL: https://github.com/apache/datafusion/pull/21042
## Summary - Use `with_hashes` for batch hash computation via thread-local buffer, separating hashing from hash table ops for better vectorization/pipelining - Process 4 rows at a time via `chunks_exact(4)` with local dedup within each chunk to reduce redundant hash table operations - Split hash table operations into `find` + `insert_unique` phases (lighter than `entry` which prepares an insertion slot even on hit) - Extract `find_group`, `insert_new_group`, `get_or_create_null_group` helpers to consolidate unsafe hash table logic with SAFETY comments - Separate null/no-null fast paths to eliminate validity checks when no nulls are present ## Test plan - [x] `cargo test -p datafusion-physical-plan aggregat` (82 tests pass) - [x] `cargo clippy -p datafusion-physical-plan --all-features -- -D warnings` (clean) - [x] `cargo fmt --all` (clean) - [ ] Benchmark with group-by queries on primitive columns (low and high cardinality) 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
