github-actions[bot] commented on PR #61730:
URL: https://github.com/apache/doris/pull/61730#issuecomment-4152902696
## Code Review Summary
Reviewed the hash precomputation optimization in `AggregateFunctionUniq` and
`AggregateFunctionUniqDistributeKey`. **No issues found.**
### Critical Checkpoint Conclusions
- **Goal / Correctness**: The goal is to eliminate redundant hash
computation (2x -> 1x per key) in the batch insert paths. The implementation is
correct: `hash()` returns the mixed hash value, and both `prefetch_hash()` and
`emplace_with_hash()` expect exactly that value (they apply H1/H2 internally).
The phmap APIs (`hash()`, `prefetch_hash()`, `emplace_with_hash()`) all exist
and are used correctly.
- **Modification scope**: Minimal and focused. Two files with identical
symmetrical changes (34 additions, 10 deletions). No unrelated changes.
- **Concurrency**: Not applicable. These functions operate on per-thread
aggregation state, no shared data.
- **Lifecycle**: Local `std::vector` with automatic cleanup. No concerns.
- **Configuration / Incompatible changes**: None.
- **Parallel code paths**: The `merge()` and `deserialize_and_merge()`
functions also call `set.insert()`, but they iterate over elements from another
set/buffer without the prefetch pattern, so the optimization is not applicable
there. Correct as-is.
- **Test coverage**: No new tests. This is a pure performance optimization
with no behavioral change (hash function is deterministic and stateless,
`emplace_with_hash` produces identical results to `insert`). Existing
`count(distinct ...)` tests cover correctness. Acceptable.
- **Performance analysis**:
- Hash computation reduced from 2x to 1x per key - clear benefit.
- The precomputation loop is a sequential scan over `keys[]` which is
cache-friendly.
- The extra `std::vector<size_t>` allocation per call is consistent with
existing allocation patterns already present in the same functions
(`keys_container`, `array_of_data_set`).
- This pattern is already established in `hash_map_context.h`
(`init_hash_values()` + `prefetch()` + `lazy_emplace()`), so the approach is
consistent with the codebase.
- **Other issues**: None identified.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]