github-actions[bot] commented on PR #61641:
URL: https://github.com/apache/doris/pull/61641#issuecomment-4115045116
## Code Review Summary
### Overview
This PR fixes a critical bug where
`AggregateFunctionsSingleValue::is_trivial()` incorrectly returned `true` for
`min`/`max` on fixed-length types (it returned `Data::IsFixedLength`). The V2
nullable wrapper's `streaming_agg_serialize_to_column` skips `create()` when
`nested_function->is_trivial()` is true, relying on `std::vector<char>`
zero-initialization. For min/max, this meant sentinel values (MAX_VALUE for
min, MIN_VALUE for max) were never set, producing incorrect results (e.g., 0
for numeric min/max, epoch for datetime).
The fix changes `is_trivial()` to `Data::IsFixedLength && Data::IS_ANY`,
restricting the trivial path to `any_value` only. This is correct because
`any_value` uses a `has_value` guard (`change_first_time` checks `!has()`
before any comparison) and zero-initialized `has_value=false` is a correct
initial state.
### Critical Checkpoint Conclusions
1. **Goal and correctness**: The fix correctly addresses the bug. The change
from `Data::IsFixedLength` to `Data::IsFixedLength && Data::IS_ANY` precisely
targets the root cause. Unit tests directly exercise the bug path
(`streaming_agg_serialize_to_column` via V2 nullable wrapper).
2. **Modification scope**: Minimal and focused — 2 logic lines changed,
improved doc comment, and comprehensive tests. No unnecessary changes.
3. **Concurrency**: Not applicable — no concurrency changes introduced.
4. **Lifecycle management**: Not applicable — no new lifecycle concerns.
5. **Configuration items**: None added.
6. **Incompatible changes**: None — `is_trivial()` is an internal
optimization hint, not a serialization format change.
7. **Parallel code paths**: Verified. The `is_trivial()` call sites are:
- V2 nullable `streaming_agg_serialize_to_column` (the bug path, now
fixed)
- V1 nullable
`deserialize_and_merge_vec`/`deserialize_and_merge_vec_selected` (uses
`is_trivial()` for a different optimization — contiguous `BufferReadable` vs
per-row `ColumnString` deserialization; V1's streaming path always calls
`create()` regardless)
- `IAggregateFunctionHelper::destroy_vec` (skips destroy for trivial —
safe since `SingleValueDataFixed` is trivially destructible)
All paths are correctly handled by the fix.
8. **Special conditional checks**: The `IS_ANY` constant is well-defined
(`true` only for `AggregateFunctionAnyData`, `false` for min/max) with clear
comments explaining why.
9. **Test coverage**: Good. Four new test cases covering:
- Nullable min/max streaming with Int32 (non-null values)
- Nullable min/max streaming with Int64 (mixed null/non-null)
- Nullable min/max streaming with DateTimeV2 (original symptom type)
- any_value streaming verification (ensures trivial path still works)
Tests are parameterized over both "min" and "max". All tests exercise
`streaming_agg_serialize_to_column` → `deserialize_and_merge_from_column_range`
→ `insert_result_into`, verifying the full round-trip.
10. **Observability**: Not applicable for this bug fix.
11. **Performance**: The fix makes min/max slightly slower in the streaming
path (now calls `create()`/`destroy()` instead of relying on zero-init), but
this is necessary for correctness. The overhead is minimal — `create()` for
`SingleValueDataFixed` is just a placement new.
12. **Other issues**: None identified. The `any_value` zero-init safety is
verified — `SingleValueDataFixed::has_value` defaults to `false` on zero-init,
and `change_first_time` checks this guard before any value comparison.
**Verdict: LGTM.** The fix is correct, minimal, well-documented, and
thoroughly tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]