github-actions[bot] commented on PR #61641:
URL: https://github.com/apache/doris/pull/61641#issuecomment-4115045116

   ## Code Review Summary
   
   ### Overview
   This PR fixes a critical bug where 
`AggregateFunctionsSingleValue::is_trivial()` incorrectly returned `true` for 
`min`/`max` on fixed-length types (it returned `Data::IsFixedLength`). The V2 
nullable wrapper's `streaming_agg_serialize_to_column` skips `create()` when 
`nested_function->is_trivial()` is true, relying on `std::vector<char>` 
zero-initialization. For min/max, this meant sentinel values (MAX_VALUE for 
min, MIN_VALUE for max) were never set, producing incorrect results (e.g., 0 
for numeric min/max, epoch for datetime).
   
   The fix changes `is_trivial()` to `Data::IsFixedLength && Data::IS_ANY`, 
restricting the trivial path to `any_value` only. This is correct because 
`any_value` uses a `has_value` guard (`change_first_time` checks `!has()` 
before any comparison) and zero-initialized `has_value=false` is a correct 
initial state.
   
   ### Critical Checkpoint Conclusions
   
   1. **Goal and correctness**: The fix correctly addresses the bug. The change 
from `Data::IsFixedLength` to `Data::IsFixedLength && Data::IS_ANY` precisely 
targets the root cause. Unit tests directly exercise the bug path 
(`streaming_agg_serialize_to_column` via V2 nullable wrapper).
   
   2. **Modification scope**: Minimal and focused — 2 logic lines changed, 
improved doc comment, and comprehensive tests. No unnecessary changes.
   
   3. **Concurrency**: Not applicable — no concurrency changes introduced.
   
   4. **Lifecycle management**: Not applicable — no new lifecycle concerns.
   
   5. **Configuration items**: None added.
   
   6. **Incompatible changes**: None — `is_trivial()` is an internal 
optimization hint, not a serialization format change.
   
   7. **Parallel code paths**: Verified. The `is_trivial()` call sites are:
      - V2 nullable `streaming_agg_serialize_to_column` (the bug path, now 
fixed)
      - V1 nullable 
`deserialize_and_merge_vec`/`deserialize_and_merge_vec_selected` (uses 
`is_trivial()` for a different optimization — contiguous `BufferReadable` vs 
per-row `ColumnString` deserialization; V1's streaming path always calls 
`create()` regardless)
      - `IAggregateFunctionHelper::destroy_vec` (skips destroy for trivial — 
safe since `SingleValueDataFixed` is trivially destructible)
      All paths are correctly handled by the fix.
   
   8. **Special conditional checks**: The `IS_ANY` constant is well-defined 
(`true` only for `AggregateFunctionAnyData`, `false` for min/max) with clear 
comments explaining why.
   
   9. **Test coverage**: Good. Four new test cases covering:
      - Nullable min/max streaming with Int32 (non-null values)
      - Nullable min/max streaming with Int64 (mixed null/non-null)
      - Nullable min/max streaming with DateTimeV2 (original symptom type)
      - any_value streaming verification (ensures trivial path still works)
      
      Tests are parameterized over both "min" and "max". All tests exercise 
`streaming_agg_serialize_to_column` → `deserialize_and_merge_from_column_range` 
→ `insert_result_into`, verifying the full round-trip.
   
   10. **Observability**: Not applicable for this bug fix.
   
   11. **Performance**: The fix makes min/max slightly slower in the streaming 
path (now calls `create()`/`destroy()` instead of relying on zero-init), but 
this is necessary for correctness. The overhead is minimal — `create()` for 
`SingleValueDataFixed` is just a placement new.
   
   12. **Other issues**: None identified. The `any_value` zero-init safety is 
verified — `SingleValueDataFixed::has_value` defaults to `false` on zero-init, 
and `change_first_time` checks this guard before any value comparison.
   
   **Verdict: LGTM.** The fix is correct, minimal, well-documented, and 
thoroughly tested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to