github-actions[bot] commented on PR #61352:
URL: https://github.com/apache/doris/pull/61352#issuecomment-4159542252
## Code Review Summary
Reviewed the full PR (34 files, +2166/-591 lines) covering BE template
refactor, 4 new FE aggregate function classes, factory registration, visitor
changes, and all 9 regression test suites.
### Critical Checkpoint Conclusions
1. **Goal & Correctness**: The PR adds `REGR_AVGX`, `REGR_AVGY`,
`REGR_COUNT`, `REGR_R2` and refactors the 5 existing `REGR_*` functions into a
unified template. The new template correctly implements the Youngs-Cramer
online algorithm. `regr_r2` follows the SQL standard (NULL when n<1 or Sxx=0,
1.0 when Syy=0, otherwise Sxy²/(Sxx·Syy)). `regr_count` correctly returns 0 for
empty input (via `NotNullableAggregateFunction` + `resultForEmptyInput`). Tests
prove correctness across edge cases.
2. **Modification scope**: Focused and clean. The refactor replaces
per-function files with a single parameterized template — a clear improvement
in maintainability.
3. **Concurrency**: Not applicable. Aggregate function states are
single-threaded per aggregation context.
4. **Lifecycle management**: No special lifecycle concerns. States are POD
types (`std::tuple<UInt64>` for count, `std::tuple<UInt64, Float64, ...>` for
others), no heap allocations.
5. **Configuration items**: None added.
6. **Incompatible changes / Serialization**: **Verified safe.** All 5
existing functions have identical serialization layouts between old and new
code. `regr_intercept` deliberately keeps `sy_level=2` (storing `syy` even
though unused) to preserve backward compatibility, with an explicit comment
explaining this design choice.
7. **Parallel code paths**: All 9 regr functions now share a single code
path via `RegrFunctionKind` enum + `RegrTraits` specializations. No parallel
paths to worry about.
8. **Special conditional checks**: The `if constexpr` guards on moment
accessors are correct and well-commented. The floating-point exact-zero
comparisons (`sxx() == 0.0`, `syy() == 0.0`) are safe because the Youngs-Cramer
algorithm guarantees exact zero when all input values are identical.
9. **Test coverage**: Comprehensive regression tests for all 4 new functions
covering: empty tables, int/double inputs, nullable/non-nullable columns,
literal inputs, constant-value edge cases, string/boolean/null inputs, and
error cases. Existing function tests updated to match the relaxed type
validation.
10. **Observability**: Not applicable for aggregate functions — no new
logging or metrics needed.
11. **Transaction/persistence**: Not applicable.
12. **Data writes**: Not applicable.
13. **FE-BE variable passing**: All new functions correctly registered in
both FE (`BuiltinAggregateFunctions.java`) and BE
(`aggregate_function_regr.cpp`). Function names are lowercase and consistent.
Return types agree (DoubleType for all except RegrCount which returns
BigIntType). Nullable semantics match (`AlwaysNullable` for avgx/avgy/r2,
`NotNullableAggregateFunction` for count).
14. **Performance**: The unified template avoids code duplication without
introducing overhead. `if constexpr` ensures dead code is eliminated at compile
time. No redundant computations detected.
15. **Other observations**:
- Type validation was broadened: string and boolean types are now
accepted (implicit cast to double) for all 9 regr functions. This is a
behavioral change for the 5 existing functions but is consistent and reasonable.
- Minor floating-point precision differences in `.out` files for
existing functions (e.g., `5.512931034482759` → `5.512931034482757`) are
expected from the refactored computation path and are within acceptable
tolerance.
- Pre-existing issue: some existing regr_slope/regr_intercept tests have
duplicate `qt_` tags (e.g., `qt_sql_int_8` appears twice). Not introduced by
this PR.
### Verdict
**No blocking issues found.** The PR is well-designed,
serialization-compatible, and thoroughly tested. The unified template approach
is a clear improvement over the previous per-function implementation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]