raphaelroshan opened a new pull request, #23310: URL: https://github.com/apache/datafusion/pull/23310
## Which issue does this PR close? - Closes #22581. ## Rationale for this change `log(base, value)` currently dispatches on the type of the *value* (the last argument): `return_type` inspects only the value's type, and `invoke_with_args` computes the result in that type. When the base is a wider float than the value — e.g. `log(Float64, Float32)` — the base is narrowed down to the value's float type before the computation, which loses precision. This is the behavior reported in #22581. The fix picks the widest float among all arguments as the result type, and widens a `Float16`/`Float32` value up to that type before computing, so the base no longer has to be narrowed. > AI-assisted disclosure: I used an AI assistant while working on this. I understand and can justify the change end-to-end. One thing I'd flag for reviewers: I treat any non-float argument (e.g. an integer base, or decimals which are already computed in `f64`) as `Float64` for ranking purposes via a `float_rank` helper — this matches the existing "decimals compute in f64" path, but please sanity-check that assumption against any type-coercion expectations I might be missing. ## What changes are included in this PR? - `return_type` now returns the widest float across all arguments instead of only looking at the value's type (via a small `float_rank` helper). - In `invoke_with_args`, when the base is wider than a `Float16`/`Float32` value, the value is cast up to the result type before computation so the base is not narrowed. Decimal values are left untouched (still computed in `f64`). - Added a unit test (`test_log_f64_base_f32_value`) and sqllogictest cases in `math.slt` covering `log(Float64, Float32)`, `log(Float64, Float16)`, and integer-base cases. ## Are these changes tested? Yes. - `cargo test -p datafusion-functions --lib math::` — green (includes the new `test_log_f64_base_f32_value`). - `math.slt` sqllogictests — green (includes new mixed-width cases and updated expected types for previously-narrowing cases). ## Are there any user-facing changes? **Yes — this is a user-facing behavior change to `log()` return types in mixed-float-width cases, and I'd like maintainers to confirm the desired semantics.** Previously the return type of `log(base, value)` followed the *value's* type. Now it follows the *widest float among all arguments*. Concretely: - `log(Float64, Float32)` now returns `Float64` (was `Float32`) - `log(Float64, Float16)` now returns `Float64` (was `Float16`) - `log(2, arrow_cast(2.0, 'Float32'))` (integer base) now returns `Float64` (was `Float32`) The unary `log(value)` and same-width cases (`log(Float64, Float64)`, `log(Float32, Float32)`, etc.) are unchanged. Because output precision/width changes in the mixed cases, some existing `math.slt` expected values were updated accordingly. Opening this as a **draft proposal** because widening the return type is a semantics decision: it fixes the precision loss in #22581, but it changes observed output types/precision for existing mixed-width queries. Does widening to the widest float match the intended `log` semantics, or would you prefer preserving the value's type (and accepting the precision loss, or handling it differently)? Happy to adjust. If this is considered a breaking change to the public output type, I can add the `api change` label. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
