raphaelroshan opened a new pull request, #23310:
URL: https://github.com/apache/datafusion/pull/23310

   ## Which issue does this PR close?
   
   - Closes #22581.
   
   ## Rationale for this change
   
   `log(base, value)` currently dispatches on the type of the *value* (the last 
argument): `return_type` inspects only the value's type, and `invoke_with_args` 
computes the result in that type. When the base is a wider float than the value 
— e.g. `log(Float64, Float32)` — the base is narrowed down to the value's float 
type before the computation, which loses precision. This is the behavior 
reported in #22581.
   
   The fix picks the widest float among all arguments as the result type, and 
widens a `Float16`/`Float32` value up to that type before computing, so the 
base no longer has to be narrowed.
   
   > AI-assisted disclosure: I used an AI assistant while working on this. I 
understand and can justify the change end-to-end. One thing I'd flag for 
reviewers: I treat any non-float argument (e.g. an integer base, or decimals 
which are already computed in `f64`) as `Float64` for ranking purposes via a 
`float_rank` helper — this matches the existing "decimals compute in f64" path, 
but please sanity-check that assumption against any type-coercion expectations 
I might be missing.
   
   ## What changes are included in this PR?
   
   - `return_type` now returns the widest float across all arguments instead of 
only looking at the value's type (via a small `float_rank` helper).
   - In `invoke_with_args`, when the base is wider than a `Float16`/`Float32` 
value, the value is cast up to the result type before computation so the base 
is not narrowed. Decimal values are left untouched (still computed in `f64`).
   - Added a unit test (`test_log_f64_base_f32_value`) and sqllogictest cases 
in `math.slt` covering `log(Float64, Float32)`, `log(Float64, Float16)`, and 
integer-base cases.
   
   ## Are these changes tested?
   
   Yes.
   - `cargo test -p datafusion-functions --lib math::` — green (includes the 
new `test_log_f64_base_f32_value`).
   - `math.slt` sqllogictests — green (includes new mixed-width cases and 
updated expected types for previously-narrowing cases).
   
   ## Are there any user-facing changes?
   
   **Yes — this is a user-facing behavior change to `log()` return types in 
mixed-float-width cases, and I'd like maintainers to confirm the desired 
semantics.**
   
   Previously the return type of `log(base, value)` followed the *value's* 
type. Now it follows the *widest float among all arguments*. Concretely:
   
   - `log(Float64, Float32)` now returns `Float64` (was `Float32`)
   - `log(Float64, Float16)` now returns `Float64` (was `Float16`)
   - `log(2, arrow_cast(2.0, 'Float32'))` (integer base) now returns `Float64` 
(was `Float32`)
   
   The unary `log(value)` and same-width cases (`log(Float64, Float64)`, 
`log(Float32, Float32)`, etc.) are unchanged. Because output precision/width 
changes in the mixed cases, some existing `math.slt` expected values were 
updated accordingly.
   
   Opening this as a **draft proposal** because widening the return type is a 
semantics decision: it fixes the precision loss in #22581, but it changes 
observed output types/precision for existing mixed-width queries. Does widening 
to the widest float match the intended `log` semantics, or would you prefer 
preserving the value's type (and accepting the precision loss, or handling it 
differently)? Happy to adjust. If this is considered a breaking change to the 
public output type, I can add the `api change` label.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to