comphead opened a new pull request, #22139:
URL: https://github.com/apache/datafusion/pull/22139

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   - Closes #22138 .
   
    ## Rationale for this change
   
     `AVG` used as a window aggregate can return `NaN` (and, for `Decimal` / 
`Duration`, panic on integer division by zero) when every value in the window 
frame is NULL.
   
     ```sql
     SELECT i,
            AVG(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED 
FOLLOWING)
     FROM (VALUES (1,1), (2,2), (3,NULL), (4,NULL)) t(i,v);
     ```
   
     | i | current output | expected (DuckDB/PgSQL) |
     |---|----------------|-------------------|
     | 1 | 1.5            | 1.5               |
     | 2 | 2.0            | 2.0               |
     | 3 | **NaN**        | **NULL**          |
     | 4 | **NaN**        | **NULL**          |
   
     Root cause: sliding-window execution calls `Accumulator::retract_batch` as 
rows leave the frame. Once every contributing value has been retracted, 
`self.count` drops back to `0` but `self.sum` stays
     `Some(0.0)` (or a tiny floating-point residual). `evaluate()` then 
computes `sum / 0`, which yields `NaN` on `Float64`, and would panic with 
integer division by zero on `DecimalAvgAccumulator` and
     `DurationAvgAccumulator`.
     
     The non-sliding aggregation path is unaffected because there `sum` becomes 
`Some(_)` only after at least one non-NULL value has been added, so `count == 
0` implies `sum == None`.
   
     ## What changes are included in this PR?
   
     `datafusion/functions-aggregate/src/average.rs` — guard all three affected 
`evaluate()` implementations with an explicit `count == 0 → None` short-circuit:
   
     - `AvgAccumulator::evaluate` (Float64)
     - `DecimalAvgAccumulator::evaluate` (Decimal32/64/128/256)
     - `DurationAvgAccumulator::evaluate` (Duration*)
   
     This matches the idiom already used by sibling retractable accumulators 
(`variance.rs` uses an explicit `match self.count` before division; `sum.rs` 
uses a `(self.count != 0).then_some(..)` guard).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to