crm26 opened a new pull request, #21376:
URL: https://github.com/apache/datafusion/pull/21376

   ## Which issue does this PR close?
   
   Closes gaps in DataFusion's array function coverage compared to DuckDB 
(`list_sum`, `list_aggregate`) and Trino (`reduce`).
   
   ## Rationale for this change
   
   DataFusion has `array_min` and `array_max` but no `array_sum`, 
`array_product`, or `array_avg`. These are common operations on array columns 
that currently require verbose workarounds (`UNNEST` + aggregate + `ARRAY_AGG`).
   
   ## What changes are included in this PR?
   
   **New functions:**
   - `array_sum` / `list_sum` — sum of all elements in an array (same return 
type as element)
   - `array_product` / `list_product` — product of all elements (rejects 
Decimal types where raw integer multiplication produces incorrect results due 
to scale)
   - `array_avg` / `list_avg` — arithmetic mean, always returns Float64
   
   **Bug fixes:**
   - Added missing `list_min` alias to `ArrayMin` (parity with `list_max` on 
`ArrayMax`)
   - Extended `convert_to_f64_array` in `vector_math.rs` to handle Int8, Int16, 
UInt8, UInt16, UInt32, UInt64
   
   **Implementation:**
   - `array_sum` and `array_product` use the same `downcast_primitive!` + 
offset-window pattern as `array_min`/`array_max` for zero-copy performance
   - `array_avg` converts elements to Float64 via the shared 
`convert_to_f64_array` primitive
   - All functions: NULL elements skipped, all-NULL/empty arrays return NULL, 
List and LargeList supported, FixedSizeList coerced automatically
   
   **Dependencies:** Builds on #21371 (vector distance + array math functions) 
for shared `vector_math.rs` primitives.
   
   ## Are these changes tested?
   
   Yes — 41 new sqllogictest cases covering:
   - Integer, float, unsigned integer inputs
   - NULL element handling (skip NULLs, all-NULL → NULL)
   - Empty array and NULL input
   - LargeList support
   - Multi-row queries
   - Alias tests (`list_sum`, `list_product`, `list_avg`, `list_min`)
   - Type preservation (`arrow_typeof` assertions)
   - Error cases (string input, Decimal rejection, no arguments)
   
   All existing tests pass (79 unit tests + 2 doctests + sqllogictests).
   
   ## Are there any user-facing changes?
   
   Yes — three new SQL functions available:
   ```sql
   SELECT array_sum([1, 2, 3, 4]);        -- 10
   SELECT array_product([2, 3, 4]);       -- 24
   SELECT array_avg([1, 2, 3, 4]);        -- 2.5
   SELECT list_min([3, 1, 4, 2]);         -- 1 (new alias)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to