crm26 opened a new pull request, #21376: URL: https://github.com/apache/datafusion/pull/21376
## Which issue does this PR close? Closes gaps in DataFusion's array function coverage compared to DuckDB (`list_sum`, `list_aggregate`) and Trino (`reduce`). ## Rationale for this change DataFusion has `array_min` and `array_max` but no `array_sum`, `array_product`, or `array_avg`. These are common operations on array columns that currently require verbose workarounds (`UNNEST` + aggregate + `ARRAY_AGG`). ## What changes are included in this PR? **New functions:** - `array_sum` / `list_sum` — sum of all elements in an array (same return type as element) - `array_product` / `list_product` — product of all elements (rejects Decimal types where raw integer multiplication produces incorrect results due to scale) - `array_avg` / `list_avg` — arithmetic mean, always returns Float64 **Bug fixes:** - Added missing `list_min` alias to `ArrayMin` (parity with `list_max` on `ArrayMax`) - Extended `convert_to_f64_array` in `vector_math.rs` to handle Int8, Int16, UInt8, UInt16, UInt32, UInt64 **Implementation:** - `array_sum` and `array_product` use the same `downcast_primitive!` + offset-window pattern as `array_min`/`array_max` for zero-copy performance - `array_avg` converts elements to Float64 via the shared `convert_to_f64_array` primitive - All functions: NULL elements skipped, all-NULL/empty arrays return NULL, List and LargeList supported, FixedSizeList coerced automatically **Dependencies:** Builds on #21371 (vector distance + array math functions) for shared `vector_math.rs` primitives. ## Are these changes tested? Yes — 41 new sqllogictest cases covering: - Integer, float, unsigned integer inputs - NULL element handling (skip NULLs, all-NULL → NULL) - Empty array and NULL input - LargeList support - Multi-row queries - Alias tests (`list_sum`, `list_product`, `list_avg`, `list_min`) - Type preservation (`arrow_typeof` assertions) - Error cases (string input, Decimal rejection, no arguments) All existing tests pass (79 unit tests + 2 doctests + sqllogictests). ## Are there any user-facing changes? Yes — three new SQL functions available: ```sql SELECT array_sum([1, 2, 3, 4]); -- 10 SELECT array_product([2, 3, 4]); -- 24 SELECT array_avg([1, 2, 3, 4]); -- 2.5 SELECT list_min([3, 1, 4, 2]); -- 1 (new alias) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
