crm26 opened a new pull request, #21371: URL: https://github.com/apache/datafusion/pull/21371
## Summary Adds vector distance and array math functions to `datafusion-functions-nested`, enabling vector search and array algebra in standard SQL. ```sql -- Vector search: find nearest neighbors by cosine distance SELECT id, cosine_distance(embedding, ARRAY[0.1, 0.2, ...]) as dist FROM documents ORDER BY dist LIMIT 10 -- Array math SELECT array_normalize(embedding) FROM documents SELECT array_add(vec_a, vec_b) FROM t SELECT array_scale(embedding, 2.0) FROM documents ``` ## Functions | Function | Returns | Description | |----------|---------|-------------| | `cosine_distance(a, b)` | float64 | 1 - cosine similarity | | `inner_product(a, b)` | float64 | Dot product | | `array_normalize(a)` | list(float64) | Unit vector | | `array_add(a, b)` | list(float64) | Element-wise addition | | `array_subtract(a, b)` | list(float64) | Element-wise subtraction | | `array_scale(a, f)` | list(float64) | Scalar multiplication | All have `list_*` aliases. `inner_product` also aliased as `dot_product`. ## Design Shared primitives in `vector_math.rs`: - `dot_product_f64(a, b)` — used by `inner_product` and `cosine_distance` - `magnitude_f64(a)` — used by `cosine_distance` and `array_normalize` - `sum_of_squares_f64(a)` — used by `magnitude_f64` - `convert_to_f64_array(a)` — shared with existing `array_distance` The existing `distance.rs` duplicate `convert_to_f64_array` is consolidated into the shared module. Follows the exact pattern of the existing `array_distance` function: same signature style, `coerce_types`, null handling, and type support (Float32, Float64, Int32, Int64, FixedSizeList, LargeList, List). ## Tests 79 tests including: normal inputs, null handling, zero vectors, orthogonal vectors, empty arrays, Float32/Float64, mismatched lengths, vector search ranking pattern. Sqllogictest coverage in `vector_functions.slt`. Clippy clean. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
