crm26 commented on PR #21376: URL: https://github.com/apache/datafusion/pull/21376#issuecomment-4210330082
Thanks for taking a look, @kumarUjjawal. Some context on why these are bundled the way they are, then a question for a committer on whether to split: **#21371** (2158 additions, 8 files) introduces 6 functions — `cosine_distance`, `inner_product`, `array_normalize`, `array_add`, `array_subtract`, `array_scale` — plus `vector_math.rs` with shared primitives (`dot_product_f64`, `magnitude_f64`, `convert_to_f64_array`) that three of the six functions use. If this is split, the primitives need to land in one PR and their callers in follow-ups, which creates cross-PR dependencies. **#21376** (this PR) is stacked on #21371 and adds `array_sum`, `array_product`, `array_avg`, plus a one-line `list_min` alias fix. The GitHub diff against `main` shows both PRs' changes together (~3030 additions) — the actual new content in this PR on top of #21371 is ~872 additions across 5 new/modified files. **Question for the committers** (@alamb / @neilconway) — would you prefer these split further? Options I see: - **Option A (current):** #21371 = all vector+array math primitives and 6 functions; #21376 = 3 array aggregates + 1 alias. Two PRs, related groupings, shared primitives co-located with first use. - **Option B:** Split into ~6 PRs: (1) vector_math.rs primitives + cosine_distance + inner_product, (2) array_normalize, (3) array_add/subtract/scale, (4) array_sum + list_min alias, (5) array_product, (6) array_avg. Happy to restructure if Option B is preferred. Option A is what's currently submitted. No strong preference on my end — defer to whatever makes review easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
