LiaCastaneda opened a new pull request, #22996: URL: https://github.com/apache/datafusion/pull/22996
## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/22993 ## Rationale for this change DataFusion has no way to aggregate key-value pairs into a map. map_agg is a common aggregate in other engines (Trino, Presto, Spark, Postgres via `json_object_agg`), and Arrow already has a native `MapArray` to back it. This PR adds it. ## What changes are included in this PR? New `map_agg(key, value)` aggregate function that collects key-value pairs into a `Map(K, V)`, one map per group. `ORDER BY` support (`map_agg(key, value ORDER BY expr)`). ## Are these changes tested? Yes - Unit tests in `map_agg.rs` covering pair collection, NULL-key skipping, NULL-value retention, first-wins dedup, multi-partition merge, and the ORDER BY ASC/DESC paths. - sqllogictest cases in `map_agg.slt` exercising `GROUP BY` and `ORDER BY` through the full SQL plan. ## Are there any user-facing changes? Yes -- adds a new `map_agg` aggregate function available from SQL. No breaking changes to existing APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
