Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

via GitHub Tue, 01 Jul 2025 00:38:50 -0700


findepi commented on code in PR #16625:
URL: https://github.com/apache/datafusion/pull/16625#discussion_r2176664752



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -232,9 +282,8 @@ physical_plan
 01)AggregateExec: mode=Final, gby=[], aggr=[array_agg(agg_order.c1) ORDER BY 
[agg_order.c2 DESC NULLS FIRST, agg_order.c3 ASC NULLS LAST]]
 02)--CoalescePartitionsExec
 03)----AggregateExec: mode=Partial, gby=[], aggr=[array_agg(agg_order.c1) 
ORDER BY [agg_order.c2 DESC NULLS FIRST, agg_order.c3 ASC NULLS LAST]]
-04)------SortExec: expr=[c2@1 DESC, c3@2 ASC NULLS LAST], 
preserve_partitioning=[true]
-05)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
-06)----------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_agg_multi_order.csv]]}, 
projection=[c1, c2, c3], file_type=csv, has_header=true
+04)------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1

Review Comment:
   for global aggregation -- agreed. but then, a global array_agg aggregation 
cannot feasible operate on large amounts of data, can it? (or rather: it can, 
but that's unlikely a common scenario)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

Reply via email to