alamb commented on code in PR #16985:
URL: https://github.com/apache/datafusion/pull/16985#discussion_r2442156539


##########
datafusion/sqllogictest/test_files/unnest.slt:
##########
@@ -941,3 +941,242 @@ where min_height * width1 = (
 )
 ----
 4 7 4 28
+
+## Unnest with ordering on unrelated column is preserved
+query TT
+EXPLAIN WITH unnested AS (SELECT
+    ROW_NUMBER() OVER () AS generated_id,
+    unnest(array[value]) as ar
+  FROM range(1,5)) SELECT array_agg(ar) FROM unnested group by generated_id;
+----
+logical_plan
+01)Projection: array_agg(unnested.ar)
+02)--Aggregate: groupBy=[[unnested.generated_id]], 
aggr=[[array_agg(unnested.ar)]]
+03)----SubqueryAlias: unnested
+04)------Projection: generated_id, 
__unnest_placeholder(make_array(range().value),depth=1) AS 
UNNEST(make_array(range().value)) AS ar
+05)--------Unnest: 
lists[__unnest_placeholder(make_array(range().value))|depth=1] structs[]
+06)----------Projection: row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING AS generated_id, make_array(range().value) AS 
__unnest_placeholder(make_array(range().value))
+07)------------WindowAggr: windowExpr=[[row_number() ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING]]
+08)--------------TableScan: range() projection=[value]
+physical_plan
+01)ProjectionExec: expr=[array_agg(unnested.ar)@1 as array_agg(unnested.ar)]
+02)--AggregateExec: mode=FinalPartitioned, gby=[generated_id@0 as 
generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted
+03)----SortExec: expr=[generated_id@0 ASC NULLS LAST], 
preserve_partitioning=[true]

Review Comment:
   Actually, I spent some more time looking and I think your code is working as 
expected
   
   Namely, Note the `ordering_mode=Sorted` that is above the `UnnestExec` - 
that means that the `GROUP BY` columns (in this case `generated_id`) are 
sorted, as expected. 
   
   ```
   06)----------AggregateExec: mode=Partial, gby=[generated_id@0 as 
generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted
   07)------------ProjectionExec: expr=[generated_id@0 as generated_id, 
__unnest_placeholder(make_array(range().value),depth=1)@1 as ar]
   08)--------------UnnestExec
   09)----------------ProjectionExec: expr=[row_number() ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING@1 as generated_id, make_array(value@0) as 
__unnest_placeholder(make_array(range().value))]
   10)------------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   11)--------------------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN 
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { name: "row_number() ROWS 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING", data_type: UInt64, 
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: 
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
   ```
   
   The reason there is a sort  in the new plan is that the optimizer has 
decided to repartition the intermediate aggregate result (unrelated to this PR)
   
   ```
   03)----SortExec: expr=[generated_id@0 ASC NULLS LAST], 
preserve_partitioning=[true] 
   04)------CoalesceBatchesExec: target_batch_size=8192 
   05)--------RepartitionExec: partitioning=Hash([generated_id@0], 4), 
input_partitions=4 
   06)----------AggregateExec: mode=Partial, gby=[generated_id@0 as 
generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to