NGA-TRAN commented on code in PR #18661:
URL: https://github.com/apache/datafusion/pull/18661#discussion_r2525143302
##########
datafusion/physical-optimizer/src/enforce_sorting/mod.rs:
##########
@@ -516,10 +516,7 @@ pub fn ensure_sorting(
);
child = update_sort_ctx_children_data(child, true)?;
}
- } else if physical_ordering.is_none()
- || !plan.maintains_input_order()[idx]
- || is_union(plan)
- {
+ } else if physical_ordering.is_none() ||
!plan.maintains_input_order()[idx] {
Review Comment:
Wow. This is simple.
I wonder why this `is_union` is here in the first place 🤔
All the tests pass and @rgehan's reproducer now also pass means this is
likely the right work
##########
datafusion/core/tests/physical_optimizer/enforce_sorting.rs:
##########
@@ -664,21 +664,13 @@ async fn test_union_inputs_different_sorted7() ->
Result<()> {
// Union has unnecessarily fine ordering below it. We should be able to
replace them with absolutely necessary ordering.
let test =
EnforceSortingTest::new(physical_plan).with_repartition_sorts(true);
assert_snapshot!(test.run(), @r"
- Input Plan:
+ Input / Optimized Plan:
SortPreservingMergeExec: [nullable_col@0 ASC]
UnionExec
SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC],
preserve_partitioning=[false]
DataSourceExec: file_groups={1 group: [[x]]},
projection=[nullable_col, non_nullable_col], file_type=parquet
SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC],
preserve_partitioning=[false]
DataSourceExec: file_groups={1 group: [[x]]},
projection=[nullable_col, non_nullable_col], file_type=parquet
-
- Optimized Plan:
- SortPreservingMergeExec: [nullable_col@0 ASC]
- UnionExec
- SortExec: expr=[nullable_col@0 ASC], preserve_partitioning=[false]
- DataSourceExec: file_groups={1 group: [[x]]},
projection=[nullable_col, non_nullable_col], file_type=parquet
- SortExec: expr=[nullable_col@0 ASC], preserve_partitioning=[false]
- DataSourceExec: file_groups={1 group: [[x]]},
projection=[nullable_col, non_nullable_col], file_type=parquet
Review Comment:
Can you explain why this is no longer needed? Because the inout and output
is now the same?
##########
datafusion/core/tests/dataframe/mod.rs:
##########
@@ -3138,19 +3115,18 @@ async fn
union_with_mix_of_presorted_and_explicitly_resorted_inputs_with_reparti
) -> Result<()> {
assert_snapshot!(
union_with_mix_of_presorted_and_explicitly_resorted_inputs_impl(false).await?,
- @r#"
+ @r"
AggregateExec: mode=Final, gby=[id@0 as id], aggr=[], ordering_mode=Sorted
- SortExec: expr=[id@0 ASC NULLS LAST], preserve_partitioning=[false]
- CoalescePartitionsExec
- AggregateExec: mode=Partial, gby=[id@0 as id], aggr=[]
- UnionExec
- DataSourceExec: file_groups={1 group:
[[{testdata}/alltypes_tiny_pages.parquet]]}, projection=[id],
output_ordering=[id@0 ASC NULLS LAST], file_type=parquet
+ SortPreservingMergeExec: [id@0 ASC NULLS LAST]
+ AggregateExec: mode=Partial, gby=[id@0 as id], aggr=[],
ordering_mode=Sorted
+ UnionExec
+ DataSourceExec: file_groups={1 group:
[[{testdata}/alltypes_tiny_pages.parquet]]}, projection=[id],
output_ordering=[id@0 ASC NULLS LAST], file_type=parquet
+ SortExec: expr=[id@0 ASC NULLS LAST], preserve_partitioning=[false]
Review Comment:
Great push down
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]