alamb commented on code in PR #17563:
URL: https://github.com/apache/datafusion/pull/17563#discussion_r2353207049
##########
datafusion/sqllogictest/test_files/window.slt:
##########
@@ -5310,15 +5310,16 @@ logical_plan
05)--------TableScan: t1 projection=[c1, c2]
physical_plan
01)SortPreservingMergeExec: [c1@0 ASC NULLS LAST, c2@1 ASC NULLS LAST, rank@2
ASC NULLS LAST]
-02)--ProjectionExec: expr=[c1@0 as c1, c2@1 as c2, rank() PARTITION BY [t1.c1]
ORDER BY [t1.c2 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT
ROW@2 as rank]
-03)----BoundedWindowAggExec: wdw=[rank() PARTITION BY [t1.c1] ORDER BY [t1.c2
ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field {
name: "rank() PARTITION BY [t1.c1] ORDER BY [t1.c2 ASC NULLS LAST] RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: UInt64, nullable:
false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted]
-04)------SortExec: expr=[c1@0 ASC NULLS LAST, c2@1 ASC NULLS LAST],
preserve_partitioning=[true]
-05)--------CoalesceBatchesExec: target_batch_size=1
-06)----------RepartitionExec: partitioning=Hash([c1@0], 2), input_partitions=2
-07)------------CoalesceBatchesExec: target_batch_size=1
-08)--------------FilterExec: c1@0 = 2 OR c1@0 = 3
-09)----------------RepartitionExec: partitioning=RoundRobinBatch(2),
input_partitions=1
-10)------------------DataSourceExec: partitions=1, partition_sizes=[1]
+02)--SortExec: expr=[c1@0 ASC NULLS LAST, c2@1 ASC NULLS LAST, rank@2 ASC
NULLS LAST], preserve_partitioning=[true]
Review Comment:
Won't this change cause a regression in performance for this query? The plan
now has an extra sort where the plan before hand had no sort after the
`BoundedWindowAggExec`
##########
datafusion/sqllogictest/test_files/window.slt:
##########
@@ -5310,15 +5310,16 @@ logical_plan
05)--------TableScan: t1 projection=[c1, c2]
physical_plan
01)SortPreservingMergeExec: [c1@0 ASC NULLS LAST, c2@1 ASC NULLS LAST, rank@2
ASC NULLS LAST]
-02)--ProjectionExec: expr=[c1@0 as c1, c2@1 as c2, rank() PARTITION BY [t1.c1]
ORDER BY [t1.c2 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT
ROW@2 as rank]
-03)----BoundedWindowAggExec: wdw=[rank() PARTITION BY [t1.c1] ORDER BY [t1.c2
ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field {
name: "rank() PARTITION BY [t1.c1] ORDER BY [t1.c2 ASC NULLS LAST] RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: UInt64, nullable:
false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted]
-04)------SortExec: expr=[c1@0 ASC NULLS LAST, c2@1 ASC NULLS LAST],
preserve_partitioning=[true]
-05)--------CoalesceBatchesExec: target_batch_size=1
-06)----------RepartitionExec: partitioning=Hash([c1@0], 2), input_partitions=2
-07)------------CoalesceBatchesExec: target_batch_size=1
-08)--------------FilterExec: c1@0 = 2 OR c1@0 = 3
-09)----------------RepartitionExec: partitioning=RoundRobinBatch(2),
input_partitions=1
-10)------------------DataSourceExec: partitions=1, partition_sizes=[1]
+02)--SortExec: expr=[c1@0 ASC NULLS LAST, c2@1 ASC NULLS LAST, rank@2 ASC
NULLS LAST], preserve_partitioning=[true]
Review Comment:
I think the same pattern (an additional SortExec) appears in the other plans
below too
##########
datafusion/physical-plan/src/windows/mod.rs:
##########
@@ -636,7 +636,8 @@ pub fn get_window_mode(
fn sort_options_resolving_constant(expr: Arc<dyn PhysicalExpr>) ->
Vec<PhysicalSortExpr> {
vec![
- PhysicalSortExpr::new(Arc::clone(&expr), SortOptions::new(false,
false)),
+ // TODO restore while avoiding exponential planning time
Review Comment:
If we are going to go with this change, can we please file a ticket for this
and leave a link to the ticket here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]