alamb opened a new pull request, #20174:
URL: https://github.com/apache/datafusion/pull/20174

   ## Which issue does this PR close?
   
   
   - Closes https://github.com/apache/datafusion/issues/20173
   
   ## Rationale for this change
   
   This is a regression we found in our unit tests.
   
   When upgrading to DataFusion 52 we hit a bug in our test cases where 
pre-sorted data was being resorted
   
   ## What changes are included in this PR?
   
    Fix the bug
   ## Are these changes tested?
   
   Yes
   
   I tried for a while with codex to get a .slt test  purely via SQL but was 
not successful. To trigger the bug the output_ordering needs to not yet be 
projected before eq_properties() runs (that’s the only case the 
project_orderings(...) fix addresses). The SQL planner seems to   always build 
output_ordering in the base schema, so the bug doesn’t manifest.
   
   Here is what I tried
   
   <details><summary>Details</summary>
   <p>
   
   
   ```sql
   
   # Create a table ordered by (a, b, c) using inline data. A filtered ORDER BY 
on b
   # should not introduce an extra SortExec after projection reorders the 
columns.
   statement ok
   COPY (
     VALUES
       (1, 1, 1),
       (1, 1, 2),
       (1, 2, 1),
       (1, 2, 3),
       (2, 1, 1),
       (2, 1, 2),
       (2, 2, 1)
   ) TO 'test_files/scratch/order/ordered_abc/part-0.csv'
   STORED AS CSV;
   
   statement ok
   CREATE EXTERNAL TABLE ordered_abc (
     a INT,
     b INT,
     c INT
   )
   STORED AS CSV
   WITH ORDER (a ASC, b ASC, c ASC)
   LOCATION 'test_files/scratch/order/ordered_abc/'
   OPTIONS ('format.has_header' 'false');
   
   query TT
   EXPLAIN SELECT b, a FROM ordered_abc WHERE a = 1 ORDER BY b;
   ----
   logical_plan
   01)Sort: ordered_abc.b ASC NULLS LAST
   02)--Projection: ordered_abc.b, ordered_abc.a
   03)----Filter: ordered_abc.a = Int32(1)
   04)------TableScan: ordered_abc projection=[a, b], 
partial_filters=[ordered_abc.a = Int32(1)]
   physical_plan
   01)SortPreservingMergeExec: [b@0 ASC NULLS LAST]
   02)--ProjectionExec: expr=[b@1 as b, a@0 as a]
   03)----FilterExec: a@0 = 1
   04)------RepartitionExec: partitioning=RoundRobinBatch(2), 
input_partitions=1, maintains_sort_order=true
   05)--------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/order/ordered_abc/part-0.csv]]},
 projection=[a, b], output_ordering=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], 
file_type=csv, has_header=false
   
   statement ok
   drop table ordered_abc;
   ```
   
   </p>
   </details> 
   
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to