zhuqi-lucas commented on PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#issuecomment-4159463797

   @adriangb Good catch on the benchmarks showing no change! The issue was 
twofold:
   
   1. **TPC-H data has files in alphabetical order that happens to match sort 
key order** — so `validated_output_ordering()` already preserves the ordering 
on main, and `EnforceSorting` eliminates the Sort without our optimization.
   
   2. **Found and fixed a bug**: when files are in wrong order but 
non-overlapping, the `Unsupported` fallback was only returning `Inexact`. Now 
it re-checks after sorting files by statistics and upgrades to `Exact` when 
valid — this is the core value of this PR.
   
   Created https://github.com/apache/datafusion/pull/21266 to generate 
benchmark data with **reversed file names** (alphabetical order ≠ sort key 
order). Once that merges, `run benchmarks sort_pushdown_sorted` will show the 
difference:
   - **main**: files in wrong order → Sort stays
   - **this PR**: files reordered by statistics → Sort eliminated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to