alamb opened a new issue, #16899: URL: https://github.com/apache/datafusion/issues/16899
### Describe the bug When data is sorted on a prefix, but not all, of the input columns I expect DataFusion to use the faster / more memory efficient operator `PartialSortExec`: https://github.com/apache/datafusion/blob/07516aa4a369aceb2ef2438fa81e5a87691f76a6/datafusion/physical-plan/src/sorts/partial_sort.rs#L80 However, this does not appear to be happening ### To Reproduce ``` > copy (values(1, 'a'), (2,'b'), (3,'d'), (4,'a')) to '/tmp/order.csv'; +-------+ | count | +-------+ | 4 | +-------+ 1 row(s) fetched. Elapsed 0.005 seconds. > create external table order stored as csv location '/tmp/order.csv' with order (column1 asc); 0 row(s) fetched. Elapsed 0.002 seconds. ``` When ordering by just `column1` (which is the declared table order) we can see the plan correctly avoids sorting 🎉 ```sql > explain select * from order ORDER BY column1; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ files: 1 │ | | | │ format: csv │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ 1 row(s) fetched. Elapsed 0.009 seconds. ``` However, when ordering by `column1, column2` (the table is a prefix of the declared table order) we can see the plan uses a SortExec (which resorts the entire input, rather than just sorting within batches): ```sql > explain select * from order ORDER BY column1, column2; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ SortExec │ | | | │ -------------------- │ | | | │ column1@0 ASC NULLS LAST, │ | | | │ column2@1 ASC NULLS LAST │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ files: 1 │ | | | │ format: csv │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+ ``` ### Expected behavior I expect the existing sort order to be used Specifically, in the query above I expect that `PartialSortExec` is used instead: ```sql > explain select * from order ORDER BY column1, column2; ``` ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org