alamb commented on issue #7871:
URL: 
https://github.com/apache/arrow-datafusion/issues/7871#issuecomment-1832767485

   > We could pass sort exprs to scan, then users can construct ExecutionPlan 
based on sort exprs.
   
   I think the challenge is that DataFusion currently treats the sort order 
from an `ExecutionPlan` like "if it has a sort order, I will try and use it" 
rather than "I will try and push Sort into the scan"
   
   Instead, DataFusion will introduce `SortExec` to resort the data if that is 
necessary to answer the query.
   
   In order to "push" sorts into ExecutionPlans / scans, we would need some way 
to help DataFusion figure out if it should push the sort into the scan, or use 
a Sort Exec afterwards
   
   For example, it is not clear which of the following plans is better as it 
depends on how the Sort within ExecutionPlan was implemented  
   
   ```
   SortExec
     Filter
      Scan (no sort)
   ```
   
   vs 
   
   ```
     Filter
      Scan (Sort in the Scan)
   ```
   
   Depending on how selective the filter, it may be better to do the scan / 
filter and then sort.
   
   of course in this case the filter is likely pushed down to the scan too, but 
I think in general the same issue still applies
   
   For this usecase, I suggest adding a custom optimizer pass that does the 
sort pushdown you want and can take advantage of the details of what the 
underlying source is to make these choices
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to