alamb commented on issue #10313:
URL: https://github.com/apache/datafusion/issues/10313#issuecomment-2089129451

   > Would this ticket be an appropriate place to add tickets related to 
pushing down sorts to federated query engines? I know that this was discussed 
previously (i.e. #7871) and it seems that writing a custom optimizer is the 
current way to handle that.
   
   I added #7871 to the list above -- thank you.
   
   Yes I think this would be a good place to discuss 
   
   > I will need to do this soon (federated sort pushdown) and it initially 
wasn't clear to me how to make this work in DataFusion. I can volunteer to 
write some docs on how to do this once I have an implementation that works.
   
   That would be great, thanks @phillipleblanc 
   
   Right now, once `TableProvider::execute` gets called, the returned 
`ExecutionPlan` can report how it is already sorted.
   
   What we don't have is any way to have the optimizer tell a `ExecutionPlan` 
that it could reduce the work required in the DataFusion plan if the data was 
already sorted.
   
   I wonder if we could add something to `ExecutionPlan` trait similar to 
[`ExecutionPlan::repartitioned`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html#method.repartitioned)
 like
   ```rust
   trait ExecutionPlan {
   ...
     /// return other possible orders that this ExecutionPlan could return
     /// (the DataFusion optimizer will use this information to potentially 
push Sorts 
     /// into the Node
     fn pushable_sorts(&self) -> Result<Option<PotentialSortOrders>>> {
       return Ok(None)
     }
   
     /// return a node like this one except that it its output is sorted 
according to exprs
    fn resorted(&self) -> Result<Option<Arc<dyn ExecutionPlan>>> {
     return Ok(None)
    }
   ```
   
   And then add a new optimizer pass that tries to push sorts into the plan 
nodes that report they can provide sorted data 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to