phillipleblanc commented on issue #10313:
URL: https://github.com/apache/datafusion/issues/10313#issuecomment-2099627792

   After digging into and understanding how the `datafusion-federation` crate 
works, I don't think we need anything additional for sort pushdown. I basically 
came to the same realization that @backkem had in 
https://github.com/apache/datafusion/issues/7871#issuecomment-1833540670.
   
   My realization essentially comes down to (please correct me if this is 
incorrect):
   
   DataFusion is a library that provides both query planning (`LogicalPlan`) 
and query execution (`ExecutionPlan`). When we are connecting a set of tables 
from a remote query engine into DataFusion, what we really want is the ability 
to get an optimized logical plan and send that plan to be executed by the 
remote query engine - in its entirety, bypassing the query execution of 
DataFusion as much as possible. (In reality we still want the query execution 
DataFusion provides for more complex queries that involve custom UDFs, joins 
between two different remote query engines, etc).
   
   The `TableProvider` construct is part of the query execution 
(`ExecutionPlan` level) machinery of DataFusion, so trying to teach it to be 
smarter for the query federation case is an anti-pattern in my mind. But we 
still need a `TableProvider` to be registered so we can take advantage of the 
logical planning (via the auto-transformation of a `TableProvider` to a 
`TableSource` in said planning). The `datafusion-federation` repo solves this 
by using a thin wrapper around a `TableProvider` called a 
`FederatedTableProviderAdaptor` whose entire job is to provide a `TableSource` 
during logical planning. And through a custom `FederationQueryPlanner` - it 
recognizes when there are `TableScan`s of a `FederatedTableProviderAdaptor` and 
knows to delegate the query execution for the largest LogicalPlan sub-tree that 
includes only `TableScan`s from the same source to that source (via the 
deparsing back to SQL).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to