backkem commented on issue #7871: URL: https://github.com/apache/arrow-datafusion/issues/7871#issuecomment-1833540670
The core problem here is: how far do you want to go with allowing the `TableProvider` to express compute ability versus a more full-blown form of [query federation](https://github.com/apache/arrow-datafusion/issues/970). The question is: where do you draw the line in expanding the `TableProvider` trait. Do we allow pushing down sort, aggregation, UDFs, etc.. ? It may be a good idea to come up with a clear rule of where to draw this line. One option would be to fully lean into the query federation idea: provide a good framework for that, provide basic implementation with filter/limit/sort pushdown out of the box and foster development of more complex federation cases (such as for remote DBMSs) out of tree. One notable case regarding the complexity of this kind of pushdown is when you start combining joins with limit/sort pushdown. Not all tables may have the needed columns to sort/filter on. In that case you'd need something like what the Velox docs call [Dynamic Filter Pushdown](https://facebookincubator.github.io/velox/develop/joins.html) to avoid full table scans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
