adriangb commented on PR #15057: URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2715592559
The main issue I've found with this approach is marking filters as `Exact` or `Inexact`. In particular unless you mark them as `Exact` DataFusion will still need to pull the possibly large unshredded data to re-apply filters in a `FilterExec`. This doesn't completely kill performance because if the filter is selective there is less data to re-filter, but the worst case scenario is possibly worse than not having this feature at all. But I feel like this is a consequence of filter pushdown in general? Ignoring this change, if I have a `TableProvider` that returns a `DataSourceExec` and I have filter pushdown enabled, should I be marking _all_ of my filters as `Exact`? That seems dangerous given that it's not documented anywhere that filter pushdown supports all filters that `FilterExec` does and things like https://github.com/apache/datafusion/blob/9382add72b929c553ca4976d1423d8ebbc80889d/datafusion/datasource-parquet/src/row_filter.rs#L333-L336. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org