adriangb commented on PR #15057:
URL: https://github.com/apache/datafusion/pull/15057#issuecomment-2715592559

   The main issue I've found with this approach is marking filters as `Exact` 
or `Inexact`.
   In particular unless you mark them as `Exact` DataFusion will still need to 
pull the possibly large unshredded data to re-apply filters in a `FilterExec`. 
This doesn't completely kill performance because if the filter is selective 
there is less data to re-filter, but the worst case scenario is possibly worse 
than not having this feature at all. But I feel like this is a consequence of 
filter pushdown in general? Ignoring this change, if I have a `TableProvider` 
that returns a `DataSourceExec` and I have filter pushdown enabled, should I be 
marking _all_ of my filters as `Exact`? That seems dangerous given that it's 
not documented anywhere that filter pushdown supports all filters that 
`FilterExec` does and things like 
https://github.com/apache/datafusion/blob/9382add72b929c553ca4976d1423d8ebbc80889d/datafusion/datasource-parquet/src/row_filter.rs#L333-L336.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to