Re: [PR] feat: adaptive filter selectivity tracking for Parquet row filters [datafusion]

via GitHub Wed, 07 Jan 2026 00:44:51 -0800


Dandandan commented on PR #19639:
URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3717859751


   I wonder if a good strategy could be:
   
   * Keep `FilterExec` and `RepartitionExec` and track selectivity there 
instead of always pushing them doen
   * Only dynamically push down effective filters in parquet (like dynamic hash 
join)
   * Only push down filters that are cheap to evaluate / that will save IO 
(i.e. has multiple columns beside the predicates)
   * Optimize filter evaluation in parquet level (such as integrate 
batchcoalescer to avoid small batches / copies)
   * (Somehow) make filter evaluation in parquet parralelizable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: adaptive filter selectivity tracking for Parquet row filters [datafusion]

Reply via email to