thinkharderdev commented on PR #3380: URL: https://github.com/apache/arrow-datafusion/pull/3380#issuecomment-1239247569
> > A separate conceptual question is around optimizing the number of distinct filters. In this design we simply assume that we want to break the filter into as many distinct predicates as we can but I'm not sure that is always the case given that this forces serial evaluation of the filters. I can imagine many cases where it would be better to group predicates together for evaluation. I didn't want to make the initial implementation too complicated so I punted on that for now, but eventually may want to do cost estimation at a higher level to determine the optimal grouping. > > @thinkharderdev Agree! I remember each distinct filters will apply to the projected col with `selection`. > > One thing i want to mention , when applying filter pushdowm to parquet, some `filters exprs` are `partial_filters`, it will also exits in `filer operator`. I think before all filters base on min_max are `partial_filters`(is there any situation pushDowan to parquet use `full_filters`🤔 ). > > After use this row_filter i think it could be a `full_filters` (we need some code change in push down rule implemention)and then we could eliminate the `filters exprs` in `filter operator`.🤔 @alamb I think you are familiar with this(rewrite the push down expr) Yes! This is I think the next phase. Once we can push down exact filters to the scan we can represent that in the `ListingTable`. The pushdown doesn't actually rewrite the filters. The existing filter `Expr` just get pushed down and it's actually `PruningPredicate` which rewrites them as min/max filters on the statistics. But they all (currently) get pushed down as inexact which means they would get executed twice (once in the scan and once again in the filter operator). If the optimizer can push down ALL the filters as exact then we can eliminate the `Filter` operator entirely (which also unlocks the possibility of pushing the limit down to the scan as well if there is one) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org