adriangb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2977125894
@alamb should we be running these checks _for every batch_? obviously that makes your concerns about overhead / performance _much worse_ but I think it will have an even greater impact for dynamic filters: currently once the file is opened if midway through the stream the topk state becomes such that we could exclude the whole file we still stream every row from the file and exclude it via the predicate pushdown, despite the fact that we now know from the stats that we could immediately exit. I propose the following: 1. Make a helper struct that encapsulates the state needed to prune the file based on the combination of filters + file statistics. 2. Add a method to `PhysicalExpr::is_dynamic` that leaks the necessary information to know if we should be doing these checks or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
