adriangb commented on PR #16014:
URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2977125894

   @alamb should we be running these checks _for every batch_? obviously that 
makes your concerns about overhead / performance _much worse_ but I think it 
will have an even greater impact for dynamic filters: currently once the file 
is opened if midway through the stream the topk state becomes such that we 
could exclude the whole file we still stream every row from the file and 
exclude it via the predicate pushdown, despite the fact that we now know from 
the stats that we could immediately exit.
   
   I propose the following:
   1. Make a helper struct that encapsulates the state needed to prune the file 
based on the combination of filters + file statistics.
   2. Add a method to `PhysicalExpr::is_dynamic` that leaks the necessary 
information to know if we should be doing these checks or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to