[GitHub] [arrow-datafusion] b41sh commented on pull request #719: Optimize min/max queries with table statistics

GitBox Thu, 15 Jul 2021 08:13:33 -0700


b41sh commented on pull request #719:
URL: https://github.com/apache/arrow-datafusion/pull/719#issuecomment-880777702



   > This looks really cool @b41sh -- thank you very much for the contribution. 
It is not all that often one gets a 600x speedup :)
   > 
   > The one thing I worry about / wonder about is "how do we ensure no one 
breaks this by accident as we refactor or change the code in the future"
   > 
   > Perhaps we could follow the model of 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests/parquet_pruning.rs#L44
 (or maybe just extend that test) by:
   > 
   > 1. Adding some statistics to the parquet scan about total row groups read  
or rows read
   > 2. Run a query with min/max and validate that no actual row groups are 
read.
   > 
   > What do you think?
   
   hi, @alamb 
   Thanks for your review.
   I will add some tests for this case, I'm still working on it and will submit 
it later
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] b41sh commented on pull request #719: Optimize min/max queries with table statistics

Reply via email to