b41sh commented on pull request #719: URL: https://github.com/apache/arrow-datafusion/pull/719#issuecomment-880777702
> This looks really cool @b41sh -- thank you very much for the contribution. It is not all that often one gets a 600x speedup :) > > The one thing I worry about / wonder about is "how do we ensure no one breaks this by accident as we refactor or change the code in the future" > > Perhaps we could follow the model of https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests/parquet_pruning.rs#L44 (or maybe just extend that test) by: > > 1. Adding some statistics to the parquet scan about total row groups read or rows read > 2. Run a query with min/max and validate that no actual row groups are read. > > What do you think? hi, @alamb Thanks for your review. I will add some tests for this case, I'm still working on it and will submit it later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
