alamb opened a new pull request #500: URL: https://github.com/apache/arrow-datafusion/pull/500
Closes https://github.com/apache/arrow-datafusion/issues/490 This PR adds support for pruning of boolean predicates such as `flag_col`, and `not flag_col` so that they can be used to prune row groups from parquet files and other predicates It does *not* add code to handle `flag_col = true` and `flag_col != false` (which currently error and continue to do so) as those are simplified in the ConstantEvaluation pass. This ended up being a larger change than I wanted because the logic to create `col_min` and `col_max` references was intertwined in `PruningExpressionBuilder` # Rationale for this change See https://github.com/apache/arrow-datafusion/issues/490 # What changes are included in this PR? Major changes: 1. Encapsulate `stat_column_req `into a new `RequiredStatColumns` struct 2. Move expression reference and rewriting logic to `StatisticsColumns` 3. Add rules for boolean columns # Are there any user-facing changes? Additional predicates can be used to prune -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
