SubhamSinghal opened a new pull request, #21647:
URL: https://github.com/apache/datafusion/pull/21647
## Which issue does this PR close?
- Closes #.
## Rationale for this change
'
`PruningPredicate` currently cannot prune Parquet row groups for predicates
with arithmetic expressions like `col + 5 > 10` or `date_col + INTERVAL '30
days' > '2024-01-01'`. The `rewrite_expr_to_prunable` function only handles
plain columns, CAST, TRY_CAST, negation, and NOT — arithmetic `BinaryExpr`
falls through to "can't prune", meaning every row group is scanned.
This is especially impactful for date/timestamp arithmetic in WHERE clauses (
`WHERE order_date + INTERVAL '30 days' > CURRENT_DATE`), which is very
common in analytics queries on Parquet tables.
## What changes are included in this PR?
Added support for arithmetic expressions (`+`, `-`) in
`rewrite_expr_to_prunable`. The approach is "evaluate on min/max" — the
arithmetic expression is passed through as the `column_expr`, and the existing
`rewrite_column_expr` machinery substitutes `col` → `col_max`/`col_min` inside
the arithmetic, producing predicates like `(col_max + 5) > 10`.
## Are these changes tested?
Yes, with UT
## Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]