bharos opened a new pull request, #15902:
URL: https://github.com/apache/iceberg/pull/15902
### What
Implements bounds-based evaluation for `startsWith` in
`StrictMetricsEvaluator`, replacing the unconditional
`ROWS_MIGHT_NOT_MATCH` return with actual logic.
Previously, `startsWith` always returned `ROWS_MIGHT_NOT_MATCH`,
which prevented the engine from eliminating the residual predicate even
when file-level column bounds made it provable that every value started
with the given prefix.
### Changes
- **`StrictMetricsEvaluator.startsWith`**: Added checks for nested
columns, null-containing columns, and lower/upper bound comparisons
against the prefix. Returns `ROWS_MUST_MATCH` when both bounds start
with the prefix.
- **`TestStrictMetricsEvaluator`**: Added 9 test methods covering:
both bounds match prefix, single-char prefix match, only lower bound
matches, bounds outside prefix range, wider range, missing stats,
all-nulls, some-nulls, and prefix longer than bounds.
### How it works
For `STARTS WITH <prefix>`:
- If the column can contain nulls → `ROWS_MIGHT_NOT_MATCH` (conservative)
- If the lower bound is shorter than the prefix → `ROWS_MIGHT_NOT_MATCH`
- If the lower bound (truncated to prefix length) equals the prefix **and**
the upper bound (truncated to prefix length) equals the prefix →
`ROWS_MUST_MATCH` (all values in the file start with the prefix)
- Otherwise → `ROWS_MIGHT_NOT_MATCH` (conservative)
This follows the same pattern used by `eq` and the recently added
`notStartsWith` bounds check in this class.
Closes #15901
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]