jnh5y opened a new pull request, #28556:
URL: https://github.com/apache/flink/pull/28556
## What is the purpose of the change
`PushFilterInCalcIntoTableSourceScanRule` — the rule that fires when a
`Filter` sits above a `Calc` above a table source scan — was unaware of
metadata filter push-down, causing two bugs:
1. **Crash on compiled-plan restore.** For sources implementing both
`SupportsFilterPushDown` and `SupportsReadingMetadata`, the rule routed all
predicates (including metadata-column predicates) through the physical path,
storing a `FilterPushDownSpec` with raw `RexInputRef` indices. When
`ProjectPushDownSpec` later narrowed the scan's row type, those indices no
longer corresponded to the right columns, crashing restore.
2. **Missed push-down for metadata-only sources.** The rule's `matches()`
guard only checked `canPushdownFilter()` (physical), so for sources
implementing only `SupportsReadingMetadata` the rule never fired at all.
This PR also fixes a latent classification bug in both rules: the old
heuristic detected metadata columns via `inputRef.getIndex() >=
physicalColumnCount`, which is incorrect after projection narrows the scan
type. Mixed predicates like `OR(id > 0, m0 > 0)` (spanning physical and
metadata columns) must stay as runtime Calc filters but were instead routed
into the physical push-down path.
## Brief change log
- `PushFilterInCalcIntoTableSourceScanRule.matches()` now also checks
`canPushdownMetadataFilter()`, enabling the rule to fire for metadata-only
sources
- Extract duplicated classify-and-push logic into
`classifyAndPushFilters()` in `PushFilterIntoSourceScanRuleBase`, shared by
both filter-push-down rules
- Replace the `physicalColumnCount` heuristic with a `Set<Integer>` of
metadata column indices derived from the scan's current row type via the
column-to-metadata-key map, correctly handling projection-narrowed scans
- Add `referencesAnyMetadataColumns()` to detect mixed predicates and
route them to the runtime Calc instead of the physical push-down path
- Document the identity contract on
`SupportsReadingMetadata.applyMetadataFilters` Javadoc
## Verifying this change
This change added tests and can be verified as follows:
- Added `testMixedOrPredicateStaysAsRuntimeFilter` to
`MetadataFilterResultShapesITCase`: asserts that a predicate touching both
physical and metadata columns (`id < 3 OR m0 > 0`) never appears in
`filter=[...]` or `metadataFilter=[...]` and produces correct runtime results
- Added `testMixedOrWithSeparatePhysicalPredicate` to
`MetadataFilterResultShapesITCase`: asserts that a pure physical predicate is
still pushed while a mixed OR predicate stays as a runtime Calc
- Existing `MetadataFilterResultShapesITCase` tests (accept-all,
accept-none, partial accept, best-effort overlap) continue to pass, covering
the `PushFilterInCalcIntoTableSourceScanRule` path via the shared
`classifyAndPushFilters()` method
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: yes (`SupportsReadingMetadata` — Javadoc only, no
interface change)
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes (Claude Sonnet 4.6)
<!--
Generated-by: Claude Sonnet 4.6
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]