Daniel Rossos created FLINK-39584:
-------------------------------------
Summary: Expand filter-merging for source-reuse
Key: FLINK-39584
URL: https://issues.apache.org/jira/browse/FLINK-39584
Project: Flink
Issue Type: Improvement
Components: Table SQL / Planner
Reporter: Daniel Rossos
h1. Summary
ScanReuser does not have the option to unify scans of the same table that
differ in their pushed-down filters. Sources with different FilterPushDownSpec
values are treated as distinct because FilterPushDownSpec is included in the
scan digest used by ReusableScanVisitor. This causes duplicate source reads
even when the underlying data could be served by a single read.
For context, ScanReuser already unifies scans with different projections by
creating a single scan with the superset of columns and adding per-consumer
Calc nodes. The same pattern does not extend to filter differences today.
h1.
Impact
For sources where data scanning is expensive computationally or monetarily
(BigQuery Storage Read API sessions, JDBC query execution) this means N reads
when one could suffice. Cost scales with the number of references rather than
with the volume of unique data. This is especially pronounced in the multi-sink
StatementSet fan-out pattern, where one upstream table feeds multiple
downstream sinks with different filters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)