Daniel Rossos created FLINK-39584:
-------------------------------------

             Summary: Expand filter-merging for source-reuse
                 Key: FLINK-39584
                 URL: https://issues.apache.org/jira/browse/FLINK-39584
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / Planner
            Reporter: Daniel Rossos


h1. Summary

ScanReuser does not have the option to unify scans of the same table that 
differ in their pushed-down filters. Sources with different FilterPushDownSpec 
values are treated as distinct because FilterPushDownSpec is included in the 
scan digest used by ReusableScanVisitor. This causes duplicate source reads 
even when the underlying data could be served by a single read.

For context, ScanReuser already unifies scans with different projections by 
creating a single scan with the superset of columns and adding per-consumer 
Calc nodes. The same pattern does not extend to filter differences today.
h1. 
Impact


For sources where data scanning is expensive computationally or monetarily 
(BigQuery Storage Read API sessions, JDBC query execution) this means N reads 
when one could suffice. Cost scales with the number of references rather than 
with the volume of unique data. This is especially pronounced in the multi-sink 
StatementSet fan-out pattern, where one upstream table feeds multiple 
downstream sinks with different filters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to