Hi Everyone,

I would like to discuss a new FLIP (FLIP-XXX: Table Planner Source Filter
Reuse).

Brief background, today scans of the same table with different
FilterPushDownSpec values produce independent source readers because their
digests differ. For sources where scan operations are expensive (BigQuery
Storage Read API sessions, JDBC query execution etc), this results in
multiple source scans when one would suffice.

We have a public draft of the FLIP[1], as well as a working prototype on
our internal fork (to be shared soon and linked in the thread).

The main open question from the FLIP I'd most value early feedback on is
the optimization's configuration scope:
"Should this optimization remain a job-level flag consistent with the
established pattern, or should we pursue finer-grained scope (per-table or
per-scan) for v1?"

Thanks a ton in advance for the feedback,

Daniel

[1]
https://docs.google.com/document/d/1CcdogFWShLdybEBhRNvu4E7zSc0ep3hJQIlsDCnr_nc/edit?usp=sharing

Reply via email to