Hello everyone! I have been working on a push-down feature and would like to give a brief update on what is done and is still under works.
*Things that are done*: General API for SQL IOs to provide information about what filters/projects they support [1]: - *Filter* can be unsupported, supported with field reordering, and supported without field reordering. - *Predicate* is broken down into a conjunctive normal form (CNF) and passed to a validator class to check what parts are supported or unsupported by an IO. A Calcite rule [2] that checks for push-down support, constructs a new IO source Rel [3] with pushed-down projects and filters when applicable, and preserves unsupported filters/projects. BigQuery should perform push-down when running queries in DIRECT_READ method [4]. MongoDB project push-down support is in a PR [5] and predicate support will be added soon. *Things that are in progress:* Documenting how developers can enable push-down for IOs that support it. Documenting certain limitation for BigQuery push-down (ex: comparing values of 2 columns is not supported at the moment, so it is being preserved in a Calc). Updating google-cloud-bigquerystorage to 0.117.0-beta. Earlier versions have a gRPC message limit set to ~11MB, which may cause some pipelies to break when reading from a table with rows larger than the limit. Adding some sort of performance tests to run continuously to measure speed-up and detect regressions. Deciding how cost should be computed for the IO source Rel with push-down [6]. Right now the following formula is used: cost of an IO without push-down minus the normalized (between 0.0 and 1.0) benefit of a performed push-down. The challenge here is to make the change to the cost small enough to not break join reordering, but large enough to make the optimizer favor pushed-down IO. If you have any suggestions/questions/concerns I would love to hear them. [1] https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java#L36 [2] https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java [3] https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java [4] https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L128 [5] https://github.com/apache/beam/pull/10095 [6] https://github.com/apache/beam/pull/10060 -- Kirill
