Hello everyone!

I have been working on a push-down feature and would like to give a brief
update on what is done and is still under works.

*Things that are done*:
General API for SQL IOs to provide information about what filters/projects
they support [1]:
- *Filter* can be unsupported, supported with field reordering, and
supported without field reordering.
- *Predicate* is broken down into a conjunctive normal form (CNF) and
passed to a validator class to check what parts are supported or
unsupported by an IO.

A Calcite rule [2] that checks for push-down support, constructs a new IO
source Rel [3] with pushed-down projects and filters when applicable, and
preserves unsupported filters/projects.

BigQuery should perform push-down when running queries in DIRECT_READ
method [4].

MongoDB project push-down support is in a PR [5] and predicate support will
be added soon.


*Things that are in progress:*
Documenting how developers can enable push-down for IOs that support it.

Documenting certain limitation for BigQuery push-down (ex: comparing values
of 2 columns is not supported at the moment, so it is being preserved in a
Calc).

Updating google-cloud-bigquerystorage to 0.117.0-beta. Earlier versions
have a gRPC message limit set to ~11MB, which may cause some pipelies to
break when reading from a table with rows larger than the limit.

Adding some sort of performance tests to run continuously to
measure speed-up and detect regressions.

Deciding how cost should be computed for the IO source Rel with push-down
[6]. Right now the following formula is used: cost of an IO without
push-down minus the normalized (between 0.0 and 1.0) benefit of a performed
push-down.
The challenge here is to make the change to the cost small enough to not
break join reordering, but large enough to make the optimizer favor
pushed-down IO.


If you have any suggestions/questions/concerns I would love to hear them.

[1]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java#L36
[2]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
[3]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java
[4]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L128
[5] https://github.com/apache/beam/pull/10095
[6] https://github.com/apache/beam/pull/10060

--
Kirill

Reply via email to