[DISCUSS] propagateCostImprovements vs incremental bestCost maintenance vs metadata

Vladimir Sitnikov Wed, 08 Jan 2020 07:22:18 -0800

Hi,

As far as I understand, the incremental best/bestCost maintenance at
RelSubset level does not really work.

That issue is triggered a lot in MaterializationTests due to
https://issues.apache.org/jira/browse/CALCITE-3682
(MaterializationService#defineMaterialization
loses information on unique keys)
In other words, materialization does not have uniqueness information, so
when the planner realizes that materialization is connected to the source
table,
it suddenly receives extra metadata which alters cost estimates
dramatically.

Here's the setup:
1) RelSubset assumes that best and bestCost are always maintained
incrementally.
2) If a relation changes (e.g. it is added to a subset), the cost change is
propagated to parentRels (~all the rels that might have that rel as input).

The propagation happens only in case the new rel is a new best (see [1]).
So far it looks ok: if we have the new best, then we propagate to other
parents.
If the new rel is worse than the previous best, why bother with propagation?

== Now comes the issue ==
The newly added rel might easily affect the costs of other rels even if the
rel is not the best in its subset.

Here's how that is possible:
RelMdColumnUniqueness#areColumnsUnique(RelSubset, ...) iterates over all
the rels in the subset,
so even if the newly added rel is not the best, it might happen to
answer areColumnsUnique request
so other cost functions that rely on uniqueness (e.g. cardinality
estimations) would change.

In other words: if the planner somehow realizes a certain subset returns
unique rows, then a join (in a very distant subset) that was supposed to be
M*N
becomes M+N, and its cost greatly reduces even though the subset's best is
not changed.

At this point, I'm inclined that incremental bestCost maintenance is not
really possible.

Any thoughts?

[1]:
https://github.com/apache/calcite/blob/571731b80a58eb095ebac7123285c375e7afff90/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L358-L360

Vladimir

[DISCUSS] propagateCostImprovements vs incremental bestCost maintenance vs metadata

Reply via email to