This is similar to CALCITE-2166 where a RelNode’s best cost could increase after its input RelSubset cardinality is changed. Unfortunately there’s no easy way to fix this with current framework design. In theory, the cardinality and uniqueness of a RelSubset should never changed per definition of equivalent set. The cost propagation logic is built upon such assumption, which I think is fine.
Regarding this particular issue, I think the JIRA is pointing to the right direction. We should probably fix defineMaterialization() to provide uniqueness info in the first place. This would be a much easier fix than updating cost propogation. > On Jan 8, 2020, at 7:21 AM, Vladimir Sitnikov <sitnikov.vladi...@gmail.com> > wrote: > > Hi, > > As far as I understand, the incremental best/bestCost maintenance at > RelSubset level does not really work. > > That issue is triggered a lot in MaterializationTests due to > https://issues.apache.org/jira/browse/CALCITE-3682 > (MaterializationService#defineMaterialization > loses information on unique keys) > In other words, materialization does not have uniqueness information, so > when the planner realizes that materialization is connected to the source > table, > it suddenly receives extra metadata which alters cost estimates > dramatically. > > Here's the setup: > 1) RelSubset assumes that best and bestCost are always maintained > incrementally. > 2) If a relation changes (e.g. it is added to a subset), the cost change is > propagated to parentRels (~all the rels that might have that rel as input). > > The propagation happens only in case the new rel is a new best (see [1]). > So far it looks ok: if we have the new best, then we propagate to other > parents. > If the new rel is worse than the previous best, why bother with propagation? > > == Now comes the issue == > The newly added rel might easily affect the costs of other rels even if the > rel is not the best in its subset. > > Here's how that is possible: > RelMdColumnUniqueness#areColumnsUnique(RelSubset, ...) iterates over all > the rels in the subset, > so even if the newly added rel is not the best, it might happen to > answer areColumnsUnique request > so other cost functions that rely on uniqueness (e.g. cardinality > estimations) would change. > > In other words: if the planner somehow realizes a certain subset returns > unique rows, then a join (in a very distant subset) that was supposed to be > M*N > becomes M+N, and its cost greatly reduces even though the subset's best is > not changed. > > At this point, I'm inclined that incremental bestCost maintenance is not > really possible. > > Any thoughts? > > [1]: > https://github.com/apache/calcite/blob/571731b80a58eb095ebac7123285c375e7afff90/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L358-L360 > > Vladimir