This is similar to CALCITE-2166 where a RelNode’s best cost could increase 
after its input RelSubset cardinality is changed. Unfortunately there’s no easy 
way to fix this with current framework design. In theory, the cardinality and 
uniqueness of a RelSubset should never changed per definition of equivalent 
set. The cost propagation logic is built upon such assumption, which I think is 
fine.

Regarding this particular issue, I think the JIRA is pointing to the right 
direction. We should probably fix defineMaterialization() to provide uniqueness 
info in the first place. This would be a much easier fix than updating cost 
propogation.

> On Jan 8, 2020, at 7:21 AM, Vladimir Sitnikov <sitnikov.vladi...@gmail.com> 
> wrote:
> 
> Hi,
> 
> As far as I understand, the incremental best/bestCost maintenance at
> RelSubset level does not really work.
> 
> That issue is triggered a lot in MaterializationTests due to
> https://issues.apache.org/jira/browse/CALCITE-3682
> (MaterializationService#defineMaterialization
> loses information on unique keys)
> In other words, materialization does not have uniqueness information, so
> when the planner realizes that materialization is connected to the source
> table,
> it suddenly receives extra metadata which alters cost estimates
> dramatically.
> 
> Here's the setup:
> 1) RelSubset assumes that best and bestCost are always maintained
> incrementally.
> 2) If a relation changes (e.g. it is added to a subset), the cost change is
> propagated to parentRels (~all the rels that might have that rel as input).
> 
> The propagation happens only in case the new rel is a new best (see [1]).
> So far it looks ok: if we have the new best, then we propagate to other
> parents.
> If the new rel is worse than the previous best, why bother with propagation?
> 
> == Now comes the issue ==
> The newly added rel might easily affect the costs of other rels even if the
> rel is not the best in its subset.
> 
> Here's how that is possible:
> RelMdColumnUniqueness#areColumnsUnique(RelSubset, ...) iterates over all
> the rels in the subset,
> so even if the newly added rel is not the best, it might happen to
> answer areColumnsUnique request
> so other cost functions that rely on uniqueness (e.g. cardinality
> estimations) would change.
> 
> In other words: if the planner somehow realizes a certain subset returns
> unique rows, then a join (in a very distant subset) that was supposed to be
> M*N
> becomes M+N, and its cost greatly reduces even though the subset's best is
> not changed.
> 
> At this point, I'm inclined that incremental bestCost maintenance is not
> really possible.
> 
> Any thoughts?
> 
> [1]:
> https://github.com/apache/calcite/blob/571731b80a58eb095ebac7123285c375e7afff90/core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java#L358-L360
> 
> Vladimir

Reply via email to