That’s why I say it’s hard to solve under current framework design. The example 
query you provide can be, and should be, optimized during logical 
transformation phase. At that moment, there shouldn’t be any cost calculation 
since all we are doing is to explore equivalences. Once the transformation is 
done, the row count, uniqueness which could affect cost calculation shouldn’t 
change anymore. 

But with current Calcite design, rules don’t have concepts of different stages, 
which means a RelNode can be implemented before it’s been fully explored. 
That’s why we will have problem like this.


> On Jan 8, 2020, at 1:42 PM, Vladimir Sitnikov <sitnikov.vladi...@gmail.com> 
> wrote:
> 
>> In theory, the cardinality and uniqueness of a RelSubset should never
> changed per definition of equivalent set
> 
> I agree. It is like in theory there is no difference between theory and
> practice :)
> 
> What if we have select empid from (select empid from emps where empid>0)
> where empid<0  ?
> The original logical plans would likely have two filters, and metadata
> would estimate rowcount as 15% * 15% * tableRowCount (or something like
> that).
> 
> Later the optimizer might realize the relation is equivalent to an empty
> relation, and it could refine the row count as 0.
> So cardinality estimates of a set can vary over time, and I don't think we
> can prevent that.
> 
> It would be nice if metadata could identify a fixed point somehow like in
> dataflow algorithms.
> 
>> We should probably fix defineMaterialization() to provide uniqueness info
> in the first place
> 
> I'm not sure that is the only trigger.
> The thing is VolcanoPlanner.isValid is not activated by default, so we do
> not see cost-related assertion errors.
> 
> Vladimir

Reply via email to