[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121177#comment-17121177
 ] 

Xiening Dai commented on CALCITE-3963:
--------------------------------------

_*What's wrong with my suggestion to treat all RelNode instances in a set as 
equivalent? Use an order-independent (and monotonic) folding operations such as 
'min', 'max', 'union' to combine property values.*_

Right now each RelNode has its own algorithm, and come up with estimated row 
count independently. From relational algebra point of view, I just don't see 
why we would average them or choose the min/max value to represent the set. 
Also in the example I described, let's say we have a materialized view that's 
equivalant to a join, then we have a TableScan node existed in the same RelSet 
as the join node. Now the table scan come from a materialized view which has 
the accurate statistics, such as row count, why would we average it with the 
join node, or use the min/max between them? The other example is MultiJoin. 
Currently MultiJoin doesn't have an implementation of getEstimatedRowCount(), 
and always returns 1 as row count. That's understandable since it's usually 
harder to estimate row count when you have multiple join inputs. After it's 
been converted into LogicalJoin, we get a better estimate. Using the confidence 
level, we can now update RelSet row count to use the one from LogicalJoin. 
Again it won't make sense if we average them or use the min/max here.

Regarding the non-deterministic comment, I believe as long as the rule firing 
order doesn't change, the behavior is deterministic. If rule firing order is 
different, a lot of things could be different. Even for the best plan, we don't 
update sub set's best as long as the cost is the same, which means we always 
choose the first plan among those that have the same cost. Thus when the rule 
firing order is changed, the order of rel node creation is changed, the best 
plan could be different. We have seen that a lot.

> Maintain logical properties at RelSet (equivalent group) instead of RelNode
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to