[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146599#comment-17146599
 ] 

Julian Hyde commented on CALCITE-3963:
--------------------------------------

bq. Do you have a concrete example where different RelNodes in a set have 
different unique keys and a union of those would make sense?

After a query has been rewritten to a join-aggregate materialized view, the MV 
might have stricter keys than can be inferred from the original query.

bq. Regarding the minRowCount, how do we know the max value is the best or most 
accurate?

Yes. minRowCount is a lower bound (in the [mathematical 
sense|https://en.wikipedia.org/wiki/Upper_and_lower_bounds]). If one RelNode 
thinks a query returns at least 1 row, and another equivalent RelNode thinks 
that a query returns at least 2 rows, then the query must return at least 2 
rows.

In my Project vs Join example I was talking about estimatedRowCount, which is 
different from minRowCount, and neither an upper or lower bound, therefore more 
problematic to reason about.

bq. A MultiJoin has low confidence.

Not necessarily true. A MultiJoin of ({{select from emp join dept using 
(deptno)}}) has high confidence (low stdev) if we have stats on emp and dept 
and know that deptno is a fk. In fact we know precisely how many rows it will 
return. Conversely, a Project will have low confidence (high stdev) if its 
input has low confidence. It is foolish to regard any RelNode as inherently 
high or low confidence based on its type.

So I claim that we should drop the whole idea of 'confidence', because we don't 
have the resources to do it properly.

> Maintain logical properties at RelSet (equivalent group) instead of RelNode
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to