[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137804#comment-17137804
 ] 

Xiening Dai commented on CALCITE-3963:
--------------------------------------

{quote}
I don't understand what you mean by 'associative property in RelSet'.
{quote}

Here is the mathematical definition of semigroup -

A semigroup is a set S together with a binary operation that satisfies the 
associative property. 

My question is what would be the binary operation if we model other RelSet 
statistics as semigroup.

{quote}
 'highest confidence' does not have a satisfactory definition. Nor does 'first 
RelNode added to the set', in the case of sets merging or other perturbations 
of planning order.
{quote}

How would you define satisfactory definition? The confidence level is provided 
by RelNode, and they can be customized to reflect the accuracy level of 
estimations. For example, we can lower the confidence when a filter predicate 
contains a UDF, or when a join node has input coming from sub query. All the 
stats today are based on estimate. When you comparing between estimates, it's a 
natural way to say you would also like to get the confidence level of estimate. 
As for set merge, it works the same way as adding rel nodes into an existing 
set. 

What would be your suggestion? Do you have a concrete example? The current way 
that uses RelSubset.best is clearly wrong and buggy.

> Maintain logical properties at RelSet (equivalent group) instead of RelNode
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to