[ 
https://issues.apache.org/jira/browse/CALCITE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188707#comment-17188707
 ] 

Rui Wang commented on CALCITE-4208:
-----------------------------------

I am not familiar with the context of existing row count estimation model, just 
based on the formula here, I think:

innerJoinRowCount = leftRowCount * rightRowCount * mq.getSelectivity(join, 
condition)

leftRowCount = leftRowCount  + innerJoinRowCount =  leftRowCount * (1 + 
rightRowCount * mq.getSelectivity(join, condition)) 

similarly for right join


So if rightRowCount * mq.getSelectivity(join, condition) is much larger, that 1 
can be ignored. If 1 is the dominate part, the row count estimation won't be a 
big number anyway. 

 I am thinking that is why at least INNER/LEFT/RIGHT have the same model. Full 
join could have a similar argument.

> Improve metadata row count for Join
> -----------------------------------
>
>                 Key: CALCITE-4208
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4208
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Ruben Q L
>            Priority: Major
>
> Currently, the default metadata row count for join 
> {{RelMdRowCount#getRowCount(Join rel, RelMetadataQuery mq)}} relies on 
> {{RelMdUtil.getJoinRowCount}}. This method has several issues:
>  - In case of ANTI join, it returns the same estimation as a SEMI join
>  - In other cases (INNER, LEFT, RIGHT, FULL), it returns always the same 
> formula:
>  {{leftRowCount * rightRowCount * mq.getSelectivity(join, condition)}}
>  which seems valid for an INNER join, but not for LEFT / RIGHT / FULL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to