[jira] [Updated] (CALCITE-938) Make Aggregate return more accurate rowCount if groupSet is unique keys.

Maryann Xue (JIRA) Mon, 26 Oct 2015 11:55:53 -0700

     [ 
https://issues.apache.org/jira/browse/CALCITE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Maryann Xue updated CALCITE-938:
--------------------------------
    Attachment: CALCITE-938.patch

Different from matching rules like AggregateRemoveRule, for metadata 
calculation we need a better way to return columnUniqueness for RelSubset.
There was an implementation in RelMdColumnUniqueness for RelSubset but was 
deliberately removed from real use. I figured (after running the tests) the 
reason was that the old implementation would cause infinite loop since there 
could be cyclic links in RelSubset after applying ProjectRemoveRule.
One way would be improve the old implementation by detecting and breaking the 
cyclic links when making recursive calls. But for the purpose of calculating 
cost only, we might not need to return any meaning value for RelSubset still in 
unimplementable state. So the current fix is just return the value for "best" 
rel if it's available otherwise just return unknown.

> Make Aggregate return more accurate rowCount if groupSet is unique keys.
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-938
>                 URL: https://issues.apache.org/jira/browse/CALCITE-938
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>            Priority: Minor
>         Attachments: CALCITE-938.patch
>
>
> If columns in "select distinct" are already distinct, there can be two sets 
> of equivalent rel before and after AggregateRemoveRule.
> {code}
> agg
>  |                  input
> input
> 10.0                100.0
> {code}
> Based on the default implementation of rel metadata, the rowCount of the 
> "before" rel is only 1/10 of that of the "after" rel, but meanwhile the 
> "after" rel is definitely cheaper. So the Volcano planner would most likely 
> either fail to pick the cheapest one or have an inconsistent state due to 
> CALCITE-830.
> An example (based EnumerableRel cost model):
> The plan for
> {code}
> select empno, d.deptno
> from "scott".emp
> join (select distinct deptno from "scott".dept) d
> using (deptno);
> {code}
> would be
> {code}
> EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
>   EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
>     EnumerableAggregate(group=[$0])
>       EnumerableTableScan(table=[[scott, DEPT]])
>     EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
>       EnumerableTableScan(table=[[scott, EMP]])
> {code}
> , while it should be
> {code}
> EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
>   EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
>     EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0])
>       EnumerableTableScan(table=[[scott, DEPT]])
>     EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
>       EnumerableTableScan(table=[[scott, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CALCITE-938) Make Aggregate return more accurate rowCount if groupSet is unique keys.

Reply via email to