Aman Sinha created CALCITE-714:
----------------------------------
Summary: Decorrelation does not push join condition into subquery
Key: CALCITE-714
URL: https://issues.apache.org/jira/browse/CALCITE-714
Project: Calcite
Issue Type: Bug
Affects Versions: 1.1.0-incubating
Reporter: Aman Sinha
Assignee: Julian Hyde
When decorrelating a scalar aggregate subquery, we push copies of the tables
from the outer query block into the subquery but don't push the join condition,
thereby creating a cartesian join. This seems to be a regression.
Query:
{code}
select count(*)
from dfs.`/Users/asinha/data/tpch-sf1/nation` n,
dfs.`/Users/asinha/data/tpch-sf1/orders` o
where n.n_nationkey = o.o_orderkey
and n.n_nationkey > (select avg(ps.ps_suppkey) from
dfs.`/Users/asinha/data/tpch-sf1/partsupp` ps
where n.n_regionkey = ps.ps_partkey);
{code}
Here's the plan on an earlier version of Calcite (I am not sure of the version
number but it was before the decorrelation refactoring), Note the join between
nation and orders at the leaf level has a equi-join condition.
{code}
AggregateRel(group=[{}], EXPR$0=[COUNT()])
ProjectRel($f0=[$0])
ProjectRel($f0=[0])
FilterRel(condition=[>($1, $5)])
ProjectRel(*=[$0], n_nationkey=[$1], n_regionkey=[$2], *0=[$3],
o_orderkey=[$4], EXPR$0=[$6])
JoinRel(condition=[=($2, $5)], joinType=[left])
JoinRel(condition=[=($1, $4)], joinType=[inner])
EnumerableTableAccessRel(table=[[dfs,
/Users/asinha/data/tpch-sf1/nation]])
EnumerableTableAccessRel(table=[[dfs,
/Users/asinha/data/tpch-sf1/orders]])
AggregateRel(group=[{0}], EXPR$0=[AVG($1)])
ProjectRel($f0=[$1], ps_suppkey=[$0])
ProjectRel(ps_suppkey=[$2], $f0=[$3])
FilterRel(condition=[=($3, $1)])
JoinRel(condition=[true], joinType=[inner])
EnumerableTableAccessRel(table=[[dfs,
/Users/asinha/data/tpch-sf1/partsupp]])
AggregateRel(group=[{0}])
ProjectRel($f0=[$2])
JoinRel(condition=[=($1, $4)], joinType=[inner])
EnumerableTableAccessRel(table=[[dfs,
/Users/asinha/data/tpch-sf1/nation]])
EnumerableTableAccessRel(table=[[dfs,
/Users/asinha/data/tpch-sf1/orders]])
{code}
Here's the new plan (I am on version 1.1.0 but I think the plan has not changed
in the latest version). Note the join between nation and orders at the leaf
level does not have any join condition.
{code}
LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
LogicalProject($f0=[$0])
LogicalProject($f0=[0])
LogicalFilter(condition=[AND(=($1, $4), >($1, $5))])
LogicalProject(*=[$0], n_nationkey=[$1], n_regionkey=[$2], *0=[$3],
o_orderkey=[$4], EXPR$0=[$6])
LogicalJoin(condition=[=($2, $5)], joinType=[left])
LogicalJoin(condition=[true], joinType=[inner])
EnumerableTableScan(table=[[dfs,
/Users/asinha/data/tpch-sf1/nation]])
EnumerableTableScan(table=[[dfs,
/Users/asinha/data/tpch-sf1/orders]])
LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
LogicalProject(n_regionkey=[$1], ps_suppkey=[$0])
LogicalProject(ps_suppkey=[$2], n_regionkey=[$3])
LogicalFilter(condition=[=($3, $1)])
LogicalJoin(condition=[true], joinType=[inner])
EnumerableTableScan(table=[[dfs,
/Users/asinha/data/tpch-sf1/partsupp]])
LogicalAggregate(group=[{0}])
LogicalProject(n_regionkey=[$2])
LogicalJoin(condition=[true], joinType=[inner])
EnumerableTableScan(table=[[dfs,
/Users/asinha/data/tpch-sf1/nation]])
EnumerableTableScan(table=[[dfs,
/Users/asinha/data/tpch-sf1/orders]])
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)