The following RelOptRulesTest reproduces the problem:
@Test void testDecorrelate() {
final String query = "SELECT 3 in (SELECT deptno FROM emp)";
sql(query).withRule(
CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE,
CoreRules.JOIN_TO_CORRELATE)
.withLateDecorrelate(true)
.check();
}
This produces 3 plans: before, mid, after.
Before:
LogicalProject(EXPR$0=[IN(3, {
LogicalProject(DEPTNO=[$7])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
})])
LogicalValues(tuples=[[{ 0 }]])
Mid, after the two rules that I understand you have recommended:
LogicalProject(EXPR$0=[IS NOT NULL($1)])
LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}])
LogicalValues(tuples=[[{ 0 }]])
LogicalAggregate(group=[{0}])
LogicalProject(cs=[true])
LogicalFilter(condition=[=(3, $0)])
LogicalProject(DEPTNO=[$7])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
After decorrelation;
LogicalProject(EXPR$0=[IS NOT NULL($1)])
LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}])
LogicalValues(tuples=[[{ 0 }]])
LogicalAggregate(group=[{0}])
LogicalProject(cs=[true])
LogicalProject(DEPTNO=[$7])
LogicalFilter(condition=[=(3, $7)])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
Notice that the decorrelation has not removed the LogicalCorrelate.
________________________________
From: suibianwanwan <[email protected]>
Sent: Wednesday, August 14, 2024 7:59 PM
To: dev <[email protected]>
Subject: Re: Re:Query planner structure
I'm trying to reproduce the problem, but the result of my convert Query doesn't
have the LogicalCorrelate.
So I tried to create a test in calcite org.apache.calcite.tools.PlannerTest as
follows:
Planner planner = getPlanner(null);
final String sql = "SELECT 3 in (SELECT \"deptno\" FROM
\"emps\")";
SqlNode parse = planner.parse(sql);
SqlNode validate = planner.validate(parse);
RelNode convert = planner.rel(validate).rel;
relnode as follows:
LogicalProject.NONE.[](input=LogicalValues#6,exprs=[IN(3, {
LogicalProject(deptno=[$1])
LogicalTableScan(table=[[hr, emps]])
})])
Do you have a way to reproduce this in calcite?
Regards,
suibianwanwan
------------------ Original ------------------
From:
"dev"
<[email protected]>;
Date: Thu, Aug 15, 2024 06:23 AM
To: "dev"<[email protected]>;
Subject: Re: Re:Query planner structure
I have tried this approach, but the decorrelator fails to decorrelate many
queries. Here is a simple query where after decorrelation there are still
LogicalCorrelate nodes left:
SELECT 3 in (SELECT empno FROM emp)
Maybe I am missing some optimization steps?
>From my reading of the code, the Calcite decorrelator seems to be a heuristic
>decorrelator, which only handles a few types of patterns of correlated
>subqueries. If your query doesn't fall into one of these patterns, the
>decorrelator leaves it unchanged.
For example, this query, after converting it to use LogicalCorrelate as you
have described, becomes:
LogicalProject(EXPR$0=[IS NOT NULL($1)]), id = 211
LogicalCorrelate(correlation=[$cor0],
joinType=[left], requiredColumns=[{}]), id = 213
LogicalValues(tuples=[[{ 0 }]]), id
= 180
LogicalAggregate(group=[{0}]), id =
207
LogicalProject(cs=[true]), id = 205
LogicalFilter(condition=[=(3, $0)]), id = 203
LogicalProject(EMPNO=[$0]), id = 201
LogicalTableScan(table=[[schema, EMP]]), id = 182
The call to RelDecorrelator.decorrelateQuery(rel, builder) cannot decorrelate
this query.
Do you have a solution that handles this case?
BTW: I also think that the FILTER_INTO_JOIN rule is unsound , e.g.:
https://issues.apache.org/jira/browse/CALCITE-5627
Thank you,
Mihai
________________________________
From: suibianwanwan33 <[email protected]>
Sent: Tuesday, August 13, 2024 10:31 PM
To: dev <[email protected]>
Subject: Re:Query planner structure
Hi, this is one of the classic optimization cases in my work.
First you need to add all CORRELATE optimizations such as
FILTER_SUB_QUERY_TO_CORRELATE in a HepPlanner, after that you need to do a
decorrelateQuery, and finally use a HepPlanner to add the JOIN_CONDITION_PUSH
and the FILTER_INTO_JOIN. my guess is that the Cartesian product of this query
will be optimized
------------------&nbsp;Original&nbsp;------------------
From:
"dev"
<[email protected]&gt;;
Date:&nbsp;Wed, Aug 14, 2024 12:51 PM
To:&nbsp;"[email protected]"<[email protected]&gt;;
Subject:&nbsp;Query planner structure
Hello all,
Calcite contains many optimization rules. It is not obvious to me how to
assemble them into a good planner. Can people recommend tested query planners
based on Calcite that are open-source and that can be used as a starting point?
In particular, I am having trouble making a HEP-planner produce a good plan for
Q21 from TPC-H; the planner I assembled produces many cartesian products.
For reference, here is the variant of Q21 that I am using:
create view q21 (
&nbsp;&nbsp;&nbsp; s_name,
&nbsp;&nbsp;&nbsp; numwait
) as
select
&nbsp;&nbsp;&nbsp; s_name,
&nbsp;&nbsp;&nbsp; count(*) as numwait
from
&nbsp;&nbsp;&nbsp; supplier,
&nbsp;&nbsp;&nbsp; lineitem l1,
&nbsp;&nbsp;&nbsp; orders,
&nbsp;&nbsp;&nbsp; nation
where
&nbsp;&nbsp;&nbsp; s_suppkey = l1.l_suppkey
&nbsp;&nbsp;&nbsp; and o_orderkey = l1.l_orderkey
&nbsp;&nbsp;&nbsp; and o_orderstatus = 'F'
&nbsp;&nbsp;&nbsp; and l1.l_receiptdate &gt; l1.l_commitdate
&nbsp;&nbsp;&nbsp; and exists (
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; select
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
*
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; from
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
lineitem l2
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
l2.l_orderkey = l1.l_orderkey
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
and l2.l_suppkey <&gt; l1.l_suppkey
&nbsp;&nbsp;&nbsp; )
&nbsp;&nbsp;&nbsp; and not exists (
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; select
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
*
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; from
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
lineitem l3
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
l3.l_orderkey = l1.l_orderkey
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
and l3.l_suppkey <&gt; l1.l_suppkey
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
and l3.l_receiptdate &gt; l3.l_commitdate
&nbsp;&nbsp;&nbsp; )
&nbsp;&nbsp;&nbsp; and s_nationkey = n_nationkey
&nbsp;&nbsp;&nbsp; and n_name = 'GERMANY'
group by
&nbsp;&nbsp;&nbsp; s_name
order by
&nbsp;&nbsp;&nbsp; numwait desc,
&nbsp;&nbsp;&nbsp; s_name
LIMIT 100;
Thank you for any suggestions!
Mihai