Re: Re:Query planner structure

Mihai Budiu Fri, 16 Aug 2024 16:47:42 -0700

The following RelOptRulesTest reproduces the problem:

  @Test void testDecorrelate() {
    final String query = "SELECT 3 in (SELECT deptno FROM emp)";
    sql(query).withRule(
        CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE,
            CoreRules.JOIN_TO_CORRELATE)
        .withLateDecorrelate(true)
        .check();
  }


This produces 3 plans: before, mid, after.
Before:

LogicalProject(EXPR$0=[IN(3, {
LogicalProject(DEPTNO=[$7])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
})])
  LogicalValues(tuples=[[{ 0 }]])

Mid, after the two rules that I understand you have recommended:

LogicalProject(EXPR$0=[IS NOT NULL($1)])
  LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}])
    LogicalValues(tuples=[[{ 0 }]])
    LogicalAggregate(group=[{0}])
      LogicalProject(cs=[true])
        LogicalFilter(condition=[=(3, $0)])
          LogicalProject(DEPTNO=[$7])
            LogicalTableScan(table=[[CATALOG, SALES, EMP]])

After decorrelation;

LogicalProject(EXPR$0=[IS NOT NULL($1)])
  LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}])
    LogicalValues(tuples=[[{ 0 }]])
    LogicalAggregate(group=[{0}])
      LogicalProject(cs=[true])
        LogicalProject(DEPTNO=[$7])
          LogicalFilter(condition=[=(3, $7)])
            LogicalTableScan(table=[[CATALOG, SALES, EMP]])

Notice that the decorrelation has not removed the LogicalCorrelate.
________________________________
From: suibianwanwan <[email protected]>
Sent: Wednesday, August 14, 2024 7:59 PM
To: dev <[email protected]>
Subject: Re: Re:Query planner structure

I'm trying to reproduce the problem, but the result of my convert Query doesn't 
have the LogicalCorrelate.


So I tried to create a test in calcite org.apache.calcite.tools.PlannerTest as 
follows:


&nbsp; &nbsp; Planner planner = getPlanner(null);
&nbsp; &nbsp; final String sql = "SELECT 3 in (SELECT \"deptno\" FROM 
\"emps\")";
&nbsp; &nbsp; SqlNode parse = planner.parse(sql);
&nbsp; &nbsp; SqlNode validate = planner.validate(parse);
&nbsp; &nbsp; RelNode convert = planner.rel(validate).rel;

relnode as follows:
LogicalProject.NONE.[](input=LogicalValues#6,exprs=[IN(3, {
LogicalProject(deptno=[$1])
&nbsp; LogicalTableScan(table=[[hr, emps]])
})])



Do you have a way to reproduce this in calcite?


Regards,
suibianwanwan
------------------&nbsp;Original&nbsp;------------------
From:                                                                           
                                             "dev"                              
                                                      <[email protected]&gt;;
Date:&nbsp;Thu, Aug 15, 2024 06:23 AM
To:&nbsp;"dev"<[email protected]&gt;;

Subject:&nbsp;Re: Re:Query planner structure



I have tried this approach, but the decorrelator fails to decorrelate many 
queries. Here is a simple query where after decorrelation there are still 
LogicalCorrelate nodes left:

SELECT 3 in (SELECT empno FROM emp)

Maybe I am missing some optimization steps?

>From my reading of the code, the Calcite decorrelator seems to be a heuristic 
>decorrelator, which only handles a few types of patterns of correlated 
>subqueries. If your query doesn't fall into one of these patterns, the 
>decorrelator leaves it unchanged.

For example, this query, after converting it to use LogicalCorrelate as you 
have described, becomes:

&nbsp;&nbsp;&nbsp; LogicalProject(EXPR$0=[IS NOT NULL($1)]), id = 211
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalCorrelate(correlation=[$cor0], 
joinType=[left], requiredColumns=[{}]), id = 213
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalValues(tuples=[[{ 0 }]]), id 
= 180
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalAggregate(group=[{0}]), id = 
207
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
LogicalProject(cs=[true]), id = 205
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
LogicalFilter(condition=[=(3, $0)]), id = 203
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
LogicalProject(EMPNO=[$0]), id = 201
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 LogicalTableScan(table=[[schema, EMP]]), id = 182

The call to RelDecorrelator.decorrelateQuery(rel, builder) cannot decorrelate 
this query.

Do you have a solution that handles this case?

BTW: I also think that the FILTER_INTO_JOIN rule is unsound , e.g.: 
https://issues.apache.org/jira/browse/CALCITE-5627

Thank you,
Mihai
________________________________
From: suibianwanwan33 <[email protected]&gt;
Sent: Tuesday, August 13, 2024 10:31 PM
To: dev <[email protected]&gt;
Subject: Re:Query planner structure

Hi, this is one of the classic optimization cases in my work.
First you need to add all CORRELATE optimizations such as 
FILTER_SUB_QUERY_TO_CORRELATE in a HepPlanner, after that you need to do a 
decorrelateQuery, and finally use a HepPlanner to add the JOIN_CONDITION_PUSH 
and the FILTER_INTO_JOIN. my guess is that the Cartesian product of this query 
will be optimized




------------------&amp;nbsp;Original&amp;nbsp;------------------
From:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 
"dev"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 <[email protected]&amp;gt;;
Date:&amp;nbsp;Wed, Aug 14, 2024 12:51 PM
To:&amp;nbsp;"[email protected]"<[email protected]&amp;gt;;

Subject:&amp;nbsp;Query planner structure



Hello all,

Calcite contains many optimization rules. It is not obvious to me how to 
assemble them into a good planner. Can people recommend tested query planners 
based on Calcite that are open-source and that can be used as a starting point?

In particular, I am having trouble making a HEP-planner produce a good plan for 
Q21 from TPC-H; the planner I assembled produces many cartesian products.

For reference, here is the variant of Q21 that I am using:

create view q21 (
&amp;nbsp;&amp;nbsp;&amp;nbsp; s_name,
&amp;nbsp;&amp;nbsp;&amp;nbsp; numwait
) as
select
&amp;nbsp;&amp;nbsp;&amp;nbsp; s_name,
&amp;nbsp;&amp;nbsp;&amp;nbsp; count(*) as numwait
from
&amp;nbsp;&amp;nbsp;&amp;nbsp; supplier,
&amp;nbsp;&amp;nbsp;&amp;nbsp; lineitem l1,
&amp;nbsp;&amp;nbsp;&amp;nbsp; orders,
&amp;nbsp;&amp;nbsp;&amp;nbsp; nation
where
&amp;nbsp;&amp;nbsp;&amp;nbsp; s_suppkey = l1.l_suppkey
&amp;nbsp;&amp;nbsp;&amp;nbsp; and o_orderkey = l1.l_orderkey
&amp;nbsp;&amp;nbsp;&amp;nbsp; and o_orderstatus = 'F'
&amp;nbsp;&amp;nbsp;&amp;nbsp; and l1.l_receiptdate &amp;gt; l1.l_commitdate
&amp;nbsp;&amp;nbsp;&amp;nbsp; and exists (
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 *
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 lineitem l2
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 l2.l_orderkey = l1.l_orderkey
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 and l2.l_suppkey <&amp;gt; l1.l_suppkey
&amp;nbsp;&amp;nbsp;&amp;nbsp; )
&amp;nbsp;&amp;nbsp;&amp;nbsp; and not exists (
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 *
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 lineitem l3
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 l3.l_orderkey = l1.l_orderkey
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 and l3.l_suppkey <&amp;gt; l1.l_suppkey
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
 and l3.l_receiptdate &amp;gt; l3.l_commitdate
&amp;nbsp;&amp;nbsp;&amp;nbsp; )
&amp;nbsp;&amp;nbsp;&amp;nbsp; and s_nationkey = n_nationkey
&amp;nbsp;&amp;nbsp;&amp;nbsp; and n_name = 'GERMANY'
group by
&amp;nbsp;&amp;nbsp;&amp;nbsp; s_name
order by
&amp;nbsp;&amp;nbsp;&amp;nbsp; numwait desc,
&amp;nbsp;&amp;nbsp;&amp;nbsp; s_name
LIMIT 100;

Thank you for any suggestions!
Mihai

Re: Re:Query planner structure

Reply via email to