[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

James Starr (Jira) Thu, 17 Sep 2020 12:42:26 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197921#comment-17197921
 ]


James Starr commented on CALCITE-4242:
--------------------------------------

Sorry, about not validating the previous query on postgres.  The null 
conditional tripped me up.  The correct query for postgres would be:

{code:sql}
// Some comments here
public String getFoo()
 WITH
q_present AS (SELECT TRUE as present FROM q),
r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z),
q_present_given_r_present_given_z AS (
    SELECT distinct TRUE AS present, r_present_given_z.present AS r_present, 
r_present_given_z.z AS r_z
    FROM q_present
    LEFT JOIN r_present_given_z ON TRUE
)
SELECT my_p.x
FROM P my_p
LEFT JOIN q_present_given_r_present_given_z ON 
q_present_given_r_present_given_z.r_z = my_p.x
WHERE NOT(
  (q_present_given_r_present_given_z.present IS NOT NULL)
  AND NOT (q_present_given_r_present_given_z.r_present IS NOT NULL)
);
{code}

{code:sql}
CREATE TABLE P(x INTEGER);
CREATE TABLE Q(y INTEGER);
CREATE TABLE R(z INTEGER);
INSERT INTO P VALUES (1);
INSERT INTO Q VALUES (1);

 x 
---
 1
(1 row)

DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (1);
INSERT INTO P VALUES (0);
INSERT INTO P VALUES (0);
INSERT INTO Q VALUES (1);
INSERT INTO Q VALUES (0);
INSERT INTO R VALUES (1);
INSERT INTO R VALUES (0);

 x 
---
 1
 1
 0
 0
(4 rows)

DELETE FROM P;
DELETE FROM Q;
DELETE FROM R;
INSERT INTO P VALUES (1);

 x 
---
 1
(1 row)

{code}

The semi left join/distinct/aggregation handles the multiplicity.  All of the 
conditional need to hoisted to the top where clause.  If the joins are kept 
right-associative, then only top most query joined, in the example 
q_present_given_r_present_given_z, needs to have an aggregation.  However, I 
think this could result in cross product joins if there are 2 nested queries in 
a query that is already nested.  However, most likely the optimization rules 
should be able to rearrange the join such that it be right or left associative 
is not a question of performance, simply of correctness, assuming that the 
rewrite is in such a way that it does not block the rule. 

This query could be very expensive if it was right associative and the planner 
did not rearrange the joins.  
{code:sql}
CREATE TABLE t_1(t1_id INTEGER);
CREATE TABLE t_2(y INTEGER);
CREATE TABLE t_2_1(t1_id INTEGER);
CREATE TABLE t_2_2(t1_id INTEGER);

SELECT * FROM t_1
WHERE EXISTS (
  SELECT * FROM t_2
  WHERE EXISTS(SELECT * FROM t_2_1 WHERE t1.t1_id = t2_1.t1_id)
  AND EXISTS(SELECT * FROM t_2_2 WHERE t1.t1_id = t2_2.t1_id)
);
{code}

So I believe the algorithm works out to something like this:
{code:java}
//De-morgan the expression such that you have a normalized AND expression
List<RexNode> demorganSubExpression = demorganToAnd(subExpression);
List<RexNode> resultExpression = new ArrayList<>();
for(RexNode subExpresion:  demorganSubExpression){
  if(hasSubQuery(subExpression)) {
    bb.hoiste(subExpression);
  } else {
    resultExpression.add(subExpression);
  }
  return buildAndExpression(resultExpression);
{code}

With bb.hoiste traversing up the tree until until all elements are in scope.


> Wrong plan for nested NOT EXISTS subqueries
> -------------------------------------------
>
>                 Key: CALCITE-4242
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4242
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Martin Raszyk
>            Priority: Major
>
> Suppose we initialize an empty database as follows.
>  
> {code:java}
> CREATE TABLE P(x INTEGER);
> CREATE TABLE Q(y INTEGER);
> CREATE TABLE R(z INTEGER);
> INSERT INTO P VALUES (1);
> INSERT INTO Q VALUES (1);{code}
>  
> The following query is supposed to yield an empty table as the result.
>  
> {code:java}
> SELECT x FROM P
> WHERE NOT EXISTS (
>   SELECT y FROM Q
>   WHERE NOT EXISTS (
>     SELECT z FROM R
>     WHERE x = z
>   )
> ){code}
>  
> However, the query is parsed and converted to the following plan
> {code:java}
> LogicalProject(X=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
>     LogicalJoin(condition=[=($0, $1)], joinType=[left])
>       LogicalTableScan(table=[[Bug, P]])
>       LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>         LogicalProject(Z=[$1], $f0=[true])
>           LogicalFilter(condition=[IS NULL($2)])
>             LogicalJoin(condition=[true], joinType=[left])
>               LogicalTableScan(table=[[Bug, Q]])
>               LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>                 LogicalProject(Z=[$0], $f0=[true])
>                   LogicalTableScan(table=[[Bug, R]])
> {code}
> that corresponds to the following SQL query
> {code:java}
> SELECT P.X
> FROM Bug.P
> LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1
> FROM Bug.Q
> LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1
> FROM Bug.R
> GROUP BY Z) AS t0 ON TRUE
> WHERE t0.$f1 IS NULL
> GROUP BY t0.Z) AS t3 ON P.X = t3.Z
> WHERE t3.$f1 IS NULL
> {code}
> which yields the (non-empty) table P as the result.
> Hence, the parsed and converted query is not equivalent to the input query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4242) Wrong plan for nested NOT EXISTS subqueries

Reply via email to