[ https://issues.apache.org/jira/browse/CALCITE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197921#comment-17197921 ]
James Starr commented on CALCITE-4242: -------------------------------------- Sorry, about not validating the previous query on postgres. The null conditional tripped me up. The correct query for postgres would be: {code:sql} // Some comments here public String getFoo() WITH q_present AS (SELECT TRUE as present FROM q), r_present_given_z AS (SELECT TRUE as present, z FROM r GROUP BY z), q_present_given_r_present_given_z AS ( SELECT distinct TRUE AS present, r_present_given_z.present AS r_present, r_present_given_z.z AS r_z FROM q_present LEFT JOIN r_present_given_z ON TRUE ) SELECT my_p.x FROM P my_p LEFT JOIN q_present_given_r_present_given_z ON q_present_given_r_present_given_z.r_z = my_p.x WHERE NOT( (q_present_given_r_present_given_z.present IS NOT NULL) AND NOT (q_present_given_r_present_given_z.r_present IS NOT NULL) ); {code} {code:sql} CREATE TABLE P(x INTEGER); CREATE TABLE Q(y INTEGER); CREATE TABLE R(z INTEGER); INSERT INTO P VALUES (1); INSERT INTO Q VALUES (1); x --- 1 (1 row) DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); INSERT INTO P VALUES (1); INSERT INTO P VALUES (0); INSERT INTO P VALUES (0); INSERT INTO Q VALUES (1); INSERT INTO Q VALUES (0); INSERT INTO R VALUES (1); INSERT INTO R VALUES (0); x --- 1 1 0 0 (4 rows) DELETE FROM P; DELETE FROM Q; DELETE FROM R; INSERT INTO P VALUES (1); x --- 1 (1 row) {code} The semi left join/distinct/aggregation handles the multiplicity. All of the conditional need to hoisted to the top where clause. If the joins are kept right-associative, then only top most query joined, in the example q_present_given_r_present_given_z, needs to have an aggregation. However, I think this could result in cross product joins if there are 2 nested queries in a query that is already nested. However, most likely the optimization rules should be able to rearrange the join such that it be right or left associative is not a question of performance, simply of correctness, assuming that the rewrite is in such a way that it does not block the rule. This query could be very expensive if it was right associative and the planner did not rearrange the joins. {code:sql} CREATE TABLE t_1(t1_id INTEGER); CREATE TABLE t_2(y INTEGER); CREATE TABLE t_2_1(t1_id INTEGER); CREATE TABLE t_2_2(t1_id INTEGER); SELECT * FROM t_1 WHERE EXISTS ( SELECT * FROM t_2 WHERE EXISTS(SELECT * FROM t_2_1 WHERE t1.t1_id = t2_1.t1_id) AND EXISTS(SELECT * FROM t_2_2 WHERE t1.t1_id = t2_2.t1_id) ); {code} So I believe the algorithm works out to something like this: {code:java} //De-morgan the expression such that you have a normalized AND expression List<RexNode> demorganSubExpression = demorganToAnd(subExpression); List<RexNode> resultExpression = new ArrayList<>(); for(RexNode subExpresion: demorganSubExpression){ if(hasSubQuery(subExpression)) { bb.hoiste(subExpression); } else { resultExpression.add(subExpression); } return buildAndExpression(resultExpression); {code} With bb.hoiste traversing up the tree until until all elements are in scope. > Wrong plan for nested NOT EXISTS subqueries > ------------------------------------------- > > Key: CALCITE-4242 > URL: https://issues.apache.org/jira/browse/CALCITE-4242 > Project: Calcite > Issue Type: Bug > Reporter: Martin Raszyk > Priority: Major > > Suppose we initialize an empty database as follows. > > {code:java} > CREATE TABLE P(x INTEGER); > CREATE TABLE Q(y INTEGER); > CREATE TABLE R(z INTEGER); > INSERT INTO P VALUES (1); > INSERT INTO Q VALUES (1);{code} > > The following query is supposed to yield an empty table as the result. > > {code:java} > SELECT x FROM P > WHERE NOT EXISTS ( > SELECT y FROM Q > WHERE NOT EXISTS ( > SELECT z FROM R > WHERE x = z > ) > ){code} > > However, the query is parsed and converted to the following plan > {code:java} > LogicalProject(X=[$0]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[=($0, $1)], joinType=[left]) > LogicalTableScan(table=[[Bug, P]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$1], $f0=[true]) > LogicalFilter(condition=[IS NULL($2)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalTableScan(table=[[Bug, Q]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(Z=[$0], $f0=[true]) > LogicalTableScan(table=[[Bug, R]]) > {code} > that corresponds to the following SQL query > {code:java} > SELECT P.X > FROM Bug.P > LEFT JOIN (SELECT t0.Z, MIN(TRUE) AS $f1 > FROM Bug.Q > LEFT JOIN (SELECT Z, MIN(TRUE) AS $f1 > FROM Bug.R > GROUP BY Z) AS t0 ON TRUE > WHERE t0.$f1 IS NULL > GROUP BY t0.Z) AS t3 ON P.X = t3.Z > WHERE t3.$f1 IS NULL > {code} > which yields the (non-empty) table P as the result. > Hence, the parsed and converted query is not equivalent to the input query. -- This message was sent by Atlassian Jira (v8.3.4#803005)