So far, a suquery of ANY sublink located in WHERE/ON clause can't
reference vars exactly one level up, as long as pull-up into the join
tree is expected. Now that we have LATERAL subqueries (there seem to be
no specifics of SEMI JOIN when it comes to parameterization etc), I
think this restriction can be lifted. Thus a subplan should be avoided
often.

Not sure if something like that is applicable to EXISTS: various parts
are cut-off, so there are probably no vars having (varlevelsup == 1).

The attachments show cases where the SEMI JOIN should be inserted above
INNER JOIN and into the nullable side of OUTER JOIN respectively, each
before the patch is applied and after that.

So far I didn't test recursive processing, but don't expect problems here.

Can the change be as simple as this or do I neglect anything?

// Antonin Houska (Tony)
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 0df70c4..062175f 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1167,6 +1167,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	JoinExpr   *result;
 	Query	   *parse = root->parse;
 	Query	   *subselect = (Query *) sublink->subselect;
+	bool	subqueryLateral;
 	Relids		upper_varnos;
 	int			rtindex;
 	RangeTblEntry *rte;
@@ -1176,12 +1177,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 
 	Assert(sublink->subLinkType == ANY_SUBLINK);
 
-	/*
-	 * The sub-select must not refer to any Vars of the parent query. (Vars of
-	 * higher levels should be okay, though.)
-	 */
-	if (contain_vars_of_level((Node *) subselect, 1))
-		return NULL;
+	subqueryLateral = contain_vars_of_level((Node *) subselect, 1);
 
 	/*
 	 * The test expression must contain some Vars of the parent query, else
@@ -1215,7 +1211,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	rte = addRangeTableEntryForSubquery(NULL,
 										subselect,
 										makeAlias("ANY_subquery", NIL),
-										false,
+										subqueryLateral,
 										false);
 	parse->rtable = lappend(parse->rtable, rte);
 	rtindex = list_length(parse->rtable);
EXPLAIN
SELECT	*
FROM	tab1
WHERE	i = ANY (SELECT  k
		FROM	tab2
		WHERE	k = j);
                          QUERY PLAN                          
--------------------------------------------------------------
 Seq Scan on tab1  (cost=0.00..39388.68 rows=1070 width=8)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Seq Scan on tab2  (cost=0.00..36.75 rows=11 width=4)
           Filter: (k = tab1.j)
(5 rows)

                             QUERY PLAN                             
--------------------------------------------------------------------
 Hash Join  (cost=73.64..76.50 rows=6 width=8)
   Hash Cond: (tab2.k = tab1.j)
   ->  HashAggregate  (cost=36.75..38.75 rows=200 width=4)
         ->  Seq Scan on tab2  (cost=0.00..31.40 rows=2140 width=4)
   ->  Hash  (cost=36.75..36.75 rows=11 width=8)
         ->  Seq Scan on tab1  (cost=0.00..36.75 rows=11 width=8)
               Filter: (j = i)
(7 rows)

EXPLAIN
SELECT	*
FROM	tab1 a
	LEFT JOIN
	tab1 b
	ON a.i = b.i
WHERE	a.i = ANY (
		SELECT  tab2.k
		FROM	tab2
		WHERE	k = a.j);
                              QUERY PLAN                              
----------------------------------------------------------------------
 Hash Left Join  (cost=58.15..39850.22 rows=11449 width=16)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on tab1 a  (cost=0.00..39388.68 rows=1070 width=8)
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Seq Scan on tab2  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (k = a.j)
   ->  Hash  (cost=31.40..31.40 rows=2140 width=8)
         ->  Seq Scan on tab1 b  (cost=0.00..31.40 rows=2140 width=8)
(9 rows)

                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Hash Right Join  (cost=76.57..116.64 rows=59 width=16)
   Hash Cond: (b.i = a.i)
   ->  Seq Scan on tab1 b  (cost=0.00..31.40 rows=2140 width=8)
   ->  Hash  (cost=76.50..76.50 rows=6 width=8)
         ->  Hash Join  (cost=73.64..76.50 rows=6 width=8)
               Hash Cond: (tab2.k = a.i)
               ->  HashAggregate  (cost=36.75..38.75 rows=200 width=4)
                     ->  Seq Scan on tab2  (cost=0.00..31.40 rows=2140 width=4)
               ->  Hash  (cost=36.75..36.75 rows=11 width=8)
                     ->  Seq Scan on tab1 a  (cost=0.00..36.75 rows=11 width=8)
                           Filter: (i = j)
(11 rows)

CREATE TABLE tab1(i int, j int);
CREATE TABLE tab2(k int, l int);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to