Hi Tom:
Sorry for the delayed response! I think my knowledge has been refreshed
for this discussion.
> One thing I'm not at all clear about is whether we need to restrict
> the optimization so that it doesn't occur if the subquery contains
> outer references falling outside available_rels. I think that that
> case is covered by is_simple_subquery() deciding later to not pull up
> the subquery based on LATERAL restrictions, but maybe that misses
> something.
>
I think we need the restriction and that should be enough for this feature
. Given the query Richard provided before:
explain
select * from tenk1 A where exists
(select 1 from tenk2 B
where A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd));
It first can be converted to the below format without any issue.
SELECT * FROM tenk1 A SEMI JOIN tenk2 B
on A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd);
Then without the restriction, since we only pull the varnos from
sublink->testexpr, then it is {A}, so it convert to
SELECT * FROM
(tenk1 A SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C)
ON c.odd = b.odd AND a.hundred = v.hundred)
SEMI JOIN on tenk2 B ON TRUE;
then the above query is NOT A VALID QUERY since:
1. The above query is *not* same as
SELECT * FROM (tenk1 A SEMI JOIN tenk2 B) on true
SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C) v
ON v.odd = b.odd;
2. The above query requires b.odd when B is not available. So it is
right that an optimizer can't generate a plan for it. The fix would
be to do the restriction before applying this optimization.
I'm not sure pull-up-subquery can play any role here, IIUC, the bad thing
happens before pull-up-subquery.
I also write & analyze more test and found no issue by me
1. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up to rarg of the left join since A.hundred is not
available.
2. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
==> should not be pull-up to rarg of the left join since A.odd is not
available.
3. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should be pull-up to rarg of left join.
4. SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> pull-up as expected.
5. SELECT * FROM tenk1 A RIGHT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up into larg of left join since b.odd is not
available.
About the existing test case changes because of this patch, they do
requires on the sublink is planned to a subPlan, so I introduces the below
changes to keep the original intention.
Changes
A in (SELECT A FROM ..)
To
(A, random() > 0) in (SELECT a, random() > 0 FROM ..);
I'm also wondering whether the similar restriction in
> convert_EXISTS_sublink_to_join could be removed similarly.
> In this light it was a mistake for convert_EXISTS_sublink_to_join
> to manage the pullup itself rather than doing it in the two-step
> fashion that convert_ANY_sublink_to_join does it.
>
>
Yes, it is true! I prefer to believe this deserves a separate patch.
Any feedback is welcome!
--
Best Regards
Andy Fan
From 3e2c8b46928a7f246ee7066b9ad160cf18a2e952 Mon Sep 17 00:00:00 2001
From: Andy Fan <[email protected]>
Date: Wed, 5 Apr 2023 14:45:11 +0800
Subject: [PATCH v3] Pull up direct-correlated ANY_SUBLINK using lateral join.
---
.../postgres_fdw/expected/postgres_fdw.out | 6 +-
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +-
src/backend/optimizer/plan/subselect.c | 19 +++-
src/test/regress/expected/join.out | 14 +--
src/test/regress/expected/subselect.out | 102 ++++++++++++++++++
src/test/regress/sql/join.sql | 8 +-
src/test/regress/sql/subselect.sql | 32 ++++++
7 files changed, 164 insertions(+), 21 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 04a3ef450cf..d43f74e49bf 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11563,7 +11563,7 @@ CREATE FOREIGN TABLE foreign_tbl (b int)
CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
SERVER loopback OPTIONS (table_name 'base_tbl');
EXPLAIN (VERBOSE, COSTS OFF)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
QUERY PLAN
-----------------------------------------------------------------------------
Seq Scan on public.base_tbl
@@ -11571,7 +11571,7 @@ SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
Filter: (SubPlan 1)
SubPlan 1
-> Result
- Output: base_tbl.a
+ Output: base_tbl.a, (random() > '0'::double precision)
-> Append
-> Async Foreign Scan on public.foreign_tbl foreign_tbl_1
Remote SQL: SELECT NULL FROM public.base_tbl
@@ -11579,7 +11579,7 @@ SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
Remote SQL: SELECT NULL FROM public.base_tbl
(11 rows)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
a
---
1
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4f3088c03ea..8d775fdcf57 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3816,8 +3816,8 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
SERVER loopback OPTIONS (table_name 'base_tbl');
EXPLAIN (VERBOSE, COSTS OFF)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
-- Clean up
DROP FOREIGN TABLE foreign_tbl CASCADE;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 052263aea6d..4708a5363ac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1279,15 +1279,24 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
List *subquery_vars;
Node *quals;
ParseState *pstate;
+ Relids sub_ref_outer_relids = NULL;
+ bool use_lateral = false;
Assert(sublink->subLinkType == ANY_SUBLINK);
/*
- * The sub-select must not refer to any Vars of the parent query. (Vars of
- * higher levels should be okay, though.)
+ * If the sub-select refers to any Vars of the parent query, we have to
+ * treat it as LATERAL. (Vars of higher levels don't matter here.)
*/
- if (contain_vars_of_level((Node *) subselect, 1))
- return NULL;
+ sub_ref_outer_relids = pull_varnos_of_level(NULL, (Node *) subselect, 1);
+
+ if (!bms_is_empty(sub_ref_outer_relids))
+ {
+ if (bms_is_subset(sub_ref_outer_relids, available_rels))
+ use_lateral = true;
+ else
+ return NULL;
+ }
/*
* The test expression must contain some Vars of the parent query, else
@@ -1324,7 +1333,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
nsitem = addRangeTableEntryForSubquery(pstate,
subselect,
makeAlias("ANY_subquery", NIL),
- false,
+ use_lateral,
false);
rte = nsitem->p_rte;
parse->rtable = lappend(parse->rtable, rte);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5d59ed7890f..317d9240dfd 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4898,7 +4898,7 @@ reset enable_nestloop;
explain (costs off)
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
QUERY PLAN
----------------------------------------------------------
Hash Join
@@ -4914,7 +4914,7 @@ select a.unique1, b.unique2
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
unique1 | unique2
---------+---------
123 | 123
@@ -6686,12 +6686,12 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------
Nested Loop
Output: "*VALUES*".column1, t1.q1, t1.q2, ss2.q1, ss2.q2
-> Seq Scan on public.int8_tbl t1
@@ -6708,7 +6708,7 @@ lateral (select * from int8_tbl t1,
Filter: (SubPlan 3)
SubPlan 3
-> Result
- Output: t3.q2
+ Output: t3.q2, (random() > '0'::double precision)
One-Time Filter: $4
InitPlan 1 (returns $2)
-> Result
@@ -6725,7 +6725,7 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 22af8fafa17..49325058c5d 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1926,3 +1926,105 @@ select * from x for update;
Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3
(2 rows)
+-- Pull-up the direct-correlated ANY_SUBLINK
+explain (costs off)
+select * from tenk1 A where hundred in (select hundred from tenk2 B where B.odd = A.odd);
+ QUERY PLAN
+------------------------------------------------------------
+ Hash Join
+ Hash Cond: ((a.odd = b.odd) AND (a.hundred = b.hundred))
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> HashAggregate
+ Group Key: b.odd, b.hundred
+ -> Seq Scan on tenk2 b
+(7 rows)
+
+explain (costs off)
+select * from tenk1 A where exists
+(select 1 from tenk2 B
+where A.hundred in (select C.hundred FROM tenk2 C
+WHERE c.odd = b.odd));
+ QUERY PLAN
+---------------------------------
+ Nested Loop Semi Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Seq Scan on tenk2 b
+ SubPlan 1
+ -> Seq Scan on tenk2 c
+ Filter: (odd = b.odd)
+(8 rows)
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.hundred is not avaiable.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+---------------------------------
+ Nested Loop Left Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Seq Scan on tenk2 b
+ SubPlan 1
+ -> Seq Scan on tenk2 c
+ Filter: (odd = b.odd)
+(8 rows)
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.odd is not avaiable for this.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
+ QUERY PLAN
+---------------------------------
+ Nested Loop Left Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Seq Scan on tenk2 b
+ SubPlan 1
+ -> Seq Scan on tenk2 c
+ Filter: (odd = a.odd)
+(8 rows)
+
+-- should be able to pull up since all the references is available
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+------------------------------------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Hash Join
+ Hash Cond: ((b.odd = c.odd) AND (b.hundred = c.hundred))
+ -> Seq Scan on tenk2 b
+ -> Hash
+ -> HashAggregate
+ Group Key: c.odd, c.hundred
+ -> Seq Scan on tenk2 c
+(10 rows)
+
+-- we can pull up the sublink into the inner JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A INNER JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+-------------------------------------------------
+ Hash Join
+ Hash Cond: (c.odd = b.odd)
+ -> Hash Join
+ Hash Cond: (a.hundred = c.hundred)
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> HashAggregate
+ Group Key: c.odd, c.hundred
+ -> Seq Scan on tenk2 c
+ -> Hash
+ -> Seq Scan on tenk2 b
+(11 rows)
+
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index a630f58b571..e91d25c0915 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1721,11 +1721,11 @@ reset enable_nestloop;
explain (costs off)
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
--
-- test full-join strength reduction
@@ -2373,7 +2373,7 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
@@ -2382,7 +2382,7 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 40276708c99..b2dd094983d 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -968,3 +968,35 @@ select * from (with x as (select 2 as y) select * from x) ss;
explain (verbose, costs off)
with x as (select * from subselect_tbl)
select * from x for update;
+
+-- Pull-up the direct-correlated ANY_SUBLINK
+explain (costs off)
+select * from tenk1 A where hundred in (select hundred from tenk2 B where B.odd = A.odd);
+
+explain (costs off)
+select * from tenk1 A where exists
+(select 1 from tenk2 B
+where A.hundred in (select C.hundred FROM tenk2 C
+WHERE c.odd = b.odd));
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.hundred is not avaiable.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.odd is not avaiable for this.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
+
+-- should be able to pull up since all the references is available
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the sublink into the inner JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A INNER JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
--
2.21.0