Re: A new strategy for pull-up correlated ANY_SUBLINK

Alena Rybakina Fri, 13 Oct 2023 01:39:57 -0700

On 13.10.2023 10:04, Andy Fan wrote:


    It seems to me that the expressions "=" and "IN" are equivalent
    here due to the fact that the aggregated subquery returns only one
    value, and the result with the "IN" operation can be considered as
    the intersection of elements on the left and right. In this query,
    we have some kind of set on the left, among which there will be
    found or not only one element on the right.


Yes, they are equivalent at the final result, but there are some
differences at the execution level.  the '=' case will be transformed
to a Subplan whose subPlanType is EXPR_SUBLINK, so if there
is more than 1 rows is returned in the subplan, error will be raised.

select * from tenk1 where
  ten =  (select ten from tenk1 i where i.two = tenk1.two );

ERROR:  more than one row returned by a subquery used as an expression

However the IN case would not.
select * from tenk1 where
  ten =  (select ten from tenk1 i where i.two = tenk1.two ) is OK.

I think the test case you added is not related to this feature. the
difference is there even without the patch.  so I kept the code
you changed, but not for the test  case.

Yes, I understand and agree with you that we should delete the lastqueries, except to one.

The query below have a different result compared to master, and it iscorrect.



Without your patch:

explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop Left Join
   ->  Seq Scan on tenk1 a
   ->  Materialize
         ->  Seq Scan on tenk2 b
               Filter: (SubPlan 2)
               SubPlan 2
                 ->  Result
                       InitPlan 1 (returns $1)
                         ->  Limit

-> Index Scan using tenk2_hundred ontenk2 c

                                     Index Cond: (hundred IS NOT NULL)
                                     Filter: (odd = b.odd)
(12 rows)


After your patch:

postgres=# explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);

                           QUERY PLAN
--------------------------------------------------------------
  Nested Loop Left Join
    ->  Seq Scan on tenk1 a
    ->  Materialize
          ->  Nested Loop
                ->  Seq Scan on tenk2 b
*->  Subquery Scan on "ANY_subquery"
                      Filter: (b.hundred = "ANY_subquery".min)*
                      ->  Aggregate
                            ->  Seq Scan on tenk2 c
                                  Filter: (odd = b.odd)
(10 rows)

        I took the liberty of adding this to your patch and added
        myself as reviewer, if you don't mind.

    Sure, the patch after your modification looks better than the
    original.
    I'm not sure how the test case around "because of got one row" is
    relevant to the current changes.  After we reach to some agreement
    on the above discussion, I think v4 is good for committer to review!


    Thank you!) I am ready to discuss it.

Actually I meant to discuss the "Unfortunately, I found a request..",looks

we have reached an agreement there:)

Yes, we have)

--
Regards,
Alena Rybakina

diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 17df6b5dc9c..e41b728df83 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -2028,3 +2028,27 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
          ->  Seq Scan on tenk2 b
 (11 rows)
 
+-- we can pull up the aggregate sublink into RHS of a left join.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Nested Loop
+               ->  Seq Scan on tenk2 b
+               ->  Memoize
+                     Cache Key: b.hundred, b.odd
+                     Cache Mode: binary
+                     ->  Subquery Scan on "ANY_subquery"
+                           Filter: (b.hundred = "ANY_subquery".min)
+                           ->  Result
+                                 InitPlan 1 (returns $1)
+                                   ->  Limit
+                                         ->  Index Scan using tenk2_hundred on tenk2 c
+                                               Index Cond: (hundred IS NOT NULL)
+                                               Filter: (odd = b.odd)
+(16 rows)
+
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 49ce5fc99a8..2f3601a0582 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -1000,3 +1000,8 @@ ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
 explain (costs off)
 SELECT * FROM tenk1 A INNER JOIN tenk2 B
 ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into RHS of a left join.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
\ No newline at end of file

Re: A new strategy for pull-up correlated ANY_SUBLINK

Reply via email to