Kevin Grittner <kgri...@ymail.com> writes:
> Kevin Grittner <kgri...@ymail.com> wrote:
>> I applied it to master and ran the regression tests, and one of
>> the subselect tests failed.
>> 
>> This query:
>> 
>> SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second
>> Field"
>>    FROM SUBSELECT_TBL upper
>>    WHERE f1 IN
>>      (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);

>> [ ... ] during the `make check` or `make install-check` [ ... ]
>> is missing the last two rows.  Oddly, if I go into the database
>> later and try it, the rows show up.  It's not immediately
>> apparent to me what's wrong.

> Using the v2 patch, with the default statistics from table
> creation, the query modified with an alias of "lower" for the
> second reference, just for clarity, yields a plan which generates
> incorrect results:

>  Hash Join  (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 
> rows=3 loops=1)
>    Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = 
> lower.f2))
>    ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16) 
> (actual time=0.006..0.007 rows=8 loops=1)
>    ->  Hash  (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 
> rows=5 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 1kB
>          ->  HashAggregate  (cost=32.12..34.12 rows=200 width=12) (actual 
> time=0.014..0.018 rows=6 loops=1)
>                ->  Seq Scan on subselect_tbl lower  (cost=0.00..27.70 
> rows=1770 width=12) (actual time=0.002..0.004 rows=8 loops=1)
>  Total runtime: 0.111 ms

FWIW, that plan isn't obviously wrong; if it is broken, most likely the
reason is that the HashAggregate is incorrectly unique-ifying the lower
table.  (Unfortunately, EXPLAIN doesn't show enough about the HashAgg
to know what it's doing exactly.)  The given query is, I think, in
principle equivalent to

 SELECT ...
  FROM SUBSELECT_TBL upper
  WHERE (f1, f2::float) IN
    (SELECT f2, f3 FROM SUBSELECT_TBL);

and if you ask unmodified HEAD to plan that you get

 Hash Join  (cost=41.55..84.83 rows=442 width=16)
   Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double precision 
= subselect_tbl.f3))
   ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16)
   ->  Hash  (cost=38.55..38.55 rows=200 width=12)
         ->  HashAggregate  (cost=36.55..38.55 rows=200 width=12)
               ->  Seq Scan on subselect_tbl  (cost=0.00..27.70 rows=1770 
width=12)

which is the same thing at the visible level of detail ... but this
version computes the correct result.  The cost of the HashAggregate is
estimated higher, though, which suggests that maybe it's distinct'ing on
two columns where the bogus plan only does one.

Not sure about where Antonin's patch is going off the rails.  I suspect
it's too simple somehow, but it's also possible that it's OK and the
real issue is some previously undetected bug in LATERAL processing.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to