Kevin Grittner <kgri...@ymail.com> writes: > Kevin Grittner <kgri...@ymail.com> wrote: >> I applied it to master and ran the regression tests, and one of >> the subselect tests failed. >> >> This query: >> >> SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second >> Field" >> FROM SUBSELECT_TBL upper >> WHERE f1 IN >> (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);
>> [ ... ] during the `make check` or `make install-check` [ ... ] >> is missing the last two rows. Oddly, if I go into the database >> later and try it, the rows show up. It's not immediately >> apparent to me what's wrong. > Using the v2 patch, with the default statistics from table > creation, the query modified with an alias of "lower" for the > second reference, just for clarity, yields a plan which generates > incorrect results: > Hash Join (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 > rows=3 loops=1) > Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = > lower.f2)) > -> Seq Scan on subselect_tbl upper (cost=0.00..27.70 rows=1770 width=16) > (actual time=0.006..0.007 rows=8 loops=1) > -> Hash (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 > rows=5 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 1kB > -> HashAggregate (cost=32.12..34.12 rows=200 width=12) (actual > time=0.014..0.018 rows=6 loops=1) > -> Seq Scan on subselect_tbl lower (cost=0.00..27.70 > rows=1770 width=12) (actual time=0.002..0.004 rows=8 loops=1) > Total runtime: 0.111 ms FWIW, that plan isn't obviously wrong; if it is broken, most likely the reason is that the HashAggregate is incorrectly unique-ifying the lower table. (Unfortunately, EXPLAIN doesn't show enough about the HashAgg to know what it's doing exactly.) The given query is, I think, in principle equivalent to SELECT ... FROM SUBSELECT_TBL upper WHERE (f1, f2::float) IN (SELECT f2, f3 FROM SUBSELECT_TBL); and if you ask unmodified HEAD to plan that you get Hash Join (cost=41.55..84.83 rows=442 width=16) Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double precision = subselect_tbl.f3)) -> Seq Scan on subselect_tbl upper (cost=0.00..27.70 rows=1770 width=16) -> Hash (cost=38.55..38.55 rows=200 width=12) -> HashAggregate (cost=36.55..38.55 rows=200 width=12) -> Seq Scan on subselect_tbl (cost=0.00..27.70 rows=1770 width=12) which is the same thing at the visible level of detail ... but this version computes the correct result. The cost of the HashAggregate is estimated higher, though, which suggests that maybe it's distinct'ing on two columns where the bogus plan only does one. Not sure about where Antonin's patch is going off the rails. I suspect it's too simple somehow, but it's also possible that it's OK and the real issue is some previously undetected bug in LATERAL processing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers