On Thu, Feb 19, 2009 at 7:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Thu, Feb 19, 2009 at 1:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> [ back to planner stuff after a hiatus ] > >> Well, as I wrote upthread: > > What you're actually suggesting is modifying the executor to incorporate > the unique-fication logic into hashjoin and/or mergejoin. Maybe, but > that code is way too complex already for my taste (especially mergejoin) > and what we'd save is, hmm, four lines in the planner.
I'm not entirely following the implications for semijoins but I know I've noticed more than a few cases where an option to Hash to only gather unique values seems like it would be a win. Consider cases like this where we hash the values twice: postgres=# explain select * from generate_series(1,1000) as a(i) where i in (select * from generate_series(1,100) as b(i)); QUERY PLAN -------------------------------------------------------------------------------------------- Hash Join (cost=19.50..45.75 rows=1000 width=4) Hash Cond: (a.i = b.i) -> Function Scan on generate_series a (cost=0.00..12.50 rows=1000 width=4) -> Hash (cost=17.00..17.00 rows=200 width=4) -> HashAggregate (cost=15.00..17.00 rows=200 width=4) -> Function Scan on generate_series b (cost=0.00..12.50 rows=1000 width=4) (6 rows) It's tempting to have Hash cheat and just peek at the node beneath it to see if it's a HashAggregate, in which case it could call a special method to request the whole hash. But it would have to know that it's just a plain uniquify and not implementing a GROUP BY. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers