> Upon further review, it appears that a big part of this problem is > that cost_hashjoin() doesn't understand that it needs cost semi-joins > differently from inner or left joins. The bogus logic looks to be > right here: > startup_cost += hash_qual_cost.startup; > run_cost += hash_qual_cost.per_tuple * > outer_path_rows * clamp_row_est(inner_path_rows * > innerbucketsize) * 0.5; > > Of course, when the join type is JOIN_SEMI, we're going to stop > looking after we find the first match, so this estimate is really far > off.
The 8.3 version of cost_hashjoin() had a line like this: joininfactor = join_in_selectivity(&path->jpath, root); and a cost function like this: run_cost += hash_qual_cost.per_tuple * outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) * joininfactor * 0.5; This compensated for IN joins being able to stop scanning a bucket once a match is found. You may consider something similar for a semi-join. Having experimented with a lot of this code recently, there is some potential for improvement on the bucket sizes, etc., but it is a non-trivial problem. I tested a similar query on TPCH 100M1Z on version 8.3: select * from customer where c_custkey in (select o_custkey from orders) and found that hash aggregate was marginally faster. If you turn off aggregation, it selects an IN hash join which is about 5% slower and the planner is not too far off. So, it would be definitely possible to modify the cost function appropriately. > >>> It's tempting to have Hash cheat and just peek at the node beneath it > >>> to see if it's a HashAggregate, in which case it could call a special > >>> method to request the whole hash. But it would have to know that it's > >>> just a plain uniquify and not implementing a GROUP BY. If HashAggregate is faster, then the question is can you make it better by avoiding building the hash structure twice. I haven't considered all the possibilities, but the situation you have used as an example, an IN query, seems workable. Instead of translating to a hash aggregate/hash/hash join query plan, it may be possible to create a special hash join node that does uniquefy. The benefit is that the planner knows about it (instead of changing the execution plan), you can be more accurate on costs for the hash join, and you can optimize by using only one hash table construction. A challenge that must be dealt with is handling the multi-batch case. It appears that hash aggregate does not currently handle this, but I may be mistaken. -- Ramon Lawrence -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers