Andrew Gierth <and...@tao11.riddles.org.uk> writes: > A quick test suggests that initializing the hash value to ~0 rather than > 0 has a curious effect: the number of batches still explodes, but the > performance does not suffer the same way. (I think because almost all > the batches end up empty.) I think this is worth doing even in the > absence of a more general solution; nulls are common enough and > important enough that they shouldn't be the worst-case value if it can > be avoided.
I think that's unlikely to be a path to a good solution. At least part of the problem here is that estimate_hash_bucketsize() supposes that nulls can be ignored --- which is normally true, and invalidates your claim that they're common. But in a RIGHT JOIN situation, they need to be considered as if they were regular keys. That would probably be sufficient to dissuade the planner from choosing a hash join in this example. There may also be something we can do in the executor, but it would take closer analysis to figure out what's going wrong. I don't think kluging the behavior for NULL in particular is the answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers