Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: > Previous discussions of Hash Joins have noted that the performance > decreases when the average number of tuples per bucket increases.
Having the hash table so small that we have hash bucket collisions with different 32bit hash values is extremely painful, however.. > The correct calculation that would match the objective set out in the > comment would be > > dbuckets = (hash_table_bytes / tupsize) / NTUP_PER_BUCKET; This looks to be driving the size of the hash table size off of "how many of this size tuple can I fit into memory?" and ignoring how many actual rows we have to hash. Consider a work_mem of 1GB with a small number of rows to actually hash- say 50. With a tupsize of 8 bytes, we'd be creating a hash table sized for some 13M buckets. > This solves the problem in earlier discussions since we get a lower > average number of tuples per bucket and yet we also get to keep the > current NTUP_PER_BUCKET value. Everybody wins. I agree that this should reduce the number of tuples per bucket, but I don't think it's doing what you expect and based on what it's doing, it seems wasteful. > Comments? Based on your argument that we want to have a bucket load which is, on average, the size of NTUP_PER_BUCKET, I have to '-1' on this. What we want is to size a table large enough that we never have any true collisions (cases where different 32-bit hash values end up in the same bucket) *for all tuples being hashed*, that includes both the side building the hash table and the side doing the scan. This should be done when we can avoid batching- if we don't have enough to avoid batching while having such a large table, we should consider adjusting the hash table size down some because, in those cases, we're memory constrained. When we have enough memory to avoid batching, we never want to have to check down through a bucket for a tuple which doesn't exist- we should simply be able to look in the hash table itself, determine (with a single fetch) that there are no entries for that hash value, and throw the tuple away and move on to the next. Thanks, Stephen
signature.asc
Description: Digital signature