On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra <t...@fuzzy.cz> wrote: > I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1) > once the table is built and there's free space in work_mem. The patch > mentioned above makes implementing this possible / rather simple.
Another idea would be to start with NTUP_PER_BUCKET=1 and then, if we run out of memory, increase NTUP_PER_BUCKET. I'd like to think that the common case is that work_mem will be large enough that everything fits; and if you do it that way, then you save yourself the trouble of rehashing later, which as you point out might lose if there are only a few probes. If it turns out that you run short of memory, you can merge pairs of buckets up to three times, effectively doubling NTUP_PER_BUCKET each time. Yet another idea is to stick with your scheme, but do the dynamic bucket splits lazily. Set a flag on each bucket indicating whether or not it needs a lazy split. When someone actually probes the hash table, if the flag is set for a particular bucket, move any tuples that don't belong as we scan the bucket. If we reach the end of the bucket chain, clear the flag. Not sure either of these are better (though I kind of like the first one) but I thought I'd throw them out there... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers