Hi, Thinking about this a bit more, do we really need to build the hash table on the first pass? Why not to do this:
(1) batching - read the tuples, stuff them into a simple list - don't build the hash table yet (2) building the hash table - we have all the tuples in a simple list, batching is done - we know exact row count, can size the table properly - build the table Also, maybe we could use a regular linear hash table [1], instead of using the current implementation with NTUP_PER_BUCKET=1. (Although, that'd be absolutely awful with duplicates.) regards Tomas [1] http://en.wikipedia.org/wiki/Linear_probing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers