Neil Conway wrote: > You might find this patch useful: > > http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
Oh, I had forgot about that. > It implements the "just store the hash in the index" idea; it also sorts > the entries in a bucket by the hash value, which allows binary search to > be used to locate candidate matches. > > I was surprised that this didn't result in a performance improvement for > the benchmarks that I ran, but I never got around to investigating > further (either running more benchmarks or checking whether there was a > bug in the implementation). You did get a smaller index, which has cache benefits with larger tables. You didn't compare the size comparison against b-tree, but a smaller index is a good thing. I think you left some money on the table on that front. Since all the HashItems are fixed size, you can get rid of the line pointers altogether. Given that a sizeof(HashItemData) is 8 bytes, and a line pointer is 4 bytes, that should give a further 1/3 reduction in index size. If you're willing to give up on the alignment of HashItemData, you can get it down to 6 bytes. Even from a CPU point of view, as the table becomes bigger and the b-tree becomes deeper, the difference between O(1) and O(log n) lookup starts to become more significant. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings