On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: > Dear PostgreSQL Hackers: > > After following the hackers mailing list for quite a while, > I am going to start investigating what will need to be done > to improve hash index performance. Below are the pieces of > this project that I am currently considering: > > 1. Characterize the current hash index implementation against > the BTree index, with a focus on space utilization and > lookup performance against a collection of test data. This > will give a baseline performance test to evaluate the impact > of changes. I initially do not plan to bench the hash creation > process since my initial focus will be on lookup performance. >
Here are very basic results for a table with 1.6m entries: postgres=# CREATE TABLE dict (word varchar(100)); CREATE TABLE postgres=# COPY dict FROM '/tmp/words'; COPY 1648379 postgres=# select count(*) from dict; count --------- 1648379 (1 row) Time: 11187.418 ms postgres=# select count(*) from dict; count --------- 1648379 (1 row) Time: 6040.912 ms postgres=# CREATE INDEX wordhash ON dict USING hash (word); CREATE INDEX Time: 11108707.160 ms postgres=# select * from dict where word = 'avatar'; word -------- avatar (1 row) Time: 79.823 ms postgres=# select * from dict where word = 'zebra'; word ------- zebra (1 row) Time: 9.864 ms postgres=# select * from dict where word = 'turkey'; word -------- turkey (1 row) Time: 18.418 ms Time: 1.045 ms Time: 1.257 ms Time: 1.080 ms postgres=# CREATE INDEX wordbtree ON dict USING btree (word); CREATE INDEX Time: 25438.884 ms postgres=# select * from dict where word = 'avatar'; word -------- avatar (1 row) Time: 13.400 ms postgres=# select * from dict where word = 'zebra'; word ------- zebra (1 row) Time: 1.173 ms postgres=# select * from dict where word = 'turkey'; word -------- turkey (1 row) Time: 1.186 ms Time: 1.103 ms Time: 1.099 ms Time: 1.108 ms ------------------------------ Size of table = 87556096 Size of hash index = 268451840 Size of btree index = 53510144 >From my very small sample on an unloaded machine, a hash index lookup took the least amount of time. It had a much larger initial time which could be attributable to cache population effects. The size is 5X that of the Btree index. I will continue to improve the test suite as more granularity is needed. If anyone has a good data generator, please let me know. Otherwise I will just roll my own. Regards, Ken ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match