On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote: > On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: > > Dear PostgreSQL Hackers: > > > > After following the hackers mailing list for quite a while, > > I am going to start investigating what will need to be done > > to improve hash index performance. Below are the pieces of > > this project that I am currently considering: > > > > 1. Characterize the current hash index implementation against > > the BTree index, with a focus on space utilization and > > lookup performance against a collection of test data. This > > will give a baseline performance test to evaluate the impact > > of changes. I initially do not plan to bench the hash creation > > process since my initial focus will be on lookup performance. > > > > Here are very basic results for a table with 1.6m entries: > > postgres=# CREATE TABLE dict (word varchar(100)); > CREATE TABLE > > postgres=# COPY dict FROM '/tmp/words'; > COPY 1648379 > postgres=# select count(*) from dict; > count > --------- > 1648379 > (1 row) > > Time: 11187.418 ms > postgres=# select count(*) from dict; > count > --------- > 1648379 > (1 row) > > Time: 6040.912 ms > postgres=# CREATE INDEX wordhash ON dict USING hash (word); > CREATE INDEX > Time: 11108707.160 ms > > postgres=# select * from dict where word = 'avatar'; > word > -------- > avatar > (1 row) > > Time: 79.823 ms > postgres=# select * from dict where word = 'zebra'; > word > ------- > zebra > (1 row) > > Time: 9.864 ms > postgres=# select * from dict where word = 'turkey'; > word > -------- > turkey > (1 row) > > Time: 18.418 ms > Time: 1.045 ms > Time: 1.257 ms > Time: 1.080 ms > > postgres=# CREATE INDEX wordbtree ON dict USING btree (word); > CREATE INDEX > > Time: 25438.884 ms > > postgres=# select * from dict where word = 'avatar'; > word > -------- > avatar > (1 row) > > Time: 13.400 ms > postgres=# select * from dict where word = 'zebra'; > word > ------- > zebra > (1 row) > > Time: 1.173 ms > postgres=# select * from dict where word = 'turkey'; > word > -------- > turkey > (1 row) > > Time: 1.186 ms > Time: 1.103 ms > Time: 1.099 ms > Time: 1.108 ms > > ------------------------------ > Size of table = 87556096 > > Size of hash index = 268451840 > Size of btree index = 53510144 > > From my very small sample on an unloaded machine, a hash index lookup > took the least amount of time. It had a much larger initial time which > could be attributable to cache population effects. The size is 5X that > of the Btree index. I will continue to improve the test suite as more > granularity is needed. If anyone has a good data generator, please let > me know. Otherwise I will just roll my own. > > Regards, > Ken > I have just re-ran the previous benchmark against 8.3beta1 (versus 8.2) including the hash index sorted build patch and the results are below:
test=# CREATE TABLE dict (word varchar(100)); CREATE TABLE Time: 24.463 ms test=# COPY dict FROM '/tmp/cracklib-words'; LOG: checkpoints are occurring too frequently (21 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". COPY 1648379 Time: 44470.235 ms test=# select count(*) from dict; count --------- 1648379 (1 row) Time: 4553.924 ms test=# CREATE INDEX wordhash ON dict USING hash (word); CREATE INDEX Time: 85905.960 ms test=# select * from dict where word = 'avatar'; word -------- avatar (1 row) Time: 592.906 ms test=# select * from dict where word = 'zebra'; word ------- zebra (1 row) Time: 21.458 ms test=# select * from dict where word = 'turkey'; word -------- turkey (1 row) Time: 22.532 ms Time: 1.224 ms Time: 1.200 ms Time: 1.264 ms test=# CREATE INDEX wordbtree ON dict USING btree (word); CREATE INDEX Time: 33714.874 ms test=# select * from dict where word = 'avatar'; word -------- avatar (1 row) Time: 24.296 ms test=# select * from dict where word = 'zebra'; word ------- zebra (1 row) Time: 1.400 ms test=# select * from dict where word = 'turkey'; word -------- turkey (1 row) Time: 28.302 ms Time: 1.284 ms Time: 1.313 ms --------------------------------------- Size of table = 69877760 Size of hash index = 268451840 Size of btree index = 48570368 I just wanted to have this baseline for future patches since 8.3 is just around the bend. As expected, the sorted build process is a huge improvement from the unsorted original. Regards, Ken Marshall ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly