I've been reviewing the hash index build patch submitted here: http://archives.postgresql.org/pgsql-patches/2007-10/msg00154.php
Although it definitely helps on large indexes, it's actually counterproductive on not-so-large ones. The test case I'm using is random integers generated like this: create table foo as select (random() * N)::int as f1 from generate_series(1,N); select count(*) from foo; -- force hint bit updates checkpoint; then timing create index fooi on foo using hash(f1); Using all-default configuration settings on some not-very-new hardware, at N = 1E6 I see 8.3.1: 30 sec With pre-expansion of index (CVS HEAD): 24 sec With sorting: 72 sec To build a btree index on same data: 34 sec Now this isn't amazingly surprising, because the original argument for doing sorting was to improve locality of access to the index during the build, and that only matters if you've got an index significantly bigger than memory. If the index fits in RAM then the sort is pure overhead. The obvious response to this is to use the sorting approach only when the estimated index size exceeds some threshold. One possible choice of threshold would be shared_buffers (or temp_buffers for a temp index) but I think that is probably too conservative, because in most scenarios the kernel's disk cache is available too. Plus you can't tweak that setting without a postmaster restart. I'm tempted to use effective_cache_size, which attempts to measure an appropriate number and can be set locally within the session doing the CREATE INDEX if necessary. Or we could invent a new GUC parameter, but that is probably overkill. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers