[HACKERS] Hash index build patch has worse performance at small table sizes

Tom Lane Sun, 16 Mar 2008 13:27:24 -0700

I've been reviewing the hash index build patch submitted here:
http://archives.postgresql.org/pgsql-patches/2007-10/msg00154.php


Although it definitely helps on large indexes, it's actually
counterproductive on not-so-large ones.  The test case I'm using
is random integers generated like this:
        create table foo as select (random() * N)::int as f1
          from generate_series(1,N);
        select count(*) from foo; -- force hint bit updates
        checkpoint;
then timing
        create index fooi on foo using hash(f1);

Using all-default configuration settings on some not-very-new hardware,
at N = 1E6 I see

8.3.1:                                  30 sec
With pre-expansion of index (CVS HEAD): 24 sec
With sorting:                           72 sec
To build a btree index on same data:    34 sec

Now this isn't amazingly surprising, because the original argument for
doing sorting was to improve locality of access to the index during
the build, and that only matters if you've got an index significantly
bigger than memory.  If the index fits in RAM then the sort is pure
overhead.

The obvious response to this is to use the sorting approach only when
the estimated index size exceeds some threshold.  One possible choice of
threshold would be shared_buffers (or temp_buffers for a temp index)
but I think that is probably too conservative, because in most scenarios
the kernel's disk cache is available too.  Plus you can't tweak that
setting without a postmaster restart.  I'm tempted to use
effective_cache_size, which attempts to measure an appropriate number
and can be set locally within the session doing the CREATE INDEX if
necessary.  Or we could invent a new GUC parameter, but that is probably
overkill.

Comments?

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hash index build patch has *worse* performance at small table sizes

Reply via email to

[HACKERS] Hash index build patch has worse performance at small table sizes