Re: [HACKERS] GIN index build speed

Teodor Sigaev Tue, 02 Dec 2008 04:38:53 -0800

The issue is that the GIN index build code accumulates the lexemes intoa binary tree, but there's nothing to keep the tree balanced. My testcase with almost monotonically increasing keys, happens to be aworst-case scenario, and the tree degenerates into almost linked listthat every insertion has to grovel through.

Agree, but in most cases it works well. Because lexemes in documents aren't 
ordered.

The obvious fix is to use a balanced tree algorithm. I wrote a quickpatch to turn the tree into a splay tree. That fixed the degenerativebehavior, and the runtime of CREATE INDEX for the above test case fellfrom 40s to 1.5s.

BTW, your patch helps to GIN's btree emulation. With typical scenarios of usageof btree emulation scalar column will be more or less ordered.

Magnus kindly gave me a dump of the full-text-search tables fromsearch.postgresql.org, for some real-world testing. Quick testing withthat suggests that the patch unfortunately makes the index build 5-10%slower with that data set.

Do you see ways to  improve that?

We're in commitfest, not supposed to be submitting new features, so I'mnot going to pursue this further right now. Patch attached, however,which seems to work fine.

Personally, I don't  object to improve that.
--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN index build speed

Reply via email to