On Aug 23, 2011, at 2:03 AM, Heikki Linnakangas wrote: > While looking at Alexander's GiST fastbuild patch, which adds buffers to > internal nodes to avoid random I/O during index build, it occurred to me that > inserting the tuples to the leaf pages one at a time is quite inefficient > too, even if the leaf pages are in cache. There's still the overhead of > locking and WAL-logging each insertion separately. I think we could get a > nice further speedup if we attach a small buffer (one block or so) to every > leaf page we're currently writing tuples to, and update the leaf page in > bulk. Conveniently, the code to insert multiple tuples to a page already > exists in GiST code (because inserting a tuple sometimes splits the page into > more than two parts, so you need to insert multiple downlinks to the parent), > so this requires no changes to the low-level routines and WAL-logging. > > Let's finish off the main fastbuild patch first, but I wanted to get the idea > out there.
I've often wondered about the per-tuple overhead of all kinds of operations, not just GiST index builds. For example, if you're doing a seqscan, ISTM it would be a lot more efficient to memcpy an entire page into backend-local memory and operate off of that lock-free. Similarly for an index scan, you'd want to copy a full leaf page if you think you'll be hitting it more than once or twice. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers