Re: [HACKERS] Bulk Inserts

Pierre Frédéric Caillau d Tue, 15 Sep 2009 00:27:06 -0700

Does that heuristic change the timings much? If not, it seems like itwouldbetter to keep it simple and always do the same thing, like log thetuples
(if it is done under one WALInsertLock, which I am assuming it is..)


It is the logging of whole pages that makes it faster.

If you fill a page with tuples in one operation (while holding exclusivelock) and then insert WAL records for each tuple, there is no speed gain.

Inserting a full page WAL record (since you just filled the pagecompletely) :


- only takes WalInsertLock once instead of once per tuple
- reduces wal traffic
- is about 2x faster in my benchmark

And inserting a "clear new page" record (if the page was previouslynew/empty and relation is fsync'd at the end) :


- only takes WalInsertLock once instead of once per tuple
- reduces wal traffic a lot
- is about 4x faster in my benchmark

Do you even need the new empty page record?  I think a zero page will be
handled correctly next time it is read into shared buffers, won't it?


I have no idea ;)

But I
guess it is need to avoid  problems with partial page writes that would
leave in a state that is neither all zeros nor consistent.

Plus, empty page records make for very small WAL traffic and I didn't seeany performance difference with or without them.

If the entire page is logged, would it have to marked as not removable by
the log compression tool?  Or can the tool recreate the needed delta?

No, the tool cannot recreate the data, since the idea is precisely toreplace a lot of "tuple insert" messages with one "entire page" message,which takes both less space and less time. The warm-standby replicatorsthat get this WAL need to know the page contents to replicate it... (also,it will probably be faster for them to redo a page write than redo all thetuple inserts).


Here is what I'm thinking about now :

* have some kind of BulkInsertState which contains
- info about the relation, indexes, triggers, etc
- a tuple queue.

The tuple queue may be a tuple store, or simply tuple copies in a localmemory context.


You'd have functions to :

- Setup the BulkInsertState
- Add a tuple to the BulkInsertState
- Finish the operation and clear the BulkInsertState

When adding a tuple, it is stored in the queue.

When the queue is full, a bulk insert operation takes place, hopefully wecan fill an entire page, and return.

Post insert triggers and index updates are also handled at this point.

When finished, the function that clears the state also inserts allremaining tuples in the queue.

With this you could also do something *really* interesting : bulk indexupdates...

Bulk index updates are probably mutually exclusive with after-row triggersthough.


Another angle of attack would be to make wal-writing more efficient...






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bulk Inserts

Reply via email to