Does that heuristic change the timings much? If not, it seems like it would better to keep it simple and always do the same thing, like log the tuples
(if it is done under one WALInsertLock, which I am assuming it is..)

It is the logging of whole pages that makes it faster.
If you fill a page with tuples in one operation (while holding exclusive lock) and then insert WAL records for each tuple, there is no speed gain.

Inserting a full page WAL record (since you just filled the page completely) :

- only takes WalInsertLock once instead of once per tuple
- reduces wal traffic
- is about 2x faster in my benchmark

And inserting a "clear new page" record (if the page was previously new/empty and relation is fsync'd at the end) :

- only takes WalInsertLock once instead of once per tuple
- reduces wal traffic a lot
- is about 4x faster in my benchmark

Do you even need the new empty page record?  I think a zero page will be
handled correctly next time it is read into shared buffers, won't it?

I have no idea ;)

But I
guess it is need to avoid  problems with partial page writes that would
leave in a state that is neither all zeros nor consistent.

Plus, empty page records make for very small WAL traffic and I didn't see any performance difference with or without them.

If the entire page is logged, would it have to marked as not removable by
the log compression tool?  Or can the tool recreate the needed delta?

No, the tool cannot recreate the data, since the idea is precisely to replace a lot of "tuple insert" messages with one "entire page" message, which takes both less space and less time. The warm-standby replicators that get this WAL need to know the page contents to replicate it... (also, it will probably be faster for them to redo a page write than redo all the tuple inserts).

Here is what I'm thinking about now :

* have some kind of BulkInsertState which contains
- info about the relation, indexes, triggers, etc
- a tuple queue.

The tuple queue may be a tuple store, or simply tuple copies in a local memory context.

You'd have functions to :

- Setup the BulkInsertState
- Add a tuple to the BulkInsertState
- Finish the operation and clear the BulkInsertState

When adding a tuple, it is stored in the queue.
When the queue is full, a bulk insert operation takes place, hopefully we can fill an entire page, and return.
Post insert triggers and index updates are also handled at this point.

When finished, the function that clears the state also inserts all remaining tuples in the queue.

With this you could also do something *really* interesting : bulk index updates...

Bulk index updates are probably mutually exclusive with after-row triggers though.

Another angle of attack would be to make wal-writing more efficient...






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to