On 04/03/2014 06:37 PM, Tom Lane wrote:
Also, IIRC there are places that WAL-log full pages that aren't in a
shared buffer at all (btree build does this I think).  How will that fit
into this model?

Hmm. We could provide a function for registering a block with given content, without a Buffer. Something like:

XLogRegisterPage(int id, RelFileNode, BlockNumber, Page)

Let's simplify that, and have one new function, XLogOpenBuffer, which
returns a return code that indicates which of the four cases we're
dealing with. A typical redo function looks like this:

        if (XLogOpenBuffer(0, &buffer) == BLK_REPLAY)
        {
                /* Modify the page */
                ...

                PageSetLSN(page, lsn);
                MarkBufferDirty(buffer);
        }
        if (BufferIsValid(buffer))
                UnlockReleaseBuffer(buffer);

The '0' in the XLogOpenBuffer call is the ID of the block reference
specified in the XLogRegisterBuffer call, when the WAL record was created.

+1, but one important step here is finding the data to be replayed.
That is, a large part of the complexity of replay routines has to do
with figuring out which parts of the WAL record were elided due to
full-page-images, and locating the remaining parts.  What can we do
to make that simpler?

We can certainly add more structure to the WAL records, but any extra information you add will make the records larger. It might be worth it, and would be lost in the noise for more complex records like page splits, but we should keep frequently-used records like heap insertions as lean as possible.

Ideally, if XLogOpenBuffer (bad name BTW) returns BLK_REPLAY, it would
also calculate and hand back the address/size of the logged data that
had been pointed to by the associated XLogRecData chain item.  The
trouble here is that there might've been multiple XLogRecData items
pointing to the same buffer.  Perhaps the magic ID number you give to
XLogOpenBuffer should be thought of as identifying an XLogRecData chain
item, not so much a buffer?  It's fairly easy to see what to do when
there's just one chain item per buffer, but I'm not sure what to do
if there's more than one.

Hmm. You could register a separate XLogRecData chain for each buffer. Along the lines of:

rdata[0].data = data for buffer
rdata[0].len = ...
rdata[0].next = &rdata[1];
rdata[1].data = more data for same buffer
rdata[1].len = ...
rdata[2].next = NULL;

XLogRegisterBuffer(0, buffer, &data[0]);

At replay:

if (XLogOpenBuffer(0, &buffer, &xldata, &len) == BLK_REPLAY)
{
        /* xldata points to the data registered for this buffer */
}

Plus one more chain for the data not associated with a buffer.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to