On 05/26/2014 02:26 PM, Greg Stark wrote:
On Mon, May 26, 2014 at 1:22 PM, Heikki Linnakangas <hlinnakan...@vmware.com
wrote:
The second record is generated before the checkpoint is finished and the
checkpoint record is written. So it will be there.
(if you crash before the checkpoint is finished, the in-progress
checkpoint is no good for recovery anyway, and won't be used)
Another idea would be to have separate checkpoints for each buffer
partition. You would have to start recovery from the oldest checkpoint of
any of the partitions.
Yeah. Simon suggested that when we talked about this, but I didn't
understand how that works at the time. I think I do now. The key to
making it work is distinguishing, when starting recovery from the latest
checkpoint, whether a record for a given page can be replayed safely. I
used flags on WAL records in my proposal to achieve this, but using
buffer partitions is simpler.
For simplicity, let's imagine that we have two Redo-pointers for each
checkpoint record: one for even-numbered pages, and another for
odd-numbered pages. When checkpoint begins, we first update the
Even-redo pointer to the current WAL insert location, and then flush all
the even-numbered buffers in the buffer cache. Then we do the same for Odd.
Recovery begins at the Even-redo pointer. Replay works as normal, but
until you reach the Odd-pointer, you refrain from replaying any changes
to Odd-numbered pages. After reaching the odd-pointer, you replay
everything as normal.
Hmm, that seems actually doable...
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers