On 05/27/2014 02:42 PM, Greg Stark wrote:
On Tue, May 27, 2014 at 10:07 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
On 05/26/2014 02:26 PM, Greg Stark wrote:
Another idea would be to have separate checkpoints for each buffer
partition. You would have to start recovery from the oldest checkpoint of
any of the partitions.
Yeah. Simon suggested that when we talked about this, but I didn't understand
how that works at the time. I think I do now. The key to making it work is
distinguishing, when starting recovery from the latest checkpoint, whether a
record for a given page can be replayed safely. I used flags on WAL records in
my proposal to achieve this, but using buffer partitions is simpler.
Interesting. I just thought of it independently.
Incidentally you wouldn't actually want to use the buffer partitions
per se since the new server might start up with a different number of
partitions. You would want an algorithm for partitioning the block
space that xlog replay can reliably reproduce regardless of the size
of the buffer lock partition table. It might make sense to set it up
so it coincidentally ensures all the buffers being flushed are in the
same partition or maybe the reverse would be better. Probably it
doesn't actually matter.
Since you will be flushing the buffers one "redo partition" at a time,
you would want to allow the OS to do merge the writes within a partition
as much as possible. So my even-odd split would in fact be pretty bad.
Some sort of striping, e.g. mapping each contiguous 1 MB chunk to the
same partition, would be better.
I'm assuming you would keep N checkpoint positions in the control
file. That also means we can double the checkpoint timeout with only a
marginal increase in the worst case recovery time. Since the worst
case will be (1 + 1/n)*timeout's worth of wal to replay rather than
2*n. The amount of time for recovery would be much more predictable.
Good point.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers