On 2017-01-19 20:45:57 -0500, Stephen Frost wrote: > * Andres Freund (and...@anarazel.de) wrote: > > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote: > > > WAL replay does do more work, generally speaking (the WAL has to be > > > read, the checksum validated on it, and then the write has to go out, > > > while the checkpointer just writes the page out from memory), but it's > > > also dealing with less contention on the system (there aren't a bunch of > > > backends hammering the disks to pull data in with reads when you're > > > doing crash recovery...). > > > > There's a huge difference though: WAL replay is single threaded, whereas > > generating WAL is not. > > I'm aware- but *checkpointing* is still single-threaded, unless, as I > mentioned, you end up with backends pushing out their own changes to the > heap to make room for new pages to come in.
Sure, but buffer checkpointing isn't necessarily that large a portion of the work done in one checkpoint cycle, in comparison to all the WAL being generated. Quite commonly a lot of the buffers will already have been flushed to disk by backend and/or bgwriter, and are clean by the time checkpointer gets to them. So I don't think checkpointer being single threaded necessarily means much WRT replay performance. > > Especially if there's synchronous IO required > > (most commonly reading in data, because more data was modified in the > > current checkpointthan fit in shared buffers, so FPIs don't pre-fill > > buffers), you can be significantly slower than generating the WAL. > > That is an interesting point, if I'm following what you're saying > correctly- during the replay we can end up having more pages modified > than fit in shared buffers, which means that we have to read back in > pages that we pushed out to implement the non-FPI WAL changes to that > page. Right. (And not just during replay obviously, also during the intial WAL generation). > I wonder if we should have a way to configure the amount of memory > allowed to be used for WAL replay, independent of shared_buffers? I don't quite see how that'd work, especially with HS. We just use the normal shared buffers code etc, and there we can't just resize the amount of shared_buffers allocated after doing crash recovery. > That said, I wonder if our eviction algorithm could be > improved/changed when performing WAL replay too to reduce the chances > that we'll have to read a page back in. I don't think that's a that promising angle of attach. Having a separate pre-fetching backend that parses the WAL and pre-reads everything necessary seems more promising. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers