Re: [PATCHES] Load Distributed Checkpoints, take 3

Heikki Linnakangas Fri, 22 Jun 2007 14:24:50 -0700

Greg Smith wrote:

True, you'd have to replay 1.5 checkpoint intervals on average insteadof 0.5 (more or less, assuming checkpoints had been short). I don'tthink we're in the business of optimizing crash recovery time though.
If you're not, I think you should be. Keeping that replay interval timedown was one of the reasons why the people I was working with weredispleased with the implications of the very spread out style of someLDC tunings. They were already unhappy with the implied recovery timeof how high they had to set checkpoint_settings for good performance,and making it that much bigger aggrevates the issue. Given a knob wherethe LDC can be spread out a bit but not across the entire interval, thatmakes it easier to control how much expansion there is relative to thecurrent behavior.

I agree on that one: we *should* optimize crash recovery time. It maynot be the most important thing on earth, but it's a significantconsideration for some systems.

However, I think shortening the checkpoint interval is a perfectly validsolution to that. It does lead to more full page writes, but in 8.3 morefull page writes can actually make the recovery go faster, not slower,because with we no longer read in the previous contents of the page whenwe restore it from a full page image. In any case, while peoplesometimes complain that we have a large WAL footprint, it's not usuallya problem.

This is off-topic, but at PGCon in May, Itagaki-san and his colleagueswhose names I can't remember, pointed out to me very clearly that ourrecovery is *slow*. So slow, that in the benchmarks they were running,their warm stand-by slave couldn't keep up with the master generatingthe WAL, even though both are running on the same kind of hardware.

The reason is simple: There can be tens of backends doing I/O andgenerating WAL, but in recovery we serialize them. If you have decentI/O hardware that could handle for example 10 concurrent random I/Os, atrecovery we'll be issuing them one at a time. That's a scalabilityissue, and doesn't show up on a laptop or a small server with a single disk.

That's one of the first things I'm planning to tackle when the 8.4 devcycle opens. And I'm planning to look at recovery times in general; I'venever even measured it before so who knows what comes up.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [PATCHES] Load Distributed Checkpoints, take 3

Reply via email to