Re: [HACKERS] [GENERAL] Slow PITR restore

Josh Berkus Thu, 13 Dec 2007 16:08:35 -0800

Tom,

> [ shrug... ]  This is not consistent with my experience.  I can't help
> suspecting misconfiguration; perhaps shared_buffers much smaller on the
> backup, for example.


You're only going to see it on SMP systems which have a high degree of CPU 
utilization.  That is, when you have 16 cores processing flat-out, then 
the *single* core which will replay that log could certainly have trouble 
keeping up.  And this wouldn't be an issue which would show up testing on 
a dual-core system.

I don't have extensive testing data on that myself (I depended on Koichi's 
as well) but I do have another real-world case where our slow recovery 
time is a serious problem: clustered filesystem failover configurations, 
e.g. RHCFS, OpenHACluster, Veritas.  For those configuratons, when one 
node fails PostgreSQL is started on a 2nd node against the same data ... 
and goes through recovery.  On very high-volume systems, the recovery can 
be quite slow, up to 15 minutes, which is a long time for a web site to be 
down.

I completely agree that we don't want to risk the reliability of recovery 
in attempts to speed it up, though, so maybe this isn't something we can 
do right now.  But I don't agree that it's not an issue for users.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] Slow PITR restore

Reply via email to