Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

Simon Riggs Thu, 11 Feb 2010 02:41:22 -0800

On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote:
> Fujii Masao wrote:
> > As I pointed out previously, the standby might restore a partially-filled
> > WAL file that is being archived by the primary, and cause a FATAL error.
> > And this happened in my box when I was testing the SR.
> > 
> >   sby [20088] FATAL:  archive file "000000010000000000000087" has
> > wrong size: 14139392 instead of 16777216
> >   sby [20076] LOG:  startup process (PID 20088) exited with exit code 1
> >   sby [20076] LOG:  terminating any other active server processes
> >   act [18164] LOG:  received immediate shutdown request
> > 
> > If the startup process is in standby mode, I think that it should retry
> > starting replication instead of emitting an error when it finds a
> > partially-filled file in the archive. Then if the replication has been
> > terminated, it has only to restore the archived file again. Thought?
> 
> Hmm, so after running restore_command, check the file size and if it's
> too short, treat it the same as if restore_command returned non-zero?
> And it will be retried on the next iteration. Works for me, though OTOH
> it will then fail to complain about a genuinely WAL file that's
> truncated for some reason. I guess there's no way around that, even if
> you have a script as restore_command that does the file size check, it
> will have the same problem.


Are we trying to re-invent pg_standby here?

-- 
 Simon Riggs           www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

Reply via email to