On Tue, 2009-06-02 at 14:54 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes:
> > Tom Lane wrote:
> >> That's a good point; don't we recover files under names like
> >> RECOVERYXLOG, not under names that could possibly conflict with regular
> >> WAL files?
> 
> > Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar 
> > at the end of recovery, in exitArchiveRecovery().
> 
> > Thinking about this some more, I think we should've changed 
> > exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more 
> > robust if exitArchiveRecovery() always copied the last WAL file rather 
> > than just renamed it. It doesn't seem safe to rely on the file the 
> > symlink points to to be valid after recovery is finished, and we might 
> > write to it before it's recycled, so the current fix isn't complete.
> 
> Hmm.  I think really the reason it's coded that way is that we assumed
> the recovery command would be physically copying the file from someplace
> else.  pg_standby is violating the backend's expectations by using a
> symlink.  And I really doubt that the technique is saving anything, since
> the data has to be read in from the archive location anyway.
> 
> I'm leaning back to the position that pg_standby's -l option is simply a
> bad idea and should be removed.

ISTM we didn't clearly state what the recovery_command should do either
way. Even if you remove the pg_standby option that will not fix the
problem for people who have written their own script, or existing users
of pg_standby. The safe way is to do as Heikki suggests and copy the
final file into place and I would add that we must then fsync it also.
That should be back-patched possibly as far as 8.0. Documenting a change
is not nearly enough.

Removing -l is a separate discussion. If there was a consensus against
it, I would suggest that we deprecate the option, so that it does
nothing.

As an aside, I would be also much more comfortable if there was an
option to not recycle the WAL files at all, as a safe mode for error
checking at least. The question is: why do we need to zero fill the file
first anyway? We could save the 8 bytes for the prev pointer on every
record if we added enough zeroes onto every WAL write to zap any
pre-existing header data, causing recovery to fail.

-- 
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to