On Tue, 2009-06-02 at 14:54 -0400, Tom Lane wrote: > Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes: > > Tom Lane wrote: > >> That's a good point; don't we recover files under names like > >> RECOVERYXLOG, not under names that could possibly conflict with regular > >> WAL files? > > > Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar > > at the end of recovery, in exitArchiveRecovery(). > > > Thinking about this some more, I think we should've changed > > exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more > > robust if exitArchiveRecovery() always copied the last WAL file rather > > than just renamed it. It doesn't seem safe to rely on the file the > > symlink points to to be valid after recovery is finished, and we might > > write to it before it's recycled, so the current fix isn't complete. > > Hmm. I think really the reason it's coded that way is that we assumed > the recovery command would be physically copying the file from someplace > else. pg_standby is violating the backend's expectations by using a > symlink. And I really doubt that the technique is saving anything, since > the data has to be read in from the archive location anyway. > > I'm leaning back to the position that pg_standby's -l option is simply a > bad idea and should be removed.
ISTM we didn't clearly state what the recovery_command should do either way. Even if you remove the pg_standby option that will not fix the problem for people who have written their own script, or existing users of pg_standby. The safe way is to do as Heikki suggests and copy the final file into place and I would add that we must then fsync it also. That should be back-patched possibly as far as 8.0. Documenting a change is not nearly enough. Removing -l is a separate discussion. If there was a consensus against it, I would suggest that we deprecate the option, so that it does nothing. As an aside, I would be also much more comfortable if there was an option to not recycle the WAL files at all, as a safe mode for error checking at least. The question is: why do we need to zero fill the file first anyway? We could save the 8 bytes for the prev pointer on every record if we added enough zeroes onto every WAL write to zap any pre-existing header data, causing recovery to fail. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers