On 26.04.2013 12:16, Simon Riggs wrote:
On 18 April 2013 19:11, Heikki Linnakangas<hlinnakan...@vmware.com>  wrote:

I just found out that if you use continuous archiving and online backups,
it's surprisingly difficult to restore a backup, without replaying any more
WAL than necessary.

I didn't add it myself because I don't see the need, if we think more carefully.

Why would you want your recovery end time to be governed solely by the
time that the *backup* ended? How can that have any bearing on what
you want at recovery time? If you have access to more WAL data, why
would you not apply them as well - unless you have some specific
reason not to - i.e. an incorrect xid or known problem time?

If you're storing only a few of the WAL files with the backup then it
will end naturally without assistance when the last file runs out.
What is the difference between stopping at an exact point in WAL half
way through a file and ending at the end of the file? If the end point
is arbitrary, why the need to specify it so closely?

I can't see a time when I have access to more WAL files *and* I want
to stop early at some imprecise point. But you could write a
restore_command script that stopped after a specific file forcing
recovery to end.

Well, I ran into this with VMware's Data Director, which manages backups among other things. In a typical setup, you have a WAL archive, and every now and then (daily, typically) a full backup is taken. Full backups are retained for some time, like a few weeks or months. The user can also manually request a full backup to be taken at any time.

There is an option to perform PITR. The system figures out the latest full backup that precedes the chosen point-in-time, sets recovery_target_time, and starts up Postgres. But there is also an operation to simply "restore a backup". The idea of that is to, well, restore to the chosen backup, and nothing more. In most cases, it probably wouldn't hurt if a one or two extra WAL files are replayed beyond the backup end time, but you certainly don't want to replay all the history. Yes, you could set recovery_target_time to the point where the backup ended, but that's complicated. You'd have to read the end-of-backup timestamp from the backup history file. And because timestamps are always a bit fuzzy, I think you'd have to add at least a few seconds to that to be sure.

To illustrate why it would be bad to replay more WAL than necessary, imagine that the user is about to perform some dangerous action he might want to undo later. For example, he's about to purge old data that isn't needed anymore, so with "DELETE FROM data WHERE year <= '2010'". The first thing he does is to take a backup with label "before-purging-2010". Immediately after the backup has finished, he performs the deletion. Now, the application stops working because it actually still needs the data, so he restores from the backup. If recovery decides to replay a few more WAL files after the end-of-backup, that could include the deletion, and that's no good.

One solution is to create restore point after the backup ends. Then you have a clearly defined point in time you can restore to. But it would be convenient to not have to do that. Or another way to think of this is that it would be convenient if there was an implicit restore point at the end of each backup.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to