Re: [GENERAL] incremental backups

Rick Gigger Wed, 15 Feb 2006 03:10:09 -0800

I've started writing some scripts to set up incremental backup to mytaste. I just discovered something and thought I would revisit thisthread briefly.

When you go to restore from a give base file system backup you needto know the start WAL file that you need and the end WAL file thatyou need. (You will most likely have many files beyond the "stop"file but you must have at least up to the "stop" file for the restoreto work.

Now if you try to restore but you don't have the "stop" WAL filepostges will die on recovery and tell you that it can't recoverforward far enough to make the backup consistent. But I wanted toknow the easiest way to verify if you indeed had the necessary fileswithout having to actually do a restore and have postgres tell you ifit succeeded or not.

Perhaps no one understood me because the answer I was looking for wastoo obvious. But what I really wanted to know was how do you knowwhat the "stop" file is. It informs you of the start file all overthe place when doing the base backups but I thought I would have todo something clever to figure out the stop file on my own. Butluckily I don't. The backup history file has too lines like this:


START WAL LOCATION: 0/88F21D0C (file 000000010000000000000088)
STOP WAL LOCATION: 0/88F21D50 (file 000000010000000000000088)

It was clear to me from the docs how to figure out what the startfile is but the end file was a mystery until I actually created abackup history file and looked in it. The only place I can find inthe Online Backup instructions where this is indicated is thisparagraph:

"To make use of this backup, you will need to keep around all the WALsegment files generated during and after the file system backup. Toaid you in doing this, the pg_stop_backup function creates a backuphistory file that is immediately stored into the WAL archive area.This file is named after the first WAL segment file that you need tohave to make use of the backup. For example, if the starting WAL fileis 0000000100001234000055CD the backup history file will be namedsomething like 0000000100001234000055CD.007C9330.backup. (The secondnumber in the file name stands for an exact position within the WALfile, and can ordinarily be ignored.) Once you have safely archivedthe file system backup and the WAL segment files used during thebackup (as specified in the backup history file), all archived WALsegments with names numerically less are no longer needed to recoverthe file system backup and may be deleted. However, you shouldconsider keeping several backup sets to be absolutely certain thatyou can recover your data. Keep in mind that only completed WALsegment files are archived, so there will be delay between runningpg_stop_backup and the archiving of all WAL segment files needed tomake the file system backup consistent."

Reading it now it seems obvious that the file would contain not onlythe start WAL file but also the Stop WAL file but when going over thedirections the first time it did not pick up on it. And it left methinking I would have to use some hack to figure it out if I everwanted to test a base backup. It would have been less confusing tome if it just said right in the docs: "The backup history filecontains both the start WAL file name and the Stop WAL file name" orsomething like that just to make it perfectly clear.

Now that I know this I can extract that filename from the backuphistory file, check to see if it has been archived and copy itsomewhere if it hasn't been archived yet. I'm pretty sure that I canassume that all files before the stop file have already beenarchived. So once I backup the stop file I can be positive that thebase backup I just made will be valid when I try to restore from it.

This lessens my need for the "get current WAL file" functionality inthis context. It will still be nice to have in the context ofbacking it up every five minutes or so in case a WAL file takes along time to fill up.

Anyway I would have been less confused if the docs had made it moreclear that the name of the stop wal file was in the backup history file.


Rick


On Jan 30, 2006, at 10:20 PM, Bruce Momjian wrote:


Yes, I think copying it while it is being written is safe.

---------------------------------------------------------------------------


Rick Gigger wrote:

Yes!  Thanks you!  That is exactly what I was looking for.

So I take it that this means that it is save to copy the current in
use WAL file even as it is being written to?
And it also means that if I copy it with my physical file system
backup then I should have the last file that I need to restore from
that physical backup?

So if I write my own backup_latest_WAL_file.sh script (I think I
found one on the list from Simon Riggs) then I can do what I need to
do before those todo items get done?  Or will I need to wait till
postgres gives me the ability to safely copy the file?



On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:


Unfortunately, I think I understand your question.  :-)

These TODO items are what you need:

        * Point-In-Time Recovery (PITR)

          o Allow point-in-time recovery to archive partially filled
            write-ahead logs [pitr]

            Currently only full WAL files are archived. This means
that the
            most recent transactions aren't available for recovery
in case
            of a disk failure. This could be triggered by a user
command or
            a timer.

          o Automatically force archiving of partially-filled WAL
files when
            pg_stop_backup() is called or the server is stopped

            Doing this will allow administrators to know more
easily when
            the archive contains all the files needed for point-in-
time
            recovery.

I will try to push to have them done for 8.2.

----------------------------------------------------------------------

-----

Rick Gigger wrote:

I guess my email wasn't all that clear.  I will try to rephrase.  I
am moving from using the old style pg_dump for backups to using

incrementals and want to make sure I understand the processbefore I

go about writing a bunch of scritps.

To me setting up incremental backup consists of the following
components:

1) Setting up the WAL archiving.  This one is trivial.
2) Doing physical dumps of the $PGDATA directory.  This one is once
again trivial.
3) Knowing which physical dumps are Good and Not Good.  For a given
physical dump D there is are WAL archive files Dstart and Dend for

which you much have Dstart and Dend and all files in between.If you

have all those files then the physical dump is Good.  If you don't
have them then the dump is worthless to you.

4) Knowing which dumps and which archive files can be deleted.This

depends on a number of factors.
        a) How far back do you want to be able to do PITR
        b) How much space do you have / want to use for PITR
        c) Which physical dumps are Good and which are Not Good. (see #3)

Now I think I have a pretty good plan here except for #3 (and so #4
then also suffers).

Just as an example lets say I'm not concerned so much with PITRas I

am recovering from a db crash. I've got all the backups files saved
to my backup db server so I can failover to it if my primary db
server dies.  I just want to make sure I've got one physical dump

that is good. (This is not my actual situation but itillustrated my

point better.)

Now when I do a physical dump it is not a Good dump. That is Idon't

have the end archive file necessary to recover from that physical
dump.  That is to say that  when I call pg_backup_start() then copy
$PGDATA then call pg_backup_stop() postgres might be on say WAL
archive file #5.  Once the physical dump is completed WAL archive
file #5 hasn't been archived yet.  I only have up to #4.  So if I

delete my old physical dumps and all I've got is this mostrecent one

and my database crashes before #5 gets archived then I am hosed.  I
have no good physical backups to start from.

My main question is about the best way to figure out when aphysical

dump is Good.

One strategy is to always keep around lots of physical dumps.If you

keep around 100 dumps you can be pretty sure that in the space of

time that those physical dumps take place that at least one WALfilewas archived. In fact if you keep 2 physical dumps you can befairlycertain of this. If not then you really need to space our yourdumps

more.

Is this making sense at this point?

The problem is that the WAL archiving is triggered by postgres and
the rate at which the db is updated.  The physical dumps are

triggered by cron and on a purely time based schedule. So intheory

if you had the physical dumps happening once a day but for some odd
reason no one updated the database for 4 days then all of a sudden
you'd have 2 physical backups and neither of them are good.  If
you're db crashes during that time you are hosed.

Maybe I am arguing a point that is just stupid because this will

never happen in real life. But in that it is my backups systemthat

I will be using to recover from complete and total disaster I just
want to have all my bases covered.

So my ideas on how to determine if a physical dump is Good are as
follows.

1) When you do the physical backup (after dumping the $PGDATAdir but

before calling pg_stop_backup() ) determine the current WAL archive
file.  Mark somewhere in the backed up physical dump the last file
needed for the dump to be considered good.  Then your deletion
scripts can look at the WAL archive files you have and the last one

required for the dump to be Good and determine if the dump isGood or

not.

2) After doing the physical dump but before callingpg_stop_backup()

just copy the current WAL file to the physical dump.  If that file
later gets archived then the restore commands overwrites your

partially completed one so it doesn't hurt but you know thatwhen youcall pg_stop_backup() that that physical dump is good. (Is itok to

copy the current WAL file while it is still in use?)

Is anyone taking one of these or any other precautions to make sure
they've got a good physical dump or does everyone just keep a whole
bunch of dumps around, and then actually restore the dump to see if
it is good and if not go back to a previous dump?

I hope that makes more sense.

Thanks,

Rick

On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:

Rick Gigger wrote:

Um, no you didn't read my email at all. I am aware of all ofthat

and it is clearly outlined in the docs.  My email was about a
specific detail in the process.  Please read it if you want to
know what my actual question was.


I'm not sure your email is quite right as regards the process. You
need:
  1. the filesystem backup
  2. the WAL file indicated in the history-file
  3. all the WAL files later than that
to get up to "now".

If you don't want to replay up to "now" then you will not needsome

of the more recent WAL files. You can't afford to throw them away
though since you've got a rolling backup system running and the
whole point is so you can recover to any point you like.

You can however throw away any WAL files older than that indicated
in the history file for your current filesystem-backup. You can
then only restore from that point in time forward.

There is no "last one" in the WAL set unless you know the time you
want to restore to. Indeed, the "last one" might not be "full" yet
and therefore archived if you want to restore to 10 seconds ago.

Or am I mis-understanding your email too?

--
  Richard Huxton
  Archonet Ltd



---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  [email protected]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square,
Pennsylvania 19073

---------------------------(end ofbroadcast)---------------------------

TIP 4: Have you searched our list archives?

               http://archives.postgresql.org


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  [email protected]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road

+ Christ can be your backup. | Newtown Square,Pennsylvania 19073

---------------------------(end ofbroadcast)---------------------------

TIP 4: Have you searched our list archives?

               http://archives.postgresql.org



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] incremental backups

Reply via email to