Re: [GENERAL] Incremental Backups in postgres

2009-11-11 Thread akp geek
Hi All -
                  I have read the document got a reasonable
understanding of the WAL process. I have some confusion regarding the
process.

1. I have set up the archiving process. Now the archive file are going
to a different mount point.
2. I set up job to create a back up of the PGDATA directory

Are the 2 above steps enough for recovery. My confusion is why we need
to have  Making a Base Backup. When you have time can you please
clarify?


Regards

On Tue, Nov 10, 2009 at 4:56 PM, Alan Hodgson ahodg...@simkin.ca wrote:

 On Tuesday 10 November 2009, akp geek akpg...@gmail.com wrote:
  So Is it always good to have the backup using PG_dump instead of PITR or
  a combination of both
 

 I like to do both. Ongoing PITR, daily base backups (by updating an rsync
 copy), and weekly pg_dumps that in turn go to tape.

 PITR gives a very recent restore point in the event of server loss. As
 previously mentioned, the full (custom) backups let you restore individual
 tables. They're also a lot smaller than base backups + WAL logs.

 --
 No animals were harmed in the recording of this episode. We tried but that
 damn monkey was just too fast.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-11 Thread Scott Mead
On Wed, Nov 11, 2009 at 12:51 PM, akp geek akpg...@gmail.com wrote:

 Hi All -
   I have read the document got a reasonable
 understanding of the WAL process. I have some confusion regarding the
 process.

 1. I have set up the archiving process. Now the archive file are going
 to a different mount point.
 2. I set up job to create a back up of the PGDATA directory


Before you backup PGDATA, you either need to stop the database or use
pg_start_backup('label');

After your backup, you would need to run: pg_stop_backup();





 Are the 2 above steps enough for recovery. My confusion is why we need
 to have  Making a Base Backup. When you have time can you please
 clarify?


  Remember, an 'incremental' backup is an increment to an original full
backup.  That's why you need to make a backup of the PGDATA.

--Scott


Re: [GENERAL] Incremental Backups in postgres

2009-11-11 Thread akp geek
Got it almost. Thanks a lot. One final question, please bear with me.

1.  select pg_start_backup('label')  == 10 AM
2.  PGDATA folder backup == 10:05 AM
3. select pg_stop_backup = 10.10AM
4. The archiving will start writing files
5. If the disc crashes at 11AM,  what will happen to the data between
10:10AM and 11:00AM, since we issued a pg_stop_backup at 10:10AM

Appreciate your help

Regards


On Wed, Nov 11, 2009 at 2:58 PM, Scott Mead
scott.li...@enterprisedb.com wrote:

 On Wed, Nov 11, 2009 at 12:51 PM, akp geek akpg...@gmail.com wrote:

 Hi All -
                   I have read the document got a reasonable
 understanding of the WAL process. I have some confusion regarding the
 process.

 1. I have set up the archiving process. Now the archive file are going
 to a different mount point.
 2. I set up job to create a back up of the PGDATA directory

 Before you backup PGDATA, you either need to stop the database or use
 pg_start_backup('label');
 After your backup, you would need to run: pg_stop_backup();



 Are the 2 above steps enough for recovery. My confusion is why we need
 to have  Making a Base Backup. When you have time can you please
 clarify?

   Remember, an 'incremental' backup is an increment to an original full
 backup.  That's why you need to make a backup of the PGDATA.
 --Scott


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-11 Thread Greg Smith

akp geek wrote:

Got it almost. Thanks a lot. One final question, please bear with me.

1.  select pg_start_backup('label')  == 10 AM
2.  PGDATA folder backup == 10:05 AM
3. select pg_stop_backup = 10.10AM
4. The archiving will start writing files
  
You've got step (4) in the wrong place.  The archiver will generate 
files you are compelled to save as soon as you issue pg_start_backup in 
(1).  That's how you're able to recover from a failure at any time after 
that.  When you issue pg_stop_backup, it will tell you what files it 
expects to be copied over to the slave system in order to recovery from 
a failure, and that list will go back to when you started the backup.  
Saving those is actually part of the base backup process, as documented 
in the manual if you read that section more carefully.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread Alban Hertroys

On 10 Nov 2009, at 3:48, akp geek wrote:


Dear all -

  Is there way to create incremental backups in  
postgres. I am currently using 8.4.1 on solaris.. I am new to  
postgres. Can you please share your thoughts


Regards



IMHO The simplest solution is to just write a dump to the same file  
every now and then and have the backup software take care of storing  
only the differences. It does have a few drawbacks; it means you'll  
have a file about as large as your database on your filesystem just  
for making backups and there is a risk that your backup software kicks  
in before the dump has finished writing.


As others mentioned, you can also go with a PITR solution, which is  
probably prettier but is a bit harder to set up.


Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.


!DSPAM:737,4af948a011071608518950!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread Greg Stark
On Tue, Nov 10, 2009 at 11:03 AM, Alban Hertroys
dal...@solfertje.student.utwente.nl wrote:
 IMHO The simplest solution is to just write a dump to the same file every
 now and then and have the backup software take care of storing only the
 differences. It does have a few drawbacks; it means you'll have a file about
 as large as your database on your filesystem just for making backups and
 there is a risk that your backup software kicks in before the dump has
 finished writing.

 As others mentioned, you can also go with a PITR solution, which is probably
 prettier but is a bit harder to set up.

It's always worth having the dump, even if you also implement PITR.
The dump allows you to restore just specific tables or to restore onto
a different type of system. The PITR backup is a physical
byte-for-byte copy which only works if you restore the whole database
and only on the same type of system.


-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread Scott Mead
On Tue, Nov 10, 2009 at 9:52 AM, Greg Stark gsst...@mit.edu wrote:


 It's always worth having the dump, even if you also implement PITR.
 The dump allows you to restore just specific tables or to restore onto
 a different type of system. The PITR backup is a physical
 byte-for-byte copy which only works if you restore the whole database
 and only on the same type of system.


Good point here, you really should have a 'logical' copy of your
database around in case there is some kind of physical corruption in
addition to Greg's good points.

--Scott


Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread akp geek
I have tested the procedure in the URL and it worked fine. I have
accidentally deleted my PGDATA folder after the backup procedure is done. I
could able to restore it. But still have few questions

Thanks for the help

Regards

On Mon, Nov 9, 2009 at 11:01 PM, Jing Tan j...@iheavy.com wrote:

 I wrote an article about PITR , incremental backups and multiple timelines.
 check out. http://jinxter555.blogspot.com/
 it should be an easy read.

 akp geek akpg...@gmail.com ha escrito:


 Dear all -

  Is there way to create incremental backups in postgres. I
 am currently using 8.4.1 on solaris. I am new to postgres. Can you please
 share your thoughts

 Regards




 
 iamastring: i am a string, a super string


 
 This message was sent using IMP, the Internet Messaging Program.




Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread akp geek
So Is it always good to have the backup using PG_dump instead of PITR or a
combination of both

Please advice

Regards

On Tue, Nov 10, 2009 at 11:24 AM, Scott Mead
scott.li...@enterprisedb.comwrote:


  On Tue, Nov 10, 2009 at 9:52 AM, Greg Stark gsst...@mit.edu wrote:


 It's always worth having the dump, even if you also implement PITR.
 The dump allows you to restore just specific tables or to restore onto
 a different type of system. The PITR backup is a physical
 byte-for-byte copy which only works if you restore the whole database
 and only on the same type of system.


 Good point here, you really should have a 'logical' copy of your
 database around in case there is some kind of physical corruption in
 addition to Greg's good points.

 --Scott



Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread Alan Hodgson
On Tuesday 10 November 2009, akp geek akpg...@gmail.com wrote:
 So Is it always good to have the backup using PG_dump instead of PITR or
 a combination of both


I like to do both. Ongoing PITR, daily base backups (by updating an rsync 
copy), and weekly pg_dumps that in turn go to tape. 

PITR gives a very recent restore point in the event of server loss. As 
previously mentioned, the full (custom) backups let you restore individual 
tables. They're also a lot smaller than base backups + WAL logs.

-- 
No animals were harmed in the recording of this episode. We tried but that 
damn monkey was just too fast.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread silly8888
How about using replication instead of incremental backups?

On Tue, Nov 10, 2009 at 4:56 PM, Alan Hodgson ahodg...@simkin.ca wrote:
 On Tuesday 10 November 2009, akp geek akpg...@gmail.com wrote:
 So Is it always good to have the backup using PG_dump instead of PITR or
 a combination of both


 I like to do both. Ongoing PITR, daily base backups (by updating an rsync
 copy), and weekly pg_dumps that in turn go to tape.

 PITR gives a very recent restore point in the event of server loss. As
 previously mentioned, the full (custom) backups let you restore individual
 tables. They're also a lot smaller than base backups + WAL logs.

 --
 No animals were harmed in the recording of this episode. We tried but that
 damn monkey was just too fast.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-10 Thread akp geek
I have set up the replication using Bucardo. This is just an additional set
up

regards

On Tue, Nov 10, 2009 at 5:09 PM, silly silly8...@gmail.com wrote:

 How about using replication instead of incremental backups?

 On Tue, Nov 10, 2009 at 4:56 PM, Alan Hodgson ahodg...@simkin.ca wrote:
  On Tuesday 10 November 2009, akp geek akpg...@gmail.com wrote:
  So Is it always good to have the backup using PG_dump instead of PITR or
  a combination of both
 
 
  I like to do both. Ongoing PITR, daily base backups (by updating an rsync
  copy), and weekly pg_dumps that in turn go to tape.
 
  PITR gives a very recent restore point in the event of server loss. As
  previously mentioned, the full (custom) backups let you restore
 individual
  tables. They're also a lot smaller than base backups + WAL logs.
 
  --
  No animals were harmed in the recording of this episode. We tried but
 that
  damn monkey was just too fast.
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general
 

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] Incremental Backups in postgres

2009-11-09 Thread akp geek
Dear all -

  Is there way to create incremental backups in postgres. I
am currently using 8.4.1 on solaris. I am new to postgres. Can you please
share your thoughts

Regards


Re: [GENERAL] Incremental Backups in postgres

2009-11-09 Thread Ben Chobot
Saving off the transaction log WAL files is a good way to do this.  
Read this part of the manual:


http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html

...and see if that answers your questions.

On Nov 9, 2009, at 6:48 PM, akp geek wrote:


Dear all -

  Is there way to create incremental backups in  
postgres. I am currently using 8.4.1 on solaris. I am new to  
postgres. Can you please share your thoughts


Regards



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-09 Thread Richard Broersma
On Mon, Nov 9, 2009 at 6:48 PM, akp geek akpg...@gmail.com wrote:

   Is there way to create incremental backups in postgres. I
 am currently using 8.4.1 on solaris. I am new to postgres. Can you please
 share your thoughts

I've read more about continuous back-ups:
http://www.postgresql.org/docs/8.4/static/continuous-archiving.html

However, I see there is a section on incremental backups as well:
http://www.postgresql.org/docs/8.4/static/warm-standby.html#BACKUP-INCREMENTAL-UPDATED


-- 
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental Backups in postgres

2009-11-09 Thread Jing Tan

I wrote an article about PITR , incremental backups and multiple timelines.
check out. http://jinxter555.blogspot.com/
it should be an easy read.

akp geek akpg...@gmail.com ha escrito:


Dear all -

  Is there way to create incremental backups in postgres. I
am currently using 8.4.1 on solaris. I am new to postgres. Can you please
share your thoughts

Regards






iamastring: i am a string, a super string



This message was sent using IMP, the Internet Messaging Program.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incremental backups

2007-04-19 Thread Kev
On Apr 17, 10:27 am, [EMAIL PROTECTED] (Mageshwaran) wrote:
 hi everyone,

 please any one give any methods to do incremental backups. it is urgent
 .. help me

 Regards
 J Mageshwaran

Sorry, I don't have anything implemented, but I've been wondering
about this too.  One way (not necessarily the best) might be an audit
trail if done in such a way that you could rebuild the entire database
from the audit trail.  Then you need only back up the audit entries
since the last backup.  However, you have to keep all backups, or once
in a while do a 'fresh start' backup where you take all audit
entries.  There's another thread starting about audit trails, you
might want to check there.

Kev


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Incremental backups

2007-04-19 Thread Kev
On Apr 19, 9:41 am, Kev [EMAIL PROTECTED] wrote:
 On Apr 17, 10:27 am, [EMAIL PROTECTED] (Mageshwaran) wrote:

  hi everyone,

  please any one give any methods to do incremental backups. it is urgent
  .. help me

  Regards
  J Mageshwaran

 Sorry, I don't have anything implemented, but I've been wondering
 about this too.  One way (not necessarily the best) might be an audit
 trail if done in such a way that you could rebuild the entire database
 from the audit trail.  Then you need only back up the audit entries
 since the last backup.  However, you have to keep all backups, or once
 in a while do a 'fresh start' backup where you take all audit
 entries.  There's another thread starting about audit trails, you
 might want to check there.

 Kev

Actually, check out WAL archiving, that might be more appropriate.  I
don't know much about it though.

Kev



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] Incremental backups

2007-04-17 Thread Mageshwaran

hi everyone,

please any one give any methods to do incremental backups. it is urgent 
.. help me



Regards
J Mageshwaran

** DISCLAIMER **
Information contained and transmitted by this E-MAIL is proprietary to 
Sify Limited and is intended for use only by the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If this is a 
forwarded message, the content of this E-MAIL may not have been sent with 
the authority of the Company. If you are not the intended recipient, an 
agent of the intended recipient or a  person responsible for delivering the 
information to the named recipient,  you are notified that any use, 
distribution, transmission, printing, copying or dissemination of this 
information in any way or in any manner is strictly prohibited. If you have 
received this communication in error, please delete this mail  notify us 
immediately at [EMAIL PROTECTED]



Complete Coverage of the ICC World Cup '07! Log on to www.sify.com/khel for 
latest updates, expert columns, schedule, desktop scorecard, photo galleries 
and more!


Watch the hottest videos from Bollywood, Fashion, News and more only on 
www.sifymax.com


For the Expert view of the ICC World Cup log on to www.sify.com/khel. 
Read exclusive interviews with Sachin, Ganguly, Yuvraj, Sreesanth, Expert 
Columns by Gavaskar, Web chat with Dhoni and more! .


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] incremental backups

2006-02-24 Thread Bruce Momjian

I have applied the following patch adds to the paragraph after the one
you quoted below.  I just added mention that the start/stop time _and_
wal file names are in the history file.

---

Rick Gigger wrote:
 I've started writing some scripts to set up incremental backup to my  
 taste. I just discovered something and thought I would revisit this  
 thread briefly.
 
 When you go to restore from a give base file system backup you need  
 to know  the start WAL file that you need and the end WAL file that  
 you need.  (You will most likely have many files beyond the stop  
 file but you must have at least up to the stop file for the restore  
 to work.
 
 Now if you try to restore but you don't have the stop WAL file  
 postges will die on recovery and tell you that it can't recover  
 forward far enough to make the backup consistent.  But I wanted to  
 know the easiest way to verify if you indeed had the necessary files  
 without having to actually do a restore and have postgres tell you if  
 it succeeded or not.
 
 Perhaps no one understood me because the answer I was looking for was  
 too obvious.   But what I really wanted to know was how do you know  
 what the stop file is.  It informs you of the start file all over  
 the place when doing the base backups but I thought I would have to  
 do something clever to figure out the stop file on my own.  But  
 luckily I don't.  The backup history file has too lines like this:
 
 START WAL LOCATION: 0/88F21D0C (file 00010088)
 STOP WAL LOCATION: 0/88F21D50 (file 00010088)
 
 It was clear to me from the docs how to figure out what the start  
 file is but the end file was a mystery until I actually created a  
 backup history file and looked in it.  The only place I can find in  
 the Online Backup instructions where this is indicated is this  
 paragraph:
 
 To make use of this backup, you will need to keep around all the WAL  
 segment files generated during and after the file system backup. To  
 aid you in doing this, the pg_stop_backup function creates a backup  
 history file that is immediately stored into the WAL archive area.  
 This file is named after the first WAL segment file that you need to  
 have to make use of the backup. For example, if the starting WAL file  
 is 0001123455CD the backup history file will be named  
 something like 0001123455CD.007C9330.backup. (The second  
 number in the file name stands for an exact position within the WAL  
 file, and can ordinarily be ignored.) Once you have safely archived  
 the file system backup and the WAL segment files used during the  
 backup (as specified in the backup history file), all archived WAL  
 segments with names numerically less are no longer needed to recover  
 the file system backup and may be deleted. However, you should  
 consider keeping several backup sets to be absolutely certain that  
 you can recover your data. Keep in mind that only completed WAL  
 segment files are archived, so there will be delay between running  
 pg_stop_backup and the archiving of all WAL segment files needed to  
 make the file system backup consistent.
 
 Reading it now it seems obvious that the file would contain not only  
 the start WAL file but also the Stop WAL file but when going over the  
 directions the first time it did not pick up on it.  And it left me  
 thinking I would have to use some hack to figure it out if I ever  
 wanted to test a base backup.  It would have been less confusing to  
 me if it just said right in the docs: The backup history file  
 contains both the start WAL file name and the Stop WAL file name or  
 something like that just to make it perfectly clear.
 
 Now that I know this I can extract that filename from the backup  
 history file, check to see if it has been archived and copy it  
 somewhere if it hasn't been archived yet.  I'm pretty sure that I can  
 assume that all files before the stop file have already been  
 archived.  So once I backup the stop file I can be positive that the  
 base backup I just made will be valid when I try to restore from it.
 
 This lessens my need for the get current WAL file functionality in  
 this context.  It will still be nice to have in the context of  
 backing it up every five minutes or so in case a WAL file takes a  
 long time to fill up.
 
 Anyway I would have been less confused if the docs had made it more  
 clear that the name of the stop wal file was in the backup history file.
 
 Rick
 
 
 On Jan 30, 2006, at 10:20 PM, Bruce Momjian wrote:
 
 
  Yes, I think copying it while it is being written is safe.
 
  -- 
  -
 
  Rick Gigger wrote:
  Yes!  Thanks you!  That is exactly what I was looking for.
 
  So I take it that this means that it is save to copy the current in
  use WAL file even as it is being 

Re: [GENERAL] incremental backups

2006-02-15 Thread Rick Gigger
I've started writing some scripts to set up incremental backup to my  
taste. I just discovered something and thought I would revisit this  
thread briefly.


When you go to restore from a give base file system backup you need  
to know  the start WAL file that you need and the end WAL file that  
you need.  (You will most likely have many files beyond the stop  
file but you must have at least up to the stop file for the restore  
to work.


Now if you try to restore but you don't have the stop WAL file  
postges will die on recovery and tell you that it can't recover  
forward far enough to make the backup consistent.  But I wanted to  
know the easiest way to verify if you indeed had the necessary files  
without having to actually do a restore and have postgres tell you if  
it succeeded or not.


Perhaps no one understood me because the answer I was looking for was  
too obvious.   But what I really wanted to know was how do you know  
what the stop file is.  It informs you of the start file all over  
the place when doing the base backups but I thought I would have to  
do something clever to figure out the stop file on my own.  But  
luckily I don't.  The backup history file has too lines like this:


START WAL LOCATION: 0/88F21D0C (file 00010088)
STOP WAL LOCATION: 0/88F21D50 (file 00010088)

It was clear to me from the docs how to figure out what the start  
file is but the end file was a mystery until I actually created a  
backup history file and looked in it.  The only place I can find in  
the Online Backup instructions where this is indicated is this  
paragraph:


To make use of this backup, you will need to keep around all the WAL  
segment files generated during and after the file system backup. To  
aid you in doing this, the pg_stop_backup function creates a backup  
history file that is immediately stored into the WAL archive area.  
This file is named after the first WAL segment file that you need to  
have to make use of the backup. For example, if the starting WAL file  
is 0001123455CD the backup history file will be named  
something like 0001123455CD.007C9330.backup. (The second  
number in the file name stands for an exact position within the WAL  
file, and can ordinarily be ignored.) Once you have safely archived  
the file system backup and the WAL segment files used during the  
backup (as specified in the backup history file), all archived WAL  
segments with names numerically less are no longer needed to recover  
the file system backup and may be deleted. However, you should  
consider keeping several backup sets to be absolutely certain that  
you can recover your data. Keep in mind that only completed WAL  
segment files are archived, so there will be delay between running  
pg_stop_backup and the archiving of all WAL segment files needed to  
make the file system backup consistent.


Reading it now it seems obvious that the file would contain not only  
the start WAL file but also the Stop WAL file but when going over the  
directions the first time it did not pick up on it.  And it left me  
thinking I would have to use some hack to figure it out if I ever  
wanted to test a base backup.  It would have been less confusing to  
me if it just said right in the docs: The backup history file  
contains both the start WAL file name and the Stop WAL file name or  
something like that just to make it perfectly clear.


Now that I know this I can extract that filename from the backup  
history file, check to see if it has been archived and copy it  
somewhere if it hasn't been archived yet.  I'm pretty sure that I can  
assume that all files before the stop file have already been  
archived.  So once I backup the stop file I can be positive that the  
base backup I just made will be valid when I try to restore from it.


This lessens my need for the get current WAL file functionality in  
this context.  It will still be nice to have in the context of  
backing it up every five minutes or so in case a WAL file takes a  
long time to fill up.


Anyway I would have been less confused if the docs had made it more  
clear that the name of the stop wal file was in the backup history file.


Rick


On Jan 30, 2006, at 10:20 PM, Bruce Momjian wrote:



Yes, I think copying it while it is being written is safe.

-- 
-


Rick Gigger wrote:

Yes!  Thanks you!  That is exactly what I was looking for.

So I take it that this means that it is save to copy the current in
use WAL file even as it is being written to?
And it also means that if I copy it with my physical file system
backup then I should have the last file that I need to restore from
that physical backup?

So if I write my own backup_latest_WAL_file.sh script (I think I
found one on the list from Simon Riggs) then I can do what I need to
do before those todo items get done?  Or will I need to wait till

Re: [GENERAL] incremental backups

2006-01-31 Thread Tom Lane
Rick Gigger [EMAIL PROTECTED] writes:
 That's what I mean by invalid.  Let's say I do something stupid and  
 do a physical backup and I don't grab the current WAL file.  All I  
 have is the last one to be archived before I did my backup, which is  
 not late enough to do a valid restore.  Will postgres know that the  
 restore process failed because I didn't have that last necessary WAL  
 file or will it just start up in a potentially inconsistent state.   

Yes:

/*
 * Complain if we did not roll forward far enough to render the backup
 * dump consistent.
 */
if (XLByteLT(EndOfLog, recoveryMinXlogOffset))
{
if (needNewTimeLine)/* stopped because of stop request */
ereport(FATAL,
(errmsg(requested recovery stop point is before end time 
of backup dump)));
else
/* ran off end of WAL */
ereport(FATAL,
(errmsg(WAL ends before end time of backup dump)));
}

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] incremental backups

2006-01-31 Thread Rick Gigger

Wonderful.  That is good news.  Thanks.

Rick

On Jan 31, 2006, at 7:14 AM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:

That's what I mean by invalid.  Let's say I do something stupid and
do a physical backup and I don't grab the current WAL file.  All I
have is the last one to be archived before I did my backup, which is
not late enough to do a valid restore.  Will postgres know that the
restore process failed because I didn't have that last necessary WAL
file or will it just start up in a potentially inconsistent state.


Yes:

/*
 * Complain if we did not roll forward far enough to render the  
backup

 * dump consistent.
 */
if (XLByteLT(EndOfLog, recoveryMinXlogOffset))
{
if (needNewTimeLine)/* stopped because of stop request */
ereport(FATAL,
(errmsg(requested recovery stop point is  
before end time of backup dump)));

else
/* ran off end of WAL */
ereport(FATAL,
(errmsg(WAL ends before end time of backup  
dump)));

}

regards, tom lane




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] incremental backups

2006-01-30 Thread Bruce Momjian

Unfortunately, I think I understand your question.  :-)

These TODO items are what you need:

* Point-In-Time Recovery (PITR)

  o Allow point-in-time recovery to archive partially filled
write-ahead logs [pitr]

Currently only full WAL files are archived. This means that the
most recent transactions aren't available for recovery in case
of a disk failure. This could be triggered by a user command or
a timer.

  o Automatically force archiving of partially-filled WAL files when
pg_stop_backup() is called or the server is stopped

Doing this will allow administrators to know more easily when
the archive contains all the files needed for point-in-time
recovery.

I will try to push to have them done for 8.2.

---

Rick Gigger wrote:
 I guess my email wasn't all that clear.  I will try to rephrase.  I  
 am moving from using the old style pg_dump for backups to using  
 incrementals and want to make sure I understand the process before I  
 go about writing a bunch of scritps.
 
 To me setting up incremental backup consists of the following  
 components:
 
 1) Setting up the WAL archiving.  This one is trivial.
 2) Doing physical dumps of the $PGDATA directory.  This one is once  
 again trivial.
 3) Knowing which physical dumps are Good and Not Good.  For a given  
 physical dump D there is are WAL archive files Dstart and Dend for  
 which you much have Dstart and Dend and all files in between.  If you  
 have all those files then the physical dump is Good.  If you don't  
 have them then the dump is worthless to you.
 4) Knowing which dumps and which archive files can be deleted.  This  
 depends on a number of factors.
   a) How far back do you want to be able to do PITR
   b) How much space do you have / want to use for PITR
   c) Which physical dumps are Good and which are Not Good. (see #3)
 
 Now I think I have a pretty good plan here except for #3 (and so #4  
 then also suffers).
 
 Just as an example lets say I'm not concerned so much with PITR as I  
 am recovering from a db crash. I've got all the backups files saved  
 to my backup db server so I can failover to it if my primary db  
 server dies.  I just want to make sure I've got one physical dump  
 that is good.  (This is not my actual situation but it illustrated my  
 point better.)
 
 Now when I do a physical dump it is not a Good dump.  That is I don't  
 have the end archive file necessary to recover from that physical  
 dump.  That is to say that  when I call pg_backup_start() then copy  
 $PGDATA then call pg_backup_stop() postgres might be on say WAL  
 archive file #5.  Once the physical dump is completed WAL archive  
 file #5 hasn't been archived yet.  I only have up to #4.  So if I  
 delete my old physical dumps and all I've got is this most recent one  
 and my database crashes before #5 gets archived then I am hosed.  I  
 have no good physical backups to start from.
 
 My main question is about the best way to figure out when a physical  
 dump is Good.
 
 One strategy is to always keep around lots of physical dumps.  If you  
 keep around 100 dumps you can be pretty sure that in the space of  
 time that those physical dumps take place that at least one WAL file  
 was archived.  In fact if you keep 2 physical dumps you can be fairly  
 certain of this.  If not then you really need to space our your dumps  
 more.
 
 Is this making sense at this point?
 
 The problem is that the WAL archiving is triggered by postgres and  
 the rate at which the db is updated.  The physical dumps are  
 triggered by cron and on a purely time based schedule.  So in theory  
 if you had the physical dumps happening once a day but for some odd  
 reason no one updated the database for 4 days then all of a sudden  
 you'd have 2 physical backups and neither of them are good.  If  
 you're db crashes during that time you are hosed.
 
 Maybe I am arguing a point that is just stupid because this will  
 never happen in real life.  But in that it is my backups system that  
 I will be using to recover from complete and total disaster I just  
 want to have all my bases covered.
 
 So my ideas on how to determine if a physical dump is Good are as  
 follows.
 
 1) When you do the physical backup (after dumping the $PGDATA dir but  
 before calling pg_stop_backup() ) determine the current WAL archive  
 file.  Mark somewhere in the backed up physical dump the last file  
 needed for the dump to be considered good.  Then your deletion  
 scripts can look at the WAL archive files you have and the last one  
 required for the dump to be Good and determine if the dump is Good or  
 not.
 
 2) After doing the physical dump but before calling pg_stop_backup()  
 just copy the current WAL file to the physical dump.  If that file  
 later gets 

Re: [GENERAL] incremental backups

2006-01-30 Thread Rick Gigger

Yes!  Thanks you!  That is exactly what I was looking for.

So I take it that this means that it is save to copy the current in  
use WAL file even as it is being written to?
And it also means that if I copy it with my physical file system  
backup then I should have the last file that I need to restore from  
that physical backup?


So if I write my own backup_latest_WAL_file.sh script (I think I  
found one on the list from Simon Riggs) then I can do what I need to  
do before those todo items get done?  Or will I need to wait till  
postgres gives me the ability to safely copy the file?




On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:



Unfortunately, I think I understand your question.  :-)

These TODO items are what you need:

* Point-In-Time Recovery (PITR)

  o Allow point-in-time recovery to archive partially filled
write-ahead logs [pitr]

Currently only full WAL files are archived. This means  
that the
most recent transactions aren't available for recovery  
in case
of a disk failure. This could be triggered by a user  
command or

a timer.

  o Automatically force archiving of partially-filled WAL  
files when

pg_stop_backup() is called or the server is stopped

Doing this will allow administrators to know more  
easily when
the archive contains all the files needed for point-in- 
time

recovery.

I will try to push to have them done for 8.2.

-- 
-


Rick Gigger wrote:

I guess my email wasn't all that clear.  I will try to rephrase.  I
am moving from using the old style pg_dump for backups to using
incrementals and want to make sure I understand the process before I
go about writing a bunch of scritps.

To me setting up incremental backup consists of the following
components:

1) Setting up the WAL archiving.  This one is trivial.
2) Doing physical dumps of the $PGDATA directory.  This one is once
again trivial.
3) Knowing which physical dumps are Good and Not Good.  For a given
physical dump D there is are WAL archive files Dstart and Dend for
which you much have Dstart and Dend and all files in between.  If you
have all those files then the physical dump is Good.  If you don't
have them then the dump is worthless to you.
4) Knowing which dumps and which archive files can be deleted.  This
depends on a number of factors.
a) How far back do you want to be able to do PITR
b) How much space do you have / want to use for PITR
c) Which physical dumps are Good and which are Not Good. (see #3)

Now I think I have a pretty good plan here except for #3 (and so #4
then also suffers).

Just as an example lets say I'm not concerned so much with PITR as I
am recovering from a db crash. I've got all the backups files saved
to my backup db server so I can failover to it if my primary db
server dies.  I just want to make sure I've got one physical dump
that is good.  (This is not my actual situation but it illustrated my
point better.)

Now when I do a physical dump it is not a Good dump.  That is I don't
have the end archive file necessary to recover from that physical
dump.  That is to say that  when I call pg_backup_start() then copy
$PGDATA then call pg_backup_stop() postgres might be on say WAL
archive file #5.  Once the physical dump is completed WAL archive
file #5 hasn't been archived yet.  I only have up to #4.  So if I
delete my old physical dumps and all I've got is this most recent one
and my database crashes before #5 gets archived then I am hosed.  I
have no good physical backups to start from.

My main question is about the best way to figure out when a physical
dump is Good.

One strategy is to always keep around lots of physical dumps.  If you
keep around 100 dumps you can be pretty sure that in the space of
time that those physical dumps take place that at least one WAL file
was archived.  In fact if you keep 2 physical dumps you can be fairly
certain of this.  If not then you really need to space our your dumps
more.

Is this making sense at this point?

The problem is that the WAL archiving is triggered by postgres and
the rate at which the db is updated.  The physical dumps are
triggered by cron and on a purely time based schedule.  So in theory
if you had the physical dumps happening once a day but for some odd
reason no one updated the database for 4 days then all of a sudden
you'd have 2 physical backups and neither of them are good.  If
you're db crashes during that time you are hosed.

Maybe I am arguing a point that is just stupid because this will
never happen in real life.  But in that it is my backups system that
I will be using to recover from complete and total disaster I just
want to have all my bases covered.

So my ideas on how to determine if a physical dump is Good are as
follows.

1) When you do the physical backup (after dumping the 

Re: [GENERAL] incremental backups

2006-01-30 Thread Rick Gigger
And here is the real million dollar question.  Let's say for some  
reason I don't have the last WAL file I need for my backup to be  
valid.  Will it die and tell me it's bad or will it just start up  
with a screwed up data directory?



On Jan 30, 2006, at 4:29 PM, Rick Gigger wrote:


Yes!  Thanks you!  That is exactly what I was looking for.

So I take it that this means that it is save to copy the current in  
use WAL file even as it is being written to?
And it also means that if I copy it with my physical file system  
backup then I should have the last file that I need to restore from  
that physical backup?


So if I write my own backup_latest_WAL_file.sh script (I think I  
found one on the list from Simon Riggs) then I can do what I need  
to do before those todo items get done?  Or will I need to wait  
till postgres gives me the ability to safely copy the file?




On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:



Unfortunately, I think I understand your question.  :-)

These TODO items are what you need:

* Point-In-Time Recovery (PITR)

  o Allow point-in-time recovery to archive partially filled
write-ahead logs [pitr]

Currently only full WAL files are archived. This means  
that the
most recent transactions aren't available for recovery  
in case
of a disk failure. This could be triggered by a user  
command or

a timer.

  o Automatically force archiving of partially-filled WAL  
files when

pg_stop_backup() is called or the server is stopped

Doing this will allow administrators to know more  
easily when
the archive contains all the files needed for point-in- 
time

recovery.

I will try to push to have them done for 8.2.

- 
--


Rick Gigger wrote:

I guess my email wasn't all that clear.  I will try to rephrase.  I
am moving from using the old style pg_dump for backups to using
incrementals and want to make sure I understand the process before I
go about writing a bunch of scritps.

To me setting up incremental backup consists of the following
components:

1) Setting up the WAL archiving.  This one is trivial.
2) Doing physical dumps of the $PGDATA directory.  This one is once
again trivial.
3) Knowing which physical dumps are Good and Not Good.  For a given
physical dump D there is are WAL archive files Dstart and Dend for
which you much have Dstart and Dend and all files in between.  If  
you

have all those files then the physical dump is Good.  If you don't
have them then the dump is worthless to you.
4) Knowing which dumps and which archive files can be deleted.  This
depends on a number of factors.
a) How far back do you want to be able to do PITR
b) How much space do you have / want to use for PITR
c) Which physical dumps are Good and which are Not Good. (see #3)

Now I think I have a pretty good plan here except for #3 (and so #4
then also suffers).

Just as an example lets say I'm not concerned so much with PITR as I
am recovering from a db crash. I've got all the backups files saved
to my backup db server so I can failover to it if my primary db
server dies.  I just want to make sure I've got one physical dump
that is good.  (This is not my actual situation but it  
illustrated my

point better.)

Now when I do a physical dump it is not a Good dump.  That is I  
don't

have the end archive file necessary to recover from that physical
dump.  That is to say that  when I call pg_backup_start() then copy
$PGDATA then call pg_backup_stop() postgres might be on say WAL
archive file #5.  Once the physical dump is completed WAL archive
file #5 hasn't been archived yet.  I only have up to #4.  So if I
delete my old physical dumps and all I've got is this most recent  
one

and my database crashes before #5 gets archived then I am hosed.  I
have no good physical backups to start from.

My main question is about the best way to figure out when a physical
dump is Good.

One strategy is to always keep around lots of physical dumps.  If  
you

keep around 100 dumps you can be pretty sure that in the space of
time that those physical dumps take place that at least one WAL file
was archived.  In fact if you keep 2 physical dumps you can be  
fairly
certain of this.  If not then you really need to space our your  
dumps

more.

Is this making sense at this point?

The problem is that the WAL archiving is triggered by postgres and
the rate at which the db is updated.  The physical dumps are
triggered by cron and on a purely time based schedule.  So in theory
if you had the physical dumps happening once a day but for some odd
reason no one updated the database for 4 days then all of a sudden
you'd have 2 physical backups and neither of them are good.  If
you're db crashes during that time you are hosed.

Maybe I am arguing a point that is just stupid because this 

Re: [GENERAL] incremental backups

2006-01-30 Thread Tom Lane
Rick Gigger [EMAIL PROTECTED] writes:
 And here is the real million dollar question.  Let's say for some  
 reason I don't have the last WAL file I need for my backup to be  
 valid.  Will it die and tell me it's bad or will it just start up  
 with a screwed up data directory?

It'll restore up to the end of the data it has.  The only case that's
actually invalid is not restoring far enough to cover the time window
that the original base backup was taken over.  Otherwise it's just a
situation of restoring up to a particular point in time...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] incremental backups

2006-01-30 Thread Rick Gigger

On Jan 30, 2006, at 6:58 PM, Tom Lane wrote:


Rick Gigger [EMAIL PROTECTED] writes:

And here is the real million dollar question.  Let's say for some
reason I don't have the last WAL file I need for my backup to be
valid.  Will it die and tell me it's bad or will it just start up
with a screwed up data directory?


It'll restore up to the end of the data it has.  The only case that's
actually invalid is not restoring far enough to cover the time  
window

that the original base backup was taken over.  Otherwise it's just a
situation of restoring up to a particular point in time...



That's what I mean by invalid.  Let's say I do something stupid and  
do a physical backup and I don't grab the current WAL file.  All I  
have is the last one to be archived before I did my backup, which is  
not late enough to do a valid restore.  Will postgres know that the  
restore process failed because I didn't have that last necessary WAL  
file or will it just start up in a potentially inconsistent state.   
Obviously that would be my fault not postgres' since I am the one  
that didn't give it the data it needed to do a full restore.  But I  
am just wondering if that is a potential area to shoot yourself in  
the foot or if postgres will put the safety on for me.


Rick

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] incremental backups

2006-01-30 Thread Bruce Momjian

Yes, I think copying it while it is being written is safe.

---

Rick Gigger wrote:
 Yes!  Thanks you!  That is exactly what I was looking for.
 
 So I take it that this means that it is save to copy the current in  
 use WAL file even as it is being written to?
 And it also means that if I copy it with my physical file system  
 backup then I should have the last file that I need to restore from  
 that physical backup?
 
 So if I write my own backup_latest_WAL_file.sh script (I think I  
 found one on the list from Simon Riggs) then I can do what I need to  
 do before those todo items get done?  Or will I need to wait till  
 postgres gives me the ability to safely copy the file?
 
 
 
 On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote:
 
 
  Unfortunately, I think I understand your question.  :-)
 
  These TODO items are what you need:
 
  * Point-In-Time Recovery (PITR)
 
o Allow point-in-time recovery to archive partially filled
  write-ahead logs [pitr]
 
  Currently only full WAL files are archived. This means  
  that the
  most recent transactions aren't available for recovery  
  in case
  of a disk failure. This could be triggered by a user  
  command or
  a timer.
 
o Automatically force archiving of partially-filled WAL  
  files when
  pg_stop_backup() is called or the server is stopped
 
  Doing this will allow administrators to know more  
  easily when
  the archive contains all the files needed for point-in- 
  time
  recovery.
 
  I will try to push to have them done for 8.2.
 
  -- 
  -
 
  Rick Gigger wrote:
  I guess my email wasn't all that clear.  I will try to rephrase.  I
  am moving from using the old style pg_dump for backups to using
  incrementals and want to make sure I understand the process before I
  go about writing a bunch of scritps.
 
  To me setting up incremental backup consists of the following
  components:
 
  1) Setting up the WAL archiving.  This one is trivial.
  2) Doing physical dumps of the $PGDATA directory.  This one is once
  again trivial.
  3) Knowing which physical dumps are Good and Not Good.  For a given
  physical dump D there is are WAL archive files Dstart and Dend for
  which you much have Dstart and Dend and all files in between.  If you
  have all those files then the physical dump is Good.  If you don't
  have them then the dump is worthless to you.
  4) Knowing which dumps and which archive files can be deleted.  This
  depends on a number of factors.
 a) How far back do you want to be able to do PITR
 b) How much space do you have / want to use for PITR
 c) Which physical dumps are Good and which are Not Good. (see #3)
 
  Now I think I have a pretty good plan here except for #3 (and so #4
  then also suffers).
 
  Just as an example lets say I'm not concerned so much with PITR as I
  am recovering from a db crash. I've got all the backups files saved
  to my backup db server so I can failover to it if my primary db
  server dies.  I just want to make sure I've got one physical dump
  that is good.  (This is not my actual situation but it illustrated my
  point better.)
 
  Now when I do a physical dump it is not a Good dump.  That is I don't
  have the end archive file necessary to recover from that physical
  dump.  That is to say that  when I call pg_backup_start() then copy
  $PGDATA then call pg_backup_stop() postgres might be on say WAL
  archive file #5.  Once the physical dump is completed WAL archive
  file #5 hasn't been archived yet.  I only have up to #4.  So if I
  delete my old physical dumps and all I've got is this most recent one
  and my database crashes before #5 gets archived then I am hosed.  I
  have no good physical backups to start from.
 
  My main question is about the best way to figure out when a physical
  dump is Good.
 
  One strategy is to always keep around lots of physical dumps.  If you
  keep around 100 dumps you can be pretty sure that in the space of
  time that those physical dumps take place that at least one WAL file
  was archived.  In fact if you keep 2 physical dumps you can be fairly
  certain of this.  If not then you really need to space our your dumps
  more.
 
  Is this making sense at this point?
 
  The problem is that the WAL archiving is triggered by postgres and
  the rate at which the db is updated.  The physical dumps are
  triggered by cron and on a purely time based schedule.  So in theory
  if you had the physical dumps happening once a day but for some odd
  reason no one updated the database for 4 days then all of a sudden
  you'd have 2 physical backups and neither of them are good.  If
  you're db crashes during that time you are hosed.
 
  Maybe I am arguing a point that is just stupid 

Re: [GENERAL] incremental backups

2006-01-27 Thread Csaba Nagy
OK, that was before going home from work, so it could be excusable :-D
I read your mail now in more detail, and I can't answer it other than
that we use here a standby data base based on WAL log shipping, and the
procedure of building the standby finishes with a script
inserting/deleting a few 1000s of lines in a bogus table so there is for
sure a WAL file archived. That might fit your needs or might not...

Cheers,
Csaba.


On Thu, 2006-01-26 at 18:48, Rick Gigger wrote:
 Um, no you didn't read my email at all.  I am aware of all of that  
 and it is clearly outlined in the docs.  My email was about a  
 specific detail in the process.  Please read it if you want to know  
 what my actual question was.
 
 Thanks,
 
 Rick
 
 On Jan 26, 2006, at 10:41 AM, Csaba Nagy wrote:
 
  I didn't read your mail very carefully, but I guess you want:
 
- turn on WAL archiving, and archive all WAL logs;
- take the file system backup at regular time points, optionally you
  can keep them also for point in time recovery;
 
  Then you always have all the WAL files you need to recover to any  
  point
  in time you need. You can then supply all the WAL files which are  
  needed
  by the last file system backup to recover after a crash, or you can
  supply all the WAL files up to the time point just before your student
  DBA deleted all your data.
 
  HTH,
  Csaba.
 
 
  On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:
  I am looking into using WAL archiving for incremental backups.  It
  all seems fairly straightforward except for one thing.
 
  So you set up the archiving of the WAL files.  Then you set up cron
  or something to regularly do a physical backup of the data
  directory.  But when you do the physical backup you don't have the
  last WAL file archived yet that you need to restore that physical
  backup.  So you always need to keep at least two physical backups
  around so that you know that at least one of them has the WAL files
  needed for recovery.
 
  The question I have is: how do I know if I can use the latest one?
  That is if I first do physical backup A and then later do physical
  backup B and then I want to do a restore.  How do I know when I've
  got the files I need to use B so that I don't have to go all the way
  back to A?
 
  My initial thoughts are that I could:
 
  a) just before or after calling pg_stop_backup check the file system
  to see what the last archived WAL file is on disk and make sure that
  that I get the next one before I try restoring from that backup.
 
  b) just before or after calling pg_stop_backup check postgres to see
  to see what the current active WAL file is and make sure it has been
  archived before I try to restore from that backup.
 
  c) Always just use backup A.
 
  No c seems the easiest but is that even fail safe?  I realize it
  wouldn't really ever happen in an active production environment that
  was set up right but say you did backup A and backup B and during
  that whole time you had few writes in postgres that you never filled
  up a whole WAL file so both of the backups are invalid.  Then you
  would have to always go to option a or b above to verify that a given
  backup was good so that any previous backups could be deleted.
 
  Wouldn't it make things a lot easier if the backup history file not
  only gave you the name of the first file that you need but also the
  last one?  Then you could look at a given backup and say I need this
  start file and this end file.  Then you could delete all archived WAL
  files before start file.  And you could delete any old physical dumps
  because you know that your last physical dump was good.  It would
  just save you the step in the backups process of figuring out what
  that file is.  And it seems like pg_stop_backup could determine that
  on it's own.
 
  Does that make sense?  Am I totally off base here?
 
  Rick
 
  ---(end of  
  broadcast)---
  TIP 6: explain analyze is your friend
 
 
 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] incremental backups

2006-01-27 Thread Richard Huxton

Rick Gigger wrote:
Um, no you didn't read my email at all.  I am aware of all of that and 
it is clearly outlined in the docs.  My email was about a specific 
detail in the process.  Please read it if you want to know what my 
actual question was.


I'm not sure your email is quite right as regards the process. You need:
  1. the filesystem backup
  2. the WAL file indicated in the history-file
  3. all the WAL files later than that
to get up to now.

If you don't want to replay up to now then you will not need some of 
the more recent WAL files. You can't afford to throw them away though 
since you've got a rolling backup system running and the whole point is 
so you can recover to any point you like.


You can however throw away any WAL files older than that indicated in 
the history file for your current filesystem-backup. You can then only 
restore from that point in time forward.


There is no last one in the WAL set unless you know the time you want 
to restore to. Indeed, the last one might not be full yet and 
therefore archived if you want to restore to 10 seconds ago.


Or am I mis-understanding your email too?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] incremental backups

2006-01-27 Thread Rick Gigger
Sorry for my sharp reply!  It looks like we are after the same thing  
so that does help a little although it doesn't really answer my  
question.  I set up my backups system using pg_dump back in 7.3  
because that's all there was.  I am finally moving to 8.1 and want to  
switch to doing incrementals because the dumps are just getting too  
big.  If you didn't mind showing me your scripts it would probably be  
a good staring point for me.


Thanks,

Rick

On Jan 27, 2006, at 3:32 AM, Csaba Nagy wrote:


OK, that was before going home from work, so it could be excusable :-D
I read your mail now in more detail, and I can't answer it other than
that we use here a standby data base based on WAL log shipping, and  
the

procedure of building the standby finishes with a script
inserting/deleting a few 1000s of lines in a bogus table so there  
is for

sure a WAL file archived. That might fit your needs or might not...

Cheers,
Csaba.


On Thu, 2006-01-26 at 18:48, Rick Gigger wrote:

Um, no you didn't read my email at all.  I am aware of all of that
and it is clearly outlined in the docs.  My email was about a
specific detail in the process.  Please read it if you want to know
what my actual question was.

Thanks,

Rick

On Jan 26, 2006, at 10:41 AM, Csaba Nagy wrote:


I didn't read your mail very carefully, but I guess you want:

  - turn on WAL archiving, and archive all WAL logs;
  - take the file system backup at regular time points,  
optionally you

can keep them also for point in time recovery;

Then you always have all the WAL files you need to recover to any
point
in time you need. You can then supply all the WAL files which are
needed
by the last file system backup to recover after a crash, or you can
supply all the WAL files up to the time point just before your  
student

DBA deleted all your data.

HTH,
Csaba.


On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:

I am looking into using WAL archiving for incremental backups.  It
all seems fairly straightforward except for one thing.

So you set up the archiving of the WAL files.  Then you set up cron
or something to regularly do a physical backup of the data
directory.  But when you do the physical backup you don't have the
last WAL file archived yet that you need to restore that physical
backup.  So you always need to keep at least two physical backups
around so that you know that at least one of them has the WAL files
needed for recovery.

The question I have is: how do I know if I can use the latest one?
That is if I first do physical backup A and then later do physical
backup B and then I want to do a restore.  How do I know when I've
got the files I need to use B so that I don't have to go all the  
way

back to A?

My initial thoughts are that I could:

a) just before or after calling pg_stop_backup check the file  
system
to see what the last archived WAL file is on disk and make sure  
that

that I get the next one before I try restoring from that backup.

b) just before or after calling pg_stop_backup check postgres to  
see
to see what the current active WAL file is and make sure it has  
been

archived before I try to restore from that backup.

c) Always just use backup A.

No c seems the easiest but is that even fail safe?  I realize it
wouldn't really ever happen in an active production environment  
that

was set up right but say you did backup A and backup B and during
that whole time you had few writes in postgres that you never  
filled

up a whole WAL file so both of the backups are invalid.  Then you
would have to always go to option a or b above to verify that a  
given

backup was good so that any previous backups could be deleted.

Wouldn't it make things a lot easier if the backup history file not
only gave you the name of the first file that you need but also the
last one?  Then you could look at a given backup and say I need  
this
start file and this end file.  Then you could delete all  
archived WAL
files before start file.  And you could delete any old physical  
dumps

because you know that your last physical dump was good.  It would
just save you the step in the backups process of figuring out what
that file is.  And it seems like pg_stop_backup could determine  
that

on it's own.

Does that make sense?  Am I totally off base here?

Rick

---(end of
broadcast)---
TIP 6: explain analyze is your friend











---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] incremental backups

2006-01-27 Thread Rick Gigger
I guess my email wasn't all that clear.  I will try to rephrase.  I  
am moving from using the old style pg_dump for backups to using  
incrementals and want to make sure I understand the process before I  
go about writing a bunch of scritps.


To me setting up incremental backup consists of the following  
components:


1) Setting up the WAL archiving.  This one is trivial.
2) Doing physical dumps of the $PGDATA directory.  This one is once  
again trivial.
3) Knowing which physical dumps are Good and Not Good.  For a given  
physical dump D there is are WAL archive files Dstart and Dend for  
which you much have Dstart and Dend and all files in between.  If you  
have all those files then the physical dump is Good.  If you don't  
have them then the dump is worthless to you.
4) Knowing which dumps and which archive files can be deleted.  This  
depends on a number of factors.

a) How far back do you want to be able to do PITR
b) How much space do you have / want to use for PITR
c) Which physical dumps are Good and which are Not Good. (see #3)

Now I think I have a pretty good plan here except for #3 (and so #4  
then also suffers).


Just as an example lets say I'm not concerned so much with PITR as I  
am recovering from a db crash. I've got all the backups files saved  
to my backup db server so I can failover to it if my primary db  
server dies.  I just want to make sure I've got one physical dump  
that is good.  (This is not my actual situation but it illustrated my  
point better.)


Now when I do a physical dump it is not a Good dump.  That is I don't  
have the end archive file necessary to recover from that physical  
dump.  That is to say that  when I call pg_backup_start() then copy  
$PGDATA then call pg_backup_stop() postgres might be on say WAL  
archive file #5.  Once the physical dump is completed WAL archive  
file #5 hasn't been archived yet.  I only have up to #4.  So if I  
delete my old physical dumps and all I've got is this most recent one  
and my database crashes before #5 gets archived then I am hosed.  I  
have no good physical backups to start from.


My main question is about the best way to figure out when a physical  
dump is Good.


One strategy is to always keep around lots of physical dumps.  If you  
keep around 100 dumps you can be pretty sure that in the space of  
time that those physical dumps take place that at least one WAL file  
was archived.  In fact if you keep 2 physical dumps you can be fairly  
certain of this.  If not then you really need to space our your dumps  
more.


Is this making sense at this point?

The problem is that the WAL archiving is triggered by postgres and  
the rate at which the db is updated.  The physical dumps are  
triggered by cron and on a purely time based schedule.  So in theory  
if you had the physical dumps happening once a day but for some odd  
reason no one updated the database for 4 days then all of a sudden  
you'd have 2 physical backups and neither of them are good.  If  
you're db crashes during that time you are hosed.


Maybe I am arguing a point that is just stupid because this will  
never happen in real life.  But in that it is my backups system that  
I will be using to recover from complete and total disaster I just  
want to have all my bases covered.


So my ideas on how to determine if a physical dump is Good are as  
follows.


1) When you do the physical backup (after dumping the $PGDATA dir but  
before calling pg_stop_backup() ) determine the current WAL archive  
file.  Mark somewhere in the backed up physical dump the last file  
needed for the dump to be considered good.  Then your deletion  
scripts can look at the WAL archive files you have and the last one  
required for the dump to be Good and determine if the dump is Good or  
not.


2) After doing the physical dump but before calling pg_stop_backup()  
just copy the current WAL file to the physical dump.  If that file  
later gets archived then the restore commands overwrites your  
partially completed one so it doesn't hurt but you know that when you  
call pg_stop_backup() that that physical dump is good.  (Is it ok to  
copy the current WAL file while it is still in use?)


Is anyone taking one of these or any other precautions to make sure  
they've got a good physical dump or does everyone just keep a whole  
bunch of dumps around, and then actually restore the dump to see if  
it is good and if not go back to a previous dump?


I hope that makes more sense.

Thanks,

Rick

On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote:


Rick Gigger wrote:
Um, no you didn't read my email at all.  I am aware of all of that  
and it is clearly outlined in the docs.  My email was about a  
specific detail in the process.  Please read it if you want to  
know what my actual question was.


I'm not sure your email is quite right as regards the process. You  
need:

  1. the filesystem backup
  2. the WAL file indicated in the 

[GENERAL] incremental backups

2006-01-26 Thread Rick Gigger
I am looking into using WAL archiving for incremental backups.  It  
all seems fairly straightforward except for one thing.


So you set up the archiving of the WAL files.  Then you set up cron  
or something to regularly do a physical backup of the data  
directory.  But when you do the physical backup you don't have the  
last WAL file archived yet that you need to restore that physical  
backup.  So you always need to keep at least two physical backups  
around so that you know that at least one of them has the WAL files  
needed for recovery.


The question I have is: how do I know if I can use the latest one?   
That is if I first do physical backup A and then later do physical  
backup B and then I want to do a restore.  How do I know when I've  
got the files I need to use B so that I don't have to go all the way  
back to A?


My initial thoughts are that I could:

a) just before or after calling pg_stop_backup check the file system  
to see what the last archived WAL file is on disk and make sure that  
that I get the next one before I try restoring from that backup.


b) just before or after calling pg_stop_backup check postgres to see  
to see what the current active WAL file is and make sure it has been  
archived before I try to restore from that backup.


c) Always just use backup A.

No c seems the easiest but is that even fail safe?  I realize it  
wouldn't really ever happen in an active production environment that  
was set up right but say you did backup A and backup B and during  
that whole time you had few writes in postgres that you never filled  
up a whole WAL file so both of the backups are invalid.  Then you  
would have to always go to option a or b above to verify that a given  
backup was good so that any previous backups could be deleted.


Wouldn't it make things a lot easier if the backup history file not  
only gave you the name of the first file that you need but also the  
last one?  Then you could look at a given backup and say I need this  
start file and this end file.  Then you could delete all archived WAL  
files before start file.  And you could delete any old physical dumps  
because you know that your last physical dump was good.  It would  
just save you the step in the backups process of figuring out what  
that file is.  And it seems like pg_stop_backup could determine that  
on it's own.


Does that make sense?  Am I totally off base here?

Rick

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] incremental backups

2006-01-26 Thread Csaba Nagy
I didn't read your mail very carefully, but I guess you want:

  - turn on WAL archiving, and archive all WAL logs;
  - take the file system backup at regular time points, optionally you
can keep them also for point in time recovery;

Then you always have all the WAL files you need to recover to any point
in time you need. You can then supply all the WAL files which are needed
by the last file system backup to recover after a crash, or you can
supply all the WAL files up to the time point just before your student
DBA deleted all your data.

HTH,
Csaba.


On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:
 I am looking into using WAL archiving for incremental backups.  It  
 all seems fairly straightforward except for one thing.
 
 So you set up the archiving of the WAL files.  Then you set up cron  
 or something to regularly do a physical backup of the data  
 directory.  But when you do the physical backup you don't have the  
 last WAL file archived yet that you need to restore that physical  
 backup.  So you always need to keep at least two physical backups  
 around so that you know that at least one of them has the WAL files  
 needed for recovery.
 
 The question I have is: how do I know if I can use the latest one?   
 That is if I first do physical backup A and then later do physical  
 backup B and then I want to do a restore.  How do I know when I've  
 got the files I need to use B so that I don't have to go all the way  
 back to A?
 
 My initial thoughts are that I could:
 
 a) just before or after calling pg_stop_backup check the file system  
 to see what the last archived WAL file is on disk and make sure that  
 that I get the next one before I try restoring from that backup.
 
 b) just before or after calling pg_stop_backup check postgres to see  
 to see what the current active WAL file is and make sure it has been  
 archived before I try to restore from that backup.
 
 c) Always just use backup A.
 
 No c seems the easiest but is that even fail safe?  I realize it  
 wouldn't really ever happen in an active production environment that  
 was set up right but say you did backup A and backup B and during  
 that whole time you had few writes in postgres that you never filled  
 up a whole WAL file so both of the backups are invalid.  Then you  
 would have to always go to option a or b above to verify that a given  
 backup was good so that any previous backups could be deleted.
 
 Wouldn't it make things a lot easier if the backup history file not  
 only gave you the name of the first file that you need but also the  
 last one?  Then you could look at a given backup and say I need this  
 start file and this end file.  Then you could delete all archived WAL  
 files before start file.  And you could delete any old physical dumps  
 because you know that your last physical dump was good.  It would  
 just save you the step in the backups process of figuring out what  
 that file is.  And it seems like pg_stop_backup could determine that  
 on it's own.
 
 Does that make sense?  Am I totally off base here?
 
 Rick
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] incremental backups

2006-01-26 Thread Rick Gigger
Um, no you didn't read my email at all.  I am aware of all of that  
and it is clearly outlined in the docs.  My email was about a  
specific detail in the process.  Please read it if you want to know  
what my actual question was.


Thanks,

Rick

On Jan 26, 2006, at 10:41 AM, Csaba Nagy wrote:


I didn't read your mail very carefully, but I guess you want:

  - turn on WAL archiving, and archive all WAL logs;
  - take the file system backup at regular time points, optionally you
can keep them also for point in time recovery;

Then you always have all the WAL files you need to recover to any  
point
in time you need. You can then supply all the WAL files which are  
needed

by the last file system backup to recover after a crash, or you can
supply all the WAL files up to the time point just before your student
DBA deleted all your data.

HTH,
Csaba.


On Thu, 2006-01-26 at 18:33, Rick Gigger wrote:

I am looking into using WAL archiving for incremental backups.  It
all seems fairly straightforward except for one thing.

So you set up the archiving of the WAL files.  Then you set up cron
or something to regularly do a physical backup of the data
directory.  But when you do the physical backup you don't have the
last WAL file archived yet that you need to restore that physical
backup.  So you always need to keep at least two physical backups
around so that you know that at least one of them has the WAL files
needed for recovery.

The question I have is: how do I know if I can use the latest one?
That is if I first do physical backup A and then later do physical
backup B and then I want to do a restore.  How do I know when I've
got the files I need to use B so that I don't have to go all the way
back to A?

My initial thoughts are that I could:

a) just before or after calling pg_stop_backup check the file system
to see what the last archived WAL file is on disk and make sure that
that I get the next one before I try restoring from that backup.

b) just before or after calling pg_stop_backup check postgres to see
to see what the current active WAL file is and make sure it has been
archived before I try to restore from that backup.

c) Always just use backup A.

No c seems the easiest but is that even fail safe?  I realize it
wouldn't really ever happen in an active production environment that
was set up right but say you did backup A and backup B and during
that whole time you had few writes in postgres that you never filled
up a whole WAL file so both of the backups are invalid.  Then you
would have to always go to option a or b above to verify that a given
backup was good so that any previous backups could be deleted.

Wouldn't it make things a lot easier if the backup history file not
only gave you the name of the first file that you need but also the
last one?  Then you could look at a given backup and say I need this
start file and this end file.  Then you could delete all archived WAL
files before start file.  And you could delete any old physical dumps
because you know that your last physical dump was good.  It would
just save you the step in the backups process of figuring out what
that file is.  And it seems like pg_stop_backup could determine that
on it's own.

Does that make sense?  Am I totally off base here?

Rick

---(end of  
broadcast)---

TIP 6: explain analyze is your friend






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] incremental backups?

2004-07-06 Thread Jan Wieck
On 7/3/2004 9:11 AM, Martin Marques wrote:
El Vie 02 Jul 2004 18:39, Jan Wieck escribió:
On 6/22/2004 11:51 PM, mike g wrote:
 Slony version 1 is supposed to be live very soon.  You can test beta3 if
 you like.
Slony-I version 1.0 is out now. It does not contain incremental backup.
This feature is on the TODO list for 1.1.
I'm very interested in this.
How is it that a replication system would give the ability of doing 
incremental backups?
The information Slony collects with triggers basically consists of 
INSERT, UPDATE and DELETE statements that cause exactly the same changes 
on a slave, that have been performed on the master. If it is possible to 
extract a consistent data snapshot and to know exactly what actions are 
included in that, and which need to be redone after ...

There isn't much of a difference between applying the changes to a 
database and writing SQL statements into files.

How would these incremental backups compare to Informix's level backups (level 
0 is a full backup, 1 are the differences from the last 0, and 2 are the 
differences from the last level 1 BU)?
I wonder how informix does that sort of level backup. It sounds very 
much like a filesystem dump, but on a database level I can't really 
imagine this. Especially with an MVCC database like PostgreSQL.

There will be no levels. Slony will provide a dump, and then 
incrementals. One advantage will be that the incrementals are just plain 
SQL scripts containing INSERT, UPDATE and DELETE operations. One usually 
does PITR because a misbehaving application corrupted the data from a 
logical point of view. Finding the offending statement with grep in 
those files will ultimately lead to the exact point in time to wich the 
DB can be rolled forward. Plus, it will be easy to analyze what had been 
done after that point, based on the wrong data.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] incremental backups?

2004-07-03 Thread Martin Marques
El Vie 02 Jul 2004 18:39, Jan Wieck escribió:
 On 6/22/2004 11:51 PM, mike g wrote:
  Slony version 1 is supposed to be live very soon.  You can test beta3 if
  you like.

 Slony-I version 1.0 is out now. It does not contain incremental backup.
 This feature is on the TODO list for 1.1.

I'm very interested in this.
How is it that a replication system would give the ability of doing 
incremental backups?
How would these incremental backups compare to Informix's level backups (level 
0 is a full backup, 1 are the differences from the last 0, and 2 are the 
differences from the last level 1 BU)?

-- 
 10:05:02 up 1 day,  1:37,  1 user,  load average: 1.17, 0.93, 0.71
-
Martín Marqués| select 'mmarques' || '@' || 'unl.edu.ar'
Centro de Telematica  |  DBA, Programador, Administrador
 Universidad Nacional
  del Litoral
-


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] incremental backups?

2004-07-03 Thread Christopher Browne
After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Martin Marques) belched 
out:
 El Vie 02 Jul 2004 18:39, Jan Wieck escribió:
 On 6/22/2004 11:51 PM, mike g wrote:
  Slony version 1 is supposed to be live very soon.  You can test beta3 if
  you like.

 Slony-I version 1.0 is out now. It does not contain incremental backup.
 This feature is on the TODO list for 1.1.

 I'm very interested in this.
 How is it that a replication system would give the ability of doing 
 incremental backups?

The idea would be that you take a pg_dump at a point in time, which
provides, if you will, a baseline.  

You then take the series of logs containing Slony-I updates, which,
themselves, are a set of SQL Insert/Update/Delete statements.  They
represent incremental updates.

The clever part was noticing that it would be useful to record those
updates in text form _as incremental SQL logs_.

 How would these incremental backups compare to Informix's level
 backups (level 0 is a full backup, 1 are the differences from the
 last 0, and 2 are the differences from the last level 1 BU)?

Well, supposing you take a pg_dump from a particular node starting at
time T; that's a level 0 backup.

Slony-I then has a series of (say) 42 log files each dated after time
T, and going to time T + n.  Together, they represent the level 1
differences between the level 0 backup at time T and the present.

I suppose that the 42nd one might be considered to represent a level
42 backup, as it depends on the preceding 41 logs as well as that
level 0 backup.

It would be unsurprising for there to be hundreds, if not thousands of
such files per day...
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://www.ntlug.org/~cbbrowne/internet.html
Who's afraid of the garbage collector?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] incremental backups?

2004-07-02 Thread Jan Wieck
On 6/22/2004 11:51 PM, mike g wrote:
Slony version 1 is supposed to be live very soon.  You can test beta3 if
you like.
Slony-I version 1.0 is out now. It does not contain incremental backup. 
This feature is on the TODO list for 1.1.

Jan
Perhaps pgpool could help you.  Version 2 was just released.
On Tue, 2004-06-22 at 22:28, Joel Matthew wrote:
My boss was asking about incremental backups.
I was scratching my head, thinking that the transaction log and a backup
policy (script) for each record set (sorry about the archaic terminology)
was the usual solution. But there is a some resistance against writing
more code, so I'm wondering what the current state of affairs with
postgresql in regards to incremental backup would be.
A quick search of the lists produced the following:
Bruce talks in November 2002 about plans for point-in-time recovery in v.
7.4, but last December says it isn't there yet.
Jan mentions Slony-I replication back last January.
Somebody threw in some chatter about XLog.
Scott metioned briefly last August the possibility of combining a live
data server with an archive server, or of using a daily schema.
What's the typical user doing for incrementals, besides going to a
commercial server?
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] Incremental backups, and backup history

2003-06-20 Thread Nigel J. Andrews
On Thu, 19 Jun 2003, Matthew Nuzum wrote:

 Regarding backup history:
 
 I have an application designed for novices.  Apparently it's easy to hit the
 Delete button, and then say yes to the Are you sure you want to delete
 this? question even when they don't want to. Therefore I simply mark a
 record as deleted.  For example, 
 UPDATE table SET deleted='t' WHERE something=true;
 
 Then my application logic pretends it doesn't really exist until two days
 later the user decides they want it back.
 
 It works very well for me.
 

But are you also taking care of the referential integrity issues, i.e. only
disallowing tuples with a deleted = true from being referenced to and ensuring
nothing references them at the time they are marked as deleted.

It is a useful idea but as I know from a current project it requires
reimplementing foreign key functionality. In this case the middleware only uses
functions, one per statement, and nothing else, so I have been able to do much
of this in those functions but it's still a pain. I even wrote a utility to
take some of the leg work out of generating and maintaining quite a few
functions but if I'd had time [and thought about these basically being foreign
key constraints] I'd have looked at the existing foreign key code and seen if I
could copy and amend it or just amend it in place.


--
Nigel Andrews



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Incremental backups, and backup history

2003-06-19 Thread Greg Stark

Antonios Christofides [EMAIL PROTECTED] writes:

 Is this filenames-instead-of-BLOBs for easier backup common practice?
 Any other ideas or comments?

This is a major point of contention. Some people think keeping all data in the
database is a better approach, others think data that isn't inherently
relational and doesn't need the protection of transactions doesn't really
belong in the database.

I happen to be firmly in the camp against putting such files in the database.
But it depends a lot on what your needs are. In every case I've faced this it
was simply useful to have the files accessible without piping them through the
database protocol every time. I could point the web server at the directory
and serve them up directly, or rsync them to the machines that could do that.
They could even be served up from a separate lightweight web server without
any database access altogether, which would have been impossible if the only
way to access them was via the database.

If you need to be able to update portions of your blobs, or if you need
transactional safety then you may need the database.

 My second question is a general relational database backup question, not
 specifically related to pgsql. Sometimes a user accidentally
 deletes/corrupts a file, and discovers it three days later. After they
 come panicing to me, I can give them their file as it was three days
 ago, because of the backup tape rotation. Now suppose a user deletes
 ten employees from the database, and three days later they understand
 that this was a bad thing. Now what? I can restore the entire database
 and make it as it was three days ago, but I can't restore the particular
 deleted records in the current database, as the relations make the
 database behave as a single unit.

What's worse is you may have several related changes to multiple tables. And
then you won't know if any other changes to other tables were later done that
depended on that data. There will be no way to do this perfectly in general.

If your data is particularly amenable to this form of normalization then it
can be useful though. For example, instead of storing counters that are
incremented, even when the total is the only interesting statistic, I normally
insert new records for every event. If ever it turns out something was wrong
and the events should be disregarded for a particular time period, or with
particular other properties, I have the complete history and can do that.
Inserts are also more efficient for the database to do than updates.

But for something like an employee table you're probably going to be stuck
with restoring the table to a new name, and having a human look over the old
data and the current data and updating the current data appropriately.

You may want to consider having a deleted flag column and not ever deleting
records. So at least undelete can be an easy operation that doesn't even need
restoring backups.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Incremental backups, and backup history

2003-06-19 Thread Dennis Gearon
On your second question:

Keeping old data helps with data analysis, i.e., data mining. I would do the fired date as transactions. To see if an employee is still and employee, look for the latest transation, hired, rehired, contracted with as a temp/consultant, fired, laid off, etc.

Antonios Christofides wrote:

Hi,

I have two backup questions, not much related to each other; here they
are.
First: With PostgreSQL, I can't do incremental backups. pg_dump will
dump the entire database. Thus, if I want to keep daily backups on tape,
I'm in trouble because I'll have to do a full backup every day, which
may need several hours and several tapes.
One workaround I'm thinking is to not store BLOBs in the database, but
store them in the filesystem and store the filenames in the database
instead.  This needs some additional work (for example, a garbage
collector to periodically delete unreferenced files), but will move a
large amount of space from the database into the filesystem, which is
capable of incremental backups. Only BLOBs, that is; for some tables
that will have several tens of millions of small rows, I can't think of
any workaround.
Is this filenames-instead-of-BLOBs for easier backup common practice?
Any other ideas or comments?
My second question is a general relational database backup question, not
specifically related to pgsql. Sometimes a user accidentally
deletes/corrupts a file, and discovers it three days later. After they
come panicing to me, I can give them their file as it was three days
ago, because of the backup tape rotation. Now suppose a user deletes
ten employees from the database, and three days later they understand
that this was a bad thing. Now what? I can restore the entire database
and make it as it was three days ago, but I can't restore the particular
deleted records in the current database, as the relations make the
database behave as a single unit.
A colleague suggested, instead of updating or deleting rows, to only
insert rows with a timestamp; for example, instead of updating the
employee row with id=55, you insert a new row with id=55 with the
updated data, and change the primary key to (id, dateinserted). You then
always select the row with max dateinserted. A garbage collector is also
needed to periodically delete obsolete rows older than, say, six months.
Improvements can be made (such as using dateobsoleted instead of
dateinserted or moving the old rows to another table), but even in the
simplest cases I think it will be extremely hard to implement such a
system, again because of the relations.
So, it is a matter of database design? Do I have to design the database
so that it keeps the history of what happened?
Thanks everyone for the answers.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] Incremental backups, and backup history

2003-06-19 Thread Matthew Nuzum
Regarding backup history:

I have an application designed for novices.  Apparently it's easy to hit the
Delete button, and then say yes to the Are you sure you want to delete
this? question even when they don't want to. Therefore I simply mark a
record as deleted.  For example, 
UPDATE table SET deleted='t' WHERE something=true;

Then my application logic pretends it doesn't really exist until two days
later the user decides they want it back.

It works very well for me.

--
Matthew Nuzum
www.bearfruit.org
[EMAIL PROTECTED]
 

 -Original Message-
 From: Bruno Wolff III [mailto:[EMAIL PROTECTED]
 Sent: Thursday, June 19, 2003 6:59 AM
 To: Antonios Christofides
 Cc: [EMAIL PROTECTED]
 Subject: Re: Incremental backups, and backup history
 
 On Thu, Jun 19, 2003 at 11:42:28 +0300,
   Antonios Christofides [EMAIL PROTECTED] wrote:
 
  My second question is a general relational database backup question, not
  specifically related to pgsql. Sometimes a user accidentally
  deletes/corrupts a file, and discovers it three days later. After they
  come panicing to me, I can give them their file as it was three days
  ago, because of the backup tape rotation. Now suppose a user deletes
  ten employees from the database, and three days later they understand
  that this was a bad thing. Now what? I can restore the entire database
  and make it as it was three days ago, but I can't restore the particular
  deleted records in the current database, as the relations make the
  database behave as a single unit.
 
 This isn't a good situation to be in. If you log all of the transactions,
 one option may be to restore the database as of the last time it was
 valid, remove the bogus transactions from the transaction log and then
 replay
 the transactions from the time of the backup to the present.
 This will mean downtime for the database and it is possible that removing
 the bogus tranasctions will affect things as you reapply other
 transactions
 in ways you don't want.
 
 Another option is to figure out what the bogus tranasctions did (which
 keeping
 a history will help out with) and try to undo them as best as possible
 with
 new transactions.
 
 Either of the above are going to need manual oversight.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org