Re: [GENERAL] : PostgreSQL Online Backup
Sorry for not responding to this email for so long. Alan, We had mentioned the following line in recovery.conf file (we had given pg_xlog location since we did not have WAL archives) - restore_command = 'cp /pg_xlog/%f %p' We found where the problem was - Here is what i did - 1. We had taken a full backup using pg_start_backup() and pg_stop_backup() on Day 1 2. Rest of the days (from Day 2 - Day 15), we had incrementally backed-up ( this is also using pg_start_backup() and pg_stop_backup()) 3. On Day-16th, when i started the recovery, PG was asking Day 1's WAL archive file, which we did not have. A fresh complete backup with change in our backup strategy resolved the issue. Thanks a lot for all your inputs and help on this !! Regards, VB 2011/10/4 Alan Hodgson > > > rsync works fine. Why exactly can't the recovery find the backed up > copy > > > of 000105390076? Please post your archive_command settings, > > > the contents of any script(s) called by that, and the recovery.conf > file > > > you're using that's having problems, as well as the complete process > you > > > followed to > > > initiate recovery. I strongly suspect you're missing part of the > process > > > of actually saving the WAL files needed for recovery. > > > The recovery is unable to find the WAL archive because, it was generated > on > > 26th September. > > > > Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the > files. > > > > I do not have that WAL archive copy. > > > > The problem area - > > > > I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not > > 256K). > > > > I'm going to need the rest of what I asked for to offer any further > suggestions > - especially the full and exact steps you took to initiate recovery and the > contents of recovery.conf. Also, please don't top-post. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] : PostgreSQL Online Backup
> > rsync works fine. Why exactly can't the recovery find the backed up copy > > of 000105390076? Please post your archive_command settings, > > the contents of any script(s) called by that, and the recovery.conf file > > you're using that's having problems, as well as the complete process you > > followed to > > initiate recovery. I strongly suspect you're missing part of the process > > of actually saving the WAL files needed for recovery. > The recovery is unable to find the WAL archive because, it was generated on > 26th September. > > Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the files. > > I do not have that WAL archive copy. > > The problem area - > > I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not > 256K). > I'm going to need the rest of what I asked for to offer any further suggestions - especially the full and exact steps you took to initiate recovery and the contents of recovery.conf. Also, please don't top-post. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] : PostgreSQL Online Backup
The recovery is unable to find the WAL archive because, it was generated on 26th September. Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the files. I do not have that WAL archive copy. The problem area - I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not 256K). Thanks VB 2011/10/3 Alan Hodgson > On October 3, 2011 05:33:35 AM Venkat Balaji wrote: > > Did anyone observe this behavior ?? Please help ! > > > > This is critical for us. I want to recommend not to use "rsync" (use cp > or > > scp instead) for production backup. > > > > rsync works fine. Why exactly can't the recovery find the backed up copy of > 000105390076? Please post your archive_command settings, the > contents of any script(s) called by that, and the recovery.conf file you're > using that's having problems, as well as the complete process you followed > to > initiate recovery. I strongly suspect you're missing part of the process of > actually saving the WAL files needed for recovery. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] : PostgreSQL Online Backup
On October 3, 2011 05:33:35 AM Venkat Balaji wrote: > Did anyone observe this behavior ?? Please help ! > > This is critical for us. I want to recommend not to use "rsync" (use cp or > scp instead) for production backup. > rsync works fine. Why exactly can't the recovery find the backed up copy of 000105390076? Please post your archive_command settings, the contents of any script(s) called by that, and the recovery.conf file you're using that's having problems, as well as the complete process you followed to initiate recovery. I strongly suspect you're missing part of the process of actually saving the WAL files needed for recovery. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] : PostgreSQL Online Backup
Another problem in recovery (probably because of "rsync") - As said earlier, we are taking a production backup everyday incrementally using "rsync". But, Postgres some how misses to sync few files in between and keeps on asking the back dated archive files (more than 1 week ago). I restored October 2nd backup and PG is asking for September 26th archive file with the last known time as 26th Sep, 2011. 2011-10-03 07:17:12 CDT [12705]: [1-1] LOG: database system was interrupted; last known up at 2011-09-26 09:01:36 CDT 2011-10-03 07:17:12 CDT [12705]: [2-1] LOG: starting archive recovery cp: cannot stat `/usr/local/pgsql9.0.1/obtdata/data/pg_xlog/000105390076': No such file or directory 2011-10-03 07:17:12 CDT [12705]: [3-1] LOG: could not open file "pg_xlog/000105390076" (log file 1337, segment 118): No such file or directory 2011-10-03 07:17:12 CDT [12705]: [4-1] LOG: invalid checkpoint record 2011-10-03 07:17:12 CDT [12705]: [5-1] PANIC: could not locate required checkpoint record 2011-10-03 07:17:12 CDT [12705]: [6-1] HINT: If you are not restoring from a backup, try removing the file "/usr/local/pgsql9.0.1/obtdata/data/backup_label". 2011-10-03 07:17:12 CDT [12702]: [1-1] LOG: startup process (PID 12705) was terminated by signal 6: Aborted 2011-10-03 07:17:12 CDT [12702]: [2-1] LOG: aborting startup due to startup process failure I always see pg_clog files and some base files not getting synced. Below is what we are doing - pg_start_backup() rsync the data directory pg_stop_backup() The first time "rsync" is fine, but, the subsequent runs are generating in-consistency. We do the same every day to backup the data directory incrementally. What i observed is PG records the TXN id when ever backup starts and stops + backup label. The next day when PG records the start backup time and TXN id, i think some of the TXN ids and pg_clog files generated between last stop time and the next start time are missed. Did anyone observe this behavior ?? Please help ! This is critical for us. I want to recommend not to use "rsync" (use cp or scp instead) for production backup. Thanks VB On Tue, Sep 27, 2011 at 2:36 PM, Albe Laurenz wrote: > Venkat Balaji wrote: > > Our problem is - > > > > We had mistakenly executed "rsync" on the running PostgreSQL data > directory (production) and we did > > not run "pg_start_backup()". > > > > Will this harm production ? can this lead to corruption ? > > I assume that you used rsync to copy *from* the data directory. > > This cannot lead to data corruption. > Only performance might suffer temporarily due to the additional I/O. > > The backup made with rsync will be unusable without pg_start_backup(). > > Yours, > Laurenz Albe >
Re: [GENERAL] : PostgreSQL Online Backup
Venkat Balaji wrote: > Our problem is - > > We had mistakenly executed "rsync" on the running PostgreSQL data directory (production) and we did > not run "pg_start_backup()". > > Will this harm production ? can this lead to corruption ? I assume that you used rsync to copy *from* the data directory. This cannot lead to data corruption. Only performance might suffer temporarily due to the additional I/O. The backup made with rsync will be unusable without pg_start_backup(). Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] : PostgreSQL Online Backup
Thanks for all your inputs ! Our problem is - We had mistakenly executed "rsync" on the running PostgreSQL data directory (production) and we did not run "pg_start_backup()". Will this harm production ? can this lead to corruption ? Thanks - On Mon, Sep 26, 2011 at 10:29 PM, Alan Hodgson wrote: > On September 26, 2011 05:49:50 AM Venkat Balaji wrote: > > I tried restoring the backup, after taking the full backup. > > > > Below is what i see in the "archive destination". > > > > Postgres was asking for "00010193006F" and i tried to find > the > > same and below is what i find... > > > > -rw--- 1 postgres postgres 3.3M Sep 26 02:06 > > 00010193006F.gz -rw--- 1 postgres postgres 219 Sep 26 > > 02:53 > > 00010193006F.00328508.backup.gz > > > > Why is PG (9.0) putting an extension for the WAL Archive file as > > > "backup.gz" ?? > > > > The archive files are created by your archive_command, as specified in > postgresql.conf. My guess would be that your archive command runs the files > through gzip as part of archiving (which is fine). > > However, the restore_command you specify in recovery.conf must undo this > compression. So instead of (for example) 'cp -f "%f" "%p"', it might > instead > need to look like 'zcat "%f" > "%p"'. > > Hope this helps. > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] : PostgreSQL Online Backup
On September 26, 2011 05:49:50 AM Venkat Balaji wrote: > I tried restoring the backup, after taking the full backup. > > Below is what i see in the "archive destination". > > Postgres was asking for "00010193006F" and i tried to find the > same and below is what i find... > > -rw--- 1 postgres postgres 3.3M Sep 26 02:06 > 00010193006F.gz -rw--- 1 postgres postgres 219 Sep 26 > 02:53 > 00010193006F.00328508.backup.gz > > Why is PG (9.0) putting an extension for the WAL Archive file as > > "backup.gz" ?? > The archive files are created by your archive_command, as specified in postgresql.conf. My guess would be that your archive command runs the files through gzip as part of archiving (which is fine). However, the restore_command you specify in recovery.conf must undo this compression. So instead of (for example) 'cp -f "%f" "%p"', it might instead need to look like 'zcat "%f" > "%p"'. Hope this helps. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] : PostgreSQL Online Backup
Venkat Balaji wrote: > We have had situations where-in "rsync" was executed without executing "pg_start_backup()" on the > production data directory and on the next runs, "pg_start_backup()" has been executed with "rsync". > This was to avoid high IO load on production. We ended up getting unmatched files (especially in > pg_clog) and not sure about "base" directory. > > Postgres is asking for WAL Archive files dated sometime around 15 days ago. We are absolutely not sure > whats going on. > > Is this dangerous for production (like corruption) ? or just the backup will be invalid ? Please help > us know if we have to perform any precautionary checks on the production cluster. > > Apart from firing a checkpoint, does "pg_start_backup()" updates any dictionary tables or views ? or > it updates anything in "pg_xlog" > > Looking forward for your help ! I am not sure what the problem is. Do you have problems starting the original PostgreSQL cluster, or do you have problems restoring a backup? Running pg_start_backup() will not harm the cluster. End online backup mode by running pg_stop_backup() or removing the backup_label file in the cluster directory. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] : PostgreSQL Online Backup
I tried restoring the backup, after taking the full backup. Below is what i see in the "archive destination". Postgres was asking for "00010193006F" and i tried to find the same and below is what i find... -rw--- 1 postgres postgres 3.3M Sep 26 02:06 00010193006F.gz -rw--- 1 postgres postgres 219 Sep 26 02:53 00010193006F.00328508.backup.gz Why is PG (9.0) putting an extension for the WAL Archive file as "backup.gz" ?? Please help ! Thanks VB On Mon, Sep 26, 2011 at 5:11 PM, Venkat Balaji wrote: > Hello Everyone, > > We have had situations where-in "rsync" was executed without executing > "pg_start_backup()" on the production data directory and on the next runs, > "pg_start_backup()" has been executed with "rsync". This was to avoid high > IO load on production. We ended up getting unmatched files (especially in > pg_clog) and not sure about "base" directory. > > Postgres is asking for WAL Archive files dated sometime around 15 days ago. > We are absolutely not sure whats going on. > > Is this dangerous for production (like corruption) ? or just the backup > will be invalid ? Please help us know if we have to perform any > precautionary checks on the production cluster. > > Apart from firing a checkpoint, does "pg_start_backup()" updates any > dictionary tables or views ? or it updates anything in "pg_xlog" > > Looking forward for your help ! > > Thanks > VB > > >
[GENERAL] : PostgreSQL Online Backup
Hello Everyone, We have had situations where-in "rsync" was executed without executing "pg_start_backup()" on the production data directory and on the next runs, "pg_start_backup()" has been executed with "rsync". This was to avoid high IO load on production. We ended up getting unmatched files (especially in pg_clog) and not sure about "base" directory. Postgres is asking for WAL Archive files dated sometime around 15 days ago. We are absolutely not sure whats going on. Is this dangerous for production (like corruption) ? or just the backup will be invalid ? Please help us know if we have to perform any precautionary checks on the production cluster. Apart from firing a checkpoint, does "pg_start_backup()" updates any dictionary tables or views ? or it updates anything in "pg_xlog" Looking forward for your help ! Thanks VB