Re: [GENERAL] : PostgreSQL Online Backup

2011-10-24 Thread Venkat Balaji
Sorry for not responding to this email for so long.

Alan,

We had mentioned the following line in recovery.conf file (we had given
pg_xlog location since we did not have WAL archives) -

restore_command = 'cp /pg_xlog/%f %p'

We found where the problem was -

Here is what i did -

1. We had taken a full backup using pg_start_backup() and pg_stop_backup()
on Day 1
2. Rest of the days (from Day 2 - Day 15), we had incrementally backed-up (
this is also using pg_start_backup() and pg_stop_backup())
3. On Day-16th, when i started the recovery, PG was asking Day 1's WAL
archive file, which we did not have.

A fresh complete backup with change in our backup strategy resolved the
issue.

Thanks a lot for all your inputs and help on this !!

Regards,
VB

2011/10/4 Alan Hodgson 

> > > rsync works fine. Why exactly can't the recovery find the backed up
> copy
> > > of 000105390076? Please post your archive_command settings,
> > > the contents of any script(s) called by that, and the recovery.conf
> file
> > > you're using that's having problems, as well as the complete process
> you
> > > followed to
> > > initiate recovery. I strongly suspect you're missing part of the
> process
> > > of actually saving the WAL files needed for recovery.
>
> > The recovery is unable to find the WAL archive because, it was generated
> on
> > 26th September.
> >
> > Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the
> files.
> >
> > I do not have that WAL archive copy.
> >
> > The problem area -
> >
> > I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not
> > 256K).
> >
>
> I'm going to need the rest of what I asked for to offer any further
> suggestions
> - especially the full and exact steps you took to initiate recovery and the
> contents of recovery.conf. Also, please don't top-post.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] : PostgreSQL Online Backup

2011-10-04 Thread Alan Hodgson
> > rsync works fine. Why exactly can't the recovery find the backed up copy
> > of 000105390076? Please post your archive_command settings,
> > the contents of any script(s) called by that, and the recovery.conf file
> > you're using that's having problems, as well as the complete process you
> > followed to
> > initiate recovery. I strongly suspect you're missing part of the process
> > of actually saving the WAL files needed for recovery.

> The recovery is unable to find the WAL archive because, it was generated on
> 26th September.
> 
> Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the files.
> 
> I do not have that WAL archive copy.
> 
> The problem area -
> 
> I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not
> 256K).
> 

I'm going to need the rest of what I asked for to offer any further suggestions 
- especially the full and exact steps you took to initiate recovery and the 
contents of recovery.conf. Also, please don't top-post.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] : PostgreSQL Online Backup

2011-10-03 Thread Venkat Balaji
The recovery is unable to find the WAL archive because, it was generated on
26th September.

Whereas the backup is as taken on Oct 2nd, 2011. We deleted all the files.

I do not have that WAL archive copy.

The problem area -

I found that a pg_clog file dated 26th Sep, 2011 is not synced (its not
256K).

Thanks
VB

2011/10/3 Alan Hodgson 

> On October 3, 2011 05:33:35 AM Venkat Balaji wrote:
> > Did anyone observe this behavior ?? Please help !
> >
> > This is critical for us. I want to recommend not to use "rsync" (use cp
> or
> > scp instead) for production backup.
> >
>
> rsync works fine. Why exactly can't the recovery find the backed up copy of
> 000105390076? Please post your archive_command settings, the
> contents of any script(s) called by that, and the recovery.conf file you're
> using that's having problems, as well as the complete process you followed
> to
> initiate recovery. I strongly suspect you're missing part of the process of
> actually saving the WAL files needed for recovery.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] : PostgreSQL Online Backup

2011-10-03 Thread Alan Hodgson
On October 3, 2011 05:33:35 AM Venkat Balaji wrote:
> Did anyone observe this behavior ?? Please help !
> 
> This is critical for us. I want to recommend not to use "rsync" (use cp or
> scp instead) for production backup.
> 

rsync works fine. Why exactly can't the recovery find the backed up copy of 
000105390076? Please post your archive_command settings, the 
contents of any script(s) called by that, and the recovery.conf file you're 
using that's having problems, as well as the complete process you followed to 
initiate recovery. I strongly suspect you're missing part of the process of 
actually saving the WAL files needed for recovery.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] : PostgreSQL Online Backup

2011-10-03 Thread Venkat Balaji
Another problem in recovery (probably because of "rsync") -

As said earlier, we are taking a production backup everyday incrementally
using "rsync".

But, Postgres some how misses to sync few files in between and keeps on
asking the back dated archive files (more than 1 week ago).

I restored October 2nd backup and PG is asking for September 26th archive
file with the last known time as 26th Sep, 2011.

2011-10-03 07:17:12 CDT [12705]: [1-1] LOG:  database system was
interrupted; last known up at 2011-09-26 09:01:36 CDT
2011-10-03 07:17:12 CDT [12705]: [2-1] LOG:  starting archive recovery
cp: cannot stat
`/usr/local/pgsql9.0.1/obtdata/data/pg_xlog/000105390076': No
such file or directory
2011-10-03 07:17:12 CDT [12705]: [3-1] LOG:  could not open file
"pg_xlog/000105390076" (log file 1337, segment 118): No such
file or directory
2011-10-03 07:17:12 CDT [12705]: [4-1] LOG:  invalid checkpoint record
2011-10-03 07:17:12 CDT [12705]: [5-1] PANIC:  could not locate required
checkpoint record
2011-10-03 07:17:12 CDT [12705]: [6-1] HINT:  If you are not restoring from
a backup, try removing the file
"/usr/local/pgsql9.0.1/obtdata/data/backup_label".
2011-10-03 07:17:12 CDT [12702]: [1-1] LOG:  startup process (PID 12705) was
terminated by signal 6: Aborted
2011-10-03 07:17:12 CDT [12702]: [2-1] LOG:  aborting startup due to startup
process failure


I always see pg_clog files and some base files not getting synced.

Below is what we are doing -

pg_start_backup()
rsync the data directory
pg_stop_backup()

The first time "rsync" is fine, but, the subsequent runs are generating
in-consistency.

We do the same every day to backup the data directory incrementally.

What i observed is PG records the TXN id when ever backup starts and stops +
backup label. The next day when PG records the start backup time and TXN id,
i think some of the TXN ids and pg_clog files generated between last stop
time and the next start time are missed.

Did anyone observe this behavior ?? Please help !

This is critical for us. I want to recommend not to use "rsync" (use cp or
scp instead) for production backup.

Thanks
VB

On Tue, Sep 27, 2011 at 2:36 PM, Albe Laurenz wrote:

> Venkat Balaji wrote:
> > Our problem is -
> >
> > We had mistakenly executed  "rsync" on the running PostgreSQL data
> directory (production) and we did
> > not run "pg_start_backup()".
> >
> > Will this harm production ? can this lead to corruption ?
>
> I assume that you used rsync to copy *from* the data directory.
>
> This cannot lead to data corruption.
> Only performance might suffer temporarily due to the additional I/O.
>
> The backup made with rsync will be unusable without pg_start_backup().
>
> Yours,
> Laurenz Albe
>


Re: [GENERAL] : PostgreSQL Online Backup

2011-09-27 Thread Albe Laurenz
Venkat Balaji wrote:
> Our problem is -
> 
> We had mistakenly executed  "rsync" on the running PostgreSQL data
directory (production) and we did
> not run "pg_start_backup()".
> 
> Will this harm production ? can this lead to corruption ?

I assume that you used rsync to copy *from* the data directory.

This cannot lead to data corruption.
Only performance might suffer temporarily due to the additional I/O.

The backup made with rsync will be unusable without pg_start_backup().

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji
Thanks for all your inputs !

Our problem is -

We had mistakenly executed  "rsync" on the running PostgreSQL data directory
(production) and we did not run "pg_start_backup()".

Will this harm production ? can this lead to corruption ?

Thanks -

On Mon, Sep 26, 2011 at 10:29 PM, Alan Hodgson  wrote:

> On September 26, 2011 05:49:50 AM Venkat Balaji wrote:
> > I tried restoring the backup, after taking the full backup.
> >
> > Below is what i see in the "archive destination".
> >
> > Postgres was asking for "00010193006F" and i tried to find
> the
> > same and below is what i find...
> >
> > -rw--- 1 postgres postgres 3.3M Sep 26 02:06
> > 00010193006F.gz -rw--- 1 postgres postgres  219 Sep 26
> > 02:53
> > 00010193006F.00328508.backup.gz
> >
> > Why is PG (9.0) putting an extension for the WAL Archive file as
> > > "backup.gz" ??
> >
>
> The archive files are created by your archive_command, as specified in
> postgresql.conf. My guess would be that your archive command runs the files
> through gzip as part of archiving (which is fine).
>
> However, the restore_command you specify in recovery.conf  must undo this
> compression. So instead of (for example) 'cp -f "%f" "%p"', it might
> instead
> need to look like 'zcat "%f" > "%p"'.
>
> Hope this helps.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Alan Hodgson
On September 26, 2011 05:49:50 AM Venkat Balaji wrote:
> I tried restoring the backup, after taking the full backup.
> 
> Below is what i see in the "archive destination".
> 
> Postgres was asking for "00010193006F" and i tried to find the
> same and below is what i find...
> 
> -rw--- 1 postgres postgres 3.3M Sep 26 02:06
> 00010193006F.gz -rw--- 1 postgres postgres  219 Sep 26
> 02:53
> 00010193006F.00328508.backup.gz
> 
> Why is PG (9.0) putting an extension for the WAL Archive file as
> > "backup.gz" ??
>

The archive files are created by your archive_command, as specified in 
postgresql.conf. My guess would be that your archive command runs the files 
through gzip as part of archiving (which is fine).

However, the restore_command you specify in recovery.conf  must undo this 
compression. So instead of (for example) 'cp -f "%f" "%p"', it might instead 
need to look like 'zcat "%f" > "%p"'.

Hope this helps.

 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Albe Laurenz
Venkat Balaji wrote:
> We have had situations where-in "rsync" was executed without executing
"pg_start_backup()" on the
> production data directory and on the next runs, "pg_start_backup()"
has been executed with "rsync".
> This was to avoid high IO load on production. We ended up getting
unmatched files (especially in
> pg_clog) and not sure about "base" directory.
> 
> Postgres is asking for WAL Archive files dated sometime around 15 days
ago. We are absolutely not sure
> whats going on.
> 
> Is this dangerous for production (like corruption) ? or just the
backup will be invalid ? Please help
> us know if we have to perform any precautionary checks on the
production cluster.
> 
> Apart from firing a checkpoint, does "pg_start_backup()" updates any
dictionary tables or views ? or
> it updates anything in "pg_xlog"
> 
> Looking forward for your help !

I am not sure what the problem is.

Do you have problems starting the original PostgreSQL cluster,
or do you have problems restoring a backup?

Running pg_start_backup() will not harm the cluster.
End online backup mode by running pg_stop_backup() or removing
the backup_label file in the cluster directory.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji
I tried restoring the backup, after taking the full backup.

Below is what i see in the "archive destination".

Postgres was asking for "00010193006F" and i tried to find the
same and below is what i find...

-rw--- 1 postgres postgres 3.3M Sep 26 02:06 00010193006F.gz
-rw--- 1 postgres postgres  219 Sep 26 02:53
00010193006F.00328508.backup.gz

Why is PG (9.0) putting an extension for the WAL Archive file as "backup.gz"
??

Please help !

Thanks
VB

On Mon, Sep 26, 2011 at 5:11 PM, Venkat Balaji wrote:

> Hello Everyone,
>
> We have had situations where-in "rsync" was executed without executing
> "pg_start_backup()" on the production data directory and on the next runs,
> "pg_start_backup()" has been executed with "rsync". This was to avoid high
> IO load on production. We ended up getting unmatched files (especially in
> pg_clog) and not sure about "base" directory.
>
> Postgres is asking for WAL Archive files dated sometime around 15 days ago.
> We are absolutely not sure whats going on.
>
> Is this dangerous for production (like corruption) ? or just the backup
> will be invalid ? Please help us know if we have to perform any
> precautionary checks on the production cluster.
>
> Apart from firing a checkpoint, does "pg_start_backup()" updates any
> dictionary tables or views ? or it updates anything in "pg_xlog"
>
> Looking forward for your help !
>
> Thanks
> VB
>
>
>


[GENERAL] : PostgreSQL Online Backup

2011-09-26 Thread Venkat Balaji
Hello Everyone,

We have had situations where-in "rsync" was executed without executing
"pg_start_backup()" on the production data directory and on the next runs,
"pg_start_backup()" has been executed with "rsync". This was to avoid high
IO load on production. We ended up getting unmatched files (especially in
pg_clog) and not sure about "base" directory.

Postgres is asking for WAL Archive files dated sometime around 15 days ago.
We are absolutely not sure whats going on.

Is this dangerous for production (like corruption) ? or just the backup will
be invalid ? Please help us know if we have to perform any precautionary
checks on the production cluster.

Apart from firing a checkpoint, does "pg_start_backup()" updates any
dictionary tables or views ? or it updates anything in "pg_xlog"

Looking forward for your help !

Thanks
VB