Thanks Craig ! Below is what i did -
1. pg_start_backup() 2. rsync the data dir 3. pg_stop_backup() I believe the backup is valid because, i was able to bring up the cluster without any issues (ofcourse with data loss). +ve signs- I am able to bring up the cluster with the Online backup, but, only with the loss of data. -ve signs and things to be strongly foreseen while backup testing - - pg_clog files were not synced. I suspect they were being written at the time of backup. I might have tried to sync the data dir when pg_clog files were half filled. - Though the WAL Archives are there, Postgres is not trying to recover beyond the timestamp at which pg_clog was missing. - Even if i replace the missing pg_clog files (which i did), Postgres is asking for the corresponding wal archive files Yes. What i learnt is that we need to ensure that all the pg_clog files must be fully copied as on the backup time. We cannot afford to miss any of them. Thanks Venkat On Wed, Aug 31, 2011 at 5:46 AM, Craig Ringer <ring...@ringerc.id.au> wrote: > On 30/08/2011 6:59 PM, Venkat Balaji wrote: > >> Hello Everyone, >> >> I have a situation here - >> >> I am trying to restore the production online backup and recover the same. >> >> - I had initially rsynced (excluded pg_log) the data directory and the >> tarred and zipped the same >> > > Did you do that after pg_start_backup() or on a stopped database server? > > If you did it on a running database server without first running > pg_start_backup(), your backup is invalid. > > Personally I like to take my base backups from an LVM snapshot of the > datadir just to be extra safe. That isn't necessary, though, and a regular > rsync or tar or whatever of a datadir after pg_start_backup() is fine. > > Remember to run pg_stop_backup() afterwards. > > > - I got an error "unable to read <filename> from pg_clog location" >> (file size is around 160K) >> > > ... from PostgreSQL, when you tried to start it? > > What emitted that error message? > > > What i understood is that, rsync some how missed out on syncing the >> files in "pg_clog" so, i had manually coped the missing pg_clog file >> from production and tried recovery. >> > > That won't work. You need a consistent snapshot of all the files in the > data dir. You cannot just mix and match copies taken at different times. > > For efficiency reasons PostgreSQL will recycle used clog files. You can't > just copy a file over and hope that because it has the same name, it still > contains the data you want. > > Your backup *failed* at the point where you got an incomplete copy of the > data directory. > > > Do i need to get that particular wal archive which is before online >> backup time ? >> > > No, you need to get the missing clog files. If you cannot do that, try > using pg_resetxlog, but be aware that that may lose transactions and can > potentially cause corruption of tables and indexes. > > > By this experience what i understand is that Postgresql stores committed >> and uncommited transactions in pg_xlog / wal archive files and >> information (not the transaction data) about transaction commit status >> is stored in pg_clog. Am I correct ? >> > > That sounds right to me, but I don't know as much about how Pg stores > things as I should. > > > I am in the process of designing a disaster recovery planner for our >> productions systems. >> > > Congratulations! > > Be extremely glad this didn't happen in a real recovery scenario. This is a > marvellous example of why you should always test your backups - you actually > did, and found a problem that would've been a critical issue if the backup > were actually needed. > > -- > Craig Ringer >