Hi all,

I have a problem with a corrupt backup, fortunately I was only testing so I did not loose any data. Unfortunetely what I did is to follow the backup guidelines in the documentation, which I thought should work reliably. Here are the details:

I am running a postgreSQL 8.4 database on a Debian Squeeze system. For Backups I am using the warm standby and "Incrementally Updated Backup" method as described in chapter 24.4 of the documentation. So my Setup is as follows:

Server 1 (Main): PostgreSQL 8.4 Database with archive_mode enabled shipping WAL files to a NFS drive. Size of database is about 370 GB and growing.

Server 2 (Replica): PostgreSQL 8.4 Database in recovery mode. Using pg_standby in recovery.conf and getting WAL files from Server 1 NFS drive.

All this works fine and runs without errors.

The replica is backed up once a week using rsync, a full backup runs about 10 hours, so I also keep at least 24h of WAL files to make sure I have a consistent backup.

The backup process also runs fine without errors, only the time (10h) it takes is quite long, so I decided to test the backup:

1) Restored the full backup to a test directory
(var/lib/postgresql/8.4/test)

2) Copied the configuration of the main server to
/etc/postgresql/8.4/test/
Altered port number, paths and turned off archive mode in postgresql.conf.

3) Added a recovery.conf in the test servers data dir. Recovering from my backed up WAL files:
restore_command = 'cp /var/postgresql-wal-test/%f "%p"'

4) Started the test server (pg_ctlcluster 8.4 test start)

5) Waited until recovery was done (everything worked fine until then)
2012-09-25 08:26:41 UTC LOG:  database system is ready to accept connections
2012-09-25 08:26:41 UTC LOG:  autovacuum launcher started

6) Connected via psql to the database and tried a \d to see my tables which did *not* work!
Here is the output:
2012-09-25 08:27:03 UTC ERROR: could not access status of transaction 500185903 2012-09-25 08:27:03 UTC DETAIL: Could not open file "pg_clog/01DD": No such file or directory.

Also trying to SELECT data from the database tables failed with the same error.
The backup is corrupt. So my question is, what went wrong:
Obviously as the rsync started it copied everything from the pg_clog (which at this point was until pg_clog/01DC) and then went on for another 10+ hours backing up all the rest of the database. At the time the backup ended, the database content changed but the newer clog files did not go into the backup. When restoring the backup and starting the server, the recovery process started at a point where pg_clog was at state 01DE or even further and thus the data from 01DD was missing.

So what I do from now, is an extra daily backup of my clog directory to make sure to have working backups. This is not documented in the postgreSQL documentation, and since the result in not doing so can be quite severe I think you should consider this in future PostgreSQL documentation versions.

Regards,
Juergen

Additional information:

The EXACT PostgreSQL version you are running:
PostgreSQL 8.4.13 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 32-bit

How you installed PostgreSQL

From Linux distro package management: Debian/Aptitude
If so, what repository?
deb http://ftp.at.debian.org/debian/ squeeze main non-free contrib
deb-src http://ftp.at.debian.org/debian/ squeeze main non-free contrib
deb http://security.debian.org/ squeeze/updates main contrib non-free
deb-src http://security.debian.org/ squeeze/updates main contrib non-free
deb http://ftp.debian.org/debian squeeze-updates main contrib non-free
deb-src http://ftp.debian.org/debian squeeze-updates main contrib non-free

Changes made to the settings in the postgresql.conf file:
name                  |  current_setting
----------------------+-------------------------------------------------
version | PostgreSQL 8.4.13 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 32-bit archive_command | cp -i %p /var/postgres-wal/%f </dev/null && cp -i %p /var/postgres-wal/bak/%f </dev/null && gzip /var/postgres-wal/bak/%f
 archive_mode         | on
 archive_timeout      | 0
 client_encoding      | utf8
 effective_cache_size | 1000MB
 lc_collate           | en_US.UTF-8
 lc_ctype             | en_US.UTF-8
 listen_addresses     | *
 log_line_prefix      | %t
 maintenance_work_mem | 256MB
 max_connections      | 100
 max_stack_depth      | 2MB
 password_encryption  | on
 port                 | 5432
 server_encoding      | UTF8
 shared_buffers       | 650MB
 ssl                  | on
 synchronous_commit   | off
 TimeZone             | UTC
 work_mem             | 40MB


Operating system and version
            Linux distro and version:
Debian 6.0.6 (squeeze)
            Kernel details:
Linux wegc203094 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686 GNU/Linux

What program you're using to connect to PostgreSQL:
psql and phpPgAdmin

--
| Juergen Fuchsberger
| Wegener Center for Climate and Global Change
| Karl-Franzens-University Graz
| Leechgasse 25, A-8010 Graz
| phone: +43-316-380-8438
|   fax: +43-316-380-9830
| eMail: juergen.fuchsber...@uni-graz.at
|   web: www.wegcenter.at


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to