On a nightly basis, we shut the database down and do a file system backup.
A short chronology of our database problem:
8/21 - count(*) of user tables succeeded (done once a week to get
statistics)
8/23 - A specific search on a specific value (one out of over 2 million)
caused postmaster to SEGV. I dropped the index in question and rebuilt it.
All appeared ok.
8/28 - count(*) failed - postmaster aborted on a SEGV. I attempted a
pg_dumpall. Postmaster aborted with SEGV about 240 MB into the dump (the
database is about 50 GB).
I reloaded 8/25 database and attempted a pg_dumpall. Same failure.
I reloaded from the 8/21 file system dump. Pg_dumpall worked on this one.
I spent the next almost 48 hours dumping and reloading 240 million rows and
reprocessing the several million additions since 8/21. I didn't dare use
the 8/21 database without reloading because I didn't know if it was good or
not.
Question:
How can we tell that a database is intact? In the above example, pg_dumpall
worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse? Pg_dumpall tells you nothing about
the condition of indexes. Could a corrupt index corrupt data blocks?
I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database. When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.
Back to the original question... How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?
Wes
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match