Greg Smith <g...@2ndquadrant.com> wrote: > But if you need all that infrastructure just to get the feature > launched, that's a bit hard to stomach. Triggering a vacuum or some hypothetical "scrubbing" feature? > Also, as someone who follows Murphy's Law as my chosen religion, If you don't think I pay attention to Murphy's Law, I should recap our backup procedures -- which involves three separate forms of backup, each to multiple servers in different buildings, real-time, plus idle-time comparison of the databases of origin to all replicas with reporting of any discrepancies. And off-line "snapshot" backups on disk at a records center controlled by a different department. That's in addition to RAID redundancy and hardware health and performance monitoring. Some people think I border on the paranoid on this issue. > I would expect this situation could be exactly how flaky hardware > would first manifest itself: server crash and a bad CRC on the > last thing written out. And in that case, the last thing you want > to do is assume things are fine, then kick off a VACUUM that might > overwrite more good data with bad. Are you arguing that autovacuum should be disabled after crash recovery? I guess if you are arguing that a database VACUUM might destroy recoverable data when hardware starts to fail, I can't argue. And certainly there are way too many people who don't ensure that they have a good backup before firing up PostgreSQL after a failure, so I can see not making autovacuum more aggressive than usual, and perhaps even disabling it until there is some sort of confirmation (I have no idea how) that a backup has been made. That said, a database VACUUM would be one of my first steps after ensuring that I had a copy of the data directory tree, personally. I guess I could even live with that as recommended procedure rather than something triggered through autovacuum and not feel that the rest of my posts on this are too far off track. > The main way I expect to validate this sort of thing is with an as > yet unwritten function to grab information about a data block from > a standby server for this purpose, something like this: > > Master: Computed CRC A, Stored CRC B; error raised because A!=B > Standby: Computed CRC C, Stored CRC D > > If C==D && A==C, the corruption is probably overwritten bits of > the CRC B. Are you arguing we need *that* infrastructure to get the feature launched? -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers