On Mon, 2012-11-19 at 18:30 +0100, Andres Freund wrote: > Yes, definitely.
OK. I suppose that makes sense for large writes. > > If that is not true, then I'm concerned about replicating corruption, or > > backing up corrupt blocks over good ones. How do we prevent that? It > > seems like a pretty major hole if we can't, because it means the only > > safe replication is streaming replication; a base-backup is essentially > > unsafe. And it means that even an online background checking utility > > would be quite hard to do properly. > > I am not sure I see the danger in the base backup case here? Why would > we have corrupted backup blocks? While postgres is running we won't see > such torn pages because its all done under proper locks... Yes, the blocks written *after* the checkpoint might have a bad checksum that will be fixed during recovery. But the blocks written *before* the checkpoint should have a valid checksum, but if they don't, then recovery doesn't know about them. So, we can't verify the checksums in the base backup because it's expected that some blocks will fail the check, and they can be fixed during recovery. That gives us no protection for blocks that were truly corrupted and written long before the last checkpoint. I suppose if we could somehow differentiate the blocks, that might work. Maybe look at the LSN and only validate blocks written before the checkpoint? But of course, that's a problem because a corrupt block might have the wrong LSN (in fact, it's likely, because garbage is more likely to make the LSN too high than too low). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers