Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra <tomas.von...@2ndquadrant.com> > wrote: > > > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote: > > >Hi, > > > > > >On 2019-03-28 21:09:22 +0100, Michael Banck wrote: > > >> I agree that the current patch might have some corner-cases where it > > >> does not guarantee 100% accuracy in online mode, but I hope the current > > >> version at least has no more false negatives. > > > > > >False positives are *bad*. We shouldn't integrate code that has them. > > > > > > > Yeah, I agree. I'm a bit puzzled by the reluctance to make the online mode > > communicate with the server, which would presumably address these issues. > > Can someone explain why not to do that? > > I agree that this effort seems better spent on fixing those issues there > (of which many are the same), and then re-use that.
This really seems like it depends on which of the options we're talking about.. Connecting to the server and asking what the current insert point is, so we can check that the LSN isn't completely insane, seems reasonable, but at least one option being discussed was to have pg_basebackup actually *lock the page* (even if just for I/O..) and then re-read it, and having an external tool doing that instead of the backend seems like a whole different level to me. That would involve having an SQL function for "lock this page against I/O" and then another for "unlock this page", wouldn't it? > > FWIW I've initially argued against that, believing that we can address > > those issues in some other way, and I'd love if that was possible. But > > considering we're still trying to make that work reliably I think the > > reasonable conclusion is that Andres was right communicating with the > > server is necessary. As part of a backup, you could check against the pages written out into the WAL as a cross-check and be able to be confident that at least everything which was backed up had been checked. That doesn't cover things like unlogged tables though. For my part, at least, adding additional checks around the LSN seems like a good solution (though we can't allow those checks to turn into false positives...) and would seriously reduce the risk that we have false negatives (we can *not* completely eliminate false negatives entirely.. we could possibly get to a point where at least we don't have any more false negatives than PG itself has but it looks like an awful lot of work and ends up adding its own risks...). As I've said before, I'd certainly support a background worker which performs ongoing checksum validation of pages and that would be able to use the same approach as what we do with pg_basebackup, but having an external tool locking pages seems really unlikely to be reasonable. Thanks! Stephen
signature.asc
Description: PGP signature