On Wed, Jun 16, 2010 at 22:26, Robert Haas <robertmh...@gmail.com> wrote: >>> and this just >>> makes it more likely. After the most recent crash, the master thought >>> pg_current_xlog_location() was 1/86CD4000; the slave thought >>> pg_last_xlog_receive_location() was 1/8733C000. After reconnecting to >>> the master, the slave then thought that >>> pg_last_xlog_receive_location() was 1/87000000. >> >> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would >> have actually prevented the slave from being corrupted. >> >> My question, though, is detecting out-of-sequence xlogs *enough*? Are >> there any crash conditions on the master which would cause the master to >> reuse the same locations for different records, for example? I don't >> think so, but I'd like to be certain. > > The real problem here is that we're sending records to the slave which > might cease to exist on the master if it unexpectedly reboots. I > believe that what we need to do is make sure that the master only > sends WAL it has already fsync'd (Tom suggested on another thread that > this might be necessary, and I think it's now clear that it is 100% > necessary). But I'm not sure how this will play with fsync=off - if > we never fsync, then we can't ever really send any WAL without risking
Well, at this point we can just prevent streaming replication with fsync=off if we can't think of an easy fix, and then design a "proper fix" for 9.1. Given how late we are in the cycle. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers