Re: [HACKERS] streaming replication breaks horribly if master crashes

Magnus Hagander Wed, 16 Jun 2010 13:33:10 -0700

On Wed, Jun 16, 2010 at 22:26, Robert Haas <[email protected]> wrote:
>>> and this just
>>> makes it more likely.  After the most recent crash, the master thought
>>> pg_current_xlog_location() was 1/86CD4000; the slave thought
>>> pg_last_xlog_receive_location() was 1/8733C000.  After reconnecting to
>>> the master, the slave then thought that
>>> pg_last_xlog_receive_location() was 1/87000000.
>>
>> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
>> have actually prevented the slave from being corrupted.
>>
>> My question, though, is detecting out-of-sequence xlogs *enough*?  Are
>> there any crash conditions on the master which would cause the master to
>> reuse the same locations for different records, for example?  I don't
>> think so, but I'd like to be certain.
>
> The real problem here is that we're sending records to the slave which
> might cease to exist on the master if it unexpectedly reboots.  I
> believe that what we need to do is make sure that the master only
> sends WAL it has already fsync'd (Tom suggested on another thread that
> this might be necessary, and I think it's now clear that it is 100%
> necessary).  But I'm not sure how this will play with fsync=off - if
> we never fsync, then we can't ever really send any WAL without risking


Well, at this point we can just prevent streaming replication with
fsync=off if we can't think of an easy fix, and then design a "proper
fix" for 9.1. Given how late we are in the cycle.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication breaks horribly if master crashes

Reply via email to