Re: [HACKERS] Inconsistent DB data in Streaming Replication

Andres Freund Fri, 12 Apr 2013 03:57:58 -0700

On 2013-04-12 02:29:01 +0900, Fujii Masao wrote:
> On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing <[email protected]> wrote:
> >
> > You just shut down the old master and let the standby catch
> > up (takas a few microseconds ;) ) before you promote it.
> >
> > After this you can start up the former master with recovery.conf
> > and it will follow nicely.
>
> No. When you shut down the old master, it might not have been
> able to send all the WAL records to the standby. I have observed
> this situation several times. So in your approach, new standby
> might fail to catch up with the master nicely.


It seems most of this thread is focusing on the wrong thing then. If we
really are only talking about planned failover then we need to solve
*that* not some ominous "don't flush data too early" which has
noticeable performance and implementation complexity problems.

I guess youre observing that not everything is replicated because youre
doing an immediate shutdown - probably because performing the shutdown
checkpoint would take too long. This seems solveable by implementing a
recovery connection command which initiates a shutdown that just
disables future WAL inserts and returns the last lsn that has been
written. Then you can fall over as soon as that llsn has been reached
and can make the previous master follow from there on without problems.

You could even teach the standby not to increment the timeline in that
case since thats safe.

The biggest issue seems to be how to implement this without another
spinlock acquisition for every XLogInsert(), but that seems possible.

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

Reply via email to