On 04/14/2013 05:56 PM, Fujii Masao wrote:
On Fri, Apr 12, 2013 at 7:57 PM, Andres Freund <and...@2ndquadrant.com> wrote:
On 2013-04-12 02:29:01 +0900, Fujii Masao wrote:
On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing <ha...@2ndquadrant.com> wrote:
You just shut down the old master and let the standby catch
up (takas a few microseconds ;) ) before you promote it.
After this you can start up the former master with recovery.conf
and it will follow nicely.
No. When you shut down the old master, it might not have been
able to send all the WAL records to the standby. I have observed
this situation several times. So in your approach, new standby
might fail to catch up with the master nicely.
It seems most of this thread is focusing on the wrong thing then. If we
really are only talking about planned failover then we need to solve
*that* not some ominous "don't flush data too early" which has
noticeable performance and implementation complexity problems.
At least I'd like to talk about not only planned failover but also normal
failover.
I guess youre observing that not everything is replicated because youre
doing an immediate shutdown
No. I did fast shutdown.
At fast shutdown, after walsender sends the checkpoint record and
closes the replication connection, walreceiver can detect the close
of connection before receiving all WAL records. This means that,
even if walsender sends all WAL records, walreceiver cannot always
receive all of them.
Seems very much like a bug, or at least a missing mode -
synchronous shutdown - where the master will wait for ack from standby(s)
before closing client connection.
You could even teach the standby not to increment the timeline in that
case since thats safe.
I don't think this is required thanks to recent Heikki's great efforts about
timelines.
Regards,
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers