On Wed, Jun 12, 2013 at 6:41 AM, Amit Kapila <amit.kap...@huawei.com> wrote: > On Wednesday, June 12, 2013 4:23 AM Fujii Masao wrote: >> Hi, >> >> In streaming replication, when we shutdown the master, walsender tries >> to send all the outstanding WAL records including the shutdown >> checkpoint record to the standby, and then to exit. This basically >> means that all the WAL records are fully synced between two servers >> after the clean shutdown of the master. So, after promoting the standby >> to new master, we can restart the stopped master as new standby without >> the need for a fresh backup from new master. >> >> But there is one problem: though walsender tries to send all the >> outstanding WAL records, it doesn't wait for them to be replicated to >> the standby. IOW, walsender closes the replication connection as soon >> as it sends WAL records. >> Then, before receiving all the WAL records, walreceiver can detect the >> closure of connection and exit. We cannot guarantee that there is no >> missing WAL in the standby after clean shutdown of the master. In this >> case, backup from new master is required when restarting the stopped >> master as new standby. I have experienced this case several times, >> especially when enabling WAL archiving. >> >> The attached patch fixes this problem. It just changes walsender so >> that it waits for all the outstanding WAL records to be replicated to >> the standby before closing the replication connection. >> >> You may be concerned the case where the standby gets stuck and the >> walsender keeps waiting for the reply from that standby. In this case, >> wal_sender_timeout detects such inactive standby and then walsender >> ends. So even in that case, the shutdown can end. > > Do you think it can impact time to complete shutdown? > After completing shutdown, user will promote standby to master, so if there > is delay in shutdown, it can cause delay in switchover.
I'd expect a controlled switchover to happen without dataloss. Yes, this could make it take a bit longer time, but it guarantees you don't loose data. ISTM that if you don't care about the potential dataloss, you can just use a faster shutdown method (e.g. immediate) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers