On Tue, Sep 19, 2017 at 2:26 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Tue, Sep 19, 2017 at 7:50 AM, Xin Zhang <xzh...@pivotal.io> wrote: >> If primary crashed at that moment, and failover to standby, the foo table is >> lost, even though the replication is synced according to >> `pg_stat_replication` view. > > GUC parameters are reloaded each time a query is run, and so > SyncRepConfig is filled with the parsed data of SyncRepStandbyNames > once the parameter is reloaded for the process. Still, here, a commit > is waiting for a signal from a WAL sender that the wanted LSN has been > correctly flushed on a standby so this code path does not care about > the state of SyncRepConfig saved in the context of the process, we > want to know what the checkpointer thinks about it. Hence using WAL > sender data or sync_standbys_defined as a source of truth looks like a > correct concept to me, making the problem of this bug legit. >
I agree with the analysis and the approach of this patch. > The check with SyncRepRequested() still holds truth: max_wal_senders > needs a restart to be updated. Also, the other caller of > SyncStandbysDefined() requires SyncRepConfig to be set, so this caller > is fine. Yeah, after reloaded the config there might be some wal senders that don't reflect it yet but I think it cannot be a problem. > I have looked at your patch and tested it, but found no problems > associated with it. A backpatch would be required, so I have added an > entry in the next commit fest with status set to "ready for committer" > so as this bug does not fall into the cracks. Also I found no problems in the patch. > >> A separate question, is the `pg_stat_replication` view the reliable way to >> find when to failover to a standby, or there are some other ways to ensure >> the standby is in-sync with the primary? > > It shows at SQL level what is currently present in shared memory by > scanning all the WAL sender entries, so this report uses the same data > as the backend themselves, so that's a reliable source. In Postgres > 10, pg_stat_activity is also able to show to users what are the > backends waiting for a change to be flushed/applied on the standby > using the wait event called SyncRep. You could make some use of that > as well. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers