Hi Benoit,

Your SQL works fine as long as there are never any communication problems 
between the master and slave.  But if your slave loses communication with the 
master, then the SQL you provided will not report any lag despite the potential 
that the slave could be very far behind the master.

This is because that SQL is asking the slave how far it is behind the master 
based on logs it has received from the master. However if the slave has not 
received logs from the master it cannot tell if the master has simply gone idle 
or if it has lost communication and is unable to send updates.

By introducing a mechanism to manually force the replication stream to be sent 
periodically, which is what I use the NOTIFY command to do on the master, I can 
ensure that the slave is updating pg_last_xact_replay_timestamp at least as 
often as the NOTIFY is executed on the master (say every 30 seconds) and is 
successfully replayed on the slave.  This enables an accurate measurement of 
replication lag whatever the circumstances.

If the master has simply gone idle because there is no update activity, the 
NOTIFY command will still force the replication stream to replay on the slave 
every 30 seconds and the replay timestamp will be updated.

But if communication to the master is lost, for whatever reason, the replay 
timestamp will not be updated on the slave and then lag can be accurately 
measured and alerted.

I hope this helps!

--Rob

From: Benoit Lobréau <benoit.lobr...@gmail.com>
Date: Thursday, February 23, 2017 at 9:23 AM
To: "David G. Johnston" <david.g.johns...@gmail.com>
Cc: Rob Brucks <rob.bru...@rackspace.com>, Tom Lane <t...@sss.pgh.pa.us>, 
François Beaulieu <fr...@tzone.org>, Adrian Klaver <adrian.kla...@aklaver.com>, 
"pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] NOTIFY command impact

Hi,

I might have missed something.
This should be enough to solve the problem no ?

SELECT
CASE
WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0
 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER
END
AS replication_lag;

Benoit.

Reply via email to