On Wed, Feb 1, 2017 at 5:21 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> Ok. I see that there is a new compelling reason to move the ring >> buffer to the sender side: then I think lag tracking will work >> automatically for the new logical replication that just landed on >> master. I will try it that way. Thanks for the feedback! > > Seeing no new patches, marked as returned with feedback. Feel free of > course to refresh the CF entry once you have a new patch!
Here is a new version with the buffer on the sender side as requested. Since it now shows write, flush and replay lag, not just replay, I decide to rename it and start counting versions at 1 again. replication-lag-v1.patch is less than half the size of replay-lag-v16.patch and considerably simpler. There is no more GUC and no more protocol change. While the write and flush locations are sent back at the right times already, I had to figure out how to get replies to be sent at the right time when WAL was replayed too. Without doing anything special for that, you get the following cases: 1. A busy system: replies flow regularly due to write and flush feedback, and those replies include replay position, so there is no problem. 2. A system that has just streamed a lot of WAL causing the standby to fall behind in replaying, but the primary is now idle: there will only be replies every 10 seconds (wal_receiver_status_interval), so pg_stat_replication.replay_lag only updates with that frequency. (That was already the case for replay_location). 3. An idle system that has just replayed some WAL and is now fully caught up. There is no reply until the next wal_receiver_status_interval; so now replay_lag shows a bogus number over 10 seconds. Oops. Case 1 is good, and I suppose that 2 is OK, but I needed to do something about 3. The solution I came up with was to force one reply to be sent whenever recovery runs out of WAL to replay and enters WaitForWALToBecomeAvailable(). This seems to work pretty well in initial testing. Thoughts? -- Thomas Munro http://www.enterprisedb.com
replication-lag-v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers