Hi,

Yesterday we (that's me and my colleague Ricardo Gomez) were working on
an issue where a monitoring script was returning increasing lag
information on a primary instead of a NULL value.

The query used involved the following functions (the function was
amended to work-around the issue I'm reporting here):

pg_last_wal_receive_lsn()
pg_last_wal_replay_lsn()
pg_last_xact_replay_timestamp()

Under normal circumstances we would expect to receive NULLs from all
three functions on a primary node, and code comments back up my thoughts.

The problem is, what if the node is a standby which was promoted without
restarting, or that had to perform crash recovery?

So during the time it's recovering the values in ` XLogCtl` are updated
with recovery information, and once the recovery finishes, due to crash
recovery reaching a consistent state, or a promotion of a standby
happening, those values are not reset to startup defaults.

That's when you start seeing non-null values returned by
`pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.

Now, I don't know if we should call this a bug, or an undocumented
anomaly. We could fix the bug by resetting the values from ` XLogCtl`
after finishing recovery, or document that we might see non-NULL values
in certain cases.

Regards,

-- 
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Reply via email to