On Mon, Jan 10, 2022 at 04:25:27PM -0500, Tom Lane wrote: > Apropos of that, it's worth noting that wait_for_catchup *is* > dependent on up-to-date stats, and here's a recent run where > it sure looks like the timeout cause is AWOL stats collector: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2022-01-10%2004%3A51%3A34 > > I wonder if we should refactor wait_for_catchup to probe the > standby directly instead of relying on the upstream's view.
It would be nice. For logical replication tests, do we have a monitoring API independent of the stats collector? If not and we don't want to add one, a hacky alternative might be for wait_for_catchup to run a WAL-writing command every ~20s. That way, if the stats collector misses the datagram about the standby reaching a certain LSN, the stats collector would have more chances.
