On Wed, Jul 23, 2025 at 4:27 PM Nick Cleaton <n...@cleaton.net> wrote: > > On Fri, 18 Jul 2025 at 21:29, Jon Zeppieri <zeppi...@gmail.com> wrote: > > > > I just had a situation where physical replication fell far behind > > (hours). The write and flush lag times were 0, but replay_lag was > > high. The replica has hot_standby_feedback on, and both > > max_standby_streaming_delay and max_standby_archive_delay are set to > > 30s. > > > > What could cause a situation like this? If the network were a problem, > > I'd expect the other _lag times to be high. So it appears that the > > replica was getting the WAL but was unable to apply it. Are there > > situations where the replica cannot apply WAL other than the kinds of > > conflicts that would be addressed by the _delay settings? > > > > I checked pg_stat_database_conflicts, but there was nothing in it -- all > > zeros. > > This can happen when there are several busy writing processes on the > primary. The single replay process on the replica can't keep up with > the writes.
Thanks for the response, Nick. I'm curious why the situation you describe wouldn't also lead to the write_lag and flush_lag also being high. If the problem is simply keeping up with the primary, wouldn't you expect all three lag times to be elevated? - Jon