On Tue, Nov 15, 2022 at 12:14 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > On Tue, Nov 15, 2022 at 09:42:26AM +1300, Thomas Munro wrote: > > That works for 020_pg_receivewal. I wonder if there are also tests > > that stream a bit of WAL first and then do wait_for_catchup that were > > previously benefiting from the 100ms-after-startup message by > > scheduling luck (as in, that was usually enough for replay)? I might > > go and teach Cluster.pm to log how much time is wasted in > > wait_for_catchup to get some observability, and then try to figure out > > how to optimise it properly. We could perhaps put the 100ms duct tape > > back temporarily though, if necessary. > > Oh, I see. Since we don't check the apply position when determining > whether to send a reply, tests may need to wait a full > wal_receiver_status_interval. FWIW with my patch, the runtime of the > src/test/recovery tests seems to be back to what it was on my machine, but > I certainly wouldn't rule out scheduling luck.
Yeah, what you have looks good in my experiments. Here are some statistics on waits on my system. Times in seconds, first row is before the new time logic went in, your patch is the last row. name | count | sum | avg | stddev | min | max ---------------+-------+---------+-------+--------+-------+-------- before | 718 | 37.375 | 0.052 | 0.135 | 0.006 | 1.110 master | 718 | 173.100 | 0.241 | 1.374 | 0.006 | 10.004 initial-100ms | 718 | 37.169 | 0.052 | 0.126 | 0.006 | 0.676 initial-0ms | 718 | 35.276 | 0.049 | 0.123 | 0.006 | 0.661 The difference on master is explained by these 14 outliers: name | time ------------------------------------------+-------- testrun/recovery/019_replslot_limit | 10.004 testrun/recovery/033_replay_tsp_drops | 9.917 testrun/recovery/033_replay_tsp_drops | 9.957 testrun/pg_basebackup/020_pg_receivewal | 9.899 testrun/pg_rewind/003_extrafiles | 9.961 testrun/pg_rewind/003_extrafiles | 9.916 testrun/pg_rewind/008_min_recovery_point | 9.929 recovery/019_replslot_limit | 10.004 recovery/033_replay_tsp_drops | 9.917 recovery/033_replay_tsp_drops | 9.957 pg_basebackup/020_pg_receivewal | 9.899 pg_rewind/003_extrafiles | 9.961 pg_rewind/003_extrafiles | 9.916 pg_rewind/008_min_recovery_point | 9.929 (14 rows) Now that I can see what is going on I'm going to try to see how much I can squeeze out of this...