On Wednesday, May 10, 2023 3:03 AM Andres Freund <and...@anarazel.de> wrote: > Hi, > > Unfortunately I have found the following commit to have caused a > performance > regression: > > commit e101dfac3a53c20bfbf1ca85d30a368c2954facf > Author: Andres Freund <and...@anarazel.de> > Date: 2023-04-08 00:24:24 -0700 > > For cascading replication, wake physical and logical walsenders separately > > Physical walsenders can't send data until it's been flushed; logical > walsenders can't decode and send data until it's been applied. On the > standby, the WAL is flushed first, which will only wake up physical > walsenders; and then applied, which will only wake up logical > walsenders. > > Previously, all walsenders were awakened when the WAL was flushed. That > was fine for logical walsenders on the primary; but on the standby the > flushed WAL would have been not applied yet, so logical walsenders were > awakened too early. > > Per idea from Jeff Davis and Amit Kapila. > > Author: "Drouvot, Bertrand" <bertranddrouvot...@gmail.com> > Reviewed-By: Jeff Davis <pg...@j-davis.com> > Reviewed-By: Robert Haas <robertmh...@gmail.com> > Reviewed-by: Amit Kapila <amit.kapil...@gmail.com> > Reviewed-by: Masahiko Sawada <sawada.m...@gmail.com> > Discussion: > https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkb > w7hqh...@mail.gmail.com > > The problem is that, on a standby, after the change - as needed to for the > approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens > for > every record, instead of only happening when the timeline is changed (or WAL > is flushed or ...). > > WalSndWakeup() iterates over all walsender slots, regardless of whether in > use. For each of the walsender slots it acquires a spinlock. > > When replaying a lot of small-ish WAL records I found the startup process to > spend the majority of the time in WalSndWakeup(). I've not measured it very > precisely yet, but the overhead is significant (~35% slowdown), even with the > default max_wal_senders. If that's increased substantially, it obviously gets > worse.
I did some simple tests for this to see the performance impact on the streaming replication, just share it here for reference. 1) sync primary-standby setup, load data on primary and count the time spent on replication. the degradation will be more obvious as the value of max_wal_senders increases. max_wal_senders before(ms) after(ms) degradation 100 13394.4013 14141.2615 5.58% 200 13280.8507 14597.1173 9.91% 300 13535.0232 16735.7379 23.65% 2) Similar as 1) but count the time that the standby startup process spent on replaying WAL(via gprof). 10 senders =========== before % cumulative self self total time seconds seconds calls s/call s/call name 4.12 0.45 0.11 1 0.11 2.46 PerformWalRecovery after % cumulative self self total time seconds seconds calls s/call s/call name 17.99 0.59 0.59 4027383 0.00 0.00 WalSndWakeup 8.23 0.86 0.27 1 0.27 3.11 PerformWalRecovery 100 senders =========== before % cumulative self self total time seconds seconds calls s/call s/call name 5.56 0.36 0.18 1 0.18 2.91 PerformWalRecovery after % cumulative self self total time seconds seconds calls s/call s/call name 64.65 4.39 4.39 4027383 0.00 0.00 WalSndWakeup 2.95 4.59 0.20 1 0.20 6.62 PerformWalRecovery Will test after applying the latest patch in this thread later. Best Regards, Hou zj