RE: walsender performance regression due to logical decoding on standby changes

Zhijie Hou (Fujitsu) Thu, 11 May 2023 02:42:59 -0700

On Wednesday, May 10, 2023 3:03 AM Andres Freund <[email protected]> wrote:
> Hi,
> 
> Unfortunately I have found the following commit to have caused a
> performance
> regression:
> 
> commit e101dfac3a53c20bfbf1ca85d30a368c2954facf
> Author: Andres Freund <[email protected]>
> Date:   2023-04-08 00:24:24 -0700
> 
>     For cascading replication, wake physical and logical walsenders separately
> 
>     Physical walsenders can't send data until it's been flushed; logical
>     walsenders can't decode and send data until it's been applied. On the
>     standby, the WAL is flushed first, which will only wake up physical
>     walsenders; and then applied, which will only wake up logical
>     walsenders.
> 
>     Previously, all walsenders were awakened when the WAL was flushed. That
>     was fine for logical walsenders on the primary; but on the standby the
>     flushed WAL would have been not applied yet, so logical walsenders were
>     awakened too early.
> 
>     Per idea from Jeff Davis and Amit Kapila.
> 
>     Author: "Drouvot, Bertrand" <[email protected]>
>     Reviewed-By: Jeff Davis <[email protected]>
>     Reviewed-By: Robert Haas <[email protected]>
>     Reviewed-by: Amit Kapila <[email protected]>
>     Reviewed-by: Masahiko Sawada <[email protected]>
>     Discussion:
> https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkb
> [email protected]
> 
> The problem is that, on a standby, after the change - as needed to for the
> approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens
> for
> every record, instead of only happening when the timeline is changed (or WAL
> is flushed or ...).
> 
> WalSndWakeup() iterates over all walsender slots, regardless of whether in
> use. For each of the walsender slots it acquires a spinlock.
> 
> When replaying a lot of small-ish WAL records I found the startup process to
> spend the majority of the time in WalSndWakeup(). I've not measured it very
> precisely yet, but the overhead is significant (~35% slowdown), even with the
> default max_wal_senders. If that's increased substantially, it obviously gets
> worse.


I did some simple tests for this to see the performance impact on
the streaming replication, just share it here for reference.

1) sync primary-standby setup, load data on primary and count the time spent on
  replication. the degradation will be more obvious as the value of 
max_wal_senders
  increases.

max_wal_senders    before(ms)    after(ms)    degradation
100    13394.4013    14141.2615    5.58%
200    13280.8507    14597.1173    9.91%
300    13535.0232    16735.7379    23.65%

2) Similar as 1) but count the time that the standby startup process spent on
   replaying WAL(via gprof).

10 senders
===========
before
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
  4.12      0.45     0.11        1     0.11     2.46  PerformWalRecovery

after
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 17.99      0.59     0.59  4027383     0.00     0.00  WalSndWakeup
  8.23      0.86     0.27        1     0.27     3.11  PerformWalRecovery

100 senders
===========
before
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
  5.56      0.36     0.18        1     0.18     2.91  PerformWalRecovery

after
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 64.65      4.39     4.39  4027383     0.00     0.00  WalSndWakeup
  2.95      4.59     0.20        1     0.20     6.62  PerformWalRecovery

Will test after applying the latest patch in this thread later.

Best Regards,
Hou zj

RE: walsender performance regression due to logical decoding on standby changes

Reply via email to