Hi,

On 5/10/23 8:36 AM, Bharath Rupireddy wrote:
On Wed, May 10, 2023 at 12:33 AM Andres Freund <and...@anarazel.de> wrote:

Unfortunately I have found the following commit to have caused a performance
regression:

commit e101dfac3a53c20bfbf1ca85d30a368c2954facf

The problem is that, on a standby, after the change - as needed to for the
approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens for
every record, instead of only happening when the timeline is changed (or WAL
is flushed or ...).

WalSndWakeup() iterates over all walsender slots, regardless of whether in
use. For each of the walsender slots it acquires a spinlock.

When replaying a lot of small-ish WAL records I found the startup process to
spend the majority of the time in WalSndWakeup(). I've not measured it very
precisely yet, but the overhead is significant (~35% slowdown), even with the
default max_wal_senders. If that's increased substantially, it obviously gets
worse.


Thanks Andres for the call out! I do agree that this is a concern.

The only saving grace is that this is not an issue on the primary.

Yeah.

+1


My current guess is that mis-using a condition variable is the best bet. I
think it should work to use ConditionVariablePrepareToSleep() before a
WalSndWait(), and then ConditionVariableCancelSleep(). I.e. to never use
ConditionVariableSleep(). The latch set from ConditionVariableBroadcast()
would still cause the necessary wakeup.

Yeah, I think that "mis-using" a condition variable is a valid option. Unless 
I'm missing
something, I don't think there is anything wrong with using a CV that way (aka 
not using
ConditionVariableTimedSleep() or ConditionVariableSleep() in this particular 
case).


How about something like the attached? Recovery and subscription tests
don't complain with the patch.

Thanks Bharath for looking at it!

I launched a full Cirrus CI test with it but it failed on one environment (did 
not look in details,
just sharing this here): https://cirrus-ci.com/task/6570140767092736

Also I have a few comments:

@@ -1958,7 +1959,7 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
         * ------
         */
        if (AllowCascadeReplication())
-               WalSndWakeup(switchedTLI, true);
+               ConditionVariableBroadcast(&WalSndCtl->cv);

I think the comment above this change needs to be updated.


@@ -3368,9 +3370,13 @@ WalSndWait(uint32 socket_events, long timeout, uint32 
wait_event)
        WaitEvent       event;

        ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, socket_events, NULL);
+
+       ConditionVariablePrepareToSleep(&WalSndCtl->cv);
        if (WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event) == 1 
&&
                (event.events & WL_POSTMASTER_DEATH))
                proc_exit(1);
+
+       ConditionVariableCancelSleep();

May be worth to update the comment above WalSndWait() too? (to explain why a CV 
handling is part of it).


@@ -108,6 +109,8 @@ typedef struct
         */
        bool            sync_standbys_defined;

+       ConditionVariable cv;

Worth to give it a more meaning full name? (to give a clue as when it is used)


I think we also need to update the comment above WalSndWakeup():

"
 * For cascading replication we need to wake up physical walsenders separately
 * from logical walsenders (see the comment before calling WalSndWakeup() in
 * ApplyWalRecord() for more details).
"

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to