On Mon, Nov 14, 2022 at 12:11 PM Thomas Munro <[email protected]> wrote:
> On Mon, Nov 14, 2022 at 11:26 AM Nathan Bossart
> <[email protected]> wrote:
> > On Sun, Nov 13, 2022 at 05:08:04PM -0500, Tom Lane wrote:
> > > There is something very seriously wrong with this patch.
> > >
> > > On my machine, running "make -j10 check-world" (with compilation
> > > already done) has been taking right about 2 minutes for some time.
> > > Since this patch, it's taking around 2:45 --- I did a bisect run
> > > to confirm that this patch is where it changed.
> >
> > I've been looking into this. I wrote a similar patch for logical/worker.c
> > before noticing that check-world was taking much longer. The problem in
> > that case seems to be that process_syncing_tables() isn't called as often.
> > It wouldn't surprise me if there's also something in walreceiver.c that
> > depends upon the frequent wakeups. I suspect this will require a revert.
>
> In the case of "meson test pg_basebackup/020_pg_receivewal" it looks
> like wait_for_catchup hangs around for 10 seconds waiting for HS
> feedback. I'm wondering if we need to go back to high frequency
> wakeups until it's caught up, or (probably better) arrange for a
> proper event for progress. Digging...
Maybe there is a better way to code this (I mean, who likes global
variables?) and I need to test some more, but I suspect the attached
is approximately what we missed.
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 8bd2ba37dd..fed2cc6e6f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1080,6 +1080,9 @@ XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli)
recvFile = -1;
}
+static XLogRecPtr writePtr = 0;
+static XLogRecPtr flushPtr = 0;
+
/*
* Send reply message to primary, indicating our current WAL locations, oldest
* xmin and the current time.
@@ -1096,8 +1099,6 @@ XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli)
static void
XLogWalRcvSendReply(bool force, bool requestReply)
{
- static XLogRecPtr writePtr = 0;
- static XLogRecPtr flushPtr = 0;
XLogRecPtr applyPtr;
TimestampTz now;
@@ -1334,6 +1335,9 @@ WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now)
case WALRCV_WAKEUP_REPLY:
if (wal_receiver_status_interval <= 0)
wakeup[reason] = PG_INT64_MAX;
+ else if (writePtr != LogstreamResult.Write ||
+ flushPtr != LogstreamResult.Flush)
+ wakeup[reason] = now + 100000; /* frequent replies, not yet caught up */
else
wakeup[reason] = now + wal_receiver_status_interval * INT64CONST(1000000);
break;