On 2020-Apr-08, Kyotaro Horiguchi wrote:

> I understand how it happens.
> 
> The latch triggered by checkpoint request by CHECKPOINT command has
> been absorbed by ConditionVariableSleep() in
> InvalidateObsoleteReplicationSlots.  The attached allows checkpointer
> use MyLatch for other than checkpoint request while a checkpoint is
> running.

Hmm, that explanation makes sense, but I couldn't reproduce it with the
steps you provided.  Perhaps I'm missing something.

Anyway I think this patch should fix it also -- instead of adding a new
flag, we just rely on the existing flags (since do_checkpoint must have
been set correctly from the flags earlier in that block.)

I think it'd be worth to verify this bugfix in a new test.  Would you
have time to produce that?  I could try in a couple of days ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 511c22043846c7453cea8b00bf911705417609eb Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Mon, 27 Apr 2020 19:35:15 -0400
Subject: [PATCH] Don't freeze on checkpoints

---
 src/backend/postmaster/checkpointer.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e354a78725..5cf5e9fe08 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -494,6 +494,13 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/*
+		 * Don't sleep if our latch was set for reasons other than a
+		 * checkpoint request.
+		 */
+		if (!do_checkpoint)
+			continue;
+
 		/*
 		 * Sleep until we are signaled or it's time for another checkpoint or
 		 * xlog file switch.
-- 
2.20.1

Reply via email to