On 2020-Apr-08, Kyotaro Horiguchi wrote: > I understand how it happens. > > The latch triggered by checkpoint request by CHECKPOINT command has > been absorbed by ConditionVariableSleep() in > InvalidateObsoleteReplicationSlots. The attached allows checkpointer > use MyLatch for other than checkpoint request while a checkpoint is > running.
Hmm, that explanation makes sense, but I couldn't reproduce it with the steps you provided. Perhaps I'm missing something. Anyway I think this patch should fix it also -- instead of adding a new flag, we just rely on the existing flags (since do_checkpoint must have been set correctly from the flags earlier in that block.) I think it'd be worth to verify this bugfix in a new test. Would you have time to produce that? I could try in a couple of days ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 511c22043846c7453cea8b00bf911705417609eb Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Mon, 27 Apr 2020 19:35:15 -0400 Subject: [PATCH] Don't freeze on checkpoints --- src/backend/postmaster/checkpointer.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index e354a78725..5cf5e9fe08 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -494,6 +494,13 @@ CheckpointerMain(void) */ pgstat_send_bgwriter(); + /* + * Don't sleep if our latch was set for reasons other than a + * checkpoint request. + */ + if (!do_checkpoint) + continue; + /* * Sleep until we are signaled or it's time for another checkpoint or * xlog file switch. -- 2.20.1