Re: Unduly short fuse in RequestCheckpoint

Tom Lane Sun, 17 Mar 2019 12:42:07 -0700

I wrote:
> The cause of this error is that RequestCheckpoint will give up and fail
> after just 2 seconds, which evidently is not long enough on slow or
> heavily loaded machines.  Since there isn't any good reason why the
> checkpointer wouldn't be running, I'm inclined to swing a large hammer
> and kick this timeout up to 60 seconds.  Thoughts?


So I had imagined this as about a 2-line patch, s/2/60/g and be done.
Looking closer, though, there's other pre-existing problems in this code:

1. As it's currently coded, the requesting process can wait for up to 2
seconds for the checkpointer to start *even if the caller did not say
CHECKPOINT_WAIT*.  That seems a little bogus, and an unwanted 60-second
wait would be a lot bogus.

2. If the timeout does elapse, and we didn't have the CHECKPOINT_WAIT
flag, we just log the failure and return.  When the checkpointer
ultimately does start, it will have no idea that it should set to work
right away on a checkpoint.  (I wonder if this accounts for any other
of the irreproducible buildfarm failures we get on slow machines.  From
the calling code's viewpoint, it'd seem like it was taking a darn long
time to perform a successfully-requested checkpoint.  Given that most
checkpoint requests are non-WAIT, this seems not very nice.)

After some thought I came up with the attached proposed patch.  The
basic idea here is that we record a checkpoint request by ensuring
that the shared-memory ckpt_flags word is nonzero.  (It's not clear
to me that a valid request would always have at least one of the
existing flag bits set, so I just added an extra always-set bit to
guarantee this.)  Then, whether the signal gets sent or not, there is
a persistent record of the request in shmem, which the checkpointer
will eventually notice.  In the expected case where the problem is
that the checkpointer hasn't started just yet, it will see the flag
during its first main loop and begin a checkpoint right away.
I took out the local checkpoint_requested flag altogether.

A possible objection to this fix is that up to now, it's been possible
to trigger a checkpoint just by sending SIGINT to the checkpointer
process, without touching shmem at all.  None of the core code depends
on that, and since the checkpointer's PID is difficult to find out
from "outside", it's hard to believe that anybody's got custom tooling
that depends on it, but perhaps they do.  I thought about keeping the
checkpoint_requested flag to allow that to continue to work, but if
we do so then we have a race condition: the checkpointer could see the
shmem flag set and start a checkpoint, then receive the signal a moment
later and believe that that represents a second, independent request
requiring a second checkpoint.  So I think we should just blow off that
hypothetical possibility and do it like this.

Comments?

                        regards, tom lane

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3d5b382..9d0d05a 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -153,7 +153,6 @@ double		CheckPointCompletionTarget = 0.5;
  * Flags set by interrupt handlers for later service in the main loop.
  */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t checkpoint_requested = false;
 static volatile sig_atomic_t shutdown_requested = false;
 
 /*
@@ -370,12 +369,6 @@ CheckpointerMain(void)
 			 */
 			UpdateSharedMemoryConfig();
 		}
-		if (checkpoint_requested)
-		{
-			checkpoint_requested = false;
-			do_checkpoint = true;
-			BgWriterStats.m_requested_checkpoints++;
-		}
 		if (shutdown_requested)
 		{
 			/*
@@ -390,6 +383,17 @@ CheckpointerMain(void)
 		}
 
 		/*
+		 * Detect a pending checkpoint request by checking whether the flags
+		 * word in shared memory is nonzero.  We shouldn't need to acquire the
+		 * ckpt_lck for this.
+		 */
+		if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
+		{
+			do_checkpoint = true;
+			BgWriterStats.m_requested_checkpoints++;
+		}
+
+		/*
 		 * Force a checkpoint if too much time has elapsed since the last one.
 		 * Note that we count a timed checkpoint in stats only when this
 		 * occurs without an external request, but we set the CAUSE_TIME flag
@@ -630,17 +634,14 @@ CheckArchiveTimeout(void)
 static bool
 ImmediateCheckpointRequested(void)
 {
-	if (checkpoint_requested)
-	{
-		volatile CheckpointerShmemStruct *cps = CheckpointerShmem;
+	volatile CheckpointerShmemStruct *cps = CheckpointerShmem;
 
-		/*
-		 * We don't need to acquire the ckpt_lck in this case because we're
-		 * only looking at a single flag bit.
-		 */
-		if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
-			return true;
-	}
+	/*
+	 * We don't need to acquire the ckpt_lck in this case because we're only
+	 * looking at a single flag bit.
+	 */
+	if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
+		return true;
 	return false;
 }
 
@@ -843,7 +844,10 @@ ReqCheckpointHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
-	checkpoint_requested = true;
+	/*
+	 * The signalling process should have set ckpt_flags nonzero, so all we
+	 * need do is ensure that our main loop gets kicked out of any wait.
+	 */
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -984,22 +988,25 @@ RequestCheckpoint(int flags)
 
 	old_failed = CheckpointerShmem->ckpt_failed;
 	old_started = CheckpointerShmem->ckpt_started;
-	CheckpointerShmem->ckpt_flags |= flags;
+	CheckpointerShmem->ckpt_flags |= (flags | CHECKPOINT_REQUESTED);
 
 	SpinLockRelease(&CheckpointerShmem->ckpt_lck);
 
 	/*
 	 * Send signal to request checkpoint.  It's possible that the checkpointer
 	 * hasn't started yet, or is in process of restarting, so we will retry a
-	 * few times if needed.  Also, if not told to wait for the checkpoint to
-	 * occur, we consider failure to send the signal to be nonfatal and merely
-	 * LOG it.
+	 * few times if needed.  (Actually, more than a few times, since on slow
+	 * or overloaded buildfarm machines, it's been observed that the
+	 * checkpointer can take several seconds to start.)  Also, if not told to
+	 * wait for the checkpoint to occur, we consider failure to send the
+	 * signal to be nonfatal and merely LOG it.
 	 */
+#define MAX_SIGNAL_TRIES 600	/* max wait 60.0 sec */
 	for (ntries = 0;; ntries++)
 	{
 		if (CheckpointerShmem->checkpointer_pid == 0)
 		{
-			if (ntries >= 20)	/* max wait 2.0 sec */
+			if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT))
 			{
 				elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG,
 					 "could not request checkpoint because checkpointer not running");
@@ -1008,7 +1015,7 @@ RequestCheckpoint(int flags)
 		}
 		else if (kill(CheckpointerShmem->checkpointer_pid, SIGINT) != 0)
 		{
-			if (ntries >= 20)	/* max wait 2.0 sec */
+			if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT))
 			{
 				elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG,
 					 "could not signal for checkpoint: %m");
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 2f4e8f5..12ab9f1 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -218,6 +218,8 @@ extern bool XLOG_DEBUG;
 /* These indicate the cause of a checkpoint request */
 #define CHECKPOINT_CAUSE_XLOG	0x0040	/* XLOG consumption */
 #define CHECKPOINT_CAUSE_TIME	0x0080	/* Elapsed time */
+/* We set this to ensure that ckpt_flags is not 0 if a request has been made */
+#define CHECKPOINT_REQUESTED	0x0100	/* Checkpoint request has been made */
 
 /*
  * Flag bits for the record being inserted, set using XLogSetRecordFlags().

Re: Unduly short fuse in RequestCheckpoint

Reply via email to