I wrote: > The cause of this error is that RequestCheckpoint will give up and fail > after just 2 seconds, which evidently is not long enough on slow or > heavily loaded machines. Since there isn't any good reason why the > checkpointer wouldn't be running, I'm inclined to swing a large hammer > and kick this timeout up to 60 seconds. Thoughts?
So I had imagined this as about a 2-line patch, s/2/60/g and be done. Looking closer, though, there's other pre-existing problems in this code: 1. As it's currently coded, the requesting process can wait for up to 2 seconds for the checkpointer to start *even if the caller did not say CHECKPOINT_WAIT*. That seems a little bogus, and an unwanted 60-second wait would be a lot bogus. 2. If the timeout does elapse, and we didn't have the CHECKPOINT_WAIT flag, we just log the failure and return. When the checkpointer ultimately does start, it will have no idea that it should set to work right away on a checkpoint. (I wonder if this accounts for any other of the irreproducible buildfarm failures we get on slow machines. From the calling code's viewpoint, it'd seem like it was taking a darn long time to perform a successfully-requested checkpoint. Given that most checkpoint requests are non-WAIT, this seems not very nice.) After some thought I came up with the attached proposed patch. The basic idea here is that we record a checkpoint request by ensuring that the shared-memory ckpt_flags word is nonzero. (It's not clear to me that a valid request would always have at least one of the existing flag bits set, so I just added an extra always-set bit to guarantee this.) Then, whether the signal gets sent or not, there is a persistent record of the request in shmem, which the checkpointer will eventually notice. In the expected case where the problem is that the checkpointer hasn't started just yet, it will see the flag during its first main loop and begin a checkpoint right away. I took out the local checkpoint_requested flag altogether. A possible objection to this fix is that up to now, it's been possible to trigger a checkpoint just by sending SIGINT to the checkpointer process, without touching shmem at all. None of the core code depends on that, and since the checkpointer's PID is difficult to find out from "outside", it's hard to believe that anybody's got custom tooling that depends on it, but perhaps they do. I thought about keeping the checkpoint_requested flag to allow that to continue to work, but if we do so then we have a race condition: the checkpointer could see the shmem flag set and start a checkpoint, then receive the signal a moment later and believe that that represents a second, independent request requiring a second checkpoint. So I think we should just blow off that hypothetical possibility and do it like this. Comments? regards, tom lane
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 3d5b382..9d0d05a 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -153,7 +153,6 @@ double CheckPointCompletionTarget = 0.5; * Flags set by interrupt handlers for later service in the main loop. */ static volatile sig_atomic_t got_SIGHUP = false; -static volatile sig_atomic_t checkpoint_requested = false; static volatile sig_atomic_t shutdown_requested = false; /* @@ -370,12 +369,6 @@ CheckpointerMain(void) */ UpdateSharedMemoryConfig(); } - if (checkpoint_requested) - { - checkpoint_requested = false; - do_checkpoint = true; - BgWriterStats.m_requested_checkpoints++; - } if (shutdown_requested) { /* @@ -390,6 +383,17 @@ CheckpointerMain(void) } /* + * Detect a pending checkpoint request by checking whether the flags + * word in shared memory is nonzero. We shouldn't need to acquire the + * ckpt_lck for this. + */ + if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags) + { + do_checkpoint = true; + BgWriterStats.m_requested_checkpoints++; + } + + /* * Force a checkpoint if too much time has elapsed since the last one. * Note that we count a timed checkpoint in stats only when this * occurs without an external request, but we set the CAUSE_TIME flag @@ -630,17 +634,14 @@ CheckArchiveTimeout(void) static bool ImmediateCheckpointRequested(void) { - if (checkpoint_requested) - { - volatile CheckpointerShmemStruct *cps = CheckpointerShmem; + volatile CheckpointerShmemStruct *cps = CheckpointerShmem; - /* - * We don't need to acquire the ckpt_lck in this case because we're - * only looking at a single flag bit. - */ - if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) - return true; - } + /* + * We don't need to acquire the ckpt_lck in this case because we're only + * looking at a single flag bit. + */ + if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) + return true; return false; } @@ -843,7 +844,10 @@ ReqCheckpointHandler(SIGNAL_ARGS) { int save_errno = errno; - checkpoint_requested = true; + /* + * The signalling process should have set ckpt_flags nonzero, so all we + * need do is ensure that our main loop gets kicked out of any wait. + */ SetLatch(MyLatch); errno = save_errno; @@ -984,22 +988,25 @@ RequestCheckpoint(int flags) old_failed = CheckpointerShmem->ckpt_failed; old_started = CheckpointerShmem->ckpt_started; - CheckpointerShmem->ckpt_flags |= flags; + CheckpointerShmem->ckpt_flags |= (flags | CHECKPOINT_REQUESTED); SpinLockRelease(&CheckpointerShmem->ckpt_lck); /* * Send signal to request checkpoint. It's possible that the checkpointer * hasn't started yet, or is in process of restarting, so we will retry a - * few times if needed. Also, if not told to wait for the checkpoint to - * occur, we consider failure to send the signal to be nonfatal and merely - * LOG it. + * few times if needed. (Actually, more than a few times, since on slow + * or overloaded buildfarm machines, it's been observed that the + * checkpointer can take several seconds to start.) Also, if not told to + * wait for the checkpoint to occur, we consider failure to send the + * signal to be nonfatal and merely LOG it. */ +#define MAX_SIGNAL_TRIES 600 /* max wait 60.0 sec */ for (ntries = 0;; ntries++) { if (CheckpointerShmem->checkpointer_pid == 0) { - if (ntries >= 20) /* max wait 2.0 sec */ + if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT)) { elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG, "could not request checkpoint because checkpointer not running"); @@ -1008,7 +1015,7 @@ RequestCheckpoint(int flags) } else if (kill(CheckpointerShmem->checkpointer_pid, SIGINT) != 0) { - if (ntries >= 20) /* max wait 2.0 sec */ + if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT)) { elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG, "could not signal for checkpoint: %m"); diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 2f4e8f5..12ab9f1 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -218,6 +218,8 @@ extern bool XLOG_DEBUG; /* These indicate the cause of a checkpoint request */ #define CHECKPOINT_CAUSE_XLOG 0x0040 /* XLOG consumption */ #define CHECKPOINT_CAUSE_TIME 0x0080 /* Elapsed time */ +/* We set this to ensure that ckpt_flags is not 0 if a request has been made */ +#define CHECKPOINT_REQUESTED 0x0100 /* Checkpoint request has been made */ /* * Flag bits for the record being inserted, set using XLogSetRecordFlags().