On Tue, Dec 04, 2018 at 08:40:40PM +0000, Bossart, Nathan wrote: > Thanks for the updated patch! The code looks good to me, the patch > applies cleanly and builds without warnings, and it seems to work well > in my manual tests. I just have a few wording suggestions.
How are you testing this? I just stop the server and manually touch some fake status files in archive_status :) > I would phrase this comment this way: > > Since archive_status files are not durably removed, a system > crash could leave behind .ready files for WAL segments that > have already been recycled or removed. In this case, simply > remove the orphan status file and move on. Fine for me. Thanks! > + ereport(WARNING, > + (errmsg("removed orphan archive status file %s", > + xlogready))); > > I think we should put quotes around the file name like we do elsewhere > in pgarch_ArchiverCopyLoop(). Done. > + ereport(WARNING, > + (errmsg("failed removal > of \"%s\" too many times, will try again later", > + > xlogready))); > > I'd suggest mirroring the log statement for failed archiving commands > and saying something like, "removing orphan archive status file \"%s\" > failed too many times, will try again later." IMO that makes it > clearer what is failing and why we are removing it in the first place. "removal of" is more consistent here I think, so changed this way with your wording merged. -- Michael
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 844b9d1b0e..90817c8598 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -28,6 +28,7 @@ #include <fcntl.h> #include <signal.h> #include <time.h> +#include <sys/stat.h> #include <sys/time.h> #include <sys/wait.h> #include <unistd.h> @@ -60,6 +61,7 @@ * failed archiver; in seconds. */ #define NUM_ARCHIVE_RETRIES 3 +#define NUM_STATUS_CLEANUP_RETRIES 3 /* ---------- @@ -424,9 +426,13 @@ pgarch_ArchiverCopyLoop(void) while (pgarch_readyXlog(xlog)) { int failures = 0; + int failures_unlink = 0; for (;;) { + struct stat stat_buf; + char pathname[MAXPGPATH]; + /* * Do not initiate any more archive commands after receiving * SIGTERM, nor after the postmaster has died unexpectedly. The @@ -456,6 +462,43 @@ pgarch_ArchiverCopyLoop(void) return; } + /* + * Since archive status files are not durably removed, a system + * crash could leave behind .ready files for WAL segments that + * have already been recycled or removed. In this case, simply + * remove the orphan status file and move on. + */ + snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog); + if (stat(pathname, &stat_buf) != 0 && errno == ENOENT) + { + char xlogready[MAXPGPATH]; + + StatusFilePath(xlogready, xlog, ".ready"); + if (durable_unlink(xlogready, WARNING) == 0) + { + ereport(WARNING, + (errmsg("removed orphan archive status file \"%s\"", + xlogready))); + + /* leave loop and move to the next status file */ + break; + } + + if (++failures_unlink >= NUM_STATUS_CLEANUP_RETRIES) + { + ereport(WARNING, + (errmsg("removal of orphan archive status file \"%s\" failed too many times, will try again later", + xlogready))); + + /* give up cleanup of orphan status files */ + return; + } + + /* wait a bit before retrying */ + pg_usleep(1000000L); + continue; + } + if (pgarch_archiveXlog(xlog)) { /* successful */
signature.asc
Description: PGP signature