Thanks for the feedback.

> The latest post on this thread contained a link to this one, and it
> made me want to rewind to this point in the discussion. Suppose we
> have the following alternative scenario:
>
> Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist
> yet.  The following directory scan ends up finding 12.ready.  Just
> before we update the PgArch state, XLogArchiveNotify() is called and
> creates 11.ready.  However, pg_readyXlog() has already decided to
> return WAL segment 12 and update the state to look for 13 next.
>
> Now, if I'm not mistaken, using <= doesn't help at all.
>
> In my opinion, the problem here is that the natural way to ask "is
> this file being archived out of order?" is to ask yourself "is the
> file that I'm marking as ready for archiving now the one that
> immediately follows the last one I marked as ready for archiving?" and
> then invert the result. That is, if I last marked 10 as ready, and now
> I'm marking 11 as ready, then it's in order, but if I'm now marking
> anything else whatsoever, then it's out of order. But that's not what
> this does. Instead of comparing what it's doing now to what it did
> last, it compares what it did now to what the archiver did last.

I agree that when we are creating a .ready file we should compare
the current .ready file with the last .ready file to check if this file is
created out of order. We can store the state of the last .ready file
in shared memory and compare it with the current .ready file. I
believe that archiver specific shared memory area can be used
to store the state of the last .ready file unless I am missing
something and this needs to be stored in a separate shared
memory area.

With this change, we have the flexibility to move the current archiver
state out of shared memory and keep it local to archiver. I have
incorporated these changes and updated a new patch.


> > And it's really not obvious that that's correct. I think that the
> > above argument actually demonstrates a flaw in the logic, but even if
> > not, or even if it's too small a flaw to be a problem in practice, it
> > seems a lot harder to reason about.
>
> I certainly agree that it's harder to reason about.  If we were to go
> the keep-trying-the-next-file route, we could probably minimize a lot
> of the handling for these rare cases by banking on the "fallback"
> directory scans.  Provided we believe these situations are extremely
> rare, some extra delay for an archive every once in a while might be
> acceptable.

+1. We are forcing a directory scan at the checkpoint and it will make sure
that any missing file gets archived within the checkpoint boundaries.

Please find the attached patch.

Thanks,
Dipesh
From f05b223b368e40594f0ed8440c0704fb7b970ee0 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit <dipesh.pan...@enterprisedb.com>
Date: Wed, 8 Sep 2021 21:47:16 +0530
Subject: [PATCH] keep trying the next file approach

---
 src/backend/access/transam/xlog.c        |  20 +++
 src/backend/access/transam/xlogarchive.c |  25 ++++
 src/backend/postmaster/pgarch.c          | 234 ++++++++++++++++++++++++++-----
 src/include/access/xlogdefs.h            |   2 +
 src/include/postmaster/pgarch.h          |   5 +
 5 files changed, 254 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a7..7dd4b96 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9285,6 +9285,16 @@ CreateCheckPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per checkpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
@@ -9650,6 +9660,16 @@ CreateRestartPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per restartpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 26b023e..7756b87 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -489,6 +489,31 @@ XLogArchiveNotify(const char *xlog)
 		return;
 	}
 
+	/* Force a directory scan if we are archiving anything but a regular
+	 * WAL file or if this WAL file is being created out-of-order.
+	 */
+	if (!IsXLogFileName(xlog))
+		PgArchForceDirScan();
+	else
+	{
+		TimeLineID tli;
+		XLogSegNo last_segno;
+		XLogSegNo this_segno;
+
+		last_segno = PgArchGetLastReadySegNo();
+		XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
+
+		/*
+		 * Force a directory scan in case if this .ready file created out of
+		 * order.
+		 */
+		last_segno++;
+		if (last_segno != this_segno)
+			PgArchForceDirScan();
+
+		PgArchSetLastReadySegNo(this_segno);
+	}
+
 	/* Notify archiver that it's got something to do */
 	if (IsUnderPostmaster)
 		PgArchWakeup();
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c..de6ceb0 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -47,6 +47,7 @@
 #include "storage/proc.h"
 #include "storage/procsignal.h"
 #include "storage/shmem.h"
+#include "storage/spin.h"
 #include "utils/guc.h"
 #include "utils/ps_status.h"
 
@@ -76,6 +77,20 @@
 typedef struct PgArchData
 {
 	int			pgprocno;		/* pgprocno of archiver process */
+
+	/*
+	 * Forces a directory scan in pgarch_readyXlog().  Protected by
+	 * arch_lck.
+	 */
+	bool		force_dir_scan;
+
+	/*
+	 * Segment number of last .ready file created by backend, protected by
+	 * arch_lck.
+	 */
+	XLogSegNo	lastReadySegNo;
+
+	slock_t		arch_lck;
 } PgArchData;
 
 
@@ -87,6 +102,16 @@ static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
 
 /*
+ * Current archiver state, the segment number and timeline ID corresponding to
+ * last WAL file found by archiver.
+ */
+typedef struct ArchXLogState
+{
+	XLogSegNo	lastSegNo;
+	TimeLineID	lastTli;
+} ArchXLogState;
+
+/*
  * Flags set by interrupt handlers for later service in the main loop.
  */
 static volatile sig_atomic_t ready_to_stop = false;
@@ -97,12 +122,13 @@ static volatile sig_atomic_t ready_to_stop = false;
  */
 static void pgarch_waken_stop(SIGNAL_ARGS);
 static void pgarch_MainLoop(void);
-static void pgarch_ArchiverCopyLoop(void);
+static void pgarch_ArchiverCopyLoop(ArchXLogState *xlogState);
 static bool pgarch_archiveXlog(char *xlog);
-static bool pgarch_readyXlog(char *xlog);
+static bool pgarch_readyXlog(char *xlog, ArchXLogState *xlogState);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
+static bool higher_arch_priority(const char *a, const char *b);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -129,6 +155,8 @@ PgArchShmemInit(void)
 		/* First time through, so initialize */
 		MemSet(PgArch, 0, PgArchShmemSize());
 		PgArch->pgprocno = INVALID_PGPROCNO;
+		PgArch->lastReadySegNo = MaxXLogSegNo;
+		SpinLockInit(&PgArch->arch_lck);
 	}
 }
 
@@ -245,6 +273,11 @@ pgarch_MainLoop(void)
 {
 	pg_time_t	last_copy_time = 0;
 	bool		time_to_stop;
+	ArchXLogState	xlogState;
+
+	/* Initialize the current state of archiver */
+	xlogState.lastSegNo = MaxXLogSegNo;
+	xlogState.lastTli = MaxTimeLineID;
 
 	/*
 	 * There shouldn't be anything for the archiver to do except to wait for a
@@ -280,7 +313,7 @@ pgarch_MainLoop(void)
 		}
 
 		/* Do what we're here for */
-		pgarch_ArchiverCopyLoop();
+		pgarch_ArchiverCopyLoop(&xlogState);
 		last_copy_time = time(NULL);
 
 		/*
@@ -321,7 +354,7 @@ pgarch_MainLoop(void)
  * Archives all outstanding xlogs then returns
  */
 static void
-pgarch_ArchiverCopyLoop(void)
+pgarch_ArchiverCopyLoop(ArchXLogState *xlogState)
 {
 	char		xlog[MAX_XFN_CHARS + 1];
 
@@ -331,7 +364,7 @@ pgarch_ArchiverCopyLoop(void)
 	 * some backend will add files onto the list of those that need archiving
 	 * while we are still copying earlier archives
 	 */
-	while (pgarch_readyXlog(xlog))
+	while (pgarch_readyXlog(xlog, xlogState))
 	{
 		int			failures = 0;
 		int			failures_orphan = 0;
@@ -432,6 +465,13 @@ pgarch_ArchiverCopyLoop(void)
 					ereport(WARNING,
 							(errmsg("archiving write-ahead log file \"%s\" failed too many times, will try again later",
 									xlog)));
+					/*
+					 * Failed to archive, make sure that archiver performs a
+					 * full directory scan in the next cycle to avoid missing
+					 * the WAL file which could not be archived due to some
+					 * failure in current cycle.
+					 */
+					PgArchForceDirScan();
 					return;		/* give up archiving for now */
 				}
 				pg_usleep(1000000L);	/* wait a bit before retrying */
@@ -596,30 +636,91 @@ pgarch_archiveXlog(char *xlog)
  * larger ID; the net result being that past timelines are given higher
  * priority for archiving.  This seems okay, or at least not obviously worth
  * changing.
+ *
+ * WAL files are generated in a specific order of log segment number. The
+ * directory scan for each WAL file can be minimised by identifying the next
+ * WAL file in the sequence. This can be achieved by maintaining log segment
+ * number and timeline ID corresponding to WAL file currently being archived.
+ * The log segment number of current WAL file can be incremented by '1' to
+ * point to the next WAL file in a sequence. Full directory scan can be avoided
+ * by checking the availability of next WAL file. "xlogState" specifies the
+ * segment number and timeline ID corresponding to the most recent WAL file
+ * successfully archived.
+ *
+ * However, a full directory scan is performed in some special cases where it
+ * requires us to archive files which takes precedence over the next anticipated
+ * log segment. For example, history file takes precedence over archiving WAL
+ * files on older timeline or an older WAL file which is being left out because
+ * corresponding .ready file is created out of order or archiving a backup
+ * history file created during backup.
+ *
+ * Returns "true" if a segment is ready for archival, "xlog" represents the
+ * name of the segment.
  */
 static bool
-pgarch_readyXlog(char *xlog)
+pgarch_readyXlog(char *xlog, ArchXLogState *xlogState)
 {
-	/*
-	 * open xlog status directory and read through list of xlogs that have the
-	 * .ready suffix, looking for earliest file. It is possible to optimise
-	 * this code, though only a single file is expected on the vast majority
-	 * of calls, so....
-	 */
+	char		basename[MAX_XFN_CHARS + 1];
 	char		XLogArchiveStatusDir[MAXPGPATH];
 	DIR		   *rldir;
 	struct dirent *rlde;
 	bool		found = false;
-	bool		historyFound = false;
+	bool		force_dir_scan;
+	TimeLineID	lastTli;
+	XLogSegNo	lastSegNo;
+
+	/* Obtain current archiver state and reset force_dir_scan. */
+	SpinLockAcquire(&PgArch->arch_lck);
+
+	/*
+	 * Scan archive status directory if it is directed by shared memory flag or
+	 * until we find the first log segment.
+	 */
+	force_dir_scan = PgArch->force_dir_scan ||
+					 xlogState->lastSegNo == MaxXLogSegNo;
+	PgArch->force_dir_scan = false;
+
+	SpinLockRelease(&PgArch->arch_lck);
+
+	lastSegNo = xlogState->lastSegNo;
+	lastTli = xlogState->lastTli;
+
+	/* Try to skip the directory scan if possible. */
+	if (!force_dir_scan)
+	{
+		struct stat	st;
+		char		readyfile[MAXPGPATH];
+
+		Assert(lastTli != MaxTimeLineID);
+		Assert(lastSegNo != MaxXLogSegNo);
+
+		lastSegNo++;
+		XLogFileName(basename, lastTli, lastSegNo, wal_segment_size);
 
+		StatusFilePath(readyfile, basename, ".ready");
+		if (stat(readyfile, &st) == 0)
+		{
+			strcpy(xlog, basename);
+			xlogState->lastSegNo++;
+
+			return true;
+		}
+		else if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not stat \"%s\": %m", readyfile)));
+	}
+
+	/*
+	 * Open the archive status directory and read through the list of files
+	 * with the .ready suffix, looking for the earliest file.
+	 */
 	snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
 	rldir = AllocateDir(XLogArchiveStatusDir);
 
 	while ((rlde = ReadDir(rldir, XLogArchiveStatusDir)) != NULL)
 	{
 		int			basenamelen = (int) strlen(rlde->d_name) - 6;
-		char		basename[MAX_XFN_CHARS + 1];
-		bool		ishistory;
 
 		/* Ignore entries with unexpected number of characters */
 		if (basenamelen < MIN_XFN_CHARS ||
@@ -638,35 +739,69 @@ pgarch_readyXlog(char *xlog)
 		memcpy(basename, rlde->d_name, basenamelen);
 		basename[basenamelen] = '\0';
 
-		/* Is this a history file? */
-		ishistory = IsTLHistoryFileName(basename);
-
-		/*
-		 * Consume the file to archive.  History files have the highest
-		 * priority.  If this is the first file or the first history file
-		 * ever, copy it.  In the presence of a history file already chosen as
-		 * target, ignore all other files except history files which have been
-		 * generated for an older timeline than what is already chosen as
-		 * target to archive.
-		 */
-		if (!found || (ishistory && !historyFound))
+		if (!found || higher_arch_priority(basename, xlog))
 		{
 			strcpy(xlog, basename);
 			found = true;
-			historyFound = ishistory;
 		}
-		else if (ishistory || !historyFound)
+	}
+	FreeDir(rldir);
+
+	if (found)
+	{
+		if (IsXLogFileName(xlog))
+			XLogFromFileName(xlog, &xlogState->lastTli, &xlogState->lastSegNo,
+					wal_segment_size);
+		else
 		{
-			if (strcmp(basename, xlog) < 0)
-				strcpy(xlog, basename);
+			/* Continue directory scan until we find a regular WAL file */
+			SpinLockAcquire(&PgArch->arch_lck);
+			PgArch->force_dir_scan = true;
+			SpinLockRelease(&PgArch->arch_lck);
 		}
+
+		ereport(DEBUG3,
+				errmsg("directory scan to archive write ahead log file \"%s\"", xlog));
 	}
-	FreeDir(rldir);
 
 	return found;
 }
 
 /*
+ * higher_arch_priority
+ *
+ * Compares archival priority of the two file names.  If "a" has a higher
+ * priority than "b", true is returned.  If "b" has a higher priority than
+ * "a" false is returned.
+ */
+static bool
+higher_arch_priority(const char *a, const char *b)
+{
+	bool a_ishistory = IsTLHistoryFileName(a);
+	bool b_ishistory = IsTLHistoryFileName(b);
+	bool a_isbackuphistory = IsBackupHistoryFileName(a);
+	bool b_isbackuphistory = IsBackupHistoryFileName(b);
+
+	/*
+	 * Timeline history files have a higher priority than everything else.
+	 * Backup history files are given the second highest priority so that
+	 * the archiver picks them up when a directory scan is forced.
+	 */
+	if (a_ishistory || b_ishistory)
+	{
+		if (a_ishistory != b_ishistory)
+			return a_ishistory;
+	}
+	else if (a_isbackuphistory || b_isbackuphistory)
+	{
+		if (a_isbackuphistory != b_isbackuphistory)
+			return a_isbackuphistory;
+	}
+
+	return (strcmp(a, b) < 0);
+}
+
+/*
  * pgarch_archiveDone
  *
  * Emit notification that an xlog file has been successfully archived.
@@ -716,3 +851,38 @@ HandlePgArchInterrupts(void)
 		ProcessConfigFile(PGC_SIGHUP);
 	}
 }
+
+/*
+ * PgArchForceDirScan
+ *
+ * When called, the next call to pgarch_readyXlog() will perform a
+ * directory scan.  This is useful for ensuring that imporant files such
+ * as timeline history files are archived as quickly as possible.
+ */
+void
+PgArchForceDirScan(void)
+{
+	SpinLockAcquire(&PgArch->arch_lck);
+	PgArch->force_dir_scan = true;
+	SpinLockRelease(&PgArch->arch_lck);
+}
+
+XLogSegNo
+PgArchGetLastReadySegNo(void)
+{
+	XLogSegNo ret;
+
+	SpinLockAcquire(&PgArch->arch_lck);
+	ret = PgArch->lastReadySegNo;
+	SpinLockRelease(&PgArch->arch_lck);
+
+	return ret;
+}
+
+void
+PgArchSetLastReadySegNo(XLogSegNo segno)
+{
+	SpinLockAcquire(&PgArch->arch_lck);
+	PgArch->lastReadySegNo = segno;
+	SpinLockRelease(&PgArch->arch_lck);
+}
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index 60348d1..6122891 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -46,6 +46,7 @@ typedef uint64 XLogRecPtr;
  * XLogSegNo - physical log file sequence number.
  */
 typedef uint64 XLogSegNo;
+#define MaxXLogSegNo    ((XLogSegNo) 0xFFFFFFFFFFFFFFFF)
 
 /*
  * TimeLineID (TLI) - identifies different database histories to prevent
@@ -57,6 +58,7 @@ typedef uint64 XLogSegNo;
  * sequence that was generated in the previous incarnation.
  */
 typedef uint32 TimeLineID;
+#define MaxTimeLineID	((TimeLineID) 0xFFFFFFFF)
 
 /*
  * Replication origin id - this is located in this file to avoid having to
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 1e47a14..ffda0de 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -13,6 +13,8 @@
 #ifndef _PGARCH_H
 #define _PGARCH_H
 
+#include "access/xlogdefs.h"
+
 /* ----------
  * Archiver control info.
  *
@@ -31,5 +33,8 @@ extern void PgArchShmemInit(void);
 extern bool PgArchCanRestart(void);
 extern void PgArchiverMain(void) pg_attribute_noreturn();
 extern void PgArchWakeup(void);
+extern void PgArchForceDirScan(void);
+extern XLogSegNo PgArchGetLastReadySegNo(void);
+extern void PgArchSetLastReadySegNo(XLogSegNo segno);
 
 #endif							/* _PGARCH_H */
-- 
1.8.3.1

Reply via email to