On Sat, Nov 4, 2023 at 6:13 AM Andres Freund <and...@anarazel.de> wrote:
>
> > +     cur_lsn = GetFlushRecPtr(NULL);
> > +     if (unlikely(startptr > cur_lsn))
> > +             elog(ERROR, "WAL start LSN %X/%X specified for reading from 
> > WAL buffers must be less than current database system WAL LSN %X/%X",
> > +                      LSN_FORMAT_ARGS(startptr), LSN_FORMAT_ARGS(cur_lsn));
>
> Hm, why does this check belong here? For some tools it might be legitimate to
> read the WAL before it was fully flushed.

Agreed and removed the check.

> > +     /*
> > +      * Holding WALBufMappingLock ensures inserters don't overwrite this 
> > value
> > +      * while we are reading it. We try to acquire it in shared mode so 
> > that
> > +      * the concurrent WAL readers are also allowed. We try to do as less 
> > work
> > +      * as possible while holding the lock to not impact concurrent WAL 
> > writers
> > +      * much. We quickly exit to not cause any contention, if the lock 
> > isn't
> > +      * immediately available.
> > +      */
> > +     if (!LWLockConditionalAcquire(WALBufMappingLock, LW_SHARED))
> > +             return 0;
>
> That seems problematic - that lock is often heavily contended.  We could
> instead check IsWALRecordAvailableInXLogBuffers() once before reading the
> page, then read the page contents *without* holding a lock, and then check
> IsWALRecordAvailableInXLogBuffers() again - if the page was replaced in the
> interim we read bogus data, but that's a bit of a wasted effort.

In the new approach described upthread here
https://www.postgresql.org/message-id/c3455ab9da42e09ca9d059879b5c512b2d1f9681.camel%40j-davis.com,
there's no lock required for reading from WAL buffers. PSA patches for
more details.

> > +             /*
> > +              * The fact that we acquire WALBufMappingLock while reading 
> > the WAL
> > +              * buffer page itself guarantees that no one else initializes 
> > it or
> > +              * makes it ready for next use in AdvanceXLInsertBuffer(). 
> > However, we
> > +              * need to ensure that we are not reading a page that just got
> > +              * initialized. For this, we look at the needed page header.
> > +              */
> > +             phdr = (XLogPageHeader) page;
> > +
> > +             /* Return, if WAL buffer page doesn't look valid. */
> > +             if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC &&
> > +                       phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) &&
> > +                       phdr->xlp_tli == tli))
> > +                     break;
>
> I don't think this code should ever encounter a page where this is not the
> case?  We particularly shouldn't do so silently, seems that could hide all
> kinds of problems.

I think it's possible to read a "just got initialized" page with the
new approach to read WAL buffer pages without WALBufMappingLock if the
page is read right after it is initialized and xlblocks is filled in
AdvanceXLInsertBuffer() but before actual WAL is written.

> > +             /*
> > +              * Note that we don't perform all page header checks here to 
> > avoid
> > +              * extra work in production builds; callers will anyway do 
> > those
> > +              * checks extensively. However, in an assert-enabled build, 
> > we perform
> > +              * all the checks here and raise an error if failed.
> > +              */
>
> Why?

Minimal page header checks are performed to ensure we don't read the
page that just got initialized unlike what
XLogReaderValidatePageHeader(). Are you suggesting to remove page
header checks with XLogReaderValidatePageHeader() for assert-enabled
builds? Or are you suggesting to do page header checks with
XLogReaderValidatePageHeader() for production builds too?

PSA v16 patch set. Note that 0004 patch adds support for WAL read
stats (both from WAL file and WAL buffers) to walsenders and may not
necessarily the best approach to capture WAL read stats in light of
https://www.postgresql.org/message-id/CALj2ACU_f5_c8F%2BxyNR4HURjG%3DJziiz07wCpQc%3DAqAJUFh7%2B8w%40mail.gmail.com
which adds WAL read/write/fsync stats to pg_stat_io.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b678edfccf7ecf490cb792391249cbf85ba0db29 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 7 Nov 2023 19:20:00 +0000
Subject: [PATCH v16] Use 64-bit atomics for xlblocks array elements

In AdvanceXLInsertBuffer(), xlblocks value of a WAL buffer page is
updated only at the end after the page is initialized with all
zeros. A problem with this approach is that anyone reading
xlblocks and WAL buffer page without holding WALBufMappingLock
will see the wrong page contents if the read happens before the
xlblocks is marked with a new entry in AdvanceXLInsertBuffer() at
the end.

To fix this issue, xlblocks is made to use 64-bit atomics instead
of XLogRecPtr and the xlblocks value is marked with
InvalidXLogRecPtr just before the page initialization begins. Once
the page initialization finishes, only then the actual value of
the newly initialized page is marked in xlblocks. A write barrier
is placed in between xlblocks update with InvalidXLogRecPtr and
the page initialization to not cause any memory ordering problems.

With this fix, one can read xlblocks and WAL buffer page without
WALBufMappingLock in the following manner:

  endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);

  /* Requested WAL isn't available in WAL buffers. */
  if (expectedEndPtr != endptr)
      break;

  page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
  data = page + ptr % XLOG_BLCKSZ;
  ...
  pg_read_barrier();
  ...
  memcpy(buf, data, bytes_to_read);
  ...
  pg_read_barrier();

  /* Recheck if the page still exists in WAL buffers. */
  endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);

  /* Return if the page got initalized while we were reading it */
  if (expectedEndPtr != endptr)
      break;
---
 src/backend/access/transam/xlog.c | 55 ++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b541be8eec..5fe4f101e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -501,7 +501,7 @@ typedef struct XLogCtlData
 	 * WALBufMappingLock.
 	 */
 	char	   *pages;			/* buffers for unwritten XLOG pages */
-	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + XLOG_BLCKSZ */
+	pg_atomic_uint64 *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */
 	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
 
 	/*
@@ -1634,20 +1634,19 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
 	 * out to disk and evicted, and the caller is responsible for making sure
 	 * that doesn't happen.
 	 *
-	 * However, we don't hold a lock while we read the value. If someone has
-	 * just initialized the page, it's possible that we get a "torn read" of
-	 * the XLogRecPtr if 64-bit fetches are not atomic on this platform. In
-	 * that case we will see a bogus value. That's ok, we'll grab the mapping
-	 * lock (in AdvanceXLInsertBuffer) and retry if we see anything else than
-	 * the page we're looking for. But it means that when we do this unlocked
-	 * read, we might see a value that appears to be ahead of the page we're
-	 * looking for. Don't PANIC on that, until we've verified the value while
-	 * holding the lock.
+	 * However, we don't hold a lock while we read the value. If someone is
+	 * just about to initialize or has just initialized the page, it's
+	 * possible that we get InvalidXLogRecPtr. That's ok, we'll grab the
+	 * mapping lock (in AdvanceXLInsertBuffer) and retry if we see anything
+	 * else than the page we're looking for. But it means that when we do this
+	 * unlocked read, we might see a value that appears to be ahead of the
+	 * page we're looking for. Don't PANIC on that, until we've verified the
+	 * value while holding the lock.
 	 */
 	expectedEndPtr = ptr;
 	expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
 
-	endptr = XLogCtl->xlblocks[idx];
+	endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
 	if (expectedEndPtr != endptr)
 	{
 		XLogRecPtr	initializedUpto;
@@ -1678,7 +1677,7 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
 		WALInsertLockUpdateInsertingAt(initializedUpto);
 
 		AdvanceXLInsertBuffer(ptr, tli, false);
-		endptr = XLogCtl->xlblocks[idx];
+		endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
 
 		if (expectedEndPtr != endptr)
 			elog(PANIC, "could not find WAL buffer for %X/%X",
@@ -1865,7 +1864,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 		 * be zero if the buffer hasn't been used yet).  Fall through if it's
 		 * already written out.
 		 */
-		OldPageRqstPtr = XLogCtl->xlblocks[nextidx];
+		OldPageRqstPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]);
 		if (LogwrtResult.Write < OldPageRqstPtr)
 		{
 			/*
@@ -1934,6 +1933,20 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 
 		NewPage = (XLogPageHeader) (XLogCtl->pages + nextidx * (Size) XLOG_BLCKSZ);
 
+		/*
+		 * Make sure to mark the xlblocks with InvalidXLogRecPtr before the
+		 * initialization of the page begins so that others, reading xlblocks
+		 * without holding a lock, will know that the page initialization has
+		 * just begun.
+		 */
+		pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
+
+		/*
+		 * A write barrier here helps to not reorder the above xlblocks atomic
+		 * write with below page initialization.
+		 */
+		pg_write_barrier();
+
 		/*
 		 * Be sure to re-zero the buffer so that bytes beyond what we've
 		 * written will look like zeroes and not valid XLOG records...
@@ -1987,8 +2000,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 		 */
 		pg_write_barrier();
 
-		*((volatile XLogRecPtr *) &XLogCtl->xlblocks[nextidx]) = NewPageEndPtr;
-
+		pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr);
 		XLogCtl->InitializedUpTo = NewPageEndPtr;
 
 		npages++;
@@ -2187,7 +2199,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 		 * if we're passed a bogus WriteRqst.Write that is past the end of the
 		 * last page that's been initialized by AdvanceXLInsertBuffer.
 		 */
-		XLogRecPtr	EndPtr = XLogCtl->xlblocks[curridx];
+		XLogRecPtr	EndPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[curridx]);
 
 		if (LogwrtResult.Write >= EndPtr)
 			elog(PANIC, "xlog write request %X/%X is past end of log %X/%X",
@@ -4675,10 +4687,13 @@ XLOGShmemInit(void)
 	 * needed here.
 	 */
 	allocptr = ((char *) XLogCtl) + sizeof(XLogCtlData);
-	XLogCtl->xlblocks = (XLogRecPtr *) allocptr;
-	memset(XLogCtl->xlblocks, 0, sizeof(XLogRecPtr) * XLOGbuffers);
-	allocptr += sizeof(XLogRecPtr) * XLOGbuffers;
+	XLogCtl->xlblocks = (pg_atomic_uint64 *) allocptr;
+	allocptr += sizeof(pg_atomic_uint64) * XLOGbuffers;
 
+	for (i = 0; i < XLOGbuffers; i++)
+	{
+		pg_atomic_init_u64(&XLogCtl->xlblocks[i], InvalidXLogRecPtr);
+	}
 
 	/* WAL insertion locks. Ensure they're aligned to the full padded size */
 	allocptr += sizeof(WALInsertLockPadded) -
@@ -5715,7 +5730,7 @@ StartupXLOG(void)
 		memcpy(page, endOfRecoveryInfo->lastPage, len);
 		memset(page + len, 0, XLOG_BLCKSZ - len);
 
-		XLogCtl->xlblocks[firstIdx] = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
+		pg_atomic_write_u64(&XLogCtl->xlblocks[firstIdx], endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ);
 		XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
 	}
 	else
-- 
2.34.1

From 3f7c2ec6ed281c827105d46b12276cfb54da8441 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 7 Nov 2023 21:02:35 +0000
Subject: [PATCH v16] Allow WAL reading from WAL buffers

This commit adds WALRead() the capability to read WAL from WAL
buffers when possible. When requested WAL isn't available in WAL
buffers, the WAL is read from the WAL file as usual.

This commit benefits the callers of WALRead(), that are walsenders
and pg_walinspect. They can now avoid reading WAL from the WAL
file (possibly avoiding disk IO). Tests show that the WAL buffers
hit ratio stood at 95% for 1 primary, 1 sync standby, 1 async
standby, with pgbench --scale=300 --client=32 --time=900. In other
words, the walsenders avoided 95% of the time reading from the
file/avoided pread system calls:
https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com

This commit also benefits when direct IO is enabled for WAL.
Reading WAL from WAL buffers puts back the performance close to
that of without direct IO for WAL:
https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40mail.gmail.com

This commit paves the way for the following features in future:
- Improves synchronous replication performance by replicating
directly from WAL buffers.
- A opt-in way for the walreceivers to receive unflushed WAL.
More details here:
https://www.postgresql.org/message-id/20231011224353.cl7c2s222dw3de4j%40awork3.anarazel.de

Author: Bharath Rupireddy
Reviewed-by: Dilip Kumar, Andres Freund
Reviewed-by: Nathan Bossart, Kuntal Ghosh
Discussion: https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c       | 167 ++++++++++++++++++++++++
 src/backend/access/transam/xlogreader.c |  41 +++++-
 src/include/access/xlog.h               |   6 +
 3 files changed, 212 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5fe4f101e8..2c1ddf235f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1705,6 +1705,173 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
 	return cachedPos + ptr % XLOG_BLCKSZ;
 }
 
+/*
+ * Read WAL from WAL buffers.
+ *
+ * Read 'count' bytes of WAL from WAL buffers into 'buf', starting at location
+ * 'startptr', on timeline 'tli' and return total read bytes.
+ *
+ * This function returns quickly in the following cases:
+ * - When passed-in timeline is different than server's current insertion
+ * timeline as WAL is always inserted into WAL buffers on insertion timeline.
+ *
+ * - When server is in recovery as WAL buffers aren't currently used in
+ * recovery.
+ *
+ * Note that this function reads as much as it can from WAL buffers, meaning,
+ * it may not read all the requested 'count' bytes. Caller must be aware of
+ * this and deal with it.
+ *
+ * Note that function reads WAL from WAL buffers without holding any lock.
+ * First it reads xlblocks atomically for checking page existence, then it
+ * reads the page contents, validates. Finally, it rechecks the page existence
+ * by rereading xlblocks, if the read page is replaced, it discards read page
+ * and returns.
+ *
+ * Note that this function is not available for frontend code as WAL buffers is
+ * an internal mechanism to the server.
+ */
+Size
+XLogReadFromBuffers(XLogReaderState *state,
+					XLogRecPtr startptr,
+					TimeLineID tli,
+					Size count,
+					char *buf)
+{
+	XLogRecPtr	ptr;
+	Size		nbytes;
+	Size		ntotal;
+	char	   *dst;
+
+	if (RecoveryInProgress())
+		return 0;
+
+	if (tli != GetWALInsertionTimeLine())
+		return 0;
+
+	Assert(!XLogRecPtrIsInvalid(startptr));
+
+	ptr = startptr;
+	nbytes = count;				/* Total bytes requested to be read by caller. */
+	ntotal = 0;					/* Total bytes read. */
+	dst = buf;
+
+	while (nbytes > 0)
+	{
+		XLogRecPtr	expectedEndPtr;
+		XLogRecPtr	endptr;
+		int			idx;
+		char	   *page;
+		char	   *data;
+		XLogPageHeader phdr;
+		Size		nread;
+
+		idx = XLogRecPtrToBufIdx(ptr);
+		expectedEndPtr = ptr;
+		expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
+		endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
+
+		/* Requested WAL isn't available in WAL buffers. */
+		if (expectedEndPtr != endptr)
+			break;
+
+		/*
+		 * We found WAL buffer page containing given XLogRecPtr. Get starting
+		 * address of the page and a pointer to the right location of given
+		 * XLogRecPtr in that page.
+		 */
+		page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
+		data = page + ptr % XLOG_BLCKSZ;
+
+		/* Make sure to not read a page that just got initialized. */
+		phdr = (XLogPageHeader) page;
+		if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC &&
+			  phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) &&
+			  phdr->xlp_tli == tli))
+			break;
+
+		/*
+		 * Note that we don't perform all page header checks here to avoid
+		 * extra work in production builds; callers will anyway do those
+		 * checks extensively. However, in an assert-enabled build, we perform
+		 * all the checks here and raise an error if failed.
+		 */
+#ifdef USE_ASSERT_CHECKING
+		if (unlikely(state != NULL &&
+					 !XLogReaderValidatePageHeader(state, (endptr - XLOG_BLCKSZ),
+												   (char *) phdr)))
+		{
+			if (state->errormsg_buf[0])
+				ereport(ERROR,
+						(errcode(ERRCODE_INTERNAL_ERROR),
+						 errmsg_internal("%s", state->errormsg_buf)));
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INTERNAL_ERROR),
+						 errmsg_internal("could not read WAL from WAL buffers")));
+		}
+#endif
+
+		/*
+		 * Make sure we don't read xlblocks up above before the page contents
+		 * down below.
+		 */
+		pg_read_barrier();
+
+		nread = 0;
+
+		/* Read what is wanted, not the whole page. */
+		if ((data + nbytes) <= (page + XLOG_BLCKSZ))
+		{
+			/* All the bytes are in one page. */
+			nread = nbytes;
+		}
+		else
+		{
+			/*
+			 * All the bytes are not in one page. Read available bytes on the
+			 * current page, copy them over to output buffer and continue to
+			 * read remaining bytes.
+			 */
+			nread = XLOG_BLCKSZ - (data - page);
+			Assert(nread > 0 && nread <= nbytes);
+		}
+
+		Assert(nread > 0);
+		memcpy(dst, data, nread);
+
+		/*
+		 * Make sure we don't read xlblocks down below before the page
+		 * contents up above.
+		 */
+		pg_read_barrier();
+
+		/* Recheck if the read page still exists in WAL buffers. */
+		endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
+
+		/* Return if the page got initalized while we were reading it. */
+		if (expectedEndPtr != endptr)
+			break;
+
+		dst += nread;
+		ptr += nread;
+		ntotal += nread;
+		nbytes -= nread;
+	}
+
+	/* We never read more than what the caller has asked for. */
+	Assert(ntotal <= count);
+
+#ifdef WAL_DEBUG
+	if (XLOG_DEBUG)
+		ereport(DEBUG1,
+				(errmsg_internal("read %zu bytes out of %zu bytes from WAL buffers for given start LSN %X/%X, timeline ID %u",
+								 ntotal, count, LSN_FORMAT_ARGS(startptr), tli)));
+#endif
+
+	return ntotal;
+}
+
 /*
  * Converts a "usable byte position" to XLogRecPtr. A usable byte position
  * is the position starting from the beginning of WAL, excluding all WAL
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index e0baa86bd3..5820c5eedc 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1473,8 +1473,10 @@ err:
  * Returns true if succeeded, false if an error occurs, in which case
  * 'errinfo' receives error details.
  *
- * XXX probably this should be improved to suck data directly from the
- * WAL buffers when possible.
+ * When possible, this function reads data directly from WAL buffers. When
+ * requested WAL isn't available in WAL buffers, the WAL is read from the WAL
+ * file as usual. The callers may avoid reading WAL from the WAL file thus
+ * reducing read system calls or even disk IOs.
  */
 bool
 WALRead(XLogReaderState *state,
@@ -1484,6 +1486,41 @@ WALRead(XLogReaderState *state,
 	char	   *p;
 	XLogRecPtr	recptr;
 	Size		nbytes;
+#ifndef FRONTEND
+	Size		nread;
+#endif
+
+#ifndef FRONTEND
+
+	/*
+	 * Try reading WAL from WAL buffers. Frontend code has no idea of WAL
+	 * buffers.
+	 */
+	nread = XLogReadFromBuffers(state, startptr, tli, count, buf);
+
+	Assert(nread >= 0);
+
+	/*
+	 * Check if we have read fully (hit), partially (partial hit) or nothing
+	 * (miss) from WAL buffers. If we have read either partially or nothing,
+	 * then continue to read the remaining bytes the usual way, that is, read
+	 * from WAL file.
+	 *
+	 * XXX: It might be worth to expose WAL buffer read stats.
+	 */
+	if (count == nread)
+		return true;			/* Buffer hit, so return. */
+	else if (count > nread)
+	{
+		/*
+		 * Buffer partial hit, so reset the state to count the read bytes and
+		 * continue.
+		 */
+		buf += nread;
+		startptr += nread;
+		count -= nread;
+	}
+#endif
 
 	p = buf;
 	recptr = startptr;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index a14126d164..18167c36b4 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -251,6 +251,12 @@ extern XLogRecPtr GetLastImportantRecPtr(void);
 
 extern void SetWalWriterSleeping(bool sleeping);
 
+extern Size XLogReadFromBuffers(struct XLogReaderState *state,
+								XLogRecPtr startptr,
+								TimeLineID tli,
+								Size count,
+								char *buf);
+
 /*
  * Routines used by xlogrecovery.c to call back into xlog.c during recovery.
  */
-- 
2.34.1

From 3eaf18789e5d0b0a7c95d59ac13d71f3ef51680c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 7 Nov 2023 21:05:12 +0000
Subject: [PATCH v16] Add test module for verifying WAL read from WAL buffers

This commit adds a test module to verify WAL read from WAL
buffers.

Author: Bharath Rupireddy
Reviewed-by: Dilip Kumar
Discussion: https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com
---
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 .../test_wal_read_from_buffers/.gitignore     |  4 ++
 .../test_wal_read_from_buffers/Makefile       | 23 ++++++++++
 .../test_wal_read_from_buffers/meson.build    | 33 ++++++++++++++
 .../test_wal_read_from_buffers/t/001_basic.pl | 43 +++++++++++++++++++
 .../test_wal_read_from_buffers--1.0.sql       | 16 +++++++
 .../test_wal_read_from_buffers.c              | 37 ++++++++++++++++
 .../test_wal_read_from_buffers.control        |  4 ++
 9 files changed, 162 insertions(+)
 create mode 100644 src/test/modules/test_wal_read_from_buffers/.gitignore
 create mode 100644 src/test/modules/test_wal_read_from_buffers/Makefile
 create mode 100644 src/test/modules/test_wal_read_from_buffers/meson.build
 create mode 100644 src/test/modules/test_wal_read_from_buffers/t/001_basic.pl
 create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql
 create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
 create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index e81873cb5a..f5aedb95a4 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -31,6 +31,7 @@ SUBDIRS = \
 		  test_rls_hooks \
 		  test_shm_mq \
 		  test_slru \
+		  test_wal_read_from_buffers \
 		  unsafe_tests \
 		  worker_spi
 
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index fcd643f6f1..86fd74ab50 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -28,5 +28,6 @@ subdir('test_regex')
 subdir('test_rls_hooks')
 subdir('test_shm_mq')
 subdir('test_slru')
+subdir('test_wal_read_from_buffers')
 subdir('unsafe_tests')
 subdir('worker_spi')
diff --git a/src/test/modules/test_wal_read_from_buffers/.gitignore b/src/test/modules/test_wal_read_from_buffers/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_wal_read_from_buffers/Makefile b/src/test/modules/test_wal_read_from_buffers/Makefile
new file mode 100644
index 0000000000..7472494501
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_wal_read_from_buffers/Makefile
+
+MODULE_big = test_wal_read_from_buffers
+OBJS = \
+	$(WIN32RES) \
+	test_wal_read_from_buffers.o
+PGFILEDESC = "test_wal_read_from_buffers - test module to read WAL from WAL buffers"
+
+EXTENSION = test_wal_read_from_buffers
+DATA = test_wal_read_from_buffers--1.0.sql
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_wal_read_from_buffers
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_wal_read_from_buffers/meson.build b/src/test/modules/test_wal_read_from_buffers/meson.build
new file mode 100644
index 0000000000..40bd5dcd33
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+test_wal_read_from_buffers_sources = files(
+  'test_wal_read_from_buffers.c',
+)
+
+if host_system == 'windows'
+  test_wal_read_from_buffers_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_wal_read_from_buffers',
+    '--FILEDESC', 'test_wal_read_from_buffers - test module to read WAL from WAL buffers',])
+endif
+
+test_wal_read_from_buffers = shared_module('test_wal_read_from_buffers',
+  test_wal_read_from_buffers_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_wal_read_from_buffers
+
+test_install_data += files(
+  'test_wal_read_from_buffers.control',
+  'test_wal_read_from_buffers--1.0.sql',
+)
+
+tests += {
+  'name': 'test_wal_read_from_buffers',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_basic.pl',
+    ],
+  },
+}
diff --git a/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl b/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl
new file mode 100644
index 0000000000..80f6947d1c
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl
@@ -0,0 +1,43 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('test');
+
+$node->init;
+
+# Ensure nobody interferes with us so that the WAL in WAL buffers don't get
+# overwritten while running tests.
+$node->append_conf(
+	'postgresql.conf', qq(
+autovacuum = off
+checkpoint_timeout = 1h
+wal_writer_delay = 10000ms
+wal_writer_flush_after = 1GB
+));
+$node->start;
+
+# Setup.
+$node->safe_psql('postgres', 'CREATE EXTENSION test_wal_read_from_buffers;');
+
+# Get current insert LSN. After this, we generate some WAL which is guranteed
+# to be in WAL buffers as there is no other WAL generating activity is
+# happening on the server. We then verify if we can read the WAL from WAL
+# buffers using this LSN.
+my $lsn = $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+
+# Generate minimal WAL so that WAL buffers don't get overwritten.
+$node->safe_psql('postgres',
+	"CREATE TABLE t (c int); INSERT INTO t VALUES (1);");
+
+# Check if WAL is successfully read from WAL buffers.
+my $result = $node->safe_psql('postgres',
+	qq{SELECT test_wal_read_from_buffers('$lsn');});
+is($result, 't', "WAL is successfully read from WAL buffers");
+
+done_testing();
diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql
new file mode 100644
index 0000000000..c6ffb3fa65
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql
@@ -0,0 +1,16 @@
+/* src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_wal_read_from_buffers" to load this file. \quit
+
+--
+-- test_wal_read_from_buffers()
+--
+-- Returns true if WAL data at a given LSN can be read from WAL buffers.
+-- Otherwise returns false.
+--
+CREATE FUNCTION test_wal_read_from_buffers(IN lsn pg_lsn,
+    read_successful OUT boolean
+)
+AS 'MODULE_PATHNAME', 'test_wal_read_from_buffers'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
new file mode 100644
index 0000000000..2307cbff7a
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
@@ -0,0 +1,37 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_wal_read_from_buffers.c
+ *		Test module to read WAL from WAL buffers.
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/xlog.h"
+#include "fmgr.h"
+#include "utils/pg_lsn.h"
+
+PG_MODULE_MAGIC;
+
+/*
+ * SQL function for verifying that WAL data at a given LSN can be read from WAL
+ * buffers. Returns true if read from WAL buffers, otherwise false.
+ */
+PG_FUNCTION_INFO_V1(test_wal_read_from_buffers);
+Datum
+test_wal_read_from_buffers(PG_FUNCTION_ARGS)
+{
+	char		data[XLOG_BLCKSZ] = {0};
+	Size		nread;
+
+	nread = XLogReadFromBuffers(NULL, PG_GETARG_LSN(0),
+								GetWALInsertionTimeLine(),
+								XLOG_BLCKSZ, data);
+
+	PG_RETURN_BOOL(nread > 0);
+}
diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control
new file mode 100644
index 0000000000..eda8d47954
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control
@@ -0,0 +1,4 @@
+comment = 'Test module to read WAL from WAL buffers'
+default_version = '1.0'
+module_pathname = '$libdir/test_wal_read_from_buffers'
+relocatable = true
-- 
2.34.1

Attachment: v16-0004-Add-support-for-collecting-WAL-read-stats-for-wa.patch
Description: Binary data

Reply via email to