Re: Improve WALRead() to suck data directly from WAL buffers when possible

Bharath Rupireddy Wed, 10 Jan 2024 06:29:57 -0800

On Fri, Jan 5, 2024 at 7:20 AM Jeff Davis <pg...@j-davis.com> wrote:
>
> On Wed, 2023-12-20 at 15:36 +0530, Bharath Rupireddy wrote:
> > Thanks. Attaching remaining patches as v18 patch-set after commits
> > c3a8e2a7cb16 and 766571be1659.
>
> Comments:


Thanks for reviewing.

> I still think the right thing for this patch is to call
> XLogReadFromBuffers() directly from the callers who need it, and not
> change WALRead(). I am open to changing this later, but for now that
> makes sense to me so that we can clearly identify which callers benefit
> and why. I have brought this up a few times before[1][2], so there must
> be some reason that I don't understand -- can you explain it?

IMO, WALRead() is the best place to have XLogReadFromBuffers() for 2
reasons: 1) All of the WALRead() callers (except FRONTEND tools) will
benefit if WAL is read from WAL buffers. I don't see any reason for a
caller to skip reading from WAL buffers. If there's a caller (in
future) wanting to skip reading from WAL buffers, I'm open to adding a
flag in XLogReaderState to skip.  2) The amount of code is reduced if
XLogReadFromBuffers() sits in WALRead().

> The XLogReadFromBuffersResult is never used. I can see how it might be
> useful for testing or asserts, but it's not used even in the test
> module. I don't think we should clutter the API with that kind of thing
> -- let's just return the nread.

Removed.

> I also do not like the terminology "partial hit" to be used in this
> way. Perhaps "short read" or something about hitting the end of
> readable WAL would be better?

"short read" seems good. Done that way in the new patch.

> I like how the callers of WALRead() are being more precise about the
> bytes they are requesting.
>
> You've added several spinlock acquisitions to the loop. Two explicitly,
> and one implicitly in WaitXLogInsertionsToFinish(). These may allow you
> to read slightly further, but introduce performance risk. Was this
> discussed?

I opted to read slightly further thinking that the loops aren't going
to get longer for spinlocks to appear costly. Basically, I wasn't sure
which approach was the best. Now that there's an opinion to keep them
outside, I'd agree with it. Done that way in the new patch.

> The callers are not checking for XLREADBUGS_UNINITIALIZED_WAL, so it
> seems like there's a risk of getting partially-written data? And it's
> not clear to me the check of the wal page headers is the right one
> anyway.
>
> It seems like all of this would be simpler if you checked first how far
> you can safely read data, and then just loop and read that far.  I'm not
> sure that it's worth it to try to mix the validity checks with the
> reading of the data.

XLogReadFromBuffers needs the page header check in after reading the
page from WAL buffers. Typically, we must not read a WAL buffer page
that just got initialized. Because we waited enough for the
in-progress WAL insertions to finish above. However, there can exist a
slight window after the above wait finishes in which the read buffer
page can get replaced especially under high WAL generation rates.
After all, we are reading from WAL buffers without any locks here. So,
let's not count such a page in.

I've addressed the above review comments and attached v19 patch-set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From e03af5726957437c15361bdb1b373fe8982f5c7c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 10 Jan 2024 14:12:17 +0000
Subject: [PATCH v19] Allow WAL reading from WAL buffers

This commit adds postgres the capability to read WAL from WAL
buffers. When requested WAL isn't available in WAL buffers, the
WAL is read from the WAL file as usual.

This commit benefits the callers of WALRead(), that are
walsenders, pg_walinspect etc. They all can now avoid reading WAL
from the WAL file (possibly avoiding disk IO). Tests show that the
WAL buffers hit ratio stood at 95% for 1 primary, 1 sync standby,
1 async standby, with pgbench --scale=300 --client=32 --time=900.
In other words, the walsenders avoided 95% of the time reading from
the file/avoided pread system calls:
https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com

This commit also benefits when direct IO is enabled for WAL.
Reading WAL from WAL buffers puts back the performance close to
that of without direct IO for WAL:
https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40mail.gmail.com

This commit paves the way for the following features in future:
- Improves synchronous replication performance by replicating
directly from WAL buffers.
- A opt-in way for the walreceivers to receive unflushed WAL.
More details here:
https://www.postgresql.org/message-id/20231011224353.cl7c2s222dw3de4j%40awork3.anarazel.de

Author: Bharath Rupireddy
Reviewed-by: Dilip Kumar, Andres Freund
Reviewed-by: Nathan Bossart, Kuntal Ghosh
Discussion: https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c       | 173 ++++++++++++++++++++++++
 src/backend/access/transam/xlogreader.c |  40 +++++-
 src/backend/access/transam/xlogutils.c  |  11 +-
 src/backend/postmaster/walsummarizer.c  |  10 +-
 src/backend/replication/walsender.c     |  10 +-
 src/include/access/xlog.h               |   3 +
 6 files changed, 231 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 478377c4a2..886eaf12e3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1705,6 +1705,179 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
 	return cachedPos + ptr % XLOG_BLCKSZ;
 }
 
+/*
+ * Read WAL from WAL buffers.
+ *
+ * This function reads 'count' bytes of WAL from WAL buffers into 'buf'
+ * starting at location 'startptr' on timeline 'tli' and returns total bytes
+ * read.
+ *
+ * Points to note:
+ *
+ * - This function reads as much as it can from WAL buffers, meaning, it may
+ * not read all the requested 'count' bytes. Caller must be aware of this and
+ * deal with it.
+ *
+ * - This function reads WAL from WAL buffers without holding any lock. First
+ * it reads xlblocks atomically for checking page existence, then it reads the
+ * page contents and validates. Finally, it rechecks the page existence by
+ * re-reading xlblocks; if the read page is replaced, it discards it and
+ * returns.
+ *
+ * - This function is not available for frontend code as WAL buffers are
+ * internal to the server.
+ *
+ * - This function waits for any in-progress WAL insertions to WAL buffers to
+ * finish.
+ */
+Size
+XLogReadFromBuffers(XLogRecPtr startptr, TimeLineID tli, Size count,
+					char *buf)
+{
+	XLogRecPtr	ptr;
+	Size		nbytes;
+	Size		ntotal = 0;
+	char	   *dst;
+	uint64		bytepos;
+	XLogRecPtr	reservedUpto;
+	XLogwrtResult LogwrtResult;
+
+	/*
+	 * Fast paths for the following reasons: 1) WAL buffers aren't in use when
+	 * server is in recovery. 2) WAL is inserted into WAL buffers on current
+	 * server's insertion TLI. 3) Invalid starting WAL location.
+	 */
+	if (RecoveryInProgress() ||
+		tli != GetWALInsertionTimeLine() ||
+		XLogRecPtrIsInvalid(startptr))
+		return ntotal;
+
+	/* Read the current insert position */
+	SpinLockAcquire(&XLogCtl->Insert.insertpos_lck);
+	bytepos = XLogCtl->Insert.CurrBytePos;
+	SpinLockRelease(&XLogCtl->Insert.insertpos_lck);
+
+	reservedUpto = XLogBytePosToEndRecPtr(bytepos);
+
+	/*
+	 * WAL being read doesn't yet exist i.e. past the current insert position.
+	 */
+	if ((startptr + count) > reservedUpto)
+		return ntotal;
+
+	SpinLockAcquire(&XLogCtl->info_lck);
+	LogwrtResult = XLogCtl->LogwrtResult;
+	SpinLockRelease(&XLogCtl->info_lck);
+
+	/* Wait for any in-progress WAL insertions to WAL buffers to finish. */
+	if ((startptr + count) > LogwrtResult.Write &&
+		(startptr + count) <= reservedUpto)
+		WaitXLogInsertionsToFinish(startptr + count);
+
+	ptr = startptr;
+	nbytes = count;
+	dst = buf;
+
+	while (nbytes > 0)
+	{
+		XLogRecPtr	expectedEndPtr;
+		XLogRecPtr	endptr;
+		int			idx;
+		char	   *page;
+		char	   *data;
+		Size		nread;
+		XLogPageHeader phdr;
+
+		idx = XLogRecPtrToBufIdx(ptr);
+		expectedEndPtr = ptr;
+		expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
+		endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
+
+		/* Requested WAL isn't available in WAL buffers. */
+		if (expectedEndPtr != endptr)
+			break;
+
+		/*
+		 * We found WAL buffer page containing given XLogRecPtr. Get starting
+		 * address of the page and a pointer to the right location of given
+		 * XLogRecPtr in that page.
+		 */
+		page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
+		data = page + ptr % XLOG_BLCKSZ;
+
+		/*
+		 * Make sure we don't read xlblocks up above before the page contents
+		 * down below.
+		 */
+		pg_read_barrier();
+
+		nread = 0;
+
+		/* Read what is wanted, not the whole page. */
+		if ((data + nbytes) <= (page + XLOG_BLCKSZ))
+		{
+			/* All the bytes are in one page. */
+			nread = nbytes;
+		}
+		else
+		{
+			/*
+			 * All the bytes are not in one page. Read available bytes on the
+			 * current page, copy them over to output buffer and continue to
+			 * read remaining bytes.
+			 */
+			nread = XLOG_BLCKSZ - (data - page);
+			Assert(nread > 0 && nread <= nbytes);
+		}
+
+		Assert(nread > 0);
+		memcpy(dst, data, nread);
+
+		/*
+		 * Make sure we don't read xlblocks down below before the page
+		 * contents up above.
+		 */
+		pg_read_barrier();
+
+		/* Recheck if the read page still exists in WAL buffers. */
+		endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
+
+		/* Return if the page got initalized while we were reading it. */
+		if (expectedEndPtr != endptr)
+			break;
+
+		/*
+		 * Typically, we must not read a WAL buffer page that just got
+		 * initialized. Because we waited enough for the in-progress WAL
+		 * insertions to finish above. However, there can exist a slight
+		 * window after the above wait finishes in which the read buffer page
+		 * can get replaced especially under high WAL generation rates. After
+		 * all, we are reading from WAL buffers without any locks here. So,
+		 * let's not count such a page in.
+		 */
+		phdr = (XLogPageHeader) page;
+		if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC &&
+			  phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) &&
+			  phdr->xlp_tli == tli))
+			break;
+
+		dst += nread;
+		ptr += nread;
+		ntotal += nread;
+		nbytes -= nread;
+	}
+
+	/* We never read more than what the caller has asked for. */
+	Assert(ntotal <= count);
+
+	ereport(DEBUG1,
+			errmsg_internal("read %zu bytes out of %zu bytes from WAL buffers for given start LSN %X/%X, timeline ID %u",
+							ntotal, count,
+							LSN_FORMAT_ARGS(startptr), tli));
+
+	return ntotal;
+}
+
 /*
  * Converts a "usable byte position" to XLogRecPtr. A usable byte position
  * is the position starting from the beginning of WAL, excluding all WAL
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 7190156f2f..639bba2ad9 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1501,17 +1501,47 @@ err:
  * Returns true if succeeded, false if an error occurs, in which case
  * 'errinfo' receives error details.
  *
- * XXX probably this should be improved to suck data directly from the
- * WAL buffers when possible.
+ * When possible, this function reads WAL from WAL buffers. When requested WAL
+ * isn't available in WAL buffers, it is read from the WAL file as usual.
  */
 bool
-WALRead(XLogReaderState *state,
-		char *buf, XLogRecPtr startptr, Size count, TimeLineID tli,
-		WALReadError *errinfo)
+WALRead(XLogReaderState *state, char *buf, XLogRecPtr startptr,
+		Size count, TimeLineID tli, WALReadError *errinfo)
 {
 	char	   *p;
 	XLogRecPtr	recptr;
 	Size		nbytes;
+#ifndef FRONTEND
+	Size		nread;
+#endif
+
+#ifndef FRONTEND
+
+	/*
+	 * Try reading WAL from WAL buffers. Frontend code has no idea of WAL
+	 * buffers.
+	 */
+	nread = XLogReadFromBuffers(startptr, tli, count, buf);
+
+	if (nread > 0)
+	{
+		/*
+		 * Check if its a full read, short read or no read from WAL buffers.
+		 * For short read or no read, continue to read the remaining bytes
+		 * from WAL file.
+		 *
+		 * XXX: It might be worth to expose WAL buffer read stats.
+		 */
+		if (nread == count)		/* full read */
+			return true;
+		else if (nread < count) /* short read */
+		{
+			buf += nread;
+			startptr += nread;
+			count -= nread;
+		}
+	}
+#endif
 
 	p = buf;
 	recptr = startptr;
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index aa8667abd1..fafab9aa32 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -1007,12 +1007,13 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,
 	}
 
 	/*
-	 * Even though we just determined how much of the page can be validly read
-	 * as 'count', read the whole page anyway. It's guaranteed to be
-	 * zero-padded up to the page boundary if it's incomplete.
+	 * We determined how much of the page can be validly read as 'count', read
+	 * that much only, not the entire page. Since WALRead() can read the page
+	 * from WAL buffers, in which case, the page is not guaranteed to be
+	 * zero-padded up to the page boundary because of the concurrent
+	 * insertions.
 	 */
-	if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ, tli,
-				 &errinfo))
+	if (!WALRead(state, cur_page, targetPagePtr, count, tli, &errinfo))
 		WALReadRaiseError(&errinfo);
 
 	/* number of valid bytes in the buffer */
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index f828cc436a..d465848bc9 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -1254,11 +1254,13 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
 	}
 
 	/*
-	 * Even though we just determined how much of the page can be validly read
-	 * as 'count', read the whole page anyway. It's guaranteed to be
-	 * zero-padded up to the page boundary if it's incomplete.
+	 * We determined how much of the page can be validly read as 'count', read
+	 * that much only, not the entire page. Since WALRead() can read the page
+	 * from WAL buffers, in which case, the page is not guaranteed to be
+	 * zero-padded up to the page boundary because of the concurrent
+	 * insertions.
 	 */
-	if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ,
+	if (!WALRead(state, cur_page, targetPagePtr, count,
 				 private_data->tli, &errinfo))
 		WALReadRaiseError(&errinfo);
 
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 087031e9dc..b35406bcdf 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1095,11 +1095,17 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 	else
 		count = flushptr - targetPagePtr;	/* part of the page available */
 
-	/* now actually read the data, we know it's there */
+	/*
+	 * We determined how much of the page can be validly read as 'count', read
+	 * that much only, not the entire page. Since WALRead() can read the page
+	 * from WAL buffers, in which case, the page is not guaranteed to be
+	 * zero-padded up to the page boundary because of the concurrent
+	 * insertions.
+	 */
 	if (!WALRead(state,
 				 cur_page,
 				 targetPagePtr,
-				 XLOG_BLCKSZ,
+				 count,
 				 currTLI,		/* Pass the current TLI because only
 								 * WalSndSegmentOpen controls whether new TLI
 								 * is needed. */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 301c5fa11f..fa760a92d5 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -252,6 +252,9 @@ extern XLogRecPtr GetLastImportantRecPtr(void);
 
 extern void SetWalWriterSleeping(bool sleeping);
 
+extern Size XLogReadFromBuffers(XLogRecPtr startptr, TimeLineID tli,
+								Size count, char *buf);
+
 /*
  * Routines used by xlogrecovery.c to call back into xlog.c during recovery.
  */
-- 
2.34.1

From 35e9c0afe130d79e4f74dfbe3a445cf3d594ec14 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 10 Jan 2024 13:36:15 +0000
Subject: [PATCH v19] Add test module for verifying WAL read from WAL buffers

This commit adds a test module to verify WAL read from WAL
buffers.

Author: Bharath Rupireddy
Reviewed-by: Dilip Kumar
Discussion: https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com
---
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 .../test_wal_read_from_buffers/.gitignore     |  4 ++
 .../test_wal_read_from_buffers/Makefile       | 23 ++++++++
 .../test_wal_read_from_buffers/meson.build    | 33 ++++++++++++
 .../test_wal_read_from_buffers/t/001_basic.pl | 54 +++++++++++++++++++
 .../test_wal_read_from_buffers--1.0.sql       | 16 ++++++
 .../test_wal_read_from_buffers.c              | 44 +++++++++++++++
 .../test_wal_read_from_buffers.control        |  4 ++
 9 files changed, 180 insertions(+)
 create mode 100644 src/test/modules/test_wal_read_from_buffers/.gitignore
 create mode 100644 src/test/modules/test_wal_read_from_buffers/Makefile
 create mode 100644 src/test/modules/test_wal_read_from_buffers/meson.build
 create mode 100644 src/test/modules/test_wal_read_from_buffers/t/001_basic.pl
 create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql
 create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
 create mode 100644 src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5d33fa6a9a..64a051ce1c 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -33,6 +33,7 @@ SUBDIRS = \
 		  test_rls_hooks \
 		  test_shm_mq \
 		  test_slru \
+		  test_wal_read_from_buffers \
 		  unsafe_tests \
 		  worker_spi \
 		  xid_wraparound
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 00ff1d77d1..d5ec3bd3a9 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -30,6 +30,7 @@ subdir('test_resowner')
 subdir('test_rls_hooks')
 subdir('test_shm_mq')
 subdir('test_slru')
+subdir('test_wal_read_from_buffers')
 subdir('unsafe_tests')
 subdir('worker_spi')
 subdir('xid_wraparound')
diff --git a/src/test/modules/test_wal_read_from_buffers/.gitignore b/src/test/modules/test_wal_read_from_buffers/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_wal_read_from_buffers/Makefile b/src/test/modules/test_wal_read_from_buffers/Makefile
new file mode 100644
index 0000000000..7472494501
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_wal_read_from_buffers/Makefile
+
+MODULE_big = test_wal_read_from_buffers
+OBJS = \
+	$(WIN32RES) \
+	test_wal_read_from_buffers.o
+PGFILEDESC = "test_wal_read_from_buffers - test module to read WAL from WAL buffers"
+
+EXTENSION = test_wal_read_from_buffers
+DATA = test_wal_read_from_buffers--1.0.sql
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_wal_read_from_buffers
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_wal_read_from_buffers/meson.build b/src/test/modules/test_wal_read_from_buffers/meson.build
new file mode 100644
index 0000000000..40bd5dcd33
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+test_wal_read_from_buffers_sources = files(
+  'test_wal_read_from_buffers.c',
+)
+
+if host_system == 'windows'
+  test_wal_read_from_buffers_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_wal_read_from_buffers',
+    '--FILEDESC', 'test_wal_read_from_buffers - test module to read WAL from WAL buffers',])
+endif
+
+test_wal_read_from_buffers = shared_module('test_wal_read_from_buffers',
+  test_wal_read_from_buffers_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_wal_read_from_buffers
+
+test_install_data += files(
+  'test_wal_read_from_buffers.control',
+  'test_wal_read_from_buffers--1.0.sql',
+)
+
+tests += {
+  'name': 'test_wal_read_from_buffers',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+    'tests': [
+      't/001_basic.pl',
+    ],
+  },
+}
diff --git a/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl b/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl
new file mode 100644
index 0000000000..1d842bb02e
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/t/001_basic.pl
@@ -0,0 +1,54 @@
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('test');
+
+$node->init;
+
+# Ensure nobody interferes with us so that the WAL in WAL buffers don't get
+# overwritten while running tests.
+$node->append_conf(
+	'postgresql.conf', qq(
+autovacuum = off
+checkpoint_timeout = 1h
+wal_writer_delay = 10000ms
+wal_writer_flush_after = 1GB
+));
+$node->start;
+
+# Setup.
+$node->safe_psql('postgres', 'CREATE EXTENSION test_wal_read_from_buffers;');
+
+# Get current insert LSN. After this, we generate some WAL which is guranteed
+# to be in WAL buffers as there is no other WAL generating activity is
+# happening on the server. We then verify if we can read the WAL from WAL
+# buffers using this LSN.
+my $lsn = $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+
+# Generate minimal WAL so that WAL buffers don't get overwritten.
+$node->safe_psql('postgres',
+	"CREATE TABLE t (c int); INSERT INTO t VALUES (1);");
+
+# Check if WAL is successfully read from WAL buffers.
+my $result = $node->safe_psql('postgres',
+	qq{SELECT test_wal_read_from_buffers('$lsn');});
+is($result, 't', "WAL is successfully read from WAL buffers");
+
+# Check with a WAL that doesn't yet exist.
+$lsn = $node->safe_psql('postgres', 'SELECT pg_current_wal_flush_lsn()+8192;');
+$result = $node->safe_psql('postgres',
+	qq{SELECT test_wal_read_from_buffers('$lsn');});
+is($result, 'f', "WAL that doesn't yet exist is not read from WAL buffers");
+
+# Check with invalid input.
+$result = $node->safe_psql('postgres',
+	qq{SELECT test_wal_read_from_buffers('0/0');});
+is($result, 'f', "WAL is not read from WAL buffers with invalid input");
+
+done_testing();
diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql
new file mode 100644
index 0000000000..c6ffb3fa65
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql
@@ -0,0 +1,16 @@
+/* src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_wal_read_from_buffers" to load this file. \quit
+
+--
+-- test_wal_read_from_buffers()
+--
+-- Returns true if WAL data at a given LSN can be read from WAL buffers.
+-- Otherwise returns false.
+--
+CREATE FUNCTION test_wal_read_from_buffers(IN lsn pg_lsn,
+    read_successful OUT boolean
+)
+AS 'MODULE_PATHNAME', 'test_wal_read_from_buffers'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
new file mode 100644
index 0000000000..e54c64236d
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
@@ -0,0 +1,44 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_wal_read_from_buffers.c
+ *		Test module to read WAL from WAL buffers.
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/xlog.h"
+#include "fmgr.h"
+#include "utils/pg_lsn.h"
+
+PG_MODULE_MAGIC;
+
+/*
+ * SQL function for verifying that WAL data at a given LSN can be read from WAL
+ * buffers. Returns true if read from WAL buffers, otherwise false.
+ */
+PG_FUNCTION_INFO_V1(test_wal_read_from_buffers);
+Datum
+test_wal_read_from_buffers(PG_FUNCTION_ARGS)
+{
+	char		data[XLOG_BLCKSZ] = {0};
+	Size		nread;
+	bool		is_read;
+
+	nread = XLogReadFromBuffers(PG_GETARG_LSN(0),
+								GetWALInsertionTimeLine(),
+								XLOG_BLCKSZ,
+								data);
+
+	if (nread > 0)
+		is_read = true;
+	else
+		is_read = false;
+
+	PG_RETURN_BOOL(is_read);
+}
diff --git a/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control
new file mode 100644
index 0000000000..eda8d47954
--- /dev/null
+++ b/src/test/modules/test_wal_read_from_buffers/test_wal_read_from_buffers.control
@@ -0,0 +1,4 @@
+comment = 'Test module to read WAL from WAL buffers'
+default_version = '1.0'
+module_pathname = '$libdir/test_wal_read_from_buffers'
+relocatable = true
-- 
2.34.1

Re: Improve WALRead() to suck data directly from WAL buffers when possible

Reply via email to