I've attached v27 of the patch.

I've renamed IOPATH to IOCONTEXT. I also have added assertions to
confirm that unexpected statistics are not being accumulated.

There are also assorted other cleanups and changes.

It would be good to confirm that the rows being skipped and cells that
are NULL in the view are the correct ones.
The startup process will never use a BufferAccessStrategy, right?


On Wed, Jul 20, 2022 at 12:50 PM Andres Freund <and...@anarazel.de> wrote:

>
> > Subject: [PATCH v26 3/4] Track IO operation statistics
>
> > @@ -978,8 +979,17 @@ ReadBuffer_common(SMgrRelation smgr, char
> relpersistence, ForkNumber forkNum,
> >
> >       bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) :
> BufHdrGetBlock(bufHdr);
> >
> > +     if (isLocalBuf)
> > +             io_path = IOPATH_LOCAL;
> > +     else if (strategy != NULL)
> > +             io_path = IOPATH_STRATEGY;
> > +     else
> > +             io_path = IOPATH_SHARED;
>
> Seems a bit ugly to have an if (isLocalBuf) just after an isLocalBuf ?.
>

Changed this.


>
>
> > +                     /*
> > +                      * When a strategy is in use, reused buffers from
> the strategy ring will
> > +                      * be counted as allocations for the purposes of
> IO Operation statistics
> > +                      * tracking.
> > +                      *
> > +                      * However, even when a strategy is in use, if a
> new buffer must be
> > +                      * allocated from shared buffers and added to the
> ring, this is counted
> > +                      * as a IOPATH_SHARED allocation.
> > +                      */
>
> There's a bit too much duplication between the paragraphs...
>

I actually think the two paragraphs are making separate points. I've
edited this, so see if you like it better now.


>
> > @@ -628,6 +637,9 @@ pgstat_report_stat(bool force)
> >       /* flush database / relation / function / ... stats */
> >       partial_flush |= pgstat_flush_pending_entries(nowait);
> >
> > +     /* flush IO Operations stats */
> > +     partial_flush |= pgstat_flush_io_ops(nowait);
>
> Could you either add a note to the commit message that the stats file
> version needs to be increased, or just iclude that in the patch.
>
>
Bumped the stats file version in attached patchset.

- Melanie
From b382e216b4a3f1dae91b043c5c8d647ea17821b7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 11 Aug 2022 18:28:46 -0400
Subject: [PATCH v27 3/4] Track IO operation statistics

Introduce "IOOp", an IO operation done by a backend, and "IOContext",
the IO location source or target or IO type done by a backend. For
example, the checkpointer may write a shared buffer out. This would be
counted as an IOOp "write" on an IOContext IOCONTEXT_SHARED by
BackendType "checkpointer".

Each IOOp (alloc, extend, fsync, read, write) is counted per IOContext
(local, shared, or strategy) through a call to pgstat_count_io_op().

The primary concern of these statistics is IO operations on data blocks
during the course of normal database operations. IO done by, for
example, the archiver or syslogger is not counted in these statistics.

IOCONTEXT_LOCAL and IOCONTEXT_SHARED IOContexts concern operations on
local and shared buffers.

The IOCONTEXT_STRATEGY IOContext concerns buffers
alloc'd/extended/fsync'd/read/written as part of a BufferAccessStrategy.

IOOP_ALLOC is counted for IOCONTEXT_SHARED and IOCONTEXT_LOCAL whenever
a buffer is acquired through [Local]BufferAlloc(). IOOP_ALLOC for
IOCONTEXT_STRATEGY is counted whenever a buffer already in the strategy
ring is reused. And IOOP_WRITE for IOCONTEXT_STRATEGY is counted
whenever the reused dirty buffer is written out.

Stats on IOOps for all IOContexts for a backend are initially
accumulated locally.

Later they are flushed to shared memory and accumulated with those from
all other backends, exited and live. The accumulated stats in shared
memory could be extended in the future with per-backend stats -- useful
for per connection IO statistics and monitoring.

Some BackendTypes will not flush their pending statistics at regular
intervals and explicitly call pgstat_flush_io_ops() during the course of
normal operations to flush their backend-local IO Operation statistics
to shared memory in a timely manner.

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Justin Pryzby <pry...@telsasoft.com>, Kyotaro Horiguchi <horikyota....@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml                  |   2 +
 src/backend/postmaster/checkpointer.c         |  12 +
 src/backend/storage/buffer/bufmgr.c           |  64 +++-
 src/backend/storage/buffer/freelist.c         |  23 +-
 src/backend/storage/buffer/localbuf.c         |   5 +
 src/backend/storage/sync/sync.c               |   9 +
 src/backend/utils/activity/Makefile           |   1 +
 src/backend/utils/activity/pgstat.c           |  31 ++
 src/backend/utils/activity/pgstat_bgwriter.c  |   7 +-
 .../utils/activity/pgstat_checkpointer.c      |   7 +-
 src/backend/utils/activity/pgstat_io_ops.c    | 297 ++++++++++++++++++
 src/backend/utils/activity/pgstat_relation.c  |  15 +-
 src/backend/utils/activity/pgstat_shmem.c     |   4 +
 src/backend/utils/activity/pgstat_wal.c       |   4 +-
 src/backend/utils/adt/pgstatfuncs.c           |   4 +-
 src/include/miscadmin.h                       |   2 +
 src/include/pgstat.h                          | 103 +++++-
 src/include/storage/buf_internals.h           |   2 +-
 src/include/utils/pgstat_internal.h           |  29 ++
 19 files changed, 601 insertions(+), 20 deletions(-)
 create mode 100644 src/backend/utils/activity/pgstat_io_ops.c

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a6e7e3b69d..14d97ec92c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5360,6 +5360,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         the <structname>pg_stat_bgwriter</structname>
         view, <literal>archiver</literal> to reset all the counters shown in
         the <structname>pg_stat_archiver</structname> view,
+        <literal>io</literal> to reset all the counters shown in the
+        <structname>pg_stat_io</structname> view,
         <literal>wal</literal> to reset all the counters shown in the
         <structname>pg_stat_wal</structname> view or
         <literal>recovery_prefetch</literal> to reset all the counters shown
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5fc076fc14..bd2e1de7c2 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -1116,6 +1116,18 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 		if (!AmBackgroundWriterProcess())
 			CheckpointerShmem->num_backend_fsync++;
 		LWLockRelease(CheckpointerCommLock);
+
+		/*
+		 * We have no way of knowing if the current IOContext is
+		 * IOCONTEXT_SHARED or IOCONTEXT_STRATEGY at this point, so count the
+		 * fsync as being in the IOCONTEXT_SHARED IOContext. This is probably
+		 * okay, because the number of backend fsyncs doesn't say anything
+		 * about the efficacy of the BufferAccessStrategy. And counting both
+		 * fsyncs done in IOCONTEXT_SHARED and IOCONTEXT_STRATEGY under
+		 * IOCONTEXT_SHARED is likely clearer when investigating the number of
+		 * backend fsyncs.
+		 */
+		pgstat_count_io_op(IOOP_FSYNC, IOCONTEXT_SHARED);
 		return false;
 	}
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8ef0436c52..d4c9bf7c4f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -482,7 +482,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 							   BlockNumber blockNum,
 							   BufferAccessStrategy strategy,
 							   bool *foundPtr);
-static void FlushBuffer(BufferDesc *buf, SMgrRelation reln);
+static void FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOContext io_context);
 static void FindAndDropRelationBuffers(RelFileLocator rlocator,
 									   ForkNumber forkNum,
 									   BlockNumber nForkBlock,
@@ -823,6 +823,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	BufferDesc *bufHdr;
 	Block		bufBlock;
 	bool		found;
+	IOContext	io_context;
 	bool		isExtend;
 	bool		isLocalBuf = SmgrIsTemp(smgr);
 
@@ -986,10 +987,25 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 */
 	Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));	/* spinlock not needed */
 
-	bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
+	if (isLocalBuf)
+	{
+		bufBlock = LocalBufHdrGetBlock(bufHdr);
+		io_context = IOCONTEXT_LOCAL;
+	}
+	else
+	{
+		bufBlock = BufHdrGetBlock(bufHdr);
+
+		if (strategy != NULL)
+			io_context = IOCONTEXT_STRATEGY;
+		else
+			io_context = IOCONTEXT_SHARED;
+	}
 
 	if (isExtend)
 	{
+
+		pgstat_count_io_op(IOOP_EXTEND, io_context);
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
 		/* don't set checksum for all-zero page */
@@ -1020,6 +1036,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
 
+			pgstat_count_io_op(IOOP_READ, io_context);
+
 			if (track_io_timing)
 			{
 				INSTR_TIME_SET_CURRENT(io_time);
@@ -1190,6 +1208,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	/* Loop here in case we have to try another victim buffer */
 	for (;;)
 	{
+		bool		from_ring;
+
 		/*
 		 * Ensure, while the spinlock's not yet held, that there's a free
 		 * refcount entry.
@@ -1200,7 +1220,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * Select a victim buffer.  The buffer is returned with its header
 		 * spinlock still held!
 		 */
-		buf = StrategyGetBuffer(strategy, &buf_state);
+		buf = StrategyGetBuffer(strategy, &buf_state, &from_ring);
 
 		Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);
 
@@ -1237,6 +1257,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			if (LWLockConditionalAcquire(BufferDescriptorGetContentLock(buf),
 										 LW_SHARED))
 			{
+				IOContext	io_context;
+
 				/*
 				 * If using a nondefault strategy, and writing the buffer
 				 * would require a WAL flush, let the strategy decide whether
@@ -1263,13 +1285,28 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 					}
 				}
 
+				/*
+				 * When a strategy is in use, if the target dirty buffer is an
+				 * existing strategy buffer being reused, count this as a
+				 * strategy write for the purposes of IO Operations statistics
+				 * tracking.
+				 *
+				 * All dirty shared buffers upon first being added to the ring
+				 * will be counted as shared buffer writes.
+				 *
+				 * When a strategy is not in use, the write can only be a
+				 * "regular" write of a dirty shared buffer.
+				 */
+
+				io_context = from_ring ? IOCONTEXT_STRATEGY : IOCONTEXT_SHARED;
+
 				/* OK, do the I/O */
 				TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum, blockNum,
 														  smgr->smgr_rlocator.locator.spcOid,
 														  smgr->smgr_rlocator.locator.dbOid,
 														  smgr->smgr_rlocator.locator.relNumber);
 
-				FlushBuffer(buf, NULL);
+				FlushBuffer(buf, NULL, io_context);
 				LWLockRelease(BufferDescriptorGetContentLock(buf));
 
 				ScheduleBufferTagForWriteback(&BackendWritebackContext,
@@ -2573,7 +2610,7 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
 	PinBuffer_Locked(bufHdr);
 	LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOCONTEXT_SHARED);
 
 	LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 
@@ -2820,9 +2857,12 @@ BufferGetTag(Buffer buffer, RelFileLocator *rlocator, ForkNumber *forknum,
  *
  * If the caller has an smgr reference for the buffer's relation, pass it
  * as the second parameter.  If not, pass NULL.
+ *
+ * IOContext will always be IOCONTEXT_SHARED except when a buffer access strategy is
+ * used and the buffer being flushed is a buffer from the strategy ring.
  */
 static void
-FlushBuffer(BufferDesc *buf, SMgrRelation reln)
+FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOContext io_context)
 {
 	XLogRecPtr	recptr;
 	ErrorContextCallback errcallback;
@@ -2902,6 +2942,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 	 */
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
+	pgstat_count_io_op(IOOP_WRITE, io_context);
+
 	if (track_io_timing)
 		INSTR_TIME_SET_CURRENT(io_start);
 
@@ -3549,6 +3591,8 @@ FlushRelationBuffers(Relation rel)
 						  localpage,
 						  false);
 
+				pgstat_count_io_op(IOOP_WRITE, IOCONTEXT_LOCAL);
+
 				buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 				pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
 
@@ -3584,7 +3628,7 @@ FlushRelationBuffers(Relation rel)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, RelationGetSmgr(rel));
+			FlushBuffer(bufHdr, RelationGetSmgr(rel), IOCONTEXT_SHARED);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr, true);
 		}
@@ -3679,7 +3723,7 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, srelent->srel);
+			FlushBuffer(bufHdr, srelent->srel, IOCONTEXT_SHARED);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr, true);
 		}
@@ -3883,7 +3927,7 @@ FlushDatabaseBuffers(Oid dbid)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, NULL);
+			FlushBuffer(bufHdr, NULL, IOCONTEXT_SHARED);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr, true);
 		}
@@ -3910,7 +3954,7 @@ FlushOneBuffer(Buffer buffer)
 
 	Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOCONTEXT_SHARED);
 }
 
 /*
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 990e081aae..237a48e8d8 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -15,6 +15,7 @@
  */
 #include "postgres.h"
 
+#include "pgstat.h"
 #include "port/atomics.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
@@ -198,13 +199,15 @@ have_free_buffer(void)
  *	return the buffer with the buffer header spinlock still held.
  */
 BufferDesc *
-StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
+StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_ring)
 {
 	BufferDesc *buf;
 	int			bgwprocno;
 	int			trycounter;
 	uint32		local_buf_state;	/* to avoid repeated (de-)referencing */
 
+	*from_ring = false;
+
 	/*
 	 * If given a strategy object, see whether it can select a buffer. We
 	 * assume strategy objects don't need buffer_strategy_lock.
@@ -212,8 +215,23 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	if (strategy != NULL)
 	{
 		buf = GetBufferFromRing(strategy, buf_state);
-		if (buf != NULL)
+		*from_ring = buf != NULL;
+		if (*from_ring)
+		{
+			/*
+			 * When a strategy is in use, reused buffers from the strategy
+			 * ring will be counted as IOCONTEXT_STRATEGY allocations for the
+			 * purposes of IO Operation statistics tracking.
+			 *
+			 * However, even when a strategy is in use, if a new buffer must
+			 * be allocated from shared buffers and added to the ring, this is
+			 * counted instead as an IOCONTEXT_SHARED allocation. So, only
+			 * reused buffers are counted as being in the IOCONTEXT_STRATEGY
+			 * IOContext.
+			 */
+			pgstat_count_io_op(IOOP_ALLOC, IOCONTEXT_STRATEGY);
 			return buf;
+		}
 	}
 
 	/*
@@ -247,6 +265,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	 * the rate of buffer consumption.  Note that buffers recycled by a
 	 * strategy object are intentionally not counted here.
 	 */
+	pgstat_count_io_op(IOOP_ALLOC, IOCONTEXT_SHARED);
 	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
 
 	/*
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 014f644bf9..a3d76599bf 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -15,6 +15,7 @@
  */
 #include "postgres.h"
 
+#include "pgstat.h"
 #include "access/parallel.h"
 #include "catalog/catalog.h"
 #include "executor/instrument.h"
@@ -196,6 +197,8 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 				LocalRefCount[b]++;
 				ResourceOwnerRememberBuffer(CurrentResourceOwner,
 											BufferDescriptorGetBuffer(bufHdr));
+
+				pgstat_count_io_op(IOOP_ALLOC, IOCONTEXT_LOCAL);
 				break;
 			}
 		}
@@ -226,6 +229,8 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 				  localpage,
 				  false);
 
+		pgstat_count_io_op(IOOP_WRITE, IOCONTEXT_LOCAL);
+
 		/* Mark not-dirty now in case we error out below */
 		buf_state &= ~BM_DIRTY;
 		pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index 9d6a9e9109..f310b7a435 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -432,6 +432,15 @@ ProcessSyncRequests(void)
 					total_elapsed += elapsed;
 					processed++;
 
+					/*
+					 * Note that if a backend using a BufferAccessStrategy is
+					 * forced to do its own fsync (as opposed to the
+					 * checkpointer doing it), it will not be counted as an
+					 * IOCONTEXT_STRATEGY IOOP_FSYNC and instead will be
+					 * counted as an IOCONTEXT_SHARED IOOP_FSYNC.
+					 */
+					pgstat_count_io_op(IOOP_FSYNC, IOCONTEXT_SHARED);
+
 					if (log_checkpoints)
 						elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f ms",
 							 processed,
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index a2e8507fd6..0098785089 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
 	pgstat_function.o \
+	pgstat_io_ops.o \
 	pgstat_relation.o \
 	pgstat_replslot.o \
 	pgstat_shmem.o \
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 88e5dd1b2b..c30954d90a 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -359,6 +359,15 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 		.snapshot_cb = pgstat_checkpointer_snapshot_cb,
 	},
 
+	[PGSTAT_KIND_IOOPS] = {
+		.name = "io_ops",
+
+		.fixed_amount = true,
+
+		.reset_all_cb = pgstat_io_ops_reset_all_cb,
+		.snapshot_cb = pgstat_io_ops_snapshot_cb,
+	},
+
 	[PGSTAT_KIND_SLRU] = {
 		.name = "slru",
 
@@ -628,6 +637,9 @@ pgstat_report_stat(bool force)
 	/* flush database / relation / function / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
+	/* flush IO Operations stats */
+	partial_flush |= pgstat_flush_io_ops(nowait);
+
 	/* flush wal stats */
 	partial_flush |= pgstat_flush_wal(nowait);
 
@@ -1312,6 +1324,14 @@ pgstat_write_statsfile(void)
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_CHECKPOINTER);
 	write_chunk_s(fpout, &pgStatLocal.snapshot.checkpointer);
 
+	/*
+	 * Write IO Operations stats struct
+	 */
+	pgstat_build_snapshot_fixed(PGSTAT_KIND_IOOPS);
+	write_chunk_s(fpout, &pgStatLocal.snapshot.io_ops.stat_reset_timestamp);
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+		write_chunk_s(fpout, &pgStatLocal.snapshot.io_ops.stats[i]);
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -1486,6 +1506,17 @@ pgstat_read_statsfile(void)
 	if (!read_chunk_s(fpin, &shmem->checkpointer.stats))
 		goto error;
 
+	/*
+	 * Read IO Operations stats struct
+	 */
+	if (!read_chunk_s(fpin, &shmem->io_ops.stat_reset_timestamp))
+		goto error;
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		if (!read_chunk_s(fpin, &shmem->io_ops.stats[i].data))
+			goto error;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c
index fbb1edc527..3d7f90a1b7 100644
--- a/src/backend/utils/activity/pgstat_bgwriter.c
+++ b/src/backend/utils/activity/pgstat_bgwriter.c
@@ -24,7 +24,7 @@ PgStat_BgWriterStats PendingBgWriterStats = {0};
 
 
 /*
- * Report bgwriter statistics
+ * Report bgwriter and IO Operation statistics
  */
 void
 pgstat_report_bgwriter(void)
@@ -56,6 +56,11 @@ pgstat_report_bgwriter(void)
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&PendingBgWriterStats, 0, sizeof(PendingBgWriterStats));
+
+	/*
+	 * Report IO Operations statistics
+	 */
+	pgstat_flush_io_ops(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index af8d513e7b..cfcf127210 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -24,7 +24,7 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0};
 
 
 /*
- * Report checkpointer statistics
+ * Report checkpointer and IO Operation statistics
  */
 void
 pgstat_report_checkpointer(void)
@@ -62,6 +62,11 @@ pgstat_report_checkpointer(void)
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&PendingCheckpointerStats, 0, sizeof(PendingCheckpointerStats));
+
+	/*
+	 * Report IO Operation statistics
+	 */
+	pgstat_flush_io_ops(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io_ops.c b/src/backend/utils/activity/pgstat_io_ops.c
new file mode 100644
index 0000000000..72d02f4dda
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_io_ops.c
@@ -0,0 +1,297 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_io_ops.c
+ *	  Implementation of IO operation statistics.
+ *
+ * This file contains the implementation of IO operation statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of
+ * statistics.
+ *
+ * Copyright (c) 2001-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_io_ops.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+static PgStat_IOContextOps pending_IOOpStats;
+bool		have_ioopstats = false;
+
+void
+pgstat_count_io_op(IOOp io_op, IOContext io_context)
+{
+	PgStat_IOOpCounters *pending_counters = &pending_IOOpStats.data[io_context];
+
+	Assert(pgstat_expect_io_op(MyBackendType, io_context, io_op));
+
+	switch (io_op)
+	{
+		case IOOP_ALLOC:
+			pending_counters->allocs++;
+			break;
+		case IOOP_EXTEND:
+			pending_counters->extends++;
+			break;
+		case IOOP_FSYNC:
+			pending_counters->fsyncs++;
+			break;
+		case IOOP_READ:
+			pending_counters->reads++;
+			break;
+		case IOOP_WRITE:
+			pending_counters->writes++;
+			break;
+	}
+
+	have_ioopstats = true;
+}
+
+PgStat_BackendIOContextOps *
+pgstat_fetch_backend_io_context_ops(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_IOOPS);
+
+	return &pgStatLocal.snapshot.io_ops;
+}
+
+/*
+ * Flush out locally pending IO Operation statistics entries
+ *
+ * If no stats have been recorded, this function returns false.
+ *
+ * If nowait is true, this function returns true if the lock could not be
+ * acquired. Otherwise return false.
+ */
+bool
+pgstat_flush_io_ops(bool nowait)
+{
+	PgStatShared_IOContextOps *type_shstats;
+
+	if (!have_ioopstats)
+		return false;
+
+	type_shstats =
+		&pgStatLocal.shmem->io_ops.stats[MyBackendType];
+
+	if (!nowait)
+		LWLockAcquire(&type_shstats->lock, LW_EXCLUSIVE);
+	else if (!LWLockConditionalAcquire(&type_shstats->lock, LW_EXCLUSIVE))
+		return true;
+
+	for (int i = 0; i < IOCONTEXT_NUM_TYPES; i++)
+	{
+		PgStat_IOOpCounters *sharedent = &type_shstats->data[i];
+		PgStat_IOOpCounters *pendingent = &pending_IOOpStats.data[i];
+
+#define IO_OP_ACC(fld) sharedent->fld += pendingent->fld
+		IO_OP_ACC(allocs);
+		IO_OP_ACC(extends);
+		IO_OP_ACC(fsyncs);
+		IO_OP_ACC(reads);
+		IO_OP_ACC(writes);
+#undef IO_OP_ACC
+	}
+
+	LWLockRelease(&type_shstats->lock);
+
+	memset(&pending_IOOpStats, 0, sizeof(pending_IOOpStats));
+
+	have_ioopstats = false;
+
+	return false;
+}
+
+const char *
+pgstat_io_context_desc(IOContext io_context)
+{
+	switch (io_context)
+	{
+		case IOCONTEXT_LOCAL:
+			return "Local";
+		case IOCONTEXT_SHARED:
+			return "Shared";
+		case IOCONTEXT_STRATEGY:
+			return "Strategy";
+	}
+
+	elog(ERROR, "unrecognized IOContext value: %d", io_context);
+}
+
+const char *
+pgstat_io_op_desc(IOOp io_op)
+{
+	switch (io_op)
+	{
+		case IOOP_ALLOC:
+			return "Alloc";
+		case IOOP_EXTEND:
+			return "Extend";
+		case IOOP_FSYNC:
+			return "Fsync";
+		case IOOP_READ:
+			return "Read";
+		case IOOP_WRITE:
+			return "Write";
+	}
+
+	elog(ERROR, "unrecognized IOOp value: %d", io_op);
+}
+
+void
+pgstat_io_ops_reset_all_cb(TimestampTz ts)
+{
+	PgStatShared_BackendIOContextOps *backends_stats_shmem = &pgStatLocal.shmem->io_ops;
+
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		PgStatShared_IOContextOps *stats_shmem = &backends_stats_shmem->stats[i];
+
+		LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
+
+		/*
+		 * Use the lock in the first BackendType's PgStat_IOContextOps to
+		 * protect the reset timestamp as well.
+		 */
+		if (i == 0)
+			backends_stats_shmem->stat_reset_timestamp = ts;
+
+		memset(stats_shmem->data, 0, sizeof(stats_shmem->data));
+		LWLockRelease(&stats_shmem->lock);
+	}
+}
+
+void
+pgstat_io_ops_snapshot_cb(void)
+{
+	PgStatShared_BackendIOContextOps *backends_stats_shmem = &pgStatLocal.shmem->io_ops;
+	PgStat_BackendIOContextOps *backends_stats_snap = &pgStatLocal.snapshot.io_ops;
+
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		PgStatShared_IOContextOps *stats_shmem = &backends_stats_shmem->stats[i];
+		PgStat_IOContextOps *stats_snap = &backends_stats_snap->stats[i];
+
+		LWLockAcquire(&stats_shmem->lock, LW_SHARED);
+
+		/*
+		 * Use the lock in the first BackendType's PgStat_IOContextOps to
+		 * protect the reset timestamp as well.
+		 */
+		if (i == 0)
+			backends_stats_snap->stat_reset_timestamp = backends_stats_shmem->stat_reset_timestamp;
+
+		memcpy(stats_snap->data, stats_shmem->data, sizeof(stats_shmem->data));
+		LWLockRelease(&stats_shmem->lock);
+	}
+
+}
+
+/*
+* IO Operation statistics are not collected for all BackendTypes.
+*
+* The following BackendTypes do not participate in the cumulative stats
+* subsystem or do not do IO operations worth reporting statistics on:
+* - Syslogger because it is not connected to shared memory
+* - Archiver because most relevant archiving IO is delegated to a
+*   specialized command or module
+* - WAL Receiver and WAL Writer IO is not tracked in pg_stat_io for now
+*
+* Function returns true if BackendType participates in the cumulative stats
+* subsystem for IO Operations and false if it does not.
+*/
+bool
+pgstat_io_op_stats_collected(BackendType bktype)
+{
+	return bktype != B_INVALID && bktype != B_ARCHIVER && bktype != B_LOGGER &&
+		bktype != B_WAL_RECEIVER && bktype != B_WAL_WRITER;
+}
+
+bool
+pgstat_bktype_io_context_valid(BackendType bktype, IOContext io_context)
+{
+	bool		no_strategy;
+	bool		no_local;
+
+	/*
+	 * Not all BackendTypes will use a BufferAccessStrategy.
+	 */
+	no_strategy = bktype == B_AUTOVAC_LAUNCHER || bktype ==
+		B_BG_WRITER || bktype == B_CHECKPOINTER;
+
+	/*
+	 * Only regular backends and WAL Sender processes executing queries should
+	 * use local buffers.
+	 */
+	no_local = bktype == B_AUTOVAC_LAUNCHER || bktype ==
+		B_BG_WRITER || bktype == B_CHECKPOINTER || bktype ==
+		B_AUTOVAC_WORKER || bktype == B_BG_WORKER || bktype ==
+		B_STANDALONE_BACKEND || bktype == B_STARTUP;
+
+	if (io_context == IOCONTEXT_STRATEGY && no_strategy)
+		return false;
+
+	if (io_context == IOCONTEXT_LOCAL && no_local)
+		return false;
+
+	return true;
+}
+
+bool
+pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op)
+{
+	if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) && io_op ==
+		IOOP_READ)
+		return false;
+
+	if ((bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER || bktype ==
+		 B_CHECKPOINTER) && io_op == IOOP_EXTEND)
+		return false;
+
+	return true;
+}
+
+bool
+pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op)
+{
+	/*
+	 * Temporary tables using local buffers are not logged and thus do not
+	 * require fsync'ing. Set this cell to NULL to differentiate between an
+	 * invalid combination and 0 observed IO Operations.
+	 *
+	 * IOOP_FSYNC IOOps done by a backend using a BufferAccessStrategy are
+	 * counted in the IOCONTEXT_SHARED IOContext. See comment in
+	 * ForwardSyncRequest() for more details.
+	 */
+	if ((io_context == IOCONTEXT_LOCAL || io_context == IOCONTEXT_STRATEGY) &&
+		io_op == IOOP_FSYNC)
+		return false;
+
+	return true;
+}
+
+bool
+pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op)
+{
+	if (!pgstat_io_op_stats_collected(bktype))
+		return false;
+
+	if (!pgstat_bktype_io_context_valid(bktype, io_context))
+		return false;
+
+	if (!pgstat_bktype_io_op_valid(bktype, io_op))
+		return false;
+
+	if (!pgstat_io_context_io_op_valid(io_context, io_op))
+		return false;
+
+	/*
+	 * There are currently no cases of a BackendType, IOContext, IOOp
+	 * combination that are specifically invalid.
+	 */
+	return true;
+}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index a846d9ffb6..7a2fd1ccf9 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -205,7 +205,7 @@ pgstat_drop_relation(Relation rel)
 }
 
 /*
- * Report that the table was just vacuumed.
+ * Report that the table was just vacuumed and flush IO Operation statistics.
  */
 void
 pgstat_report_vacuum(Oid tableoid, bool shared,
@@ -257,10 +257,18 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	}
 
 	pgstat_unlock_entry(entry_ref);
+
+	/*
+	 * Flush IO Operations statistics now. pgstat_report_stat() will flush IO
+	 * Operation stats, however this will not be called after an entire
+	 * autovacuum cycle is done -- which will likely vacuum many relations --
+	 * or until the VACUUM command has processed all tables and committed.
+	 */
+	pgstat_flush_io_ops(false);
 }
 
 /*
- * Report that the table was just analyzed.
+ * Report that the table was just analyzed and flush IO Operation statistics.
  *
  * Caller must provide new live- and dead-tuples estimates, as well as a
  * flag indicating whether to reset the changes_since_analyze counter.
@@ -340,6 +348,9 @@ pgstat_report_analyze(Relation rel,
 	}
 
 	pgstat_unlock_entry(entry_ref);
+
+	/* see pgstat_report_vacuum() */
+	pgstat_flush_io_ops(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 89060ef29a..2acfeb3192 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -202,6 +202,10 @@ StatsShmemInit(void)
 		LWLockInitialize(&ctl->checkpointer.lock, LWTRANCHE_PGSTATS_DATA);
 		LWLockInitialize(&ctl->slru.lock, LWTRANCHE_PGSTATS_DATA);
 		LWLockInitialize(&ctl->wal.lock, LWTRANCHE_PGSTATS_DATA);
+
+		for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+			LWLockInitialize(&ctl->io_ops.stats[i].lock,
+							 LWTRANCHE_PGSTATS_DATA);
 	}
 	else
 	{
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 5a878bd115..9cac407b42 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -34,7 +34,7 @@ static WalUsage prevWalUsage;
 
 /*
  * Calculate how much WAL usage counters have increased and update
- * shared statistics.
+ * shared WAL and IO Operation statistics.
  *
  * Must be called by processes that generate WAL, that do not call
  * pgstat_report_stat(), like walwriter.
@@ -43,6 +43,8 @@ void
 pgstat_report_wal(bool force)
 {
 	pgstat_flush_wal(force);
+
+	pgstat_flush_io_ops(force);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index d9e2a79382..cda4447e53 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2092,6 +2092,8 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		pgstat_reset_of_kind(PGSTAT_KIND_BGWRITER);
 		pgstat_reset_of_kind(PGSTAT_KIND_CHECKPOINTER);
 	}
+	else if (strcmp(target, "io") == 0)
+		pgstat_reset_of_kind(PGSTAT_KIND_IOOPS);
 	else if (strcmp(target, "recovery_prefetch") == 0)
 		XLogPrefetchResetStats();
 	else if (strcmp(target, "wal") == 0)
@@ -2100,7 +2102,7 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
+				 errhint("Target must be \"archiver\", \"io\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
 
 	PG_RETURN_VOID();
 }
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 7c41b27994..f65e9635a3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -331,6 +331,8 @@ typedef enum BackendType
 	B_WAL_WRITER,
 } BackendType;
 
+#define BACKEND_NUM_TYPES B_WAL_WRITER + 1
+
 extern PGDLLIMPORT BackendType MyBackendType;
 
 extern const char *GetBackendTypeDesc(BackendType backendType);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ac28f813b4..83b416c59d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -48,6 +48,7 @@ typedef enum PgStat_Kind
 	PGSTAT_KIND_ARCHIVER,
 	PGSTAT_KIND_BGWRITER,
 	PGSTAT_KIND_CHECKPOINTER,
+	PGSTAT_KIND_IOOPS,
 	PGSTAT_KIND_SLRU,
 	PGSTAT_KIND_WAL,
 } PgStat_Kind;
@@ -242,7 +243,7 @@ typedef struct PgStat_TableXactStatus
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BCA7
+#define PGSTAT_FILE_FORMAT_ID	0x01A5BCA8
 
 typedef struct PgStat_ArchiverStats
 {
@@ -276,6 +277,50 @@ typedef struct PgStat_CheckpointerStats
 	PgStat_Counter buf_fsync_backend;
 } PgStat_CheckpointerStats;
 
+/*
+ * Types related to counting IO Operations for various IO Contexts
+ */
+
+typedef enum IOOp
+{
+	IOOP_ALLOC,
+	IOOP_EXTEND,
+	IOOP_FSYNC,
+	IOOP_READ,
+	IOOP_WRITE,
+}			IOOp;
+
+#define IOOP_NUM_TYPES (IOOP_WRITE + 1)
+
+typedef enum IOContext
+{
+	IOCONTEXT_LOCAL,
+	IOCONTEXT_SHARED,
+	IOCONTEXT_STRATEGY,
+}			IOContext;
+
+#define IOCONTEXT_NUM_TYPES (IOCONTEXT_STRATEGY + 1)
+
+typedef struct PgStat_IOOpCounters
+{
+	PgStat_Counter allocs;
+	PgStat_Counter extends;
+	PgStat_Counter fsyncs;
+	PgStat_Counter reads;
+	PgStat_Counter writes;
+}			PgStat_IOOpCounters;
+
+typedef struct PgStat_IOContextOps
+{
+	PgStat_IOOpCounters data[IOCONTEXT_NUM_TYPES];
+}			PgStat_IOContextOps;
+
+typedef struct PgStat_BackendIOContextOps
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_IOContextOps stats[BACKEND_NUM_TYPES];
+}			PgStat_BackendIOContextOps;
+
 typedef struct PgStat_StatDBEntry
 {
 	PgStat_Counter n_xact_commit;
@@ -453,6 +498,62 @@ extern void pgstat_report_checkpointer(void);
 extern PgStat_CheckpointerStats *pgstat_fetch_stat_checkpointer(void);
 
 
+/*
+ * Functions in pgstat_io_ops.c
+ */
+
+extern void pgstat_count_io_op(IOOp io_op, IOContext io_context);
+extern PgStat_BackendIOContextOps * pgstat_fetch_backend_io_context_ops(void);
+extern bool pgstat_flush_io_ops(bool nowait);
+extern const char *pgstat_io_context_desc(IOContext io_context);
+extern const char *pgstat_io_op_desc(IOOp io_op);
+
+/* Validation functions in pgstat_io_ops.c */
+extern bool pgstat_io_op_stats_collected(BackendType bktype);
+extern bool pgstat_bktype_io_context_valid(BackendType bktype, IOContext io_context);
+extern bool pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op);
+extern bool pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op);
+extern bool pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op);
+
+/*
+ * Functions to assert that invalid IO Operation counters are zero. Used with
+ * the validation functions in pgstat_io_ops.c
+ */
+static inline void
+pgstat_io_context_ops_assert_zero(PgStat_IOOpCounters * counters)
+{
+	Assert(counters->allocs == 0 && counters->extends == 0 &&
+		   counters->fsyncs == 0 && counters->reads == 0 &&
+		   counters->writes == 0);
+}
+
+static inline void
+pgstat_io_op_assert_zero(PgStat_IOOpCounters * counters, IOOp io_op)
+{
+	switch (io_op)
+	{
+		case IOOP_ALLOC:
+			Assert(counters->allocs == 0);
+			return;
+		case IOOP_EXTEND:
+			Assert(counters->extends == 0);
+			return;
+		case IOOP_FSYNC:
+			Assert(counters->fsyncs == 0);
+			return;
+		case IOOP_READ:
+			Assert(counters->reads == 0);
+			return;
+		case IOOP_WRITE:
+			Assert(counters->writes == 0);
+			return;
+	}
+
+	elog(ERROR, "unrecognized IOOp value: %d", io_op);
+}
+
+
+
 /*
  * Functions in pgstat_database.c
  */
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 72466551d7..aa064173ee 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -346,7 +346,7 @@ extern void ScheduleBufferTagForWriteback(WritebackContext *context, BufferTag *
 
 /* freelist.c */
 extern BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
-									 uint32 *buf_state);
+									 uint32 *buf_state, bool *from_ring);
 extern void StrategyFreeBuffer(BufferDesc *buf);
 extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 								 BufferDesc *buf);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 9303d05427..26e8ec2331 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -329,6 +329,24 @@ typedef struct PgStatShared_Checkpointer
 	PgStat_CheckpointerStats reset_offset;
 } PgStatShared_Checkpointer;
 
+typedef struct PgStatShared_IOContextOps
+{
+	/*
+	 * lock protects ->data, PgStatShared_BackendIOContextOps->stats[0] also
+	 * protects PgStatShared_BackendIOContextOps->stat_reset_timestamp.
+	 */
+	LWLock		lock;
+	PgStat_IOOpCounters data[IOCONTEXT_NUM_TYPES];
+}			PgStatShared_IOContextOps;
+
+typedef struct PgStatShared_BackendIOContextOps
+{
+	/* ->stats_reset_timestamp is protected by ->stats[0].lock */
+	TimestampTz stat_reset_timestamp;
+	PgStatShared_IOContextOps stats[BACKEND_NUM_TYPES];
+}			PgStatShared_BackendIOContextOps;
+
+
 typedef struct PgStatShared_SLRU
 {
 	/* lock protects ->stats */
@@ -419,6 +437,7 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Archiver archiver;
 	PgStatShared_BgWriter bgwriter;
 	PgStatShared_Checkpointer checkpointer;
+	PgStatShared_BackendIOContextOps io_ops;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
 } PgStat_ShmemControl;
@@ -442,6 +461,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_CheckpointerStats checkpointer;
 
+	PgStat_BackendIOContextOps io_ops;
+
 	PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS];
 
 	PgStat_WalStats wal;
@@ -549,6 +570,14 @@ extern void pgstat_database_reset_timestamp_cb(PgStatShared_Common *header, Time
 extern bool pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 
 
+/*
+ * Functions in pgstat_io_ops.c
+ */
+
+extern void pgstat_io_ops_reset_all_cb(TimestampTz ts);
+extern void pgstat_io_ops_snapshot_cb(void);
+
+
 /*
  * Functions in pgstat_relation.c
  */
-- 
2.34.1

From 17036e78a92a75da83ea7811fd738d490fc6c65e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 11 Aug 2022 18:28:50 -0400
Subject: [PATCH v27 4/4] Add system view tracking IO ops per backend type

Add pg_stat_io, a system view which tracks the number of IOOp (allocs,
extends, fsyncs, reads, and writes) done through each IOContext (shared
buffers, local buffers, strategy buffers) by each type of backend (e.g.
client backend, checkpointer).

Some BackendTypes do not accumulate IO operations statistics and will
not be included in the view.

Some IOContexts are not used by some BackendTypes and will not be in the
view. For example, checkpointer does not use a BufferAccessStrategy
(currently), so there will be no row for the "strategy" IOContext for
checkpointer.

Some IOOps are invalid in combination with certain IOContexts. Those
cells will be NULL in the view to distinguish between 0 observed IOOps
of that type and an invalid combination. For example, local buffers are
not fsync'd so cells for all BackendTypes for IOCONTEXT_STRATEGY and
IOOP_FSYNC will be NULL.

Some BackendTypes never perform certain IOOps. Those cells will also be
NULL in the view. For example, bgwriter should not perform reads.

View stats are fetched from statistics incremented when a backend
performs an IO Operation and maintained by the cumulative statistics
subsystem.

Each row of the view is stats for a particular BackendType for a
particular IOContext (e.g. shared buffer accesses by checkpointer) and
each column in the view is the total number of IO Operations done (e.g.
writes).
So a cell in the view would be, for example, the number of shared
buffers written by checkpointer since the last stats reset.

Note that some of the cells in the view are redundant with fields in
pg_stat_bgwriter (e.g. buffers_backend), however these have been kept in
pg_stat_bgwriter for backwards compatibility. Deriving the redundant
pg_stat_bgwriter stats from the IO operations stats structures was also
problematic due to the separate reset targets for 'bgwriter' and
'io'.

Suggested by Andres Freund

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Justin Pryzby <pry...@telsasoft.com>, Kyotaro Horiguchi <horikyota....@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml         | 115 ++++++++++++++-
 src/backend/catalog/system_views.sql |  12 ++
 src/backend/utils/adt/pgstatfuncs.c  | 102 ++++++++++++++
 src/include/catalog/pg_proc.dat      |   9 ++
 src/test/regress/expected/rules.out  |   9 ++
 src/test/regress/expected/stats.out  | 201 +++++++++++++++++++++++++++
 src/test/regress/sql/stats.sql       | 103 ++++++++++++++
 7 files changed, 550 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 14d97ec92c..98750121c5 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -448,6 +448,15 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
+      <entry>A row for each IO Context for each backend type showing
+      statistics about backend IO operations. See
+       <link linkend="monitoring-pg-stat-io-view">
+       <structname>pg_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
       <entry>One row only, showing statistics about WAL activity. See
@@ -3600,7 +3609,111 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
       </para>
       <para>
-       Time at which these statistics were last reset
+       Time at which these statistics were last reset.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+ </sect2>
+
+ <sect2 id="monitoring-pg-stat-io-view">
+  <title><structname>pg_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_io</structname> view has a row for each backend
+   type for each possible IO Context containing global data for the cluster for
+   that backend and IO Context.
+  </para>
+
+  <table id="pg-stat-io-view" xreflabel="pg_stat_io">
+   <title><structname>pg_stat_io</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>backend_type</structfield> <type>text</type>
+      </para>
+      <para>
+       Type of backend (e.g. background worker, autovacuum worker).
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>io_context</structfield> <type>text</type>
+      </para>
+      <para>
+       IO Context used (e.g. shared buffers, direct).
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>alloc</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of buffers allocated.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>extend</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of blocks extended.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>fsync</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of blocks fsynced.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>read</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of blocks read.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>write</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of blocks written.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset.
       </para></entry>
      </row>
     </tbody>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f369b1fc14..5fab964219 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1115,6 +1115,18 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_io AS
+SELECT
+       b.backend_type,
+       b.io_context,
+       b.alloc,
+       b.extend,
+       b.fsync,
+       b.read,
+       b.write,
+       b.stats_reset
+FROM pg_stat_get_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cda4447e53..821216d01e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1733,6 +1733,108 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
 }
 
+Datum
+pg_stat_get_io(PG_FUNCTION_ARGS)
+{
+	PgStat_BackendIOContextOps *backends_io_stats;
+	ReturnSetInfo *rsinfo;
+	Datum		reset_time;
+
+	/*
+	 * When adding a new column to the pg_stat_io view, add a new enum value
+	 * here above IO_NUM_COLUMNS.
+	 */
+	enum
+	{
+		IO_COLUMN_BACKEND_TYPE,
+		IO_COLUMN_IO_CONTEXT,
+		IO_COLUMN_ALLOCS,
+		IO_COLUMN_EXTENDS,
+		IO_COLUMN_FSYNCS,
+		IO_COLUMN_READS,
+		IO_COLUMN_WRITES,
+		IO_COLUMN_RESET_TIME,
+		IO_NUM_COLUMNS,
+	};
+
+#define IO_COLUMN_IOOP_OFFSET (IO_COLUMN_IO_CONTEXT + 1)
+
+	SetSingleFuncCall(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backends_io_stats = pgstat_fetch_backend_io_context_ops();
+
+	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
+
+	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
+	{
+		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		PgStat_IOContextOps *io_context_ops = &backends_io_stats->stats[bktype];
+
+		/*
+		 * For those BackendTypes without IO Operation stats, skip
+		 * representing them in the view altogether.
+		 */
+		if (!pgstat_io_op_stats_collected(bktype))
+		{
+			for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+				pgstat_io_context_ops_assert_zero(&io_context_ops->data[io_context]);
+			continue;
+		}
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			PgStat_IOOpCounters *counters = &io_context_ops->data[io_context];
+			Datum		values[IO_NUM_COLUMNS];
+			bool		nulls[IO_NUM_COLUMNS];
+
+			/*
+			 * Some combinations of IOCONTEXT and BackendType are not valid
+			 * for any type of IO Operation. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_bktype_io_context_valid(bktype, io_context))
+			{
+				pgstat_io_context_ops_assert_zero(counters);
+				continue;
+			}
+
+			memset(values, 0, sizeof(values));
+			memset(nulls, 0, sizeof(nulls));
+
+			values[IO_COLUMN_BACKEND_TYPE] = bktype_desc;
+			values[IO_COLUMN_IO_CONTEXT] = CStringGetTextDatum(
+															   pgstat_io_context_desc(io_context));
+			values[IO_COLUMN_ALLOCS] = Int64GetDatum(counters->allocs);
+			values[IO_COLUMN_EXTENDS] = Int64GetDatum(counters->extends);
+			values[IO_COLUMN_FSYNCS] = Int64GetDatum(counters->fsyncs);
+			values[IO_COLUMN_READS] = Int64GetDatum(counters->reads);
+			values[IO_COLUMN_WRITES] = Int64GetDatum(counters->writes);
+			values[IO_COLUMN_RESET_TIME] = TimestampTzGetDatum(reset_time);
+
+
+			/*
+			 * Some combinations of BackendType and IOOp and of IOContext and
+			 * IOOp are not valid. Set these cells in the view NULL and assert
+			 * that these stats are zero as expected.
+			 */
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				if (!pgstat_bktype_io_op_valid(bktype, io_op) ||
+					!pgstat_io_context_io_op_valid(io_context, io_op))
+				{
+					pgstat_io_op_assert_zero(counters, io_op);
+					nulls[io_op + IO_COLUMN_IOOP_OFFSET] = true;
+				}
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index be47583122..4aefebc7f8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5646,6 +5646,15 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '8459', descr => 'statistics: per backend type IO statistics',
+  proname => 'pg_stat_get_io', provolatile => 'v',
+  prorows => '14', proretset => 't',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,io_context,alloc,extend,fsync,read,write,stats_reset}',
+  prosrc => 'pg_stat_get_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7ec3d2688f..d122c36556 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1873,6 +1873,15 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
   WHERE (s.client_port IS NOT NULL);
+pg_stat_io| SELECT b.backend_type,
+    b.io_context,
+    b.alloc,
+    b.extend,
+    b.fsync,
+    b.read,
+    b.write,
+    b.stats_reset
+   FROM pg_stat_get_io() b(backend_type, io_context, alloc, extend, fsync, read, write, stats_reset);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6b233ff4c0..a75fc91c57 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -796,4 +796,205 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
+-- Test that allocs, extends, reads, and writes to Shared Buffers and fsyncs
+-- done to ensure durability of Shared Buffers are tracked in pg_stat_io.
+SELECT sum(alloc) AS io_sum_shared_allocs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(extend) AS io_sum_shared_extends_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(fsync) AS io_sum_shared_fsyncs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(read) AS io_sum_shared_reads_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(write) AS io_sum_shared_writes_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+-- Create a regular table and insert some data to generate IOCONTEXT_SHARED allocs and extends.
+CREATE TABLE test_io_shared(a int);
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+-- After a checkpoint, there should be some additional IOCONTEXT_SHARED writes and fsyncs.
+CHECKPOINT;
+SELECT sum(alloc) AS io_sum_shared_allocs_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(extend) AS io_sum_shared_extends_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(write) AS io_sum_shared_writes_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(fsync) AS io_sum_shared_fsyncs_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT :io_sum_shared_allocs_after > :io_sum_shared_allocs_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_shared_extends_after > :io_sum_shared_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off' OR :io_sum_shared_fsyncs_after > :io_sum_shared_fsyncs_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_shared_writes_after > :io_sum_shared_writes_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Change the tablespace so that the table is rewritten directly, then SELECT
+-- from it to cause it to be read back into Shared Buffers.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
+ALTER TABLE test_io_shared SET TABLESPACE test_io_shared_stats_tblspc;
+SELECT COUNT(*) FROM test_io_shared;
+ count 
+-------
+   100
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(read) AS io_sum_shared_reads_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT :io_sum_shared_reads_after > :io_sum_shared_reads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_io_shared;
+DROP TABLESPACE test_io_shared_stats_tblspc;
+-- Test that allocs, extends, reads, and writes of temporary tables are tracked
+-- in pg_stat_io.
+CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
+SELECT sum(alloc) AS io_sum_local_allocs_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(extend) AS io_sum_local_extends_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(read) AS io_sum_local_reads_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(write) AS io_sum_local_writes_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+-- Insert enough values that we need to reuse and write out dirty local
+-- buffers.
+INSERT INTO test_io_local SELECT generate_series(1, 80000) as id,
+'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+-- Read in evicted buffers.
+SELECT COUNT(*) FROM test_io_local;
+ count 
+-------
+ 80000
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(alloc) AS io_sum_local_allocs_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(extend) AS io_sum_local_extends_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(read) AS io_sum_local_reads_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(write) AS io_sum_local_writes_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT :io_sum_local_allocs_after > :io_sum_local_allocs_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_local_extends_after > :io_sum_local_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_local_reads_after > :io_sum_local_reads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_local_writes_after > :io_sum_local_writes_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test that, when using a Strategy, reusing buffers from the Strategy ring
+-- count as "Strategy" allocs in pg_stat_io. Also test that Strategy reads are
+-- counted as such.
+-- Set wal_skip_threshold smaller than the expected size of test_io_strategy so
+-- that, even if wal_level is minimal, VACUUM FULL will fsync the newly
+-- rewritten test_io_strategy instead of writing it to WAL. Writing it to WAL
+-- will result in the newly written relation pages being in shared buffers --
+-- preventing us from testing BufferAccessStrategy allocs and reads.
+SET wal_skip_threshold = '1 kB';
+SELECT sum(alloc) AS io_sum_strategy_allocs_before FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT sum(read) AS io_sum_strategy_reads_before FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+CREATE TABLE test_io_strategy(a INT, b INT);
+ALTER TABLE test_io_strategy SET (autovacuum_enabled = 'false');
+INSERT INTO test_io_strategy SELECT i, i from generate_series(1, 8000)i;
+-- Ensure that the next VACUUM will need to perform IO by rewriting the table
+-- first with VACUUM (FULL).
+VACUUM (FULL) test_io_strategy;
+VACUUM (PARALLEL 0) test_io_strategy;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(alloc) AS io_sum_strategy_allocs_after FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT sum(read) AS io_sum_strategy_reads_after FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT :io_sum_strategy_allocs_after > :io_sum_strategy_allocs_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_strategy_reads_after > :io_sum_strategy_reads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_io_strategy;
+-- Hope that the previous value of wal_skip_threshold was the default. We
+-- can't use BEGIN...SET LOCAL since VACUUM can't be run inside a transaction
+-- block.
+RESET wal_skip_threshold;
+-- Test that, when using a Strategy, if creating a relation, Strategy extends
+-- are counted in pg_stat_io.
+-- A CTAS uses a Bulkwrite strategy.
+SELECT sum(extend) AS io_sum_strategy_extends_before FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+CREATE TABLE test_io_strategy_extend AS SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extend) AS io_sum_strategy_extends_after FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT :io_sum_strategy_extends_after > :io_sum_strategy_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_io_strategy_extend;
+-- Test stats reset
+SELECT sum(alloc) + sum(extend) + sum(fsync) + sum(read) + sum(write) AS io_stats_pre_reset FROM pg_stat_io \gset
+SELECT pg_stat_reset_shared('io');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT sum(alloc) + sum(extend) + sum(fsync) + sum(read) + sum(write) AS io_stats_post_reset FROM pg_stat_io \gset
+SELECT :io_stats_post_reset < :io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 096f00ce8b..090cc67296 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -396,4 +396,107 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
+
+-- Test that allocs, extends, reads, and writes to Shared Buffers and fsyncs
+-- done to ensure durability of Shared Buffers are tracked in pg_stat_io.
+SELECT sum(alloc) AS io_sum_shared_allocs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(extend) AS io_sum_shared_extends_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(fsync) AS io_sum_shared_fsyncs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(read) AS io_sum_shared_reads_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(write) AS io_sum_shared_writes_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
+-- Create a regular table and insert some data to generate IOCONTEXT_SHARED allocs and extends.
+CREATE TABLE test_io_shared(a int);
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+-- After a checkpoint, there should be some additional IOCONTEXT_SHARED writes and fsyncs.
+CHECKPOINT;
+SELECT sum(alloc) AS io_sum_shared_allocs_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(extend) AS io_sum_shared_extends_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(write) AS io_sum_shared_writes_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT sum(fsync) AS io_sum_shared_fsyncs_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT :io_sum_shared_allocs_after > :io_sum_shared_allocs_before;
+SELECT :io_sum_shared_extends_after > :io_sum_shared_extends_before;
+SELECT current_setting('fsync') = 'off' OR :io_sum_shared_fsyncs_after > :io_sum_shared_fsyncs_before;
+SELECT :io_sum_shared_writes_after > :io_sum_shared_writes_before;
+-- Change the tablespace so that the table is rewritten directly, then SELECT
+-- from it to cause it to be read back into Shared Buffers.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
+ALTER TABLE test_io_shared SET TABLESPACE test_io_shared_stats_tblspc;
+SELECT COUNT(*) FROM test_io_shared;
+SELECT pg_stat_force_next_flush();
+SELECT sum(read) AS io_sum_shared_reads_after FROM pg_stat_io WHERE io_context = 'Shared' \gset
+SELECT :io_sum_shared_reads_after > :io_sum_shared_reads_before;
+DROP TABLE test_io_shared;
+DROP TABLESPACE test_io_shared_stats_tblspc;
+
+-- Test that allocs, extends, reads, and writes of temporary tables are tracked
+-- in pg_stat_io.
+CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
+SELECT sum(alloc) AS io_sum_local_allocs_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(extend) AS io_sum_local_extends_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(read) AS io_sum_local_reads_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(write) AS io_sum_local_writes_before FROM pg_stat_io WHERE io_context = 'Local' \gset
+-- Insert enough values that we need to reuse and write out dirty local
+-- buffers.
+INSERT INTO test_io_local SELECT generate_series(1, 80000) as id,
+'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
+-- Read in evicted buffers.
+SELECT COUNT(*) FROM test_io_local;
+SELECT pg_stat_force_next_flush();
+SELECT sum(alloc) AS io_sum_local_allocs_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(extend) AS io_sum_local_extends_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(read) AS io_sum_local_reads_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT sum(write) AS io_sum_local_writes_after FROM pg_stat_io WHERE io_context = 'Local' \gset
+SELECT :io_sum_local_allocs_after > :io_sum_local_allocs_before;
+SELECT :io_sum_local_extends_after > :io_sum_local_extends_before;
+SELECT :io_sum_local_reads_after > :io_sum_local_reads_before;
+SELECT :io_sum_local_writes_after > :io_sum_local_writes_before;
+
+-- Test that, when using a Strategy, reusing buffers from the Strategy ring
+-- count as "Strategy" allocs in pg_stat_io. Also test that Strategy reads are
+-- counted as such.
+
+-- Set wal_skip_threshold smaller than the expected size of test_io_strategy so
+-- that, even if wal_level is minimal, VACUUM FULL will fsync the newly
+-- rewritten test_io_strategy instead of writing it to WAL. Writing it to WAL
+-- will result in the newly written relation pages being in shared buffers --
+-- preventing us from testing BufferAccessStrategy allocs and reads.
+SET wal_skip_threshold = '1 kB';
+SELECT sum(alloc) AS io_sum_strategy_allocs_before FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT sum(read) AS io_sum_strategy_reads_before FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+CREATE TABLE test_io_strategy(a INT, b INT);
+ALTER TABLE test_io_strategy SET (autovacuum_enabled = 'false');
+INSERT INTO test_io_strategy SELECT i, i from generate_series(1, 8000)i;
+-- Ensure that the next VACUUM will need to perform IO by rewriting the table
+-- first with VACUUM (FULL).
+VACUUM (FULL) test_io_strategy;
+VACUUM (PARALLEL 0) test_io_strategy;
+SELECT pg_stat_force_next_flush();
+SELECT sum(alloc) AS io_sum_strategy_allocs_after FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT sum(read) AS io_sum_strategy_reads_after FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT :io_sum_strategy_allocs_after > :io_sum_strategy_allocs_before;
+SELECT :io_sum_strategy_reads_after > :io_sum_strategy_reads_before;
+DROP TABLE test_io_strategy;
+-- Hope that the previous value of wal_skip_threshold was the default. We
+-- can't use BEGIN...SET LOCAL since VACUUM can't be run inside a transaction
+-- block.
+RESET wal_skip_threshold;
+
+-- Test that, when using a Strategy, if creating a relation, Strategy extends
+-- are counted in pg_stat_io.
+-- A CTAS uses a Bulkwrite strategy.
+SELECT sum(extend) AS io_sum_strategy_extends_before FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+CREATE TABLE test_io_strategy_extend AS SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+SELECT sum(extend) AS io_sum_strategy_extends_after FROM pg_stat_io WHERE io_context = 'Strategy' \gset
+SELECT :io_sum_strategy_extends_after > :io_sum_strategy_extends_before;
+DROP TABLE test_io_strategy_extend;
+
+-- Test stats reset
+SELECT sum(alloc) + sum(extend) + sum(fsync) + sum(read) + sum(write) AS io_stats_pre_reset FROM pg_stat_io \gset
+SELECT pg_stat_reset_shared('io');
+SELECT sum(alloc) + sum(extend) + sum(fsync) + sum(read) + sum(write) AS io_stats_post_reset FROM pg_stat_io \gset
+SELECT :io_stats_post_reset < :io_stats_pre_reset;
+
 -- End of Stats Test
-- 
2.34.1

From 2f67aadbf36ddca626d03474b962346ffeab89fb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 11 Aug 2022 18:28:24 -0400
Subject: [PATCH v27 1/4] Add BackendType for standalone backends

All backends should have a BackendType to enable statistics reporting
per BackendType.

Add a new BackendType for standalone backends, B_STANDALONE_BACKEND (and
alphabetize the BackendTypes). Both the bootstrap backend and single
user mode backends will have BackendType B_STANDALONE_BACKEND.

Author: Melanie Plageman <melanieplage...@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAAKRu_aaq33UnG4TXq3S-OSXGWj1QGf0sU%2BECH4tNwGFNERkZA%40mail.gmail.com
---
 src/backend/utils/init/miscinit.c | 17 +++++++++++------
 src/include/miscadmin.h           |  5 +++--
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index bd973ba613..bf3871a774 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -176,6 +176,8 @@ InitStandaloneProcess(const char *argv0)
 {
 	Assert(!IsPostmasterEnvironment);
 
+	MyBackendType = B_STANDALONE_BACKEND;
+
 	/*
 	 * Start our win32 signal implementation
 	 */
@@ -255,6 +257,9 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_INVALID:
 			backendDesc = "not initialized";
 			break;
+		case B_ARCHIVER:
+			backendDesc = "archiver";
+			break;
 		case B_AUTOVAC_LAUNCHER:
 			backendDesc = "autovacuum launcher";
 			break;
@@ -273,6 +278,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_CHECKPOINTER:
 			backendDesc = "checkpointer";
 			break;
+		case B_LOGGER:
+			backendDesc = "logger";
+			break;
+		case B_STANDALONE_BACKEND:
+			backendDesc = "standalone backend";
+			break;
 		case B_STARTUP:
 			backendDesc = "startup";
 			break;
@@ -285,12 +296,6 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
-		case B_ARCHIVER:
-			backendDesc = "archiver";
-			break;
-		case B_LOGGER:
-			backendDesc = "logger";
-			break;
 	}
 
 	return backendDesc;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 067b729d5a..7c41b27994 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -316,18 +316,19 @@ extern void SwitchBackToLocalLatch(void);
 typedef enum BackendType
 {
 	B_INVALID = 0,
+	B_ARCHIVER,
 	B_AUTOVAC_LAUNCHER,
 	B_AUTOVAC_WORKER,
 	B_BACKEND,
 	B_BG_WORKER,
 	B_BG_WRITER,
 	B_CHECKPOINTER,
+	B_LOGGER,
+	B_STANDALONE_BACKEND,
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
 	B_WAL_WRITER,
-	B_ARCHIVER,
-	B_LOGGER,
 } BackendType;
 
 extern PGDLLIMPORT BackendType MyBackendType;
-- 
2.34.1

From 8035d5ca54b670e366af4a8688b942dad299eb66 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 11 Aug 2022 18:28:40 -0400
Subject: [PATCH v27 2/4] Remove unneeded call to pgstat_report_wal()

pgstat_report_stat() will be called before shutdown so an explicit call
to pgstat_report_wal() is wasted.
---
 src/backend/postmaster/walwriter.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index e926f8c27c..beb46dcb55 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -293,18 +293,7 @@ HandleWalWriterInterrupts(void)
 	}
 
 	if (ShutdownRequestPending)
-	{
-		/*
-		 * Force reporting remaining WAL statistics at process exit.
-		 *
-		 * Since pgstat_report_wal is invoked with 'force' is false in main
-		 * loop to avoid overloading the cumulative stats system, there may
-		 * exist unreported stats counters for the WAL writer.
-		 */
-		pgstat_report_wal(true);
-
 		proc_exit(0);
-	}
 
 	/* Perform logging of memory contexts of this process */
 	if (LogMemoryContextPending)
-- 
2.34.1

Reply via email to