Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Melanie Plageman Tue, 17 Jan 2023 14:01:14 -0800

v49 attached

On Tue, Jan 17, 2023 at 2:12 PM Andres Freund <and...@anarazel.de> wrote:
> On 2023-01-17 12:22:14 -0500, Melanie Plageman wrote:
>
> > > > +typedef struct PgStat_BackendIO
> > > > +{
> > > > +     PgStat_Counter 
> > > > data[IOCONTEXT_NUM_TYPES][IOOBJECT_NUM_TYPES][IOOP_NUM_TYPES];
> > > > +} PgStat_BackendIO;
> > >
> > > Would it bother you if we swapped the order of iocontext and iobject here 
> > > and
> > > related places? It makes more sense to me semantically, and should now be
> > > pretty easy, code wise.
> >
> > So, thinking about this I started noticing inconsistencies in other
> > areas around this order:
> > For example: ordering of objects mentioned in commit messages and comments,
> > ordering of parameters (like in pgstat_count_io_op() [currently in
> > reverse order]).
> >
> > I think we should make a final decision about this ordering and then
> > make everywhere consistent (including ordering in the view).
> >
> > Currently the order is:
> > BackendType
> >   IOContext
> >     IOObject
> >       IOOp
> >
> > You are suggesting this order:
> > BackendType
> >   IOObject
> >     IOContext
> >       IOOp
> >
> > Could you explain what you find more natural about this ordering (as I
> > find the other more natural)?
>
> The object we're performing IO on determines more things than the context. So
> it just seems like the natural hierarchical fit. The context is a sub-category
> of the object. Consider how it'll look like if we also have objects for 'wal',
> 'temp files'. It'll make sense to group by just the object, but it won't make
> sense to group by just the context.
>
> If it were trivial to do I'd use a different IOContext for each IOObject. But
> it'd make it much harder. So there'll just be a bunch of values of IOContext
> that'll only be used for one or a subset of the IOObjects.
>
>
> The reason to put BackendType at the top is pragmatic - one backend is of a
> single type, but can do IO for all kinds of objects/contexts. So any other
> hierarchy would make the locking etc much harder.
>
>
> > This is one possible natural sentence with these objects:
> >
> > During COPY, a client backend may read in data from a permanent
> > relation.
> > This order is:
> > IOContext
> >   BackendType
> >     IOOp
> >       IOObject
> >
> > I think English sentences are often structured subject, verb, object --
> > but in our case, we have an extra thing that doesn't fit neatly
> > (IOContext).
>
> "..., to avoid polluting the buffer cache it uses the bulk (read|write)
> strategy".
>
>
> > Also, IOOp in a sentence would be in the middle (as the
> > verb). I made it last because a) it feels like the smallest unit b) it
> > would make the code a lot more annoying if it wasn't last.
>
> Yea, I think pragmatically that is the right choice.


I have changed the order and updated all the places using
PgStat_BktypeIO as well as in all locations in which it should be
ordered for consistency (that I could find in the pass I did) -- e.g.
the view definition, function signatures, comments, commit messages,
etc.

> > > > +-- Change the tablespace so that the table is rewritten directly, then 
> > > > SELECT
> > > > +-- from it to cause it to be read back into shared buffers.
> > > > +SET allow_in_place_tablespaces = true;
> > > > +CREATE TABLESPACE regress_io_stats_tblspc LOCATION '';
> > >
> > > Perhaps worth doing this in tablespace.sql, to avoid the additional
> > > checkpoints done as part of CREATE/DROP TABLESPACE?
> > >
> > > Or, at least combine this with the CHECKPOINTs above?
> >
> > I see a checkpoint is requested when dropping the tablespace if not all
> > the files in it are deleted. It seems like if the DROP TABLE for the
> > permanent table is before the explicit checkpoints in the test, then the
> > DROP TABLESPACE will not cause an additional checkpoint.
>
> Unfortunately, that's not how it works :(. See the comment above mdunlink():
>
> > * For regular relations, we don't unlink the first segment file of the rel,
> > * but just truncate it to zero length, and record a request to unlink it 
> > after
> > * the next checkpoint.  Additional segments can be unlinked immediately,
> > * however.  Leaving the empty file in place prevents that relfilenumber
> > * from being reused.  The scenario this protects us from is:
> > ...
>
>
> > Is this what you are suggesting? Dropping the temporary table should not
> > have an effect on this.
>
> I was wondering about simply moving that portion of the test to
> tablespace.sql, where we already created a tablespace.
>
>
> An alternative would be to propose splitting tablespace.sql into one portion
> running at the start of parallel_schedule, and one at the end. Historically,
> we needed tablespace.sql to be optional due to causing problems when
> replicating to another instance on the same machine, but now we have
> allow_in_place_tablespaces.

It seems like the best way would be to split up the tablespace test file
as you suggested and drop the tablespace at the end of the regression
test suite. There could be other tests that could use a tablespace.
Though what I wrote is kind of tablespace test coverage, if this
rewriting behavior no longer happened when doing alter table set
tablespace, we would want to come up with a new test which exercised
that code to count those IO stats, not simply delete it from the
tablespace tests.

> > > SELECT pg_relation_size('test_io_local') / 
> > > current_setting('block_size')::int8 > 100;
> > >
> > > Better toast compression or such could easily make test_io_local smaller 
> > > than
> > > it's today. Seeing that it's too small would make it easier to understand 
> > > the
> > > failure.
> >
> > Good idea. So, I used pg_table_size() because it seems like
> > pg_relation_size() does not include the toast relations. However, I'm
> > not sure this is a good idea, because pg_table_size() includes FSM and
> > visibility map. Should I write a query to get the toast relation name
> > and add pg_relation_size() of that relation and the main relation?
>
> I think it's the right thing to just include the relation size. Your queries
> IIRC won't use the toast table or other forks. So I'd leave it at just
> pg_relation_size().

I did notice that this test wasn't using the toast table for the
toastable column -- but you mentioned better toast compression affecting
the future test stability, so I'm confused.

- Melanie

From 2e29ec2d41fee3fd299c271ade82f8270a16474b Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Tue, 17 Jan 2023 16:10:34 -0500
Subject: [PATCH v49 1/4] pgstat: Infrastructure to track IO operations

Introduce "IOOp", an IO operation done by a backend, "IOObject", the
target object of the IO, and "IOContext", the context or location of the
IO operations on that object. For example, the checkpointer may write a
shared buffer out. This would be considered an IOOP_WRITE IOOp on an
IOOBJECT_RELATION IOObject in the IOCONTEXT_NORMAL IOContext by
BackendType B_CHECKPOINTER.

Each BackendType counts IOOps (evict, extend, fsync, read, reuse, and
write) per IOObject (relation, temp relation) per IOContext (normal,
bulkread, bulkwrite, or vacuum) through a call to pgstat_count_io_op().

Note that this commit introduces the infrastructure to count IO
Operation statistics. A subsequent commit will add calls to
pgstat_count_io_op() in the appropriate locations.

IOObject IOOBJECT_TEMP_RELATION concerns IO Operations on buffers
containing temporary table data, while IOObject IOOBJECT_RELATION
concerns IO Operations on buffers containing permanent relation data.

IOContext IOCONTEXT_NORMAL concerns operations on local and shared
buffers, while IOCONTEXT_BULKREAD, IOCONTEXT_BULKWRITE, and
IOCONTEXT_VACUUM IOContexts concern IO operations on buffers as part of
a BufferAccessStrategy.

Stats on IOOps on all IOObjects in all IOContexts for a given backend
are first counted in a backend's local memory and then flushed to shared
memory and accumulated with those from all other backends, exited and
live.

Some BackendTypes will not flush their pending statistics at regular
intervals and explicitly call pgstat_flush_io_ops() during the course of
normal operations to flush their backend-local IO operation statistics
to shared memory in a timely manner.

Because not all BackendType, IOObject, IOContext, IOOp combinations are
valid, the validity of the stats is checked before flushing pending
stats and before reading in the existing stats file to shared memory.

The aggregated stats in shared memory could be extended in the future
with per-backend stats -- useful for per connection IO statistics and
monitoring.

PGSTAT_FILE_FORMAT_ID should be bumped with this commit.

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Justin Pryzby <pry...@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota....@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml                  |   2 +
 src/backend/utils/activity/Makefile           |   1 +
 src/backend/utils/activity/meson.build        |   1 +
 src/backend/utils/activity/pgstat.c           |  26 ++
 src/backend/utils/activity/pgstat_bgwriter.c  |   7 +-
 .../utils/activity/pgstat_checkpointer.c      |   7 +-
 src/backend/utils/activity/pgstat_io.c        | 386 ++++++++++++++++++
 src/backend/utils/activity/pgstat_relation.c  |  15 +-
 src/backend/utils/activity/pgstat_shmem.c     |   4 +
 src/backend/utils/activity/pgstat_wal.c       |   4 +-
 src/backend/utils/adt/pgstatfuncs.c           |  11 +-
 src/include/miscadmin.h                       |   2 +
 src/include/pgstat.h                          |  68 +++
 src/include/utils/pgstat_internal.h           |  30 ++
 src/tools/pgindent/typedefs.list              |   6 +
 15 files changed, 563 insertions(+), 7 deletions(-)
 create mode 100644 src/backend/utils/activity/pgstat_io.c

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 358d2ff90f..8d51ca3773 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5418,6 +5418,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         the <structname>pg_stat_bgwriter</structname>
         view, <literal>archiver</literal> to reset all the counters shown in
         the <structname>pg_stat_archiver</structname> view,
+        <literal>io</literal> to reset all the counters shown in the
+        <structname>pg_stat_io</structname> view,
         <literal>wal</literal> to reset all the counters shown in the
         <structname>pg_stat_wal</structname> view or
         <literal>recovery_prefetch</literal> to reset all the counters shown
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index a80eda3cf4..7d7482dde0 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
 	pgstat_function.o \
+	pgstat_io.o \
 	pgstat_relation.o \
 	pgstat_replslot.o \
 	pgstat_shmem.o \
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index a2b872c24b..518ee3f798 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -9,6 +9,7 @@ backend_sources += files(
   'pgstat_checkpointer.c',
   'pgstat_database.c',
   'pgstat_function.c',
+  'pgstat_io.c',
   'pgstat_relation.c',
   'pgstat_replslot.c',
   'pgstat_shmem.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 0fa5370bcd..60fc4e761f 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -72,6 +72,7 @@
  * - pgstat_checkpointer.c
  * - pgstat_database.c
  * - pgstat_function.c
+ * - pgstat_io.c
  * - pgstat_relation.c
  * - pgstat_replslot.c
  * - pgstat_slru.c
@@ -359,6 +360,15 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 		.snapshot_cb = pgstat_checkpointer_snapshot_cb,
 	},
 
+	[PGSTAT_KIND_IO] = {
+		.name = "io",
+
+		.fixed_amount = true,
+
+		.reset_all_cb = pgstat_io_reset_all_cb,
+		.snapshot_cb = pgstat_io_snapshot_cb,
+	},
+
 	[PGSTAT_KIND_SLRU] = {
 		.name = "slru",
 
@@ -582,6 +592,7 @@ pgstat_report_stat(bool force)
 
 	/* Don't expend a clock check if nothing to do */
 	if (dlist_is_empty(&pgStatPending) &&
+		!have_iostats &&
 		!have_slrustats &&
 		!pgstat_have_pending_wal())
 	{
@@ -628,6 +639,9 @@ pgstat_report_stat(bool force)
 	/* flush database / relation / function / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
+	/* flush IO stats */
+	partial_flush |= pgstat_flush_io(nowait);
+
 	/* flush wal stats */
 	partial_flush |= pgstat_flush_wal(nowait);
 
@@ -1322,6 +1336,12 @@ pgstat_write_statsfile(void)
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_CHECKPOINTER);
 	write_chunk_s(fpout, &pgStatLocal.snapshot.checkpointer);
 
+	/*
+	 * Write IO stats struct
+	 */
+	pgstat_build_snapshot_fixed(PGSTAT_KIND_IO);
+	write_chunk_s(fpout, &pgStatLocal.snapshot.io);
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -1496,6 +1516,12 @@ pgstat_read_statsfile(void)
 	if (!read_chunk_s(fpin, &shmem->checkpointer.stats))
 		goto error;
 
+	/*
+	 * Read IO stats struct
+	 */
+	if (!read_chunk_s(fpin, &shmem->io.stats))
+		goto error;
+
 	/*
 	 * Read SLRU stats struct
 	 */
diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c
index 9247f2dda2..92be384b0d 100644
--- a/src/backend/utils/activity/pgstat_bgwriter.c
+++ b/src/backend/utils/activity/pgstat_bgwriter.c
@@ -24,7 +24,7 @@ PgStat_BgWriterStats PendingBgWriterStats = {0};
 
 
 /*
- * Report bgwriter statistics
+ * Report bgwriter and IO statistics
  */
 void
 pgstat_report_bgwriter(void)
@@ -56,6 +56,11 @@ pgstat_report_bgwriter(void)
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&PendingBgWriterStats, 0, sizeof(PendingBgWriterStats));
+
+	/*
+	 * Report IO statistics
+	 */
+	pgstat_flush_io(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index 3e9ab45103..26dec112f6 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -24,7 +24,7 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0};
 
 
 /*
- * Report checkpointer statistics
+ * Report checkpointer and IO statistics
  */
 void
 pgstat_report_checkpointer(void)
@@ -62,6 +62,11 @@ pgstat_report_checkpointer(void)
 	 * Clear out the statistics buffer, so it can be re-used.
 	 */
 	MemSet(&PendingCheckpointerStats, 0, sizeof(PendingCheckpointerStats));
+
+	/*
+	 * Report IO statistics
+	 */
+	pgstat_flush_io(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
new file mode 100644
index 0000000000..b606f23eb8
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -0,0 +1,386 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_io.c
+ *	  Implementation of IO statistics.
+ *
+ * This file contains the implementation of IO statistics. It is kept separate
+ * from pgstat.c to enforce the line between the statistics access / storage
+ * implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2021-2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_io.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+
+static PgStat_BktypeIO PendingIOStats;
+bool		have_iostats = false;
+
+/*
+ * Check that stats have not been counted for any combination of IOObject,
+ * IOContext, and IOOp which are not tracked for the passed-in BackendType. The
+ * passed-in PgStat_BktypeIO must contain stats from the BackendType specified
+ * by the second parameter. Caller is responsible for locking the passed-in
+ * PgStat_BktypeIO, if needed.
+ */
+bool
+pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
+							 BackendType bktype)
+{
+	bool		bktype_tracked = pgstat_tracks_io_bktype(bktype);
+
+	for (IOObject io_object = IOOBJECT_FIRST;
+		 io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (IOContext io_context = IOCONTEXT_FIRST;
+			 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			/*
+			 * Don't bother trying to skip to the next loop iteration if
+			 * pgstat_tracks_io_object() would return false here. We still
+			 * need to validate that each counter is zero anyway.
+			 */
+			for (IOOp io_op = IOOP_FIRST; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				/* No stats, so nothing to validate */
+				if (backend_io->data[io_object][io_context][io_op] == 0)
+					continue;
+
+				/* There are stats and there shouldn't be */
+				if (!bktype_tracked ||
+					!pgstat_tracks_io_op(bktype, io_object, io_context, io_op))
+					return false;
+			}
+		}
+	}
+
+	return true;
+}
+
+void
+pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op)
+{
+	Assert(io_object < IOOBJECT_NUM_TYPES);
+	Assert(io_context < IOCONTEXT_NUM_TYPES);
+	Assert(io_op < IOOP_NUM_TYPES);
+	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
+
+	PendingIOStats.data[io_object][io_context][io_op]++;
+
+	have_iostats = true;
+}
+
+PgStat_IO *
+pgstat_fetch_stat_io(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
+
+	return &pgStatLocal.snapshot.io;
+}
+
+/*
+ * Flush out locally pending IO statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ *
+ * If nowait is true, this function returns true if the lock could not be
+ * acquired. Otherwise, return false.
+ */
+bool
+pgstat_flush_io(bool nowait)
+{
+	LWLock	   *bktype_lock;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!have_iostats)
+		return false;
+
+	bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType];
+	bktype_shstats =
+		&pgStatLocal.shmem->io.stats.stats[MyBackendType];
+
+	if (!nowait)
+		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+	else if (!LWLockConditionalAcquire(bktype_lock, LW_EXCLUSIVE))
+		return true;
+
+	for (IOObject io_object = IOOBJECT_FIRST;
+		 io_object < IOOBJECT_NUM_TYPES; io_object++)
+		for (IOContext io_context = IOCONTEXT_FIRST;
+			 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+			for (IOOp io_op = IOOP_FIRST;
+				 io_op < IOOP_NUM_TYPES; io_op++)
+				bktype_shstats->data[io_object][io_context][io_op] +=
+					PendingIOStats.data[io_object][io_context][io_op];
+
+	Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
+
+	LWLockRelease(bktype_lock);
+
+	memset(&PendingIOStats, 0, sizeof(PendingIOStats));
+
+	have_iostats = false;
+
+	return false;
+}
+
+const char *
+pgstat_get_io_context_name(IOContext io_context)
+{
+	switch (io_context)
+	{
+		case IOCONTEXT_BULKREAD:
+			return "bulkread";
+		case IOCONTEXT_BULKWRITE:
+			return "bulkwrite";
+		case IOCONTEXT_NORMAL:
+			return "normal";
+		case IOCONTEXT_VACUUM:
+			return "vacuum";
+	}
+
+	elog(ERROR, "unrecognized IOContext value: %d", io_context);
+	pg_unreachable();
+}
+
+const char *
+pgstat_get_io_object_name(IOObject io_object)
+{
+	switch (io_object)
+	{
+		case IOOBJECT_RELATION:
+			return "relation";
+		case IOOBJECT_TEMP_RELATION:
+			return "temp relation";
+	}
+
+	elog(ERROR, "unrecognized IOObject value: %d", io_object);
+	pg_unreachable();
+}
+
+void
+pgstat_io_reset_all_cb(TimestampTz ts)
+{
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+
+		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+
+		/*
+		 * Use the lock in the first BackendType's PgStat_BktypeIO to protect
+		 * the reset timestamp as well.
+		 */
+		if (i == 0)
+			pgStatLocal.shmem->io.stats.stat_reset_timestamp = ts;
+
+		memset(bktype_shstats, 0, sizeof(*bktype_shstats));
+		LWLockRelease(bktype_lock);
+	}
+}
+
+void
+pgstat_io_snapshot_cb(void)
+{
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+		PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io.stats[i];
+
+		LWLockAcquire(bktype_lock, LW_SHARED);
+
+		/*
+		 * Use the lock in the first BackendType's PgStat_BktypeIO to protect
+		 * the reset timestamp as well.
+		 */
+		if (i == 0)
+			pgStatLocal.snapshot.io.stat_reset_timestamp =
+				pgStatLocal.shmem->io.stats.stat_reset_timestamp;
+
+		/* using struct assignment due to better type safety */
+		*bktype_snap = *bktype_shstats;
+		LWLockRelease(bktype_lock);
+	}
+}
+
+/*
+* IO statistics are not collected for all BackendTypes.
+*
+* The following BackendTypes do not participate in the cumulative stats
+* subsystem or do not perform IO on which we currently track:
+* - Syslogger because it is not connected to shared memory
+* - Archiver because most relevant archiving IO is delegated to a
+*   specialized command or module
+* - WAL Receiver and WAL Writer IO is not tracked in pg_stat_io for now
+*
+* Function returns true if BackendType participates in the cumulative stats
+* subsystem for IO and false if it does not.
+*
+* When adding a new BackendType, also consider adding relevant restrictions to
+* pgstat_tracks_io_object() and pgstat_tracks_io_op().
+*/
+bool
+pgstat_tracks_io_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+			return false;
+
+		case B_AUTOVAC_LAUNCHER:
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STANDALONE_BACKEND:
+		case B_STARTUP:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Some BackendTypes do not perform IO on certain IOObjects or in certain
+ * IOContexts. Some IOObjects are never operated on in some IOContexts. Check
+ * that the given BackendType is expected to do IO in the given IOContext and
+ * on the given IOObject and that the given IOObject is expected to be operated
+ * on in the given IOContext.
+ */
+bool
+pgstat_tracks_io_object(BackendType bktype, IOObject io_object,
+						IOContext io_context)
+{
+	bool		no_temp_rel;
+
+	/*
+	 * Some BackendTypes should never track IO statistics.
+	 */
+	if (!pgstat_tracks_io_bktype(bktype))
+		return false;
+
+	/*
+	 * Currently, IO on temporary relations can only occur in the
+	 * IOCONTEXT_NORMAL IOContext.
+	 */
+	if (io_context != IOCONTEXT_NORMAL &&
+		io_object == IOOBJECT_TEMP_RELATION)
+		return false;
+
+	/*
+	 * In core Postgres, only regular backends and WAL Sender processes
+	 * executing queries will use local buffers and operate on temporary
+	 * relations. Parallel workers will not use local buffers (see
+	 * InitLocalBuffers()); however, extensions leveraging background workers
+	 * have no such limitation, so track IO on IOOBJECT_TEMP_RELATION for
+	 * BackendType B_BG_WORKER.
+	 */
+	no_temp_rel = bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
+		bktype == B_CHECKPOINTER || bktype == B_AUTOVAC_WORKER ||
+		bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP;
+
+	if (no_temp_rel && io_context == IOCONTEXT_NORMAL &&
+		io_object == IOOBJECT_TEMP_RELATION)
+		return false;
+
+	/*
+	 * Some BackendTypes do not currently perform any IO in certain
+	 * IOContexts, and, while it may not be inherently incorrect for them to
+	 * do so, excluding those rows from the view makes the view easier to use.
+	 */
+	if ((bktype == B_CHECKPOINTER || bktype == B_BG_WRITER) &&
+		(io_context == IOCONTEXT_BULKREAD ||
+		 io_context == IOCONTEXT_BULKWRITE ||
+		 io_context == IOCONTEXT_VACUUM))
+		return false;
+
+	if (bktype == B_AUTOVAC_LAUNCHER && io_context == IOCONTEXT_VACUUM)
+		return false;
+
+	if ((bktype == B_AUTOVAC_WORKER || bktype == B_AUTOVAC_LAUNCHER) &&
+		io_context == IOCONTEXT_BULKWRITE)
+		return false;
+
+	return true;
+}
+
+/*
+ * Some BackendTypes will never do certain IOOps and some IOOps should not
+ * occur in certain IOContexts or on certain IOObjects. Check that the given
+ * IOOp is valid for the given BackendType in the given IOContext and on the
+ * given IOObject. Note that there are currently no cases of an IOOp being
+ * invalid for a particular BackendType only within a certain IOContext and/or
+ * only on a certain IOObject.
+ */
+bool
+pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
+					IOContext io_context, IOOp io_op)
+{
+	bool		strategy_io_context;
+
+	/* if (io_context, io_object) will never collect stats, we're done */
+	if (!pgstat_tracks_io_object(bktype, io_object, io_context))
+		return false;
+
+	/*
+	 * Some BackendTypes will not do certain IOOps.
+	 */
+	if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) &&
+		(io_op == IOOP_READ || io_op == IOOP_EVICT))
+		return false;
+
+	if ((bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
+		 bktype == B_CHECKPOINTER) && io_op == IOOP_EXTEND)
+		return false;
+
+	/*
+	 * Some IOOps are not valid in certain IOContexts and some IOOps are only
+	 * valid in certain contexts.
+	 */
+	if (io_context == IOCONTEXT_BULKREAD && io_op == IOOP_EXTEND)
+		return false;
+
+	strategy_io_context = io_context == IOCONTEXT_BULKREAD ||
+		io_context == IOCONTEXT_BULKWRITE || io_context == IOCONTEXT_VACUUM;
+
+	/*
+	 * IOOP_REUSE is only relevant when a BufferAccessStrategy is in use.
+	 */
+	if (!strategy_io_context && io_op == IOOP_REUSE)
+		return false;
+
+	/*
+	 * IOOP_FSYNC IOOps done by a backend using a BufferAccessStrategy are
+	 * counted in the IOCONTEXT_NORMAL IOContext. See comment in
+	 * register_dirty_segment() for more details.
+	 */
+	if (strategy_io_context && io_op == IOOP_FSYNC)
+		return false;
+
+	/*
+	 * Temporary tables are not logged and thus do not require fsync'ing.
+	 */
+	if (io_context == IOCONTEXT_NORMAL &&
+		io_object == IOOBJECT_TEMP_RELATION && io_op == IOOP_FSYNC)
+		return false;
+
+	return true;
+}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 2e20b93c20..f793ac1516 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -206,7 +206,7 @@ pgstat_drop_relation(Relation rel)
 }
 
 /*
- * Report that the table was just vacuumed.
+ * Report that the table was just vacuumed and flush IO statistics.
  */
 void
 pgstat_report_vacuum(Oid tableoid, bool shared,
@@ -258,10 +258,18 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	}
 
 	pgstat_unlock_entry(entry_ref);
+
+	/*
+	 * Flush IO statistics now. pgstat_report_stat() will flush IO stats,
+	 * however this will not be called until after an entire autovacuum cycle
+	 * is done -- which will likely vacuum many relations -- or until the
+	 * VACUUM command has processed all tables and committed.
+	 */
+	pgstat_flush_io(false);
 }
 
 /*
- * Report that the table was just analyzed.
+ * Report that the table was just analyzed and flush IO statistics.
  *
  * Caller must provide new live- and dead-tuples estimates, as well as a
  * flag indicating whether to reset the mod_since_analyze counter.
@@ -341,6 +349,9 @@ pgstat_report_analyze(Relation rel,
 	}
 
 	pgstat_unlock_entry(entry_ref);
+
+	/* see pgstat_report_vacuum() */
+	pgstat_flush_io(false);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index c1506b53d0..09fffd0e82 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -202,6 +202,10 @@ StatsShmemInit(void)
 		LWLockInitialize(&ctl->checkpointer.lock, LWTRANCHE_PGSTATS_DATA);
 		LWLockInitialize(&ctl->slru.lock, LWTRANCHE_PGSTATS_DATA);
 		LWLockInitialize(&ctl->wal.lock, LWTRANCHE_PGSTATS_DATA);
+
+		for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+			LWLockInitialize(&ctl->io.locks[i],
+							 LWTRANCHE_PGSTATS_DATA);
 	}
 	else
 	{
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index e7a82b5fed..e8598b2f4e 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -34,7 +34,7 @@ static WalUsage prevWalUsage;
 
 /*
  * Calculate how much WAL usage counters have increased and update
- * shared statistics.
+ * shared WAL and IO statistics.
  *
  * Must be called by processes that generate WAL, that do not call
  * pgstat_report_stat(), like walwriter.
@@ -43,6 +43,8 @@ void
 pgstat_report_wal(bool force)
 {
 	pgstat_flush_wal(force);
+
+	pgstat_flush_io(force);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 58bd1360b9..6df9f06a20 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1576,7 +1576,12 @@ pg_stat_reset(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
-/* Reset some shared cluster-wide counters */
+/*
+ * Reset some shared cluster-wide counters
+ *
+ * When adding a new reset target, ideally the name should match that in
+ * pgstat_kind_infos, if relevant.
+ */
 Datum
 pg_stat_reset_shared(PG_FUNCTION_ARGS)
 {
@@ -1593,6 +1598,8 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		pgstat_reset_of_kind(PGSTAT_KIND_BGWRITER);
 		pgstat_reset_of_kind(PGSTAT_KIND_CHECKPOINTER);
 	}
+	else if (strcmp(target, "io") == 0)
+		pgstat_reset_of_kind(PGSTAT_KIND_IO);
 	else if (strcmp(target, "recovery_prefetch") == 0)
 		XLogPrefetchResetStats();
 	else if (strcmp(target, "wal") == 0)
@@ -1601,7 +1608,7 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\", \"io\", \"recovery_prefetch\", or \"wal\".")));
 
 	PG_RETURN_VOID();
 }
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 96b3a1e1a0..c309e0233d 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -332,6 +332,8 @@ typedef enum BackendType
 	B_WAL_WRITER,
 } BackendType;
 
+#define BACKEND_NUM_TYPES (B_WAL_WRITER + 1)
+
 extern PGDLLIMPORT BackendType MyBackendType;
 
 extern const char *GetBackendTypeDesc(BackendType backendType);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5e3326a3b9..9f09caa05f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -48,6 +48,7 @@ typedef enum PgStat_Kind
 	PGSTAT_KIND_ARCHIVER,
 	PGSTAT_KIND_BGWRITER,
 	PGSTAT_KIND_CHECKPOINTER,
+	PGSTAT_KIND_IO,
 	PGSTAT_KIND_SLRU,
 	PGSTAT_KIND_WAL,
 } PgStat_Kind;
@@ -276,6 +277,55 @@ typedef struct PgStat_CheckpointerStats
 	PgStat_Counter buf_fsync_backend;
 } PgStat_CheckpointerStats;
 
+
+/*
+ * Types related to counting IO operations
+ */
+typedef enum IOObject
+{
+	IOOBJECT_RELATION,
+	IOOBJECT_TEMP_RELATION,
+} IOObject;
+
+#define IOOBJECT_FIRST IOOBJECT_RELATION
+#define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
+
+typedef enum IOContext
+{
+	IOCONTEXT_BULKREAD,
+	IOCONTEXT_BULKWRITE,
+	IOCONTEXT_NORMAL,
+	IOCONTEXT_VACUUM,
+} IOContext;
+
+#define IOCONTEXT_FIRST IOCONTEXT_BULKREAD
+#define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1)
+
+typedef enum IOOp
+{
+	IOOP_EVICT,
+	IOOP_EXTEND,
+	IOOP_FSYNC,
+	IOOP_READ,
+	IOOP_REUSE,
+	IOOP_WRITE,
+} IOOp;
+
+#define IOOP_FIRST IOOP_EVICT
+#define IOOP_NUM_TYPES (IOOP_WRITE + 1)
+
+typedef struct PgStat_BktypeIO
+{
+	PgStat_Counter data[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BktypeIO;
+
+typedef struct PgStat_IO
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
+} PgStat_IO;
+
+
 typedef struct PgStat_StatDBEntry
 {
 	PgStat_Counter xact_commit;
@@ -453,6 +503,24 @@ extern void pgstat_report_checkpointer(void);
 extern PgStat_CheckpointerStats *pgstat_fetch_stat_checkpointer(void);
 
 
+/*
+ * Functions in pgstat_io.c
+ */
+
+extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *context_ops,
+										 BackendType bktype);
+extern void pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op);
+extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern const char *pgstat_get_io_context_name(IOContext io_context);
+extern const char *pgstat_get_io_object_name(IOObject io_object);
+
+extern bool pgstat_tracks_io_bktype(BackendType bktype);
+extern bool pgstat_tracks_io_object(BackendType bktype,
+									IOObject io_object, IOContext io_context);
+extern bool pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
+								IOContext io_context, IOOp io_op);
+
+
 /*
  * Functions in pgstat_database.c
  */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 12fd51f1ae..6badb2fde4 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -329,6 +329,17 @@ typedef struct PgStatShared_Checkpointer
 	PgStat_CheckpointerStats reset_offset;
 } PgStatShared_Checkpointer;
 
+/* Shared-memory ready PgStat_IO */
+typedef struct PgStatShared_IO
+{
+	/*
+	 * locks[i] protects stats.stats[i]. locks[0] also protects
+	 * stats.stat_reset_timestamp.
+	 */
+	LWLock		locks[BACKEND_NUM_TYPES];
+	PgStat_IO	stats;
+} PgStatShared_IO;
+
 typedef struct PgStatShared_SLRU
 {
 	/* lock protects ->stats */
@@ -419,6 +430,7 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Archiver archiver;
 	PgStatShared_BgWriter bgwriter;
 	PgStatShared_Checkpointer checkpointer;
+	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
 } PgStat_ShmemControl;
@@ -442,6 +454,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_CheckpointerStats checkpointer;
 
+	PgStat_IO	io;
+
 	PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS];
 
 	PgStat_WalStats wal;
@@ -549,6 +563,15 @@ extern void pgstat_database_reset_timestamp_cb(PgStatShared_Common *header, Time
 extern bool pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 
 
+/*
+ * Functions in pgstat_io.c
+ */
+
+extern bool pgstat_flush_io(bool nowait);
+extern void pgstat_io_reset_all_cb(TimestampTz ts);
+extern void pgstat_io_snapshot_cb(void);
+
+
 /*
  * Functions in pgstat_relation.c
  */
@@ -643,6 +666,13 @@ extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
 extern PGDLLIMPORT PgStat_LocalState pgStatLocal;
 
 
+/*
+ * Variables in pgstat_io.c
+ */
+
+extern PGDLLIMPORT bool have_iostats;
+
+
 /*
  * Variables in pgstat_slru.c
  */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bafec5f7..1be6e07980 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1106,7 +1106,10 @@ ID
 INFIX
 INT128
 INTERFACE_INFO
+IOContext
 IOFuncSelector
+IOObject
+IOOp
 IPCompareMethod
 ITEM
 IV
@@ -2016,6 +2019,7 @@ PgStatShared_Common
 PgStatShared_Database
 PgStatShared_Function
 PgStatShared_HashEntry
+PgStatShared_IO
 PgStatShared_Relation
 PgStatShared_ReplSlot
 PgStatShared_SLRU
@@ -2025,6 +2029,7 @@ PgStat_ArchiverStats
 PgStat_BackendFunctionEntry
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
+PgStat_BktypeIO
 PgStat_CheckpointerStats
 PgStat_Counter
 PgStat_EntryRef
@@ -2033,6 +2038,7 @@ PgStat_FetchConsistency
 PgStat_FunctionCallUsage
 PgStat_FunctionCounts
 PgStat_HashKey
+PgStat_IO
 PgStat_Kind
 PgStat_KindInfo
 PgStat_LocalState
-- 
2.34.1

From 86be2a8ef4e800061ca57f0ba42ac4ebc0c4ac91 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 17 Jan 2023 16:34:27 -0500
Subject: [PATCH v49 4/4] pg_stat_io documentation

Author: Melanie Plageman <melanieplage...@gmail.com>
Author: Samay Sharma <smilingsa...@gmail.com>
Reviewed-by: Maciek Sakrejda <m.sakre...@gmail.com>
Reviewed-by: Lukas Fittl <lu...@fittl.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Justin Pryzby <pry...@telsasoft.com>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml | 321 +++++++++++++++++++++++++++++++++--
 1 file changed, 307 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8d51ca3773..b875fc3f12 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -469,6 +469,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of backend type, context, and target object
+       containing cluster-wide I/O statistics.
+       See <link linkend="monitoring-pg-stat-io-view">
+       <structname>pg_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -665,20 +675,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
   </para>
 
   <para>
-   The <structname>pg_statio_</structname> views are primarily useful to
-   determine the effectiveness of the buffer cache.  When the number
-   of actual disk reads is much smaller than the number of buffer
-   hits, then the cache is satisfying most read requests without
-   invoking a kernel call. However, these statistics do not give the
-   entire story: due to the way in which <productname>PostgreSQL</productname>
-   handles disk I/O, data that is not in the
-   <productname>PostgreSQL</productname> buffer cache might still reside in the
-   kernel's I/O cache, and might therefore still be fetched without
-   requiring a physical read. Users interested in obtaining more
-   detailed information on <productname>PostgreSQL</productname> I/O behavior are
-   advised to use the <productname>PostgreSQL</productname> statistics views
-   in combination with operating system utilities that allow insight
-   into the kernel's handling of I/O.
+   The <structname>pg_stat_io</structname> and
+   <structname>pg_statio_</structname> set of views are useful for determining
+   the effectiveness of the buffer cache. They can be used to calculate a cache
+   hit ratio. Note that while <productname>PostgreSQL</productname>'s I/O
+   statistics capture most instances in which the kernel was invoked in order
+   to perform I/O, they do not differentiate between data which had to be
+   fetched from disk and that which already resided in the kernel page cache.
+   Users are advised to use the <productname>PostgreSQL</productname>
+   statistics views in combination with operating system utilities for a more
+   complete picture of their database's I/O performance.
   </para>
 
  </sect2>
@@ -3643,6 +3649,293 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
     <structfield>last_archived_wal</structfield> have also been successfully
     archived.
   </para>
+ </sect2>
+
+ <sect2 id="monitoring-pg-stat-io-view">
+  <title><structname>pg_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_io</structname> view will contain one row for each
+   combination of backend type, target I/O object, and I/O context, showing
+   cluster-wide I/O statistics. Combinations which do not make sense are
+   omitted.
+  </para>
+
+  <para>
+   Currently, I/O on relations (e.g. tables, indexes) is tracked. However,
+   relation I/O which bypasses shared buffers (e.g. when moving a table from one
+   tablespace to another) is currently not tracked.
+  </para>
+
+  <table id="pg-stat-io-view" xreflabel="pg_stat_io">
+   <title><structname>pg_stat_io</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        Column Type
+       </para>
+       <para>
+        Description
+       </para>
+      </entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>backend_type</structfield> <type>text</type>
+       </para>
+       <para>
+        Type of backend (e.g. background worker, autovacuum worker). See <link
+        linkend="monitoring-pg-stat-activity-view">
+        <structname>pg_stat_activity</structname></link> for more information
+        on <varname>backend_type</varname>s. Some
+        <varname>backend_type</varname>s do not accumulate I/O operation
+        statistics and will not be included in the view.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>io_object</structfield> <type>text</type>
+       </para>
+       <para>
+        Target object of an I/O operation. Possible values are:
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>relation</literal>: Permanent relations.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>temp relation</literal>: Temporary relations.
+         </para>
+        </listitem>
+       </itemizedlist>
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>io_context</structfield> <type>text</type>
+       </para>
+       <para>
+        The context of an I/O operation. Possible values are:
+       </para>
+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>normal</literal>: The default or standard
+          <varname>io_context</varname> for a type of I/O operation. For
+          example, by default, relation data is read into and written out from
+          shared buffers. Thus, reads and writes of relation data to and from
+          shared buffers are tracked in <varname>io_context</varname>
+          <literal>normal</literal>.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>vacuum</literal>: I/O operations performed outside of shared
+          buffers while vacuuming and analyzing permanent relations. Temporary
+          table vacuums use the same local buffer pool as other temporary table
+          IO operations and are tracked in <varname>io_context</varname>
+          <literal>normal</literal>.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>bulkread</literal>: Certain large read I/O operations
+          done outside of shared buffers, for example, a sequential scan of a
+          large table.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>bulkwrite</literal>: Certain large write I/O operations
+          done outside of shared buffers, such as <command>COPY</command>.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>reads</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of read operations, each of the size specified in
+        <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>writes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of write operations, each of the size specified in
+        <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>extends</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of relation extend operations, each of the size specified in
+        <varname>op_bytes</varname>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>op_bytes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        The number of bytes per unit of I/O read, written, or extended.
+       </para>
+       <para>
+        Relation data reads, writes, and extends are done in
+        <varname>block_size</varname> units, derived from the build-time
+        parameter <symbol>BLCKSZ</symbol>, which is <literal>8192</literal> by
+        default.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>evictions</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of times a block has been written out from a shared or local
+        buffer in order to make it available for another use.
+       </para>
+       <para>
+        In <varname>io_context</varname> <literal>normal</literal>, this counts
+        the number of times a block was evicted from a buffer and replaced with
+        another block. In <varname>io_context</varname>s
+        <literal>bulkwrite</literal>, <literal>bulkread</literal>, and
+        <literal>vacuum</literal>, this counts the number of times a block was
+        evicted from shared buffers in order to add the shared buffer to a
+        separate, size-limited ring buffer for use in a bulk I/O operation.
+        </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>reuses</structfield> <type>bigint</type>
+       </para>
+       <para>
+        The number of times an existing buffer in a size-limited ring buffer
+        outside of shared buffers was reused as part of an I/O operation in the
+        <literal>bulkread</literal>, <literal>bulkwrite</literal>, or
+        <literal>vacuum</literal> <varname>io_context</varname>s.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>fsyncs</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of <literal>fsync</literal> calls. These are only tracked in
+        <varname>io_context</varname> <literal>normal</literal>.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry">
+       <para role="column_definition">
+        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+       </para>
+       <para>
+        Time at which these statistics were last reset.
+       </para>
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   Some backend types never perform I/O operations on some I/O objects and/or
+   in some I/O contexts. These rows are omitted from the view. For example, the
+   checkpointer does not checkpoint temporary tables, so there will be no rows
+   for <varname>backend_type</varname> <literal>checkpointer</literal> and
+   <varname>io_object</varname> <literal>temp relation</literal>.
+  </para>
+
+  <para>
+   In addition, some I/O operations will never be performed either by certain
+   backend types or on certain I/O objects and/or in certain I/O contexts.
+   These cells will be NULL. For example, temporary tables are not
+   <literal>fsync</literal>ed, so <varname>fsyncs</varname> will be NULL for
+   <varname>io_object</varname> <literal>temp relation</literal>. Also, the
+   background writer does not perform reads, so <varname>reads</varname> will
+   be NULL in rows for <varname>backend_type</varname> <literal>background
+   writer</literal>.
+  </para>
+
+  <para>
+   <structname>pg_stat_io</structname> can be used to inform database tuning.
+   For example:
+   <itemizedlist>
+    <listitem>
+     <para>
+      A high <varname>evictions</varname> count can indicate that shared
+      buffers should be increased.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Client backends rely on the checkpointer to ensure data is persisted to
+      permanent storage. Large numbers of <varname>fsyncs</varname> by
+      <literal>client backend</literal>s could indicate a misconfiguration of
+      shared buffers or of the checkpointer. More information on configuring
+      the checkpointer can be found in <xref linkend="wal-configuration"/>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      Normally, client backends should be able to rely on auxiliary processes
+      like the checkpointer and the background writer to write out dirty data
+      as much as possible. Large numbers of writes by client backends could
+      indicate a misconfiguration of shared buffers or of the checkpointer.
+      More information on configuring the checkpointer can be found in <xref
+      linkend="wal-configuration"/>.
+     </para>
+    </listitem>
+   </itemizedlist>
+  </para>
+
 
  </sect2>
 
-- 
2.34.1

From cb2dd852c8435537ed9a9a148c719e81c0dc22ce Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Tue, 17 Jan 2023 16:25:31 -0500
Subject: [PATCH v49 2/4] pgstat: Count IO for relations

Count IOOps done on IOObjects in IOContexts by various BackendTypes
using the IO stats infrastructure introduced by a previous commit.

The primary concern of these statistics is IO operations on data blocks
during the course of normal database operations. IO operations done by,
for example, the archiver or syslogger are not counted in these
statistics. WAL IO, temporary file IO, and IO done directly though smgr*
functions (such as when building an index) are not yet counted but would
be useful future additions.

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 src/backend/storage/buffer/bufmgr.c   | 111 ++++++++++++++++++++++----
 src/backend/storage/buffer/freelist.c |  58 ++++++++++----
 src/backend/storage/buffer/localbuf.c |  13 ++-
 src/backend/storage/smgr/md.c         |  24 ++++++
 src/include/storage/buf_internals.h   |   8 +-
 src/include/storage/bufmgr.h          |   7 +-
 6 files changed, 185 insertions(+), 36 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8075828e8a..ff12bc2ba6 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -481,8 +481,9 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 							   ForkNumber forkNum,
 							   BlockNumber blockNum,
 							   BufferAccessStrategy strategy,
-							   bool *foundPtr);
-static void FlushBuffer(BufferDesc *buf, SMgrRelation reln);
+							   bool *foundPtr, IOContext *io_context);
+static void FlushBuffer(BufferDesc *buf, SMgrRelation reln,
+						IOObject io_object, IOContext io_context);
 static void FindAndDropRelationBuffers(RelFileLocator rlocator,
 									   ForkNumber forkNum,
 									   BlockNumber nForkBlock,
@@ -823,6 +824,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	BufferDesc *bufHdr;
 	Block		bufBlock;
 	bool		found;
+	IOContext	io_context;
+	IOObject	io_object;
 	bool		isExtend;
 	bool		isLocalBuf = SmgrIsTemp(smgr);
 
@@ -855,7 +858,14 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	if (isLocalBuf)
 	{
-		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
+		/*
+		 * LocalBufferAlloc() will set the io_context to IOCONTEXT_NORMAL. We
+		 * do not use a BufferAccessStrategy for I/O of temporary tables.
+		 * However, in some cases, the "strategy" may not be NULL, so we can't
+		 * rely on IOContextForStrategy() to set the right IOContext for us.
+		 * This may happen in cases like CREATE TEMPORARY TABLE AS...
+		 */
+		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found, &io_context);
 		if (found)
 			pgBufferUsage.local_blks_hit++;
 		else if (isExtend)
@@ -871,7 +881,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * not currently in memory.
 		 */
 		bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
-							 strategy, &found);
+							 strategy, &found, &io_context);
 		if (found)
 			pgBufferUsage.shared_blks_hit++;
 		else if (isExtend)
@@ -986,7 +996,16 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 */
 	Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));	/* spinlock not needed */
 
-	bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
+	if (isLocalBuf)
+	{
+		bufBlock = LocalBufHdrGetBlock(bufHdr);
+		io_object = IOOBJECT_TEMP_RELATION;
+	}
+	else
+	{
+		bufBlock = BufHdrGetBlock(bufHdr);
+		io_object = IOOBJECT_RELATION;
+	}
 
 	if (isExtend)
 	{
@@ -995,6 +1014,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		/* don't set checksum for all-zero page */
 		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
 
+		pgstat_count_io_op(io_object, io_context, IOOP_EXTEND);
+
 		/*
 		 * NB: we're *not* doing a ScheduleBufferTagForWriteback here;
 		 * although we're essentially performing a write. At least on linux
@@ -1020,6 +1041,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
 
+			pgstat_count_io_op(io_object, io_context, IOOP_READ);
+
 			if (track_io_timing)
 			{
 				INSTR_TIME_SET_CURRENT(io_time);
@@ -1113,14 +1136,19 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  * *foundPtr is actually redundant with the buffer's BM_VALID flag, but
  * we keep it for simplicity in ReadBuffer.
  *
+ * io_context is passed as an output parameter to avoid calling
+ * IOContextForStrategy() when there is a shared buffers hit and no IO
+ * statistics need be captured.
+ *
  * No locks are held either at entry or exit.
  */
 static BufferDesc *
 BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			BlockNumber blockNum,
 			BufferAccessStrategy strategy,
-			bool *foundPtr)
+			bool *foundPtr, IOContext *io_context)
 {
+	bool		from_ring;
 	BufferTag	newTag;			/* identity of requested block */
 	uint32		newHash;		/* hash value for newTag */
 	LWLock	   *newPartitionLock;	/* buffer partition lock for it */
@@ -1172,8 +1200,11 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			{
 				/*
 				 * If we get here, previous attempts to read the buffer must
-				 * have failed ... but we shall bravely try again.
+				 * have failed ... but we shall bravely try again. Set
+				 * io_context since we will in fact need to count an IO
+				 * Operation.
 				 */
+				*io_context = IOContextForStrategy(strategy);
 				*foundPtr = false;
 			}
 		}
@@ -1187,6 +1218,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 */
 	LWLockRelease(newPartitionLock);
 
+	*io_context = IOContextForStrategy(strategy);
+
 	/* Loop here in case we have to try another victim buffer */
 	for (;;)
 	{
@@ -1200,7 +1233,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * Select a victim buffer.  The buffer is returned with its header
 		 * spinlock still held!
 		 */
-		buf = StrategyGetBuffer(strategy, &buf_state);
+		buf = StrategyGetBuffer(strategy, &buf_state, &from_ring);
 
 		Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);
 
@@ -1254,7 +1287,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 					UnlockBufHdr(buf, buf_state);
 
 					if (XLogNeedsFlush(lsn) &&
-						StrategyRejectBuffer(strategy, buf))
+						StrategyRejectBuffer(strategy, buf, from_ring))
 					{
 						/* Drop lock/pin and loop around for another buffer */
 						LWLockRelease(BufferDescriptorGetContentLock(buf));
@@ -1269,7 +1302,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 														  smgr->smgr_rlocator.locator.dbOid,
 														  smgr->smgr_rlocator.locator.relNumber);
 
-				FlushBuffer(buf, NULL);
+				FlushBuffer(buf, NULL, IOOBJECT_RELATION, *io_context);
 				LWLockRelease(BufferDescriptorGetContentLock(buf));
 
 				ScheduleBufferTagForWriteback(&BackendWritebackContext,
@@ -1450,6 +1483,28 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	LWLockRelease(newPartitionLock);
 
+	if (oldFlags & BM_VALID)
+	{
+		/*
+		 * When a BufferAccessStrategy is in use, blocks evicted from shared
+		 * buffers are counted as IOOP_EVICT in the corresponding context
+		 * (e.g. IOCONTEXT_BULKWRITE). Shared buffers are evicted by a
+		 * strategy in two cases: 1) while initially claiming buffers for the
+		 * strategy ring 2) to replace an existing strategy ring buffer
+		 * because it is pinned or in use and cannot be reused.
+		 *
+		 * Blocks evicted from buffers already in the strategy ring are
+		 * counted as IOOP_REUSE in the corresponding strategy context.
+		 *
+		 * At this point, we can accurately count evictions and reuses,
+		 * because we have successfully claimed the valid buffer. Previously,
+		 * we may have been forced to release the buffer due to concurrent
+		 * pinners or erroring out.
+		 */
+		pgstat_count_io_op(IOOBJECT_RELATION, *io_context,
+						   from_ring ? IOOP_REUSE : IOOP_EVICT);
+	}
+
 	/*
 	 * Buffer contents are currently invalid.  Try to obtain the right to
 	 * start I/O.  If StartBufferIO returns false, then someone else managed
@@ -2570,7 +2625,7 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
 	PinBuffer_Locked(bufHdr);
 	LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
 
 	LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 
@@ -2820,7 +2875,8 @@ BufferGetTag(Buffer buffer, RelFileLocator *rlocator, ForkNumber *forknum,
  * as the second parameter.  If not, pass NULL.
  */
 static void
-FlushBuffer(BufferDesc *buf, SMgrRelation reln)
+FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
+			IOContext io_context)
 {
 	XLogRecPtr	recptr;
 	ErrorContextCallback errcallback;
@@ -2912,6 +2968,26 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 			  bufToWrite,
 			  false);
 
+	/*
+	 * When a strategy is in use, only flushes of dirty buffers already in the
+	 * strategy ring are counted as strategy writes (IOCONTEXT
+	 * [BULKREAD|BULKWRITE|VACUUM] IOOP_WRITE) for the purpose of IO
+	 * statistics tracking.
+	 *
+	 * If a shared buffer initially added to the ring must be flushed before
+	 * being used, this is counted as an IOCONTEXT_NORMAL IOOP_WRITE.
+	 *
+	 * If a shared buffer which was added to the ring later because the
+	 * current strategy buffer is pinned or in use or because all strategy
+	 * buffers were dirty and rejected (for BAS_BULKREAD operations only)
+	 * requires flushing, this is counted as an IOCONTEXT_NORMAL IOOP_WRITE
+	 * (from_ring will be false).
+	 *
+	 * When a strategy is not in use, the write can only be a "regular" write
+	 * of a dirty shared buffer (IOCONTEXT_NORMAL IOOP_WRITE).
+	 */
+	pgstat_count_io_op(IOOBJECT_RELATION, io_context, IOOP_WRITE);
+
 	if (track_io_timing)
 	{
 		INSTR_TIME_SET_CURRENT(io_time);
@@ -3554,6 +3630,8 @@ FlushRelationBuffers(Relation rel)
 				buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 				pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
 
+				pgstat_count_io_op(IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL, IOOP_WRITE);
+
 				/* Pop the error context stack */
 				error_context_stack = errcallback.previous;
 			}
@@ -3586,7 +3664,8 @@ FlushRelationBuffers(Relation rel)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, RelationGetSmgr(rel));
+			FlushBuffer(bufHdr, RelationGetSmgr(rel), IOCONTEXT_NORMAL, IOOBJECT_RELATION);
+			FlushBuffer(bufHdr, RelationGetSmgr(rel), IOOBJECT_RELATION, IOCONTEXT_NORMAL);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr);
 		}
@@ -3684,7 +3763,7 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, srelent->srel);
+			FlushBuffer(bufHdr, srelent->srel, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr);
 		}
@@ -3894,7 +3973,7 @@ FlushDatabaseBuffers(Oid dbid)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, NULL);
+			FlushBuffer(bufHdr, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr);
 		}
@@ -3921,7 +4000,7 @@ FlushOneBuffer(Buffer buffer)
 
 	Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
 }
 
 /*
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 7dec35801c..c690d5f15f 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -15,6 +15,7 @@
  */
 #include "postgres.h"
 
+#include "pgstat.h"
 #include "port/atomics.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
@@ -81,12 +82,6 @@ typedef struct BufferAccessStrategyData
 	 */
 	int			current;
 
-	/*
-	 * True if the buffer just returned by StrategyGetBuffer had been in the
-	 * ring already.
-	 */
-	bool		current_was_in_ring;
-
 	/*
 	 * Array of buffer numbers.  InvalidBuffer (that is, zero) indicates we
 	 * have not yet selected a buffer for this ring slot.  For allocation
@@ -198,13 +193,15 @@ have_free_buffer(void)
  *	return the buffer with the buffer header spinlock still held.
  */
 BufferDesc *
-StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
+StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_ring)
 {
 	BufferDesc *buf;
 	int			bgwprocno;
 	int			trycounter;
 	uint32		local_buf_state;	/* to avoid repeated (de-)referencing */
 
+	*from_ring = false;
+
 	/*
 	 * If given a strategy object, see whether it can select a buffer. We
 	 * assume strategy objects don't need buffer_strategy_lock.
@@ -213,7 +210,10 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	{
 		buf = GetBufferFromRing(strategy, buf_state);
 		if (buf != NULL)
+		{
+			*from_ring = true;
 			return buf;
+		}
 	}
 
 	/*
@@ -602,7 +602,7 @@ FreeAccessStrategy(BufferAccessStrategy strategy)
 
 /*
  * GetBufferFromRing -- returns a buffer from the ring, or NULL if the
- *		ring is empty.
+ *		ring is empty / not usable.
  *
  * The bufhdr spin lock is held on the returned buffer.
  */
@@ -625,10 +625,7 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
 	 */
 	bufnum = strategy->buffers[strategy->current];
 	if (bufnum == InvalidBuffer)
-	{
-		strategy->current_was_in_ring = false;
 		return NULL;
-	}
 
 	/*
 	 * If the buffer is pinned we cannot use it under any circumstances.
@@ -644,7 +641,6 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
 	if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0
 		&& BUF_STATE_GET_USAGECOUNT(local_buf_state) <= 1)
 	{
-		strategy->current_was_in_ring = true;
 		*buf_state = local_buf_state;
 		return buf;
 	}
@@ -654,7 +650,6 @@ GetBufferFromRing(BufferAccessStrategy strategy, uint32 *buf_state)
 	 * Tell caller to allocate a new buffer with the normal allocation
 	 * strategy.  He'll then replace this ring element via AddBufferToRing.
 	 */
-	strategy->current_was_in_ring = false;
 	return NULL;
 }
 
@@ -670,6 +665,39 @@ AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
 	strategy->buffers[strategy->current] = BufferDescriptorGetBuffer(buf);
 }
 
+/*
+ * Utility function returning the IOContext of a given BufferAccessStrategy's
+ * strategy ring.
+ */
+IOContext
+IOContextForStrategy(BufferAccessStrategy strategy)
+{
+	if (!strategy)
+		return IOCONTEXT_NORMAL;
+
+	switch (strategy->btype)
+	{
+		case BAS_NORMAL:
+
+			/*
+			 * Currently, GetAccessStrategy() returns NULL for
+			 * BufferAccessStrategyType BAS_NORMAL, so this case is
+			 * unreachable.
+			 */
+			pg_unreachable();
+			return IOCONTEXT_NORMAL;
+		case BAS_BULKREAD:
+			return IOCONTEXT_BULKREAD;
+		case BAS_BULKWRITE:
+			return IOCONTEXT_BULKWRITE;
+		case BAS_VACUUM:
+			return IOCONTEXT_VACUUM;
+	}
+
+	elog(ERROR, "unrecognized BufferAccessStrategyType: %d", strategy->btype);
+	pg_unreachable();
+}
+
 /*
  * StrategyRejectBuffer -- consider rejecting a dirty buffer
  *
@@ -682,14 +710,14 @@ AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
  * if this buffer should be written and re-used.
  */
 bool
-StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
+StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
 {
 	/* We only do this in bulkread mode */
 	if (strategy->btype != BAS_BULKREAD)
 		return false;
 
 	/* Don't muck with behavior of normal buffer-replacement strategy */
-	if (!strategy->current_was_in_ring ||
+	if (!from_ring ||
 		strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
 		return false;
 
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 8372acc383..8e286db5df 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -18,6 +18,7 @@
 #include "access/parallel.h"
 #include "catalog/catalog.h"
 #include "executor/instrument.h"
+#include "pgstat.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 #include "utils/guc_hooks.h"
@@ -107,7 +108,7 @@ PrefetchLocalBuffer(SMgrRelation smgr, ForkNumber forkNum,
  */
 BufferDesc *
 LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
-				 bool *foundPtr)
+				 bool *foundPtr, IOContext *io_context)
 {
 	BufferTag	newTag;			/* identity of requested block */
 	LocalBufferLookupEnt *hresult;
@@ -127,6 +128,14 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 	hresult = (LocalBufferLookupEnt *)
 		hash_search(LocalBufHash, (void *) &newTag, HASH_FIND, NULL);
 
+	/*
+	 * IO Operations on local buffers are only done in IOCONTEXT_NORMAL. Set
+	 * io_context here (instead of after a buffer hit would have returned) for
+	 * convenience since we don't have to worry about the overhead of calling
+	 * IOContextForStrategy().
+	 */
+	*io_context = IOCONTEXT_NORMAL;
+
 	if (hresult)
 	{
 		b = hresult->id;
@@ -230,6 +239,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		buf_state &= ~BM_DIRTY;
 		pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
 
+		pgstat_count_io_op(IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL, IOOP_WRITE);
 		pgBufferUsage.local_blks_written++;
 	}
 
@@ -256,6 +266,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		ClearBufferTag(&bufHdr->tag);
 		buf_state &= ~(BM_VALID | BM_TAG_VALID);
 		pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
+		pgstat_count_io_op(IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL, IOOP_EVICT);
 	}
 
 	hresult = (LocalBufferLookupEnt *)
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 60c9905eff..8da813600c 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -983,6 +983,15 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
 	{
 		MdfdVec    *v = &reln->md_seg_fds[forknum][segno - 1];
 
+		/*
+		 * fsyncs done through mdimmedsync() should be tracked in a separate
+		 * IOContext than those done through mdsyncfiletag() to differentiate
+		 * between unavoidable client backend fsyncs (e.g. those done during
+		 * index build) and those which ideally would have been done by the
+		 * checkpointer. Since other IO operations bypassing the buffer
+		 * manager could also be tracked in such an IOContext, wait until
+		 * these are also tracked to track immediate fsyncs.
+		 */
 		if (FileSync(v->mdfd_vfd, WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
 					(errcode_for_file_access(),
@@ -1021,6 +1030,19 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 
 	if (!RegisterSyncRequest(&tag, SYNC_REQUEST, false /* retryOnError */ ))
 	{
+		/*
+		 * We have no way of knowing if the current IOContext is
+		 * IOCONTEXT_NORMAL or IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] at this
+		 * point, so count the fsync as being in the IOCONTEXT_NORMAL
+		 * IOContext. This is probably okay, because the number of backend
+		 * fsyncs doesn't say anything about the efficacy of the
+		 * BufferAccessStrategy. And counting both fsyncs done in
+		 * IOCONTEXT_NORMAL and IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] under
+		 * IOCONTEXT_NORMAL is likely clearer when investigating the number of
+		 * backend fsyncs.
+		 */
+		pgstat_count_io_op(IOOBJECT_RELATION, IOCONTEXT_NORMAL, IOOP_FSYNC);
+
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
@@ -1410,6 +1432,8 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 	if (need_to_close)
 		FileClose(file);
 
+	pgstat_count_io_op(IOOBJECT_RELATION, IOCONTEXT_NORMAL, IOOP_FSYNC);
+
 	errno = save_errno;
 	return result;
 }
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ed8aa2519c..0b44814740 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -15,6 +15,7 @@
 #ifndef BUFMGR_INTERNALS_H
 #define BUFMGR_INTERNALS_H
 
+#include "pgstat.h"
 #include "port/atomics.h"
 #include "storage/buf.h"
 #include "storage/bufmgr.h"
@@ -391,11 +392,12 @@ extern void IssuePendingWritebacks(WritebackContext *context);
 extern void ScheduleBufferTagForWriteback(WritebackContext *context, BufferTag *tag);
 
 /* freelist.c */
+extern IOContext IOContextForStrategy(BufferAccessStrategy bas);
 extern BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
-									 uint32 *buf_state);
+									 uint32 *buf_state, bool *from_ring);
 extern void StrategyFreeBuffer(BufferDesc *buf);
 extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
-								 BufferDesc *buf);
+								 BufferDesc *buf, bool from_ring);
 
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern void StrategyNotifyBgWriter(int bgwprocno);
@@ -417,7 +419,7 @@ extern PrefetchBufferResult PrefetchLocalBuffer(SMgrRelation smgr,
 												ForkNumber forkNum,
 												BlockNumber blockNum);
 extern BufferDesc *LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum,
-									BlockNumber blockNum, bool *foundPtr);
+									BlockNumber blockNum, bool *foundPtr, IOContext *io_context);
 extern void MarkLocalBufferDirty(Buffer buffer);
 extern void DropRelationLocalBuffers(RelFileLocator rlocator,
 									 ForkNumber forkNum,
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 33eadbc129..b8a18b8081 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -23,7 +23,12 @@
 
 typedef void *Block;
 
-/* Possible arguments for GetAccessStrategy() */
+/*
+ * Possible arguments for GetAccessStrategy().
+ *
+ * If adding a new BufferAccessStrategyType, also add a new IOContext so
+ * IO statistics using this strategy are tracked.
+ */
 typedef enum BufferAccessStrategyType
 {
 	BAS_NORMAL,					/* Normal random access */
-- 
2.34.1

From d40934679b00fd1e157bd6942d7f3faf8be5ea8e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 17 Jan 2023 16:28:27 -0500
Subject: [PATCH v49 3/4] Add system view tracking IO ops per backend type

Add pg_stat_io, a system view which tracks the number of IOOps
(evictions, reuses, reads, writes, extensions, and fsyncs) done on each
IOObject (relation, temp relation) in each IOContext ("normal" and those
using a BufferAccessStrategy) by each type of backend (e.g. client
backend, checkpointer).

Some BackendTypes do not accumulate IO operations statistics and will
not be included in the view.

Some IOObjects are never operated on in some IOContexts or by some
BackendTypes. These rows are omitted from the view. For example,
checkpointer will never operate on IOOBJECT_TEMP_RELATION data, so those
rows are omitted.

Some IOContexts are not used by some BackendTypes and will not be in the
view. For example, checkpointer does not use a BufferAccessStrategy
(currently), so there will be no rows for BufferAccessStrategy
IOContexts for checkpointer.

Some IOOps are invalid in combination with certain IOContexts and
certain IOObjects. Those cells will be NULL in the view to distinguish
between 0 observed IOOps of that type and an invalid combination. For
example, temporary tables are not fsynced so cells for all BackendTypes
for IOOBJECT_TEMP_RELATION and IOOP_FSYNC will be NULL.

Some BackendTypes never perform certain IOOps. Those cells will also be
NULL in the view. For example, bgwriter should not perform reads.

View stats are populated with statistics incremented when a backend
performs an IO Operation and maintained by the cumulative statistics
subsystem.

Each row of the view shows stats for a particular BackendType, IOObject,
IOContext combination (e.g. a client backend's operations on permanent
relations in shared buffers) and each column in the view is the total
number of IO Operations done (e.g. writes). So a cell in the view would
be, for example, the number of blocks of relation data written from
shared buffers by client backends since the last stats reset.

In anticipation of tracking WAL IO and non-block-oriented IO (such as
temporary file IO), the "op_bytes" column specifies the unit of the
"reads", "writes", and "extends" columns for a given row.

Note that some of the cells in the view are redundant with fields in
pg_stat_bgwriter (e.g. buffers_backend), however these have been kept in
pg_stat_bgwriter for backwards compatibility. Deriving the redundant
pg_stat_bgwriter stats from the IO operations stats structures was also
problematic due to the separate reset targets for 'bgwriter' and 'io'.

Suggested by Andres Freund

Catalog version should be bumped.

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Andres Freund <and...@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 contrib/amcheck/expected/check_heap.out |  34 ++++
 contrib/amcheck/sql/check_heap.sql      |  27 +++
 src/backend/catalog/system_views.sql    |  15 ++
 src/backend/utils/adt/pgstatfuncs.c     | 141 ++++++++++++++
 src/include/catalog/pg_proc.dat         |   9 +
 src/test/regress/expected/rules.out     |  12 ++
 src/test/regress/expected/stats.out     | 234 ++++++++++++++++++++++++
 src/test/regress/sql/stats.sql          | 148 +++++++++++++++
 src/tools/pgindent/typedefs.list        |   1 +
 9 files changed, 621 insertions(+)

diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
index c010361025..e4785141a6 100644
--- a/contrib/amcheck/expected/check_heap.out
+++ b/contrib/amcheck/expected/check_heap.out
@@ -66,6 +66,22 @@ SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
 INSERT INTO heaptest (a, b)
 	(SELECT gs, repeat('x', gs)
 		FROM generate_series(1,50) gs);
+-- pg_stat_io test:
+-- verify_heapam always uses a BAS_BULKREAD BufferAccessStrategy, whereas a
+-- sequential scan does so only if the table is large enough when compared to
+-- shared buffers (see initscan()). CREATE DATABASE ... also unconditionally
+-- uses a BAS_BULKREAD strategy, but we have chosen to use a tablespace and
+-- verify_heapam to provide coverage instead of adding another expensive
+-- operation to the main regression test suite.
+--
+-- Create an alternative tablespace and move the heaptest table to it, causing
+-- it to be rewritten and all the blocks to reliably evicted from shared
+-- buffers -- guaranteeing actual reads when we next select from it.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE regress_test_stats_tblspc LOCATION '';
+SELECT sum(reads) AS stats_bulkreads_before
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+ALTER TABLE heaptest SET TABLESPACE regress_test_stats_tblspc;
 -- Check that valid options are not rejected nor corruption reported
 -- for a non-empty table
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
@@ -88,6 +104,23 @@ SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock :=
 -------+--------+--------+-----
 (0 rows)
 
+-- verify_heapam should have read in the page written out by
+--   ALTER TABLE ... SET TABLESPACE ...
+-- causing an additional bulkread, which should be reflected in pg_stat_io.
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(reads) AS stats_bulkreads_after
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+SELECT :stats_bulkreads_after > :stats_bulkreads_before;
+ ?column? 
+----------
+ t
+(1 row)
+
 CREATE ROLE regress_heaptest_role;
 -- verify permissions are checked (error due to function not callable)
 SET ROLE regress_heaptest_role;
@@ -195,6 +228,7 @@ ERROR:  cannot check relation "test_foreign_table"
 DETAIL:  This operation is not supported for foreign tables.
 -- cleanup
 DROP TABLE heaptest;
+DROP TABLESPACE regress_test_stats_tblspc;
 DROP TABLE test_partition;
 DROP TABLE test_partitioned;
 DROP OWNED BY regress_heaptest_role; -- permissions
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
index 298de6886a..6794ca4eb0 100644
--- a/contrib/amcheck/sql/check_heap.sql
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -20,11 +20,29 @@ SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
 
+
 -- Add some data so subsequent tests are not entirely trivial
 INSERT INTO heaptest (a, b)
 	(SELECT gs, repeat('x', gs)
 		FROM generate_series(1,50) gs);
 
+-- pg_stat_io test:
+-- verify_heapam always uses a BAS_BULKREAD BufferAccessStrategy, whereas a
+-- sequential scan does so only if the table is large enough when compared to
+-- shared buffers (see initscan()). CREATE DATABASE ... also unconditionally
+-- uses a BAS_BULKREAD strategy, but we have chosen to use a tablespace and
+-- verify_heapam to provide coverage instead of adding another expensive
+-- operation to the main regression test suite.
+--
+-- Create an alternative tablespace and move the heaptest table to it, causing
+-- it to be rewritten and all the blocks to reliably evicted from shared
+-- buffers -- guaranteeing actual reads when we next select from it.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE regress_test_stats_tblspc LOCATION '';
+SELECT sum(reads) AS stats_bulkreads_before
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+ALTER TABLE heaptest SET TABLESPACE regress_test_stats_tblspc;
+
 -- Check that valid options are not rejected nor corruption reported
 -- for a non-empty table
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
@@ -32,6 +50,14 @@ SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
 SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
 SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
 
+-- verify_heapam should have read in the page written out by
+--   ALTER TABLE ... SET TABLESPACE ...
+-- causing an additional bulkread, which should be reflected in pg_stat_io.
+SELECT pg_stat_force_next_flush();
+SELECT sum(reads) AS stats_bulkreads_after
+  FROM pg_stat_io WHERE io_context = 'bulkread' \gset
+SELECT :stats_bulkreads_after > :stats_bulkreads_before;
+
 CREATE ROLE regress_heaptest_role;
 
 -- verify permissions are checked (error due to function not callable)
@@ -110,6 +136,7 @@ SELECT * FROM verify_heapam('test_foreign_table',
 
 -- cleanup
 DROP TABLE heaptest;
+DROP TABLESPACE regress_test_stats_tblspc;
 DROP TABLE test_partition;
 DROP TABLE test_partitioned;
 DROP OWNED BY regress_heaptest_role; -- permissions
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d2a8c82900..70699f4b85 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1116,6 +1116,21 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_io AS
+SELECT
+       b.backend_type,
+       b.io_object,
+       b.io_context,
+       b.reads,
+       b.writes,
+       b.extends,
+       b.op_bytes,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.stats_reset
+FROM pg_stat_get_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6df9f06a20..5b79d703b7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1234,6 +1234,147 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
 }
 
+/*
+* When adding a new column to the pg_stat_io view, add a new enum value
+* here above IO_NUM_COLUMNS.
+*/
+typedef enum io_stat_col
+{
+	IO_COL_BACKEND_TYPE,
+	IO_COL_IO_OBJECT,
+	IO_COL_IO_CONTEXT,
+	IO_COL_READS,
+	IO_COL_WRITES,
+	IO_COL_EXTENDS,
+	IO_COL_CONVERSION,
+	IO_COL_EVICTIONS,
+	IO_COL_REUSES,
+	IO_COL_FSYNCS,
+	IO_COL_RESET_TIME,
+	IO_NUM_COLUMNS,
+} io_stat_col;
+
+/*
+ * When adding a new IOOp, add a new io_stat_col and add a case to this
+ * function returning the corresponding io_stat_col.
+ */
+static io_stat_col
+pgstat_get_io_op_index(IOOp io_op)
+{
+	switch (io_op)
+	{
+		case IOOP_EVICT:
+			return IO_COL_EVICTIONS;
+		case IOOP_READ:
+			return IO_COL_READS;
+		case IOOP_REUSE:
+			return IO_COL_REUSES;
+		case IOOP_WRITE:
+			return IO_COL_WRITES;
+		case IOOP_EXTEND:
+			return IO_COL_EXTENDS;
+		case IOOP_FSYNC:
+			return IO_COL_FSYNCS;
+	}
+
+	elog(ERROR, "unrecognized IOOp value: %d", io_op);
+	pg_unreachable();
+}
+
+Datum
+pg_stat_get_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_IO  *backends_io_stats;
+	Datum		reset_time;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backends_io_stats = pgstat_fetch_stat_io();
+
+	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
+
+	for (BackendType bktype = B_INVALID; bktype < BACKEND_NUM_TYPES; bktype++)
+	{
+		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
+
+		/*
+		 * In Assert builds, we can afford an extra loop through all of the
+		 * counters checking that only expected stats are non-zero, since it
+		 * keeps the non-Assert code cleaner.
+		 */
+		Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+		/*
+		 * For those BackendTypes without IO Operation stats, skip
+		 * representing them in the view altogether.
+		 */
+		if (!pgstat_tracks_io_bktype(bktype))
+			continue;
+
+		for (IOObject io_obj = IOOBJECT_FIRST;
+			 io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+		{
+			const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+			for (IOContext io_context = IOCONTEXT_FIRST;
+				 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+			{
+				const char *context_name = pgstat_get_io_context_name(io_context);
+
+				Datum		values[IO_NUM_COLUMNS] = {0};
+				bool		nulls[IO_NUM_COLUMNS] = {0};
+
+				/*
+				 * Some combinations of BackendType, IOObject, and IOContext
+				 * are not valid for any type of IOOp. In such cases, omit the
+				 * entire row from the view.
+				 */
+				if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+					continue;
+
+				values[IO_COL_BACKEND_TYPE] = bktype_desc;
+				values[IO_COL_IO_CONTEXT] = CStringGetTextDatum(context_name);
+				values[IO_COL_IO_OBJECT] = CStringGetTextDatum(obj_name);
+				values[IO_COL_RESET_TIME] = TimestampTzGetDatum(reset_time);
+
+				/*
+				 * Hard-code this to the value of BLCKSZ for now. Future
+				 * values could include XLOG_BLCKSZ, once WAL IO is tracked,
+				 * and constant multipliers, once non-block-oriented IO (e.g.
+				 * temporary file IO) is tracked.
+				 */
+				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+				for (IOOp io_op = IOOP_FIRST; io_op < IOOP_NUM_TYPES; io_op++)
+				{
+					int			col_idx = pgstat_get_io_op_index(io_op);
+
+					/*
+					 * Some combinations of BackendType and IOOp, of IOContext
+					 * and IOOp, and of IOObject and IOOp are not tracked. Set
+					 * these cells in the view NULL.
+					 */
+					nulls[col_idx] = !pgstat_tracks_io_op(bktype, io_obj, io_context, io_op);
+
+					if (nulls[col_idx])
+						continue;
+
+					values[col_idx] =
+						Int64GetDatum(bktype_stats->data[io_obj][io_context][io_op]);
+				}
+
+				tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+									 values, nulls);
+			}
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3810de7b22..2155d93b44 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5690,6 +5690,15 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '8459', descr => 'statistics: per backend type IO statistics',
+  proname => 'pg_stat_get_io', provolatile => 'v',
+  prorows => '30', proretset => 't',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,text,text,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,io_object,io_context,reads,writes,extends,op_bytes,evictions,reuses,fsyncs,stats_reset}',
+  prosrc => 'pg_stat_get_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a969ae63eb..8a7ed673c2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1876,6 +1876,18 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, query_id)
   WHERE (s.client_port IS NOT NULL);
+pg_stat_io| SELECT b.backend_type,
+    b.io_object,
+    b.io_context,
+    b.reads,
+    b.writes,
+    b.extends,
+    b.op_bytes,
+    b.evictions,
+    b.reuses,
+    b.fsyncs,
+    b.stats_reset
+   FROM pg_stat_get_io() b(backend_type, io_object, io_context, reads, writes, extends, op_bytes, evictions, reuses, fsyncs, stats_reset);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 1d84407a03..46bc79e740 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1126,4 +1126,238 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
+-- Test that the following operations are tracked in pg_stat_io:
+-- - reads of target blocks into shared buffers
+-- - writes of shared buffers to permanent storage
+-- - extends of relations using shared buffers
+-- - fsyncs done to ensure the durability of data dirtying shared buffers
+-- There is no test for blocks evicted from shared buffers, because we cannot
+-- be sure of the state of shared buffers at the point the test is run.
+-- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
+-- extends.
+SELECT sum(extends) AS io_sum_shared_before_extends
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+CREATE TABLE test_io_shared(a int);
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extends) AS io_sum_shared_after_extends
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
+-- and fsyncs.
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_before_
+-- See comment above for rationale for two explicit CHECKPOINTs.
+CHECKPOINT;
+CHECKPOINT;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_after_
+SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Change the tablespace so that the table is rewritten directly, then SELECT
+-- from it to cause it to be read back into shared buffers.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE regress_io_stats_tblspc LOCATION '';
+SELECT sum(reads) AS io_sum_shared_before_reads
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+ALTER TABLE test_io_shared SET TABLESPACE regress_io_stats_tblspc;
+-- SELECT from the table so that the data is read into shared buffers and
+-- io_context 'normal', io_object 'relation' reads are counted.
+SELECT COUNT(*) FROM test_io_shared;
+ count 
+-------
+   100
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(reads) AS io_sum_shared_after_reads
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT :io_sum_shared_after_reads > :io_sum_shared_before_reads;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Drop the table so we can drop the tablespace later.
+DROP TABLE test_io_shared;
+-- Test that the follow IOCONTEXT_LOCAL IOOps are tracked in pg_stat_io:
+-- - eviction of local buffers in order to reuse them
+-- - reads of temporary table blocks into local buffers
+-- - writes of local buffers to permanent storage
+-- - extends of temporary tables
+-- Set temp_buffers to its minimum so that we can trigger writes with fewer
+-- inserted tuples. Do so in a new session in case temporary tables have been
+-- accessed by previous tests in this session.
+\c
+SET temp_buffers TO 100;
+CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
+SELECT sum(extends) AS extends, sum(evictions) AS evictions, sum(writes) AS writes
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'temp relation' \gset io_sum_local_before_
+-- Insert tuples into the temporary table, generating extends in the stats.
+-- Insert enough values that we need to reuse and write out dirty local
+-- buffers, generating evictions and writes.
+INSERT INTO test_io_local SELECT generate_series(1, 5000) as id, repeat('a', 200);
+-- Ensure the table is large enough to exceed our temp_buffers setting.
+SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(reads) AS io_sum_local_before_reads
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+-- Read in evicted buffers, generating reads.
+SELECT COUNT(*) FROM test_io_local;
+ count 
+-------
+  5000
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) AS evictions,
+       sum(reads) AS reads,
+       sum(writes) AS writes,
+       sum(extends) AS extends
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset io_sum_local_after_
+SELECT :io_sum_local_after_evictions > :io_sum_local_before_evictions,
+       :io_sum_local_after_reads > :io_sum_local_before_reads,
+       :io_sum_local_after_writes > :io_sum_local_before_writes,
+       :io_sum_local_after_extends > :io_sum_local_before_extends;
+ ?column? | ?column? | ?column? | ?column? 
+----------+----------+----------+----------
+ t        | t        | t        | t
+(1 row)
+
+-- Change the tablespaces so that the temporary table is rewritten to other
+-- local buffers, exercising a different codepath than standard local buffer
+-- writes.
+ALTER TABLE test_io_local SET TABLESPACE regress_io_stats_tblspc;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(writes) AS io_sum_local_new_tblspc_writes
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT :io_sum_local_new_tblspc_writes > :io_sum_local_after_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Drop the table so we can drop the tablespace later.
+DROP TABLE test_io_local;
+RESET temp_buffers;
+DROP TABLESPACE regress_io_stats_tblspc;
+-- Test that reuse of strategy buffers and reads of blocks into these reused
+-- buffers while VACUUMing are tracked in pg_stat_io.
+-- Set wal_skip_threshold smaller than the expected size of
+-- test_io_vac_strategy so that, even if wal_level is minimal, VACUUM FULL will
+-- fsync the newly rewritten test_io_vac_strategy instead of writing it to WAL.
+-- Writing it to WAL will result in the newly written relation pages being in
+-- shared buffers -- preventing us from testing BAS_VACUUM BufferAccessStrategy
+-- reads.
+SET wal_skip_threshold = '1 kB';
+SELECT sum(reuses) AS reuses, sum(reads) AS reads
+  FROM pg_stat_io WHERE io_context = 'vacuum' \gset io_sum_vac_strategy_before_
+CREATE TABLE test_io_vac_strategy(a int, b int) WITH (autovacuum_enabled = 'false');
+INSERT INTO test_io_vac_strategy SELECT i, i from generate_series(1, 8000)i;
+-- Ensure that the next VACUUM will need to perform IO by rewriting the table
+-- first with VACUUM (FULL).
+VACUUM (FULL) test_io_vac_strategy;
+VACUUM (PARALLEL 0) test_io_vac_strategy;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(reuses) AS reuses, sum(reads) AS reads
+  FROM pg_stat_io WHERE io_context = 'vacuum' \gset io_sum_vac_strategy_after_
+SELECT :io_sum_vac_strategy_after_reads > :io_sum_vac_strategy_before_reads,
+       :io_sum_vac_strategy_after_reuses > :io_sum_vac_strategy_before_reuses;
+ ?column? | ?column? 
+----------+----------
+ t        | t
+(1 row)
+
+RESET wal_skip_threshold;
+-- Test that extends done by a CTAS, which uses a BAS_BULKWRITE
+-- BufferAccessStrategy, are tracked in pg_stat_io.
+SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_before
+  FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+CREATE TABLE test_io_bulkwrite_strategy AS SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
+  FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test IO stats reset
+SELECT pg_stat_have_stats('io', 0, 0);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) AS io_stats_pre_reset
+  FROM pg_stat_io \gset
+SELECT pg_stat_reset_shared('io');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) AS io_stats_post_reset
+  FROM pg_stat_io \gset
+SELECT :io_stats_post_reset < :io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index b4d6753c71..4465649211 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -536,4 +536,152 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
+-- Test that the following operations are tracked in pg_stat_io:
+-- - reads of target blocks into shared buffers
+-- - writes of shared buffers to permanent storage
+-- - extends of relations using shared buffers
+-- - fsyncs done to ensure the durability of data dirtying shared buffers
+
+-- There is no test for blocks evicted from shared buffers, because we cannot
+-- be sure of the state of shared buffers at the point the test is run.
+
+-- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
+-- extends.
+SELECT sum(extends) AS io_sum_shared_before_extends
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+CREATE TABLE test_io_shared(a int);
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+SELECT sum(extends) AS io_sum_shared_after_extends
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+
+-- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
+-- and fsyncs.
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_before_
+-- See comment above for rationale for two explicit CHECKPOINTs.
+CHECKPOINT;
+CHECKPOINT;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_after_
+
+SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+
+-- Change the tablespace so that the table is rewritten directly, then SELECT
+-- from it to cause it to be read back into shared buffers.
+SET allow_in_place_tablespaces = true;
+CREATE TABLESPACE regress_io_stats_tblspc LOCATION '';
+SELECT sum(reads) AS io_sum_shared_before_reads
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+ALTER TABLE test_io_shared SET TABLESPACE regress_io_stats_tblspc;
+-- SELECT from the table so that the data is read into shared buffers and
+-- io_context 'normal', io_object 'relation' reads are counted.
+SELECT COUNT(*) FROM test_io_shared;
+SELECT pg_stat_force_next_flush();
+SELECT sum(reads) AS io_sum_shared_after_reads
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation'  \gset
+SELECT :io_sum_shared_after_reads > :io_sum_shared_before_reads;
+-- Drop the table so we can drop the tablespace later.
+DROP TABLE test_io_shared;
+
+-- Test that the follow IOCONTEXT_LOCAL IOOps are tracked in pg_stat_io:
+-- - eviction of local buffers in order to reuse them
+-- - reads of temporary table blocks into local buffers
+-- - writes of local buffers to permanent storage
+-- - extends of temporary tables
+
+-- Set temp_buffers to its minimum so that we can trigger writes with fewer
+-- inserted tuples. Do so in a new session in case temporary tables have been
+-- accessed by previous tests in this session.
+\c
+SET temp_buffers TO 100;
+CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
+SELECT sum(extends) AS extends, sum(evictions) AS evictions, sum(writes) AS writes
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'temp relation' \gset io_sum_local_before_
+-- Insert tuples into the temporary table, generating extends in the stats.
+-- Insert enough values that we need to reuse and write out dirty local
+-- buffers, generating evictions and writes.
+INSERT INTO test_io_local SELECT generate_series(1, 5000) as id, repeat('a', 200);
+-- Ensure the table is large enough to exceed our temp_buffers setting.
+SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100;
+
+SELECT sum(reads) AS io_sum_local_before_reads
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation' \gset
+-- Read in evicted buffers, generating reads.
+SELECT COUNT(*) FROM test_io_local;
+SELECT pg_stat_force_next_flush();
+SELECT sum(evictions) AS evictions,
+       sum(reads) AS reads,
+       sum(writes) AS writes,
+       sum(extends) AS extends
+  FROM pg_stat_io
+  WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset io_sum_local_after_
+SELECT :io_sum_local_after_evictions > :io_sum_local_before_evictions,
+       :io_sum_local_after_reads > :io_sum_local_before_reads,
+       :io_sum_local_after_writes > :io_sum_local_before_writes,
+       :io_sum_local_after_extends > :io_sum_local_before_extends;
+
+-- Change the tablespaces so that the temporary table is rewritten to other
+-- local buffers, exercising a different codepath than standard local buffer
+-- writes.
+ALTER TABLE test_io_local SET TABLESPACE regress_io_stats_tblspc;
+SELECT pg_stat_force_next_flush();
+SELECT sum(writes) AS io_sum_local_new_tblspc_writes
+  FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'temp relation'  \gset
+SELECT :io_sum_local_new_tblspc_writes > :io_sum_local_after_writes;
+-- Drop the table so we can drop the tablespace later.
+DROP TABLE test_io_local;
+RESET temp_buffers;
+DROP TABLESPACE regress_io_stats_tblspc;
+
+-- Test that reuse of strategy buffers and reads of blocks into these reused
+-- buffers while VACUUMing are tracked in pg_stat_io.
+
+-- Set wal_skip_threshold smaller than the expected size of
+-- test_io_vac_strategy so that, even if wal_level is minimal, VACUUM FULL will
+-- fsync the newly rewritten test_io_vac_strategy instead of writing it to WAL.
+-- Writing it to WAL will result in the newly written relation pages being in
+-- shared buffers -- preventing us from testing BAS_VACUUM BufferAccessStrategy
+-- reads.
+SET wal_skip_threshold = '1 kB';
+SELECT sum(reuses) AS reuses, sum(reads) AS reads
+  FROM pg_stat_io WHERE io_context = 'vacuum' \gset io_sum_vac_strategy_before_
+CREATE TABLE test_io_vac_strategy(a int, b int) WITH (autovacuum_enabled = 'false');
+INSERT INTO test_io_vac_strategy SELECT i, i from generate_series(1, 8000)i;
+-- Ensure that the next VACUUM will need to perform IO by rewriting the table
+-- first with VACUUM (FULL).
+VACUUM (FULL) test_io_vac_strategy;
+VACUUM (PARALLEL 0) test_io_vac_strategy;
+SELECT pg_stat_force_next_flush();
+SELECT sum(reuses) AS reuses, sum(reads) AS reads
+  FROM pg_stat_io WHERE io_context = 'vacuum' \gset io_sum_vac_strategy_after_
+SELECT :io_sum_vac_strategy_after_reads > :io_sum_vac_strategy_before_reads,
+       :io_sum_vac_strategy_after_reuses > :io_sum_vac_strategy_before_reuses;
+RESET wal_skip_threshold;
+
+-- Test that extends done by a CTAS, which uses a BAS_BULKWRITE
+-- BufferAccessStrategy, are tracked in pg_stat_io.
+SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_before
+  FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+CREATE TABLE test_io_bulkwrite_strategy AS SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
+  FROM pg_stat_io WHERE io_context = 'bulkwrite' \gset
+SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
+
+-- Test IO stats reset
+SELECT pg_stat_have_stats('io', 0, 0);
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) AS io_stats_pre_reset
+  FROM pg_stat_io \gset
+SELECT pg_stat_reset_shared('io');
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) AS io_stats_post_reset
+  FROM pg_stat_io \gset
+SELECT :io_stats_post_reset < :io_stats_pre_reset;
+
 -- End of Stats Test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1be6e07980..a399e0a5e4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3377,6 +3377,7 @@ intset_internal_node
 intset_leaf_node
 intset_node
 intvKEY
+io_stat_col
 itemIdCompact
 itemIdCompactData
 iterator
-- 
2.34.1

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Reply via email to