On Thu, Sep 23, 2021 at 5:05 PM Melanie Plageman
<melanieplage...@gmail.com> wrote:
>
> The attached v8 patchset is rewritten to add in an additional dimension
> -- buffer type. Now, a backend keeps track of how many buffers of a
> particular type (e.g. shared, local) it has accessed in a particular way
> (e.g. alloc, write). It also changes the naming of various structures
> and the view members.
>
> Previously, stats reset did not work since it did not consider live
> backends' counters. Now, the reset message includes the current live
> backends' counters to be tracked by the stats collector and used when
> the view is queried.
>
> The reset message is one of the areas in which I still need to do some
> work -- I shoved the array of PgBufferAccesses into the existing reset
> message used for checkpointer, bgwriter, etc. Before making a new type
> of message, I would like feedback from a reviewer about the approach.
>
> There are various TODOs in the code which are actually questions for the
> reviewer. Once I have some feedback, it will be easier to address these
> items.
>
> There a few other items which may be material for other commits that
> I would also like to do:
> 1) write wrapper functions for smgr* functions which count buffer
> accesses of the appropriate type. I wasn't sure if these should
> literally just take all the parameters that the smgr* functions take +
> buffer type. Once these exist, there will be less possibility for
> regressions in which new code is added using smgr* functions without
> counting this buffer activity. Once I add these, I was going to go
> through and replace existing calls to smgr* functions and thereby start
> counting currently uncounted buffer type accesses (direct, local, etc).
>
> 2) Separate checkpointer and bgwriter into two views and add additional
> stats to the bgwriter view.
>
> 3) Consider adding a helper function to pgstatfuncs.c to help create the
> tuplestore. These functions all have quite a few lines which are exactly
> the same, and I thought it might be nice to do something about that:
>   pg_stat_get_progress_info(PG_FUNCTION_ARGS)
>   pg_stat_get_activity(PG_FUNCTION_ARGS)
>   pg_stat_get_buffers_accesses(PG_FUNCTION_ARGS)
>   pg_stat_get_slru(PG_FUNCTION_ARGS)
>   pg_stat_get_progress_info(PG_FUNCTION_ARGS)
> I can imagine a function that takes a Datums array, a nulls array, and a
> ResultSetInfo and then makes the tuplestore -- though I think that will
> use more memory. Perhaps we could make a macro which does the initial
> error checking (checking if caller supports returning a tuplestore)? I'm
> not sure if there is something meaningful here, but I thought I would
> ask.
>
> Finally, I haven't removed the test in pg_stats and haven't done a final
> pass for comment clarity, alphabetization, etc on this version.
>

I have addressed almost all of the issues mentioned above in v9.
The only remaining TODOs are described in the commit message.
most critical one is that the reset message doesn't work.
From 9747484ad0b6f1fe97f98cfb681fa117982dfb2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 2 Sep 2021 11:47:41 -0400
Subject: [PATCH v9 3/3] Remove superfluous bgwriter stats

Remove stats from pg_stat_bgwriter which are now more clearly expressed
in pg_stat_buffers.

TODO:
- make pg_stat_checkpointer view and move relevant stats into it
- add additional stats to pg_stat_bgwriter
---
 doc/src/sgml/monitoring.sgml          | 47 ---------------------------
 src/backend/catalog/system_views.sql  |  6 +---
 src/backend/postmaster/checkpointer.c | 26 ---------------
 src/backend/postmaster/pgstat.c       |  5 ---
 src/backend/storage/buffer/bufmgr.c   |  6 ----
 src/backend/utils/adt/pgstatfuncs.c   | 30 -----------------
 src/include/catalog/pg_proc.dat       | 22 -------------
 src/include/pgstat.h                  | 10 ------
 src/test/regress/expected/rules.out   |  5 ---
 9 files changed, 1 insertion(+), 156 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 60627c692a..08772652ac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3416,24 +3416,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para></entry>
      </row>
 
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>buffers_checkpoint</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of buffers written during checkpoints
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>buffers_clean</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of buffers written by the background writer
-      </para></entry>
-     </row>
-
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>maxwritten_clean</structfield> <type>bigint</type>
@@ -3444,35 +3426,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para></entry>
      </row>
 
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>buffers_backend</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of buffers written directly by a backend
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>buffers_backend_fsync</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of times a backend had to execute its own
-       <function>fsync</function> call (normally the background writer handles those
-       even when the backend does its own write)
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>buffers_alloc</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of buffers allocated
-      </para></entry>
-     </row>
-
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 30280d520b..c45c261f4b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1058,18 +1058,14 @@ CREATE VIEW pg_stat_archiver AS
         s.stats_reset
     FROM pg_stat_get_archiver() s;
 
+-- TODO: make separate pg_stat_checkpointer view
 CREATE VIEW pg_stat_bgwriter AS
     SELECT
         pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
         pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
         pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
         pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
-        pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
-        pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
         pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
-        pg_stat_get_buf_written_backend() AS buffers_backend,
-        pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-        pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
 CREATE VIEW pg_stat_buffers AS
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index c0c4122fd5..829f52cc8f 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -90,17 +90,9 @@
  * requesting backends since the last checkpoint start.  The flags are
  * chosen so that OR'ing is the correct way to combine multiple requests.
  *
- * num_backend_writes is used to count the number of buffer writes performed
- * by user backend processes.  This counter should be wide enough that it
- * can't overflow during a single processing cycle.  num_backend_fsync
- * counts the subset of those writes that also had to do their own fsync,
- * because the checkpointer failed to absorb their request.
- *
  * The requests array holds fsync requests sent by backends and not yet
  * absorbed by the checkpointer.
  *
- * Unlike the checkpoint fields, num_backend_writes, num_backend_fsync, and
- * the requests fields are protected by CheckpointerCommLock.
  *----------
  */
 typedef struct
@@ -124,9 +116,6 @@ typedef struct
 	ConditionVariable start_cv; /* signaled when ckpt_started advances */
 	ConditionVariable done_cv;	/* signaled when ckpt_done advances */
 
-	uint32		num_backend_writes; /* counts user backend buffer writes */
-	uint32		num_backend_fsync;	/* counts user backend fsync calls */
-
 	int			num_requests;	/* current # of requests */
 	int			max_requests;	/* allocated array size */
 	CheckpointerRequest requests[FLEXIBLE_ARRAY_MEMBER];
@@ -1085,10 +1074,6 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 
 	LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
-	/* Count all backend writes regardless of if they fit in the queue */
-	if (!AmBackgroundWriterProcess())
-		CheckpointerShmem->num_backend_writes++;
-
 	/*
 	 * If the checkpointer isn't running or the request queue is full, the
 	 * backend will have to perform its own fsync request.  But before forcing
@@ -1102,8 +1087,6 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 		 * Count the subset of writes where backends have to do their own
 		 * fsync
 		 */
-		if (!AmBackgroundWriterProcess())
-			CheckpointerShmem->num_backend_fsync++;
 		pgstat_increment_buffer_access_type(BA_Fsync, Buf_Shared);
 		LWLockRelease(CheckpointerCommLock);
 		return false;
@@ -1261,15 +1244,6 @@ AbsorbSyncRequests(void)
 
 	LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
 
-	/* Transfer stats counts into pending pgstats message */
-	PendingCheckpointerStats.m_buf_written_backend
-		+= CheckpointerShmem->num_backend_writes;
-	PendingCheckpointerStats.m_buf_fsync_backend
-		+= CheckpointerShmem->num_backend_fsync;
-
-	CheckpointerShmem->num_backend_writes = 0;
-	CheckpointerShmem->num_backend_fsync = 0;
-
 	/*
 	 * We try to avoid holding the lock for a long time by copying the request
 	 * array, and processing the requests after releasing the lock.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 903d4df911..b8c17f8e7f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -5569,9 +5569,7 @@ pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len)
 static void
 pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 {
-	globalStats.bgwriter.buf_written_clean += msg->m_buf_written_clean;
 	globalStats.bgwriter.maxwritten_clean += msg->m_maxwritten_clean;
-	globalStats.bgwriter.buf_alloc += msg->m_buf_alloc;
 }
 
 /* ----------
@@ -5587,9 +5585,6 @@ pgstat_recv_checkpointer(PgStat_MsgCheckpointer *msg, int len)
 	globalStats.checkpointer.requested_checkpoints += msg->m_requested_checkpoints;
 	globalStats.checkpointer.checkpoint_write_time += msg->m_checkpoint_write_time;
 	globalStats.checkpointer.checkpoint_sync_time += msg->m_checkpoint_sync_time;
-	globalStats.checkpointer.buf_written_checkpoints += msg->m_buf_written_checkpoints;
-	globalStats.checkpointer.buf_written_backend += msg->m_buf_written_backend;
-	globalStats.checkpointer.buf_fsync_backend += msg->m_buf_fsync_backend;
 }
 
 static void
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 9832d35b90..997fff9f3f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2165,7 +2165,6 @@ BufferSync(int flags)
 			if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN)
 			{
 				TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
-				PendingCheckpointerStats.m_buf_written_checkpoints++;
 				num_written++;
 			}
 		}
@@ -2274,9 +2273,6 @@ BgBufferSync(WritebackContext *wb_context)
 	 */
 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
 
-	/* Report buffer alloc counts to pgstat */
-	PendingBgWriterStats.m_buf_alloc += recent_alloc;
-
 	/*
 	 * If we're not running the LRU scan, just stop after doing the stats
 	 * stuff.  We mark the saved state invalid so that we can recover sanely
@@ -2473,8 +2469,6 @@ BgBufferSync(WritebackContext *wb_context)
 			reusable_buffers++;
 	}
 
-	PendingBgWriterStats.m_buf_written_clean += num_written;
-
 #ifdef BGW_DEBUG
 	elog(DEBUG1, "bgwriter: recent_alloc=%u smoothed=%.2f delta=%ld ahead=%d density=%.2f reusable_est=%d upcoming_est=%d scanned=%d wrote=%d reusable=%d",
 		 recent_alloc, smoothed_alloc, strategy_delta, bufs_ahead,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 477caf2536..998625d490 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1738,18 +1738,6 @@ pg_stat_get_bgwriter_requested_checkpoints(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_checkpoints);
 }
 
-Datum
-pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->buf_written_checkpoints);
-}
-
-Datum
-pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_written_clean);
-}
-
 Datum
 pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS)
 {
@@ -1778,24 +1766,6 @@ pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
 	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_bgwriter()->stat_reset_timestamp);
 }
 
-Datum
-pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->buf_written_backend);
-}
-
-Datum
-pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->buf_fsync_backend);
-}
-
-Datum
-pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
-}
-
 Datum
 pg_stat_get_buffers_accesses(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 54661e2b5f..02f624c18c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5600,16 +5600,6 @@
   proname => 'pg_stat_get_bgwriter_requested_checkpoints', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => '',
   prosrc => 'pg_stat_get_bgwriter_requested_checkpoints' },
-{ oid => '2771',
-  descr => 'statistics: number of buffers written by the bgwriter during checkpoints',
-  proname => 'pg_stat_get_bgwriter_buf_written_checkpoints', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_bgwriter_buf_written_checkpoints' },
-{ oid => '2772',
-  descr => 'statistics: number of buffers written by the bgwriter for cleaning dirty buffers',
-  proname => 'pg_stat_get_bgwriter_buf_written_clean', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_bgwriter_buf_written_clean' },
 { oid => '2773',
   descr => 'statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning',
   proname => 'pg_stat_get_bgwriter_maxwritten_clean', provolatile => 's',
@@ -5629,18 +5619,6 @@
   proname => 'pg_stat_get_checkpoint_sync_time', provolatile => 's',
   proparallel => 'r', prorettype => 'float8', proargtypes => '',
   prosrc => 'pg_stat_get_checkpoint_sync_time' },
-{ oid => '2775', descr => 'statistics: number of buffers written by backends',
-  proname => 'pg_stat_get_buf_written_backend', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_buf_written_backend' },
-{ oid => '3063',
-  descr => 'statistics: number of backend buffer writes that did their own fsync',
-  proname => 'pg_stat_get_buf_fsync_backend', provolatile => 's',
-  proparallel => 'r', prorettype => 'int8', proargtypes => '',
-  prosrc => 'pg_stat_get_buf_fsync_backend' },
-{ oid => '2859', descr => 'statistics: number of buffer allocations',
-  proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
-  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
 { oid => '8459', descr => 'statistics: counts of various types of accesses of buffers done by each backend type',
   proname => 'pg_stat_get_buffers_accesses', provolatile => 's', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b73265ab13..9a3ffc9ee4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -486,9 +486,7 @@ typedef struct PgStat_MsgBgWriter
 {
 	PgStat_MsgHdr m_hdr;
 
-	PgStat_Counter m_buf_written_clean;
 	PgStat_Counter m_maxwritten_clean;
-	PgStat_Counter m_buf_alloc;
 } PgStat_MsgBgWriter;
 
 /* ----------
@@ -501,9 +499,6 @@ typedef struct PgStat_MsgCheckpointer
 
 	PgStat_Counter m_timed_checkpoints;
 	PgStat_Counter m_requested_checkpoints;
-	PgStat_Counter m_buf_written_checkpoints;
-	PgStat_Counter m_buf_written_backend;
-	PgStat_Counter m_buf_fsync_backend;
 	PgStat_Counter m_checkpoint_write_time; /* times in milliseconds */
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgCheckpointer;
@@ -879,9 +874,7 @@ typedef struct PgStat_ArchiverStats
  */
 typedef struct PgStat_BgWriterStats
 {
-	PgStat_Counter buf_written_clean;
 	PgStat_Counter maxwritten_clean;
-	PgStat_Counter buf_alloc;
 	TimestampTz stat_reset_timestamp;
 } PgStat_BgWriterStats;
 
@@ -895,9 +888,6 @@ typedef struct PgStat_CheckpointerStats
 	PgStat_Counter requested_checkpoints;
 	PgStat_Counter checkpoint_write_time;	/* times in milliseconds */
 	PgStat_Counter checkpoint_sync_time;
-	PgStat_Counter buf_written_checkpoints;
-	PgStat_Counter buf_written_backend;
-	PgStat_Counter buf_fsync_backend;
 } PgStat_CheckpointerStats;
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9172b0fcd2..ac2f7cf61e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1821,12 +1821,7 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
     pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
     pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
     pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
-    pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
-    pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
     pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
-    pg_stat_get_buf_written_backend() AS buffers_backend,
-    pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-    pg_stat_get_buf_alloc() AS buffers_alloc,
     pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 pg_stat_buffers| SELECT b.backend_type,
     b.buffer_type,
-- 
2.27.0

From b0a24e0cd0115f5bfb15a69693ce205a9dca841e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 24 Sep 2021 17:39:12 -0400
Subject: [PATCH v9 1/3] Allow bootstrap process to beinit

---
 src/backend/utils/init/postinit.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 78bc64671e..fba5864172 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -670,8 +670,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	EnablePortalManager();
 
 	/* Initialize status reporting */
-	if (!bootstrap)
-		pgstat_beinit();
+	pgstat_beinit();
 
 	/*
 	 * Load relcache entries for the shared system catalogs.  This must create
-- 
2.27.0

From f924873f296fa4691a41f38b4b3509d08ee3b62d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 2 Sep 2021 11:33:59 -0400
Subject: [PATCH v9 2/3] Add system view tracking accesses to buffers

Add pg_stat_buffers, a system view which tracks the number of buffers of
a particular type (e.g. shared, local) allocated, written, fsync'd, and
extended by each backend type.

Some of these should always be zero. For example, a checkpointer backend
will not use a BufferAccessStrategy (currently), so buffer type
"strategy" for checkpointer will be 0 for all buffer access types
(alloc, write, fsync, and extend).

All backends increment a counter in their PgBackendStatus when
performing a buffer access. On exit, backends send these stats to the
stats collector to be persisted.

When stats are reset, the reset message includes the current values of
all the live backends' buffer access counters. When receiving this
message, the stats collector will 1) save these reset values in an array
of "resets" and 2) zero out the exited backends' saved buffer access
counters. This is required for accurate stats after a reset without
writing to other backends' PgBackendStatus.

When the pg_stat_buffers view is queried, sum live backends' stats with
saved stats from exited backends and subtract saved reset stats,
returning the total.

Each row of the view is for a particular backend type and a particular
buffer type (e.g. shared buffer accesses by checkpointer) and each
column in the view is the total number of buffers of each kind of buffer
access (e.g. written). So a cell in the view would be, for example, the
number of shared buffers written by checkpointer since the last stats
reset.

Note that this commit does not add code to increment buffer accesses for
all types of buffers. It includes all possible combinations in the stats
view but doesn't populate all of them.

TODO:
- pgstat reset message too large -- needs to be fixed
- Wrappers for smgr funcs to protect against regressions and cover other
  buffer types
- Consider helper func in pgstatfuncs.c to refactor out some of the
  redundant tuplestore creation code from pg_stat_get_progress_info,
  pg_stat_get_activity, etc
- Remove pg_stats test I added
- current code TODOs are mostly about adding comments
- When finished, catalog bump
- pgindent
---
 doc/src/sgml/monitoring.sgml                | 116 ++++++++++++++++-
 src/backend/catalog/system_views.sql        |  11 ++
 src/backend/postmaster/checkpointer.c       |   1 +
 src/backend/postmaster/pgstat.c             | 115 ++++++++++++++++-
 src/backend/storage/buffer/bufmgr.c         |  25 +++-
 src/backend/storage/buffer/freelist.c       |  22 +++-
 src/backend/utils/activity/backend_status.c |  49 ++++++-
 src/backend/utils/adt/pgstatfuncs.c         | 136 ++++++++++++++++++++
 src/backend/utils/init/miscinit.c           |  25 ++++
 src/include/catalog/pg_proc.dat             |   9 ++
 src/include/miscadmin.h                     |  23 ++++
 src/include/pgstat.h                        |  35 ++++-
 src/include/storage/buf_internals.h         |   4 +-
 src/include/utils/backend_status.h          |  44 +++++++
 src/test/regress/expected/rules.out         |   8 ++
 src/test/regress/sql/stats.sql              |   4 +
 16 files changed, 610 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 2281ba120f..60627c692a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -444,6 +444,15 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_buffers</structname><indexterm><primary>pg_stat_buffers</primary></indexterm></entry>
+      <entry>A row for each buffer type for each backend type showing
+      statistics about backend buffer activity. See
+       <link linkend="monitoring-pg-stat-buffers-view">
+       <structname>pg_stat_buffers</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
       <entry>One row only, showing statistics about WAL activity. See
@@ -3478,6 +3487,101 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-buffers-view">
+  <title><structname>pg_stat_buffers</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_buffers</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_buffers</structname> view has a row for each buffer
+   type for each backend type, containing global data for the cluster for that
+   backend and buffer type.
+  </para>
+
+  <table id="pg-stat-buffer-actions-view" xreflabel="pg_stat_buffers">
+   <title><structname>pg_stat_buffers</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>backend_type</structfield> <type>text</type>
+      </para>
+      <para>
+       Type of backend (e.g. background worker, autovacuum worker).
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>buffer_type</structfield> <type>text</type>
+      </para>
+      <para>
+      Type of buffer accessed (e.g. shared).
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>alloc</structfield> <type>integer</type>
+      </para>
+      <para>
+       Number of buffers allocated.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>extend</structfield> <type>integer</type>
+      </para>
+      <para>
+       Number of buffers extended.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>fsync</structfield> <type>integer</type>
+      </para>
+      <para>
+       Number of buffers fsynced.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>write</structfield> <type>integer</type>
+      </para>
+      <para>
+       Number of buffers written.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+ </sect2>
+
  <sect2 id="monitoring-pg-stat-wal-view">
    <title><structname>pg_stat_wal</structname></title>
 
@@ -5074,12 +5178,14 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
        </para>
        <para>
         Resets some cluster-wide statistics counters to zero, depending on the
-        argument.  The argument can be <literal>bgwriter</literal> to reset
-        all the counters shown in
-        the <structname>pg_stat_bgwriter</structname>
+        argument.  The argument can be <literal>bgwriter</literal> to reset all
+        the counters shown in the <structname>pg_stat_bgwriter</structname>
         view, <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view or <literal>wal</literal>
-        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
+        the <structname>pg_stat_archiver</structname> view,
+        <literal>wal</literal> to reset all the counters shown in the
+        <structname>pg_stat_wal</structname> view, or
+        <literal>buffers</literal> to reset all the counters shown in the
+        <structname>pg_stat_buffers</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f6e3711d..30280d520b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1072,6 +1072,17 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_buffers AS
+SELECT
+       b.backend_type,
+       b.buffer_type,
+       b.alloc,
+       b.extend,
+       b.fsync,
+       b.write,
+       b.stats_reset
+FROM pg_stat_get_buffers_accesses() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index be7366379d..c0c4122fd5 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -1104,6 +1104,7 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 		 */
 		if (!AmBackgroundWriterProcess())
 			CheckpointerShmem->num_backend_fsync++;
+		pgstat_increment_buffer_access_type(BA_Fsync, Buf_Shared);
 		LWLockRelease(CheckpointerCommLock);
 		return false;
 	}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b7d0fbaefd..903d4df911 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -124,9 +124,12 @@ char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
 
 /*
- * BgWriter and WAL global statistics counters.
- * Stored directly in a stats message structure so they can be sent
- * without needing to copy things around.  We assume these init to zeroes.
+ * BgWriter, Checkpointer, WAL, and I/O global statistics counters. I/O global
+ * statistics on various buffer actions are tracked in PgBackendStatus while a
+ * backend is alive and then sent to stats collector before a backend exits in
+ * a PgStat_MsgBufferActions.
+ * All others are stored directly in a stats message structure so they can be
+ * sent without needing to copy things around.  We assume these init to zeroes.
  */
 PgStat_MsgBgWriter PendingBgWriterStats;
 PgStat_MsgCheckpointer PendingCheckpointerStats;
@@ -362,6 +365,7 @@ static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
 static void pgstat_recv_checkpointer(PgStat_MsgCheckpointer *msg, int len);
+static void pgstat_recv_buffer_type_accesses(PgStat_MsgBufferTypeAccesses *msg, int len);
 static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
@@ -974,6 +978,7 @@ pgstat_report_stat(bool disconnect)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+
 	/* Send WAL statistics */
 	pgstat_send_wal(true);
 
@@ -1452,6 +1457,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "buffers") == 0) {
+		memset(msg.resets, 0, sizeof(PgStat_MsgBufferTypeAccesses) * BACKEND_NUM_TYPES);
+
+		msg.m_resettarget = RESET_BUFFERS;
+
+		pgstat_report_live_backend_accesses(msg.resets);
+	}
 	else if (strcmp(target, "wal") == 0)
 		msg.m_resettarget = RESET_WAL;
 	else
@@ -2760,6 +2772,15 @@ pgstat_twophase_postabort(TransactionId xid, uint16 info,
 		rec->tuples_inserted + rec->tuples_updated;
 }
 
+// TODO: add comment?
+PgStat_BackendAccesses *
+pgstat_fetch_exited_backend_buffers(void)
+{
+	backend_read_statsfile();
+
+	return &globalStats.buffers;
+}
+
 
 /* ----------
  * pgstat_fetch_stat_dbentry() -
@@ -2998,6 +3019,13 @@ static void
 pgstat_shutdown_hook(int code, Datum arg)
 {
 	Assert(!pgstat_is_shutdown);
+	/*
+	 * Only need to send stats on buffer accesses when a process exits, as
+	 * pg_stat_get_buffers() will read from live backends' PgBackendStatus and
+	 * then sum this with totals from exited backends persisted by the stats
+	 * collector.
+	 */
+	pgstat_send_buffers();
 
 	/*
 	 * If we got as far as discovering our own database ID, we can report what
@@ -3148,6 +3176,47 @@ pgstat_send_bgwriter(void)
 	MemSet(&PendingBgWriterStats, 0, sizeof(PendingBgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_buffers() -
+ *
+ *		Before exiting, a backend sends its buffer access statistics to the
+ *		collector so that they may be persisted
+ * ----------
+ */
+void
+pgstat_send_buffers(void)
+{
+	PgStat_MsgBufferTypeAccesses msg;
+	PgBufferAccesses *src_accesses;
+	PgStatBufferAccesses *dest_accesses;
+	int buffer_type;
+
+	PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	MemSet(&msg, 0, sizeof(msg));
+	msg.backend_type = beentry->st_backendType;
+
+	src_accesses = (PgBufferAccesses *) &beentry->buffer_access_stats;
+	dest_accesses = msg.buffer_type_accesses;
+
+	for (buffer_type = 0; buffer_type < BUFFER_NUM_TYPES; buffer_type++)
+	{
+		dest_accesses->allocs += pg_atomic_read_u64(&src_accesses->allocs);
+		dest_accesses->extends += pg_atomic_read_u64(&src_accesses->extends);
+		dest_accesses->fsyncs += pg_atomic_read_u64(&src_accesses->fsyncs);
+		dest_accesses->writes += pg_atomic_read_u64(&src_accesses->writes);
+		dest_accesses++;
+		src_accesses++;
+	}
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_BUFFER_ACTIONS);
+	pgstat_send(&msg, sizeof(msg));
+}
+
+
 /* ----------
  * pgstat_send_checkpointer() -
  *
@@ -3522,6 +3591,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_checkpointer(&msg.msg_checkpointer, len);
 					break;
 
+				case PGSTAT_MTYPE_BUFFER_ACTIONS:
+					pgstat_recv_buffer_type_accesses(&msg.msg_buffer_accesses, len);
+					break;
+
 				case PGSTAT_MTYPE_WAL:
 					pgstat_recv_wal(&msg.msg_wal, len);
 					break;
@@ -5222,9 +5295,16 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 	if (msg->m_resettarget == RESET_BGWRITER)
 	{
 		/* Reset the global, bgwriter and checkpointer statistics for the cluster. */
-		memset(&globalStats, 0, sizeof(globalStats));
+		memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer));
+		memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter));
 		globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_BUFFERS)
+	{
+		memset(&globalStats.buffers.accesses, 0, sizeof(globalStats.buffers.accesses));
+		memcpy(globalStats.buffers.resets, msg->resets, sizeof(msg->resets));
+		globalStats.buffers.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 	else if (msg->m_resettarget == RESET_ARCHIVER)
 	{
 		/* Reset the archiver statistics for the cluster. */
@@ -5512,6 +5592,33 @@ pgstat_recv_checkpointer(PgStat_MsgCheckpointer *msg, int len)
 	globalStats.checkpointer.buf_fsync_backend += msg->m_buf_fsync_backend;
 }
 
+static void
+pgstat_recv_buffer_type_accesses(PgStat_MsgBufferTypeAccesses *msg, int len)
+{
+	int buffer_type;
+	PgStatBufferAccesses *src_buffer_accesses = msg->buffer_type_accesses;
+	PgStatBufferAccesses *dest_buffer_accesses = globalStats.buffers.accesses[msg->backend_type].buffer_type_accesses;
+
+	/*
+	 * No users will likely need PgStat_MsgBufferTypeAccesses->backend_type
+	 * when accessing it from globalStats since its place in the
+	 * globalStats.buffers.accesses array indicates backend_type. However,
+	 * leaving it undefined seemed like an invitation for unnecessary future
+	 * bugs.
+	 */
+	globalStats.buffers.accesses[msg->backend_type].backend_type = msg->backend_type;
+
+	for (buffer_type = 0; buffer_type < BUFFER_NUM_TYPES; buffer_type++)
+	{
+		PgStatBufferAccesses *src = &src_buffer_accesses[buffer_type];
+		PgStatBufferAccesses *dest = &dest_buffer_accesses[buffer_type];
+		dest->allocs += src->allocs;
+		dest->extends += src->extends;
+		dest->fsyncs += src->fsyncs;
+		dest->writes += src->writes;
+	}
+}
+
 /* ----------
  * pgstat_recv_wal() -
  *
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e88e4e918b..9832d35b90 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -972,6 +972,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	if (isExtend)
 	{
+		pgstat_increment_buffer_access_type(BA_Extend, Buf_Shared);
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
 		/* don't set checksum for all-zero page */
@@ -1172,6 +1173,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	/* Loop here in case we have to try another victim buffer */
 	for (;;)
 	{
+		bool from_ring;
 		/*
 		 * Ensure, while the spinlock's not yet held, that there's a free
 		 * refcount entry.
@@ -1182,7 +1184,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * Select a victim buffer.  The buffer is returned with its header
 		 * spinlock still held!
 		 */
-		buf = StrategyGetBuffer(strategy, &buf_state);
+		buf = StrategyGetBuffer(strategy, &buf_state, &from_ring);
 
 		Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);
 
@@ -1236,7 +1238,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 					UnlockBufHdr(buf, buf_state);
 
 					if (XLogNeedsFlush(lsn) &&
-						StrategyRejectBuffer(strategy, buf))
+						StrategyRejectBuffer(strategy, buf, &from_ring))
 					{
 						/* Drop lock/pin and loop around for another buffer */
 						LWLockRelease(BufferDescriptorGetContentLock(buf));
@@ -1245,6 +1247,23 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 					}
 				}
 
+				/*
+				 * When a strategy is in use, if the dirty buffer was selected
+				 * from the strategy ring and we did not bother checking the
+				 * freelist or doing a clock sweep to look for a clean shared
+				 * buffer to use, the write will be counted as a strategy
+				 * write. However, if the dirty buffer was obtained from the
+				 * freelist or a clock sweep, it is counted as a regular write.
+				 * When a strategy is not in use, at this point, the write can
+				 * only be a "regular" write of a dirty buffer.
+				 */
+
+				if (from_ring)
+					pgstat_increment_buffer_access_type(BA_Write, Buf_Strategy);
+				else
+					pgstat_increment_buffer_access_type(BA_Write, Buf_Shared);
+
+
 				/* OK, do the I/O */
 				TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum, blockNum,
 														  smgr->smgr_rnode.node.spcNode,
@@ -2552,6 +2571,8 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
 	 * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
 	 * buffer is clean by the time we've locked it.)
 	 */
+
+	pgstat_increment_buffer_access_type(BA_Write, Buf_Shared);
 	PinBuffer_Locked(bufHdr);
 	LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
 
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 6be80476db..866cdd3911 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -19,6 +19,7 @@
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 #include "storage/proc.h"
+#include "utils/backend_status.h"
 
 #define INT_ACCESS_ONCE(var)	((int)(*((volatile int *)&(var))))
 
@@ -198,7 +199,7 @@ have_free_buffer(void)
  *	return the buffer with the buffer header spinlock still held.
  */
 BufferDesc *
-StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
+StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state, bool *from_ring)
 {
 	BufferDesc *buf;
 	int			bgwprocno;
@@ -212,6 +213,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	if (strategy != NULL)
 	{
 		buf = GetBufferFromRing(strategy, buf_state);
+		*from_ring = buf == NULL ? false : true;
 		if (buf != NULL)
 			return buf;
 	}
@@ -247,6 +249,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
 	 * the rate of buffer consumption.  Note that buffers recycled by a
 	 * strategy object are intentionally not counted here.
 	 */
+	pgstat_increment_buffer_access_type(BA_Alloc, Buf_Shared);
 	pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1);
 
 	/*
@@ -683,8 +686,14 @@ AddBufferToRing(BufferAccessStrategy strategy, BufferDesc *buf)
  * if this buffer should be written and re-used.
  */
 bool
-StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
+StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool *from_ring)
 {
+	/*
+	 * If we decide to use the dirty buffer selected by StrategyGetBuffer, then
+	 * ensure that we count it as such in pg_stat_buffers view.
+	 */
+	*from_ring = true;
+
 	/* We only do this in bulkread mode */
 	if (strategy->btype != BAS_BULKREAD)
 		return false;
@@ -700,5 +709,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
 	 */
 	strategy->buffers[strategy->current] = InvalidBuffer;
 
+	/*
+	 * Since we will not be writing out a dirty buffer from the ring, set
+	 * from_ring to false so that the caller does not count this write as a
+	 * "strategy write" and can do proper bookkeeping for
+	 * pg_stat_buffers.
+	 */
+	*from_ring = false;
+
+
 	return true;
 }
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 7229598822..0581dce8b9 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -279,7 +279,7 @@ pgstat_beinit(void)
  * pgstat_bestart() -
  *
  *	Initialize this backend's entry in the PgBackendStatus array.
- *	Called from InitPostgres.
+ *	Called from InitPostgres and AuxiliaryProcessMain
  *
  *	Apart from auxiliary processes, MyBackendId, MyDatabaseId,
  *	session userid, and application_name must be set for a
@@ -293,6 +293,7 @@ pgstat_bestart(void)
 {
 	volatile PgBackendStatus *vbeentry = MyBEEntry;
 	PgBackendStatus lbeentry;
+	int buffer_type;
 #ifdef USE_SSL
 	PgBackendSSLStatus lsslstatus;
 #endif
@@ -399,6 +400,14 @@ pgstat_bestart(void)
 	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
 	lbeentry.st_progress_command_target = InvalidOid;
 	lbeentry.st_query_id = UINT64CONST(0);
+	for (buffer_type = 0; buffer_type < BUFFER_NUM_TYPES; buffer_type++)
+	{
+		PgBufferAccesses *accesses = &lbeentry.buffer_access_stats[buffer_type];
+		pg_atomic_init_u64(&accesses->allocs, 0);
+		pg_atomic_init_u64(&accesses->extends, 0);
+		pg_atomic_init_u64(&accesses->fsyncs, 0);
+		pg_atomic_init_u64(&accesses->writes, 0);
+	}
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should
@@ -621,6 +630,38 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+// TODO: function comment
+void pgstat_report_live_backend_accesses(PgStat_MsgBufferTypeAccesses *backend_accesses)
+{
+	int i, buffer_type;
+	PgBackendStatus *beentry = BackendStatusArray;
+	/*
+		* Loop through live backends and capture reset values
+		*/
+	for (i = 0; i < MaxBackends + NUM_AUXPROCTYPES; i++)
+	{
+		PgBufferAccesses *live_accesses;
+		PgStatBufferAccesses *buffer_accesses;
+		beentry++;
+		/* Don't count dead backends. They should already be counted */
+		if (beentry->st_procpid == 0)
+			continue;
+
+		live_accesses = (PgBufferAccesses *) beentry->buffer_access_stats;
+		buffer_accesses = backend_accesses[beentry->st_backendType].buffer_type_accesses;
+
+		for (buffer_type = 0; buffer_type < BUFFER_NUM_TYPES; buffer_type++)
+		{
+			buffer_accesses->allocs = pg_atomic_read_u64(&live_accesses->allocs);
+			buffer_accesses->extends = pg_atomic_read_u64(&live_accesses->extends);
+			buffer_accesses->fsyncs = pg_atomic_read_u64(&live_accesses->fsyncs);
+			buffer_accesses->writes = pg_atomic_read_u64(&live_accesses->writes);
+			buffer_accesses++;
+			live_accesses++;
+		}
+	}
+}
+
 /* --------
  * pgstat_report_query_id() -
  *
@@ -1046,6 +1087,12 @@ pgstat_get_my_query_id(void)
 }
 
 
+PgBackendStatus *
+pgstat_fetch_backend_statuses(void)
+{
+	return BackendStatusArray;
+}
+
 /* ----------
  * pgstat_fetch_stat_beentry() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ff5aedc99c..477caf2536 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1796,6 +1796,142 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
 }
 
+Datum
+pg_stat_get_buffers_accesses(PG_FUNCTION_ARGS)
+{
+#define NROWS ((BACKEND_NUM_TYPES - 1) * BUFFER_NUM_TYPES)
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	PgStat_BackendAccesses *backend_accesses;
+	int buffer_type;
+	int backend_type;
+	Datum reset_time;
+	int i;
+	PgBackendStatus *beentry;
+
+	enum {
+		COLUMN_BACKEND_TYPE,
+		COLUMN_BUFFER_TYPE,
+		COLUMN_ALLOCS,
+		COLUMN_EXTENDS,
+		COLUMN_FSYNCS,
+		COLUMN_WRITES,
+		COLUMN_RESET_TIME,
+		COLUMN_LENGTH,
+	};
+
+	Datum all_values[NROWS][COLUMN_LENGTH];
+	bool all_nulls[NROWS][COLUMN_LENGTH];
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	if (!(rsinfo->allowedModes & SFRM_Materialize))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	memset(all_values, 0, sizeof(all_values));
+	memset(all_nulls, 0, sizeof(all_nulls));
+
+	/*
+	 * Loop through all live backends and count their buffer accesses for each
+	 * buffer type
+	 */
+	beentry = pgstat_fetch_backend_statuses();
+
+	for (i = 0; i < MaxBackends + NUM_AUXPROCTYPES; i++)
+	{
+		PgBufferAccesses *buffer_accesses;
+		beentry++;
+		/* Don't count dead backends. They should already be counted */
+		if (beentry->st_procpid == 0)
+			continue;
+
+		buffer_accesses = beentry->buffer_access_stats;
+
+		for (buffer_type = 0; buffer_type < BUFFER_NUM_TYPES; buffer_type++)
+		{
+			int rownum = (beentry->st_backendType - 1) * BUFFER_NUM_TYPES + buffer_type;
+			Datum *values = all_values[rownum];
+
+			/*
+			 * COLUMN_RESET_TIME, COLUMN_BACKEND_TYPE, and COLUMN_BUFFER_TYPE
+			 * will all be set when looping through exited backends array
+			 */
+			values[COLUMN_ALLOCS] += pg_atomic_read_u64(&buffer_accesses->allocs);
+			values[COLUMN_EXTENDS] += pg_atomic_read_u64(&buffer_accesses->extends);
+			values[COLUMN_FSYNCS] += pg_atomic_read_u64(&buffer_accesses->fsyncs);
+			values[COLUMN_WRITES] += pg_atomic_read_u64(&buffer_accesses->writes);
+			buffer_accesses++;
+		}
+	}
+
+	/* Add stats from all exited backends */
+	backend_accesses = pgstat_fetch_exited_backend_buffers();
+
+	reset_time = TimestampTzGetDatum(backend_accesses->stat_reset_timestamp);
+
+	/* 0 is not a valid BackendType */
+	for (backend_type = 1; backend_type < BACKEND_NUM_TYPES; backend_type++)
+	{
+		PgStatBufferAccesses *buffer_accesses = backend_accesses->accesses[backend_type].buffer_type_accesses;
+		PgStatBufferAccesses *resets = backend_accesses->resets[backend_type].buffer_type_accesses;
+
+		Datum backend_type_desc = CStringGetTextDatum(GetBackendTypeDesc(backend_type));
+
+		for (buffer_type = 0; buffer_type < BUFFER_NUM_TYPES; buffer_type++)
+		{
+			/*
+			 * Subtract 1 from backend_type to avoid having rows for B_INVALID
+			 * BackendType
+			 */
+			Datum *values = all_values[(backend_type - 1) * BUFFER_NUM_TYPES + buffer_type];
+
+			values[COLUMN_BACKEND_TYPE] = backend_type_desc;
+			values[COLUMN_BUFFER_TYPE] = CStringGetTextDatum(GetBufferTypeDesc(buffer_type));
+			values[COLUMN_ALLOCS] = values[COLUMN_ALLOCS] + buffer_accesses->allocs - resets->allocs;
+			values[COLUMN_EXTENDS] = values[COLUMN_EXTENDS] + buffer_accesses->extends - resets->extends;
+			values[COLUMN_FSYNCS] = values[COLUMN_FSYNCS] + buffer_accesses->fsyncs - resets->fsyncs;
+			values[COLUMN_WRITES] = values[COLUMN_WRITES] + buffer_accesses->writes - resets->writes;
+			values[COLUMN_RESET_TIME] = reset_time;
+			buffer_accesses++;
+			resets++;
+		}
+	}
+
+	for (i = 0; i < NROWS; i++)
+	{
+		Datum *values = all_values[i];
+		bool *nulls = all_nulls[i];
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	/* clean up and return the tuplestore */
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 88801374b5..50b50d00ce 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -299,6 +299,31 @@ GetBackendTypeDesc(BackendType backendType)
 	return backendDesc;
 }
 
+const char *
+GetBufferTypeDesc(BufferType bufferType)
+{
+	const char *bufferDesc = "unknown buffer type";
+
+	switch (bufferType)
+	{
+		case Buf_Direct:
+			bufferDesc = "direct";
+			break;
+		case Buf_Local:
+			bufferDesc = "local";
+			break;
+		case Buf_Shared:
+			bufferDesc = "shared";
+			break;
+		case Buf_Strategy:
+			bufferDesc = "strategy";
+			break;
+	}
+
+	return bufferDesc;
+}
+
+
 /* ----------------------------------------------------------------
  *				database path / name support stuff
  * ----------------------------------------------------------------
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d068d6532e..54661e2b5f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5642,6 +5642,15 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '8459', descr => 'statistics: counts of various types of accesses of buffers done by each backend type',
+  proname => 'pg_stat_get_buffers_accesses', provolatile => 's', proisstrict => 'f',
+  prorows => '52', proretset => 't',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,text,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,buffer_type,alloc,extend,fsync,write,stats_reset}',
+  prosrc => 'pg_stat_get_buffers_accesses' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 90a3016065..266d32835c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -338,9 +338,32 @@ typedef enum BackendType
 	B_LOGGER,
 } BackendType;
 
+#define BACKEND_NUM_TYPES (B_LOGGER + 1)
+
+typedef enum BufferAccessType
+{
+	BA_Alloc,
+	BA_Extend,
+	BA_Fsync,
+	BA_Write,
+}	BufferAccessType;
+
+#define BUFFER_ACCESS_NUM_TYPES (BA_Write + 1)
+
+typedef enum BufferType
+{
+	Buf_Direct,
+	Buf_Local,
+	Buf_Shared,
+	Buf_Strategy,
+} BufferType;
+
+#define BUFFER_NUM_TYPES (Buf_Strategy + 1)
+
 extern BackendType MyBackendType;
 
 extern const char *GetBackendTypeDesc(BackendType backendType);
+extern const char * GetBufferTypeDesc(BufferType bufferType);
 
 extern void SetDatabasePath(const char *path);
 extern void checkDataDir(void);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index bcd3588ea2..b73265ab13 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -72,6 +72,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
 	PGSTAT_MTYPE_CHECKPOINTER,
+	PGSTAT_MTYPE_BUFFER_ACTIONS,
 	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
@@ -138,6 +139,7 @@ typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
 	RESET_BGWRITER,
+	RESET_BUFFERS,
 	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
@@ -224,7 +226,9 @@ typedef struct PgStat_MsgHdr
  * platforms, but we're being conservative here.)
  * ----------
  */
-#define PGSTAT_MAX_MSG_SIZE 1000
+// TODO: how sketchy is this? What can I do instead? The array of counters for
+// reset message is 2kB, I think
+#define PGSTAT_MAX_MSG_SIZE 3000
 #define PGSTAT_MSG_PAYLOAD	(PGSTAT_MAX_MSG_SIZE - sizeof(PgStat_MsgHdr))
 
 
@@ -331,6 +335,30 @@ typedef struct PgStat_MsgDropdb
 } PgStat_MsgDropdb;
 
 
+// TODO: add comment
+typedef struct PgStatBufferAccesses
+{
+	PgStat_Counter allocs;
+	PgStat_Counter extends;
+	PgStat_Counter fsyncs;
+	PgStat_Counter writes;
+} PgStatBufferAccesses;
+
+typedef struct PgStat_MsgBufferTypeAccesses
+{
+	PgStat_MsgHdr m_hdr;
+
+	BackendType backend_type;
+	PgStatBufferAccesses buffer_type_accesses[BUFFER_NUM_TYPES];
+} PgStat_MsgBufferTypeAccesses;
+
+typedef struct PgStat_BackendAccesses
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_MsgBufferTypeAccesses accesses[BACKEND_NUM_TYPES];
+	PgStat_MsgBufferTypeAccesses resets[BACKEND_NUM_TYPES];
+} PgStat_BackendAccesses;
+
 /* ----------
  * PgStat_MsgResetcounter		Sent by the backend to tell the collector
  *								to reset counters
@@ -351,6 +379,7 @@ typedef struct PgStat_MsgResetsharedcounter
 {
 	PgStat_MsgHdr m_hdr;
 	PgStat_Shared_Reset_Target m_resettarget;
+	PgStat_MsgBufferTypeAccesses resets[BACKEND_NUM_TYPES];
 } PgStat_MsgResetsharedcounter;
 
 /* ----------
@@ -703,6 +732,7 @@ typedef union PgStat_Msg
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
 	PgStat_MsgCheckpointer msg_checkpointer;
+	PgStat_MsgBufferTypeAccesses msg_buffer_accesses;
 	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
@@ -879,6 +909,7 @@ typedef struct PgStat_GlobalStats
 
 	PgStat_CheckpointerStats checkpointer;
 	PgStat_BgWriterStats bgwriter;
+	PgStat_BackendAccesses buffers;
 } PgStat_GlobalStats;
 
 /*
@@ -1118,6 +1149,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_buffers(void);
 extern void pgstat_send_checkpointer(void);
 extern void pgstat_send_wal(bool force);
 
@@ -1126,6 +1158,7 @@ extern void pgstat_send_wal(bool force);
  * generate the pgstat* views.
  * ----------
  */
+extern PgStat_BackendAccesses * pgstat_fetch_exited_backend_buffers(void);
 extern PgStat_StatDBEntry *pgstat_fetch_stat_dbentry(Oid dbid);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
 extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 33fcaf5c9a..7e385135db 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -310,10 +310,10 @@ extern void ScheduleBufferTagForWriteback(WritebackContext *context, BufferTag *
 
 /* freelist.c */
 extern BufferDesc *StrategyGetBuffer(BufferAccessStrategy strategy,
-									 uint32 *buf_state);
+									 uint32 *buf_state, bool *from_ring);
 extern void StrategyFreeBuffer(BufferDesc *buf);
 extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
-								 BufferDesc *buf);
+								 BufferDesc *buf, bool *from_ring);
 
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern void StrategyNotifyBgWriter(int bgwprocno);
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 8042b817df..0364779978 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -13,6 +13,7 @@
 #include "datatype/timestamp.h"
 #include "libpq/pqcomm.h"
 #include "miscadmin.h"			/* for BackendType */
+#include "port/atomics.h"
 #include "utils/backend_progress.h"
 
 
@@ -37,6 +38,15 @@ typedef enum BackendState
  * ----------
  */
 
+// TODO: add a comment
+typedef struct PgBufferAccesses
+{
+	pg_atomic_uint64 allocs;
+	pg_atomic_uint64 extends;
+	pg_atomic_uint64 fsyncs;
+	pg_atomic_uint64 writes;
+} PgBufferAccesses;
+
 /*
  * PgBackendSSLStatus
  *
@@ -168,6 +178,7 @@ typedef struct PgBackendStatus
 
 	/* query identifier, optionally computed using post_parse_analyze_hook */
 	uint64		st_query_id;
+	PgBufferAccesses buffer_access_stats[BUFFER_NUM_TYPES];
 } PgBackendStatus;
 
 
@@ -296,7 +307,39 @@ extern void pgstat_bestart(void);
 extern void pgstat_clear_backend_activity_snapshot(void);
 
 /* Activity reporting functions */
+typedef struct PgStat_MsgBufferTypeAccesses PgStat_MsgBufferTypeAccesses;
+
+static inline void
+pgstat_increment_buffer_access_type(BufferAccessType ba_type, BufferType buf_type)
+{
+	PgBufferAccesses *accesses;
+	PgBackendStatus *beentry   = MyBEEntry;
+
+	Assert(beentry);
+
+	accesses = &beentry->buffer_access_stats[buf_type];
+	switch (ba_type)
+	{
+		case BA_Alloc:
+			pg_atomic_write_u64(&accesses->allocs,
+					pg_atomic_read_u64(&accesses->allocs) + 1);
+			break;
+		case BA_Extend:
+			pg_atomic_write_u64(&accesses->extends,
+					pg_atomic_read_u64(&accesses->extends) + 1);
+			break;
+		case BA_Fsync:
+			pg_atomic_write_u64(&accesses->fsyncs,
+					pg_atomic_read_u64(&accesses->fsyncs) + 1);
+			break;
+		case BA_Write:
+			pg_atomic_write_u64(&accesses->writes,
+					pg_atomic_read_u64(&accesses->writes) + 1);
+			break;
+	}
+}
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
+extern void pgstat_report_live_backend_accesses(PgStat_MsgBufferTypeAccesses *backend_accesses);
 extern void pgstat_report_query_id(uint64 query_id, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
@@ -312,6 +355,7 @@ extern uint64 pgstat_get_my_query_id(void);
  * generate the pgstat* views.
  * ----------
  */
+extern PgBackendStatus *pgstat_fetch_backend_statuses(void);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
 extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2fa00a3c29..9172b0fcd2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1828,6 +1828,14 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
     pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
     pg_stat_get_buf_alloc() AS buffers_alloc,
     pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+pg_stat_buffers| SELECT b.backend_type,
+    b.buffer_type,
+    b.alloc,
+    b.extend,
+    b.fsync,
+    b.write,
+    b.stats_reset
+   FROM pg_stat_get_buffers_accesses() b(backend_type, buffer_type, alloc, extend, fsync, write, stats_reset);
 pg_stat_database| SELECT d.oid AS datid,
     d.datname,
         CASE
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index feaaee6326..4ad672b35a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -176,4 +176,8 @@ FROM prevstats AS pr;
 
 DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
 DROP TABLE prevstats;
+SELECT * FROM pg_stat_buffers;
+SELECT pg_stat_reset_shared('buffers');
+SELECT pg_sleep(2);
+SELECT * FROM pg_stat_buffers;
 -- End of Stats Test
-- 
2.27.0

Reply via email to