On Fri, Oct 8, 2021 at 1:56 PM Andres Freund <and...@anarazel.de> wrote:
>
> Hi,
>
> On 2021-10-01 16:05:31 -0400, Melanie Plageman wrote:
> > From 40c809ad1127322f3462e85be080c10534485f0d Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplage...@gmail.com>
> > Date: Fri, 24 Sep 2021 17:39:12 -0400
> > Subject: [PATCH v13 1/4] Allow bootstrap process to beinit
> >
> > ---
> >  src/backend/utils/init/postinit.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/src/backend/utils/init/postinit.c 
> > b/src/backend/utils/init/postinit.c
> > index 78bc64671e..fba5864172 100644
> > --- a/src/backend/utils/init/postinit.c
> > +++ b/src/backend/utils/init/postinit.c
> > @@ -670,8 +670,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const 
> > char *username,
> >       EnablePortalManager();
> >
> >       /* Initialize status reporting */
> > -     if (!bootstrap)
> > -             pgstat_beinit();
> > +     pgstat_beinit();
> >
> >       /*
> >        * Load relcache entries for the shared system catalogs.  This must 
> > create
> > --
> > 2.27.0
> >
>
> I think it's good to remove more and more of these !bootstrap cases - they
> really make it harder to understand the state of the system at various
> points. Optimizing for the rarely executed bootstrap mode at the cost of
> checks in very common codepaths...

What scope do you suggest for this patch set? A single patch which does
this in more locations (remove !bootstrap) or should I remove this patch
from the patchset?

>
>
>
> > From a709ddb30b2b747beb214f0b13cd1e1816094e6b Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplage...@gmail.com>
> > Date: Thu, 30 Sep 2021 16:16:22 -0400
> > Subject: [PATCH v13 2/4] Add utility to make tuplestores for pg stat views
> >
> > Most of the steps to make a tuplestore for those pg_stat views requiring
> > one are the same. Consolidate them into a single helper function for
> > clarity and to avoid bugs.
> > ---
> >  src/backend/utils/adt/pgstatfuncs.c | 129 ++++++++++------------------
> >  1 file changed, 44 insertions(+), 85 deletions(-)
> >
> > diff --git a/src/backend/utils/adt/pgstatfuncs.c 
> > b/src/backend/utils/adt/pgstatfuncs.c
> > index ff5aedc99c..513f5aecf6 100644
> > --- a/src/backend/utils/adt/pgstatfuncs.c
> > +++ b/src/backend/utils/adt/pgstatfuncs.c
> > @@ -36,6 +36,42 @@
> >
> >  #define HAS_PGSTAT_PERMISSIONS(role)  (is_member_of_role(GetUserId(), 
> > ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role))
> >
> > +/*
> > + * Helper function for views with multiple rows constructed from a 
> > tuplestore
> > + */
> > +static Tuplestorestate *
> > +pg_stat_make_tuplestore(FunctionCallInfo fcinfo, TupleDesc *tupdesc)
> > +{
> > +     Tuplestorestate *tupstore;
> > +     ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > +     MemoryContext per_query_ctx;
> > +     MemoryContext oldcontext;
> > +
> > +     /* check to see if caller supports us returning a tuplestore */
> > +     if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
> > +             ereport(ERROR,
> > +                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +                              errmsg("set-valued function called in 
> > context that cannot accept a set")));
> > +     if (!(rsinfo->allowedModes & SFRM_Materialize))
> > +             ereport(ERROR,
> > +                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +                              errmsg("materialize mode required, but it is 
> > not allowed in this context")));
> > +
> > +     /* Build a tuple descriptor for our result type */
> > +     if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
> > +             elog(ERROR, "return type must be a row type");
> > +
> > +     per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
> > +     oldcontext = MemoryContextSwitchTo(per_query_ctx);
> > +
> > +     tupstore = tuplestore_begin_heap(true, false, work_mem);
> > +     rsinfo->returnMode = SFRM_Materialize;
> > +     rsinfo->setResult = tupstore;
> > +     rsinfo->setDesc = *tupdesc;
> > +     MemoryContextSwitchTo(oldcontext);
> > +     return tupstore;
> > +}
>
> Is pgstattuple the best place for this helper? It's not really pgstatfuncs
> specific...
>
> It also looks vaguely familiar - I wonder if we have a helper roughly like
> this somewhere else already...
>

I don't see a function which is specifically a utility to make a
tuplestore. Looking at the callers of tuplestore_begin_heap(), I notice
very similar code to the function I added in pg_tablespace_databases()
in utils/adt/misc.c, pg_stop_backup_v2() in xlogfuncs.c,
pg_event_trigger_dropped_objects() and pg_event_trigger_ddl_commands in
event_tigger.c, pg_available_extensions in extension.c, etc.

Do you think it makes sense to refactor this code out of all of these
places? If so, where would such a utility function belong?

>
>
> > From e9a5d2a021d429fdbb2daa58ab9d75a069f334d4 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplage...@gmail.com>
> > Date: Wed, 29 Sep 2021 15:39:45 -0400
> > Subject: [PATCH v13 3/4] Add system view tracking IO ops per backend type
> >
>
> > diff --git a/src/backend/postmaster/checkpointer.c 
> > b/src/backend/postmaster/checkpointer.c
> > index be7366379d..0d18e7f71a 100644
> > --- a/src/backend/postmaster/checkpointer.c
> > +++ b/src/backend/postmaster/checkpointer.c
> > @@ -1104,6 +1104,7 @@ ForwardSyncRequest(const FileTag *ftag, 
> > SyncRequestType type)
> >                */
> >               if (!AmBackgroundWriterProcess())
> >                       CheckpointerShmem->num_backend_fsync++;
> > +             pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED);
> >               LWLockRelease(CheckpointerCommLock);
> >               return false;
> >       }
>
> ISTM this doens't need to happen while holding CheckpointerCommLock?
>

Fixed in attached updates. I only attached the diff from my previous version.

>
>
> > @@ -1461,7 +1467,25 @@ pgstat_reset_shared_counters(const char *target)
> >                                errhint("Target must be \"archiver\", 
> > \"bgwriter\", or \"wal\".")));
> >
> >       pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
> > -     pgstat_send(&msg, sizeof(msg));
> > +
> > +     if (msg.m_resettarget == RESET_BUFFERS)
> > +     {
> > +             int                     backend_type;
> > +             PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> > +
> > +             memset(ops, 0, sizeof(ops));
> > +             pgstat_report_live_backend_io_path_ops(ops);
> > +
> > +             for (backend_type = 1; backend_type < BACKEND_NUM_TYPES; 
> > backend_type++)
> > +             {
> > +                     msg.m_backend_resets.backend_type = backend_type;
> > +                     memcpy(&msg.m_backend_resets.iop, &ops[backend_type], 
> > sizeof(msg.m_backend_resets.iop));
> > +                     pgstat_send(&msg, sizeof(msg));
> > +             }
> > +     }
> > +     else
> > +             pgstat_send(&msg, sizeof(msg));
> > +
> >  }
>
> I'd perhaps put this in a small helper function.
>

Done.

>
> >  /* ----------
> >   * pgstat_fetch_stat_dbentry() -
> > @@ -2999,6 +3036,14 @@ pgstat_shutdown_hook(int code, Datum arg)
> >  {
> >       Assert(!pgstat_is_shutdown);
> >
> > +     /*
> > +      * Only need to send stats on IO Ops for IO Paths when a process 
> > exits, as
> > +      * pg_stat_get_buffers() will read from live backends' 
> > PgBackendStatus and
> > +      * then sum this with totals from exited backends persisted by the 
> > stats
> > +      * collector.
> > +      */
> > +     pgstat_send_buffers();
> > +
> >       /*
> >        * If we got as far as discovering our own database ID, we can report 
> > what
> >        * we did to the collector.  Otherwise, we'd be sending an invalid
> > @@ -3092,6 +3137,30 @@ pgstat_send(void *msg, int len)
> >  #endif
> >  }
>
> I think it might be nicer to move pgstat_beshutdown_hook() to be a
> before_shmem_exit(), and do this in there.
>

Not really sure the correct way to do this. A cursory attempt to do so
failed because ShutdownXLOG() is also registered as a
before_shmem_exit() and ends up being called after
pgstat_beshutdown_hook(). pgstat_beshutdown_hook() zeroes out
PgBackendStatus, ShutdownXLOG() initiates a checkpoint, and during a
checkpoint, the checkpointer increments IO op counter for writes in its
PgBackendStatus.

>
> > +/*
> > + * Add live IO Op stats for all IO Paths (e.g. shared, local) to those in 
> > the
> > + * equivalent stats structure for exited backends. Note that this adds and
> > + * doesn't set, so the destination stats structure should be zeroed out by 
> > the
> > + * caller initially. This would commonly be used to transfer all IO Op 
> > stats
> > + * for all IO Paths for a particular backend type to the pgstats structure.
> > + */
>
> This seems a bit odd. Why not zero it in here? Perhaps it also should be
> called something like _sum_ instead of _add_?
>

I wanted to be able to use the function both when it was setting the
values and when it needed to add to the values (which are the two
current callers). I have changed the name from add -> sum.

>
> > +void
> > +pgstat_add_io_path_ops(PgStatIOOps *dest, IOOps *src, int 
> > io_path_num_types)
> > +{
>
> Why is io_path_num_types a parameter?
>

I imagined that maybe another caller would want to only add some IO path
types and still use the function, but I think it is more confusing than
anything else so I've changed it.

>
> > +static void
> > +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> > +{
> > +     int                     io_path;
> > +     PgStatIOOps *src_io_path_ops = msg->iop.io_path_ops;
> > +     PgStatIOOps *dest_io_path_ops =
> > +     globalStats.buffers.ops[msg->backend_type].io_path_ops;
> > +
> > +     for (io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> > +     {
> > +             PgStatIOOps *src = &src_io_path_ops[io_path];
> > +             PgStatIOOps *dest = &dest_io_path_ops[io_path];
> > +
> > +             dest->allocs += src->allocs;
> > +             dest->extends += src->extends;
> > +             dest->fsyncs += src->fsyncs;
> > +             dest->writes += src->writes;
> > +     }
> > +}
>
> Could this, with a bit of finessing, use pgstat_add_io_path_ops()?
>

I didn't really see a good way to do this -- given that
pgstat_add_io_path_ops() adds IOOps members to PgStatIOOps members --
which requires a pg_atomic_read_u64() and pgstat_recv_io_path_ops adds
PgStatIOOps to PgStatIOOps which doesn't require pg_atomic_read_u64().
Maybe I could pass a flag which, based on the type, either does or
doesn't use pg_atomic_read_u64 to access the value? But that seems worse
to me.

>
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
>
> What about writes originating in like FlushRelationBuffers()?
>

Yes, I have made IOPath a parameter to FlushBuffer() so that it can
distinguish between strategy buffer writes and shared buffer writes and
then pushed pgstat_inc_ioop() into FlushBuffer().

>
> >  bool
> > -StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)
> > +StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool 
> > *from_ring)
> >  {
> > +     /*
> > +      * If we decide to use the dirty buffer selected by 
> > StrategyGetBuffer(),
> > +      * then ensure that we count it as such in pg_stat_buffers view.
> > +      */
> > +     *from_ring = true;
> > +
>
> Absolutely minor nitpick: Somehow it feelsoff to talk about the view here.

Fixed.

>
>
> > +PgBackendStatus *
> > +pgstat_fetch_backend_statuses(void)
> > +{
> > +     return BackendStatusArray;
> > +}
>
> Hm, not sure this adds much?

Is there a better way to access the whole BackendStatusArray from within
pgstatfuncs.c?

>
>
> > +                     /*
> > +                      * Subtract 1 from backend_type to avoid having rows 
> > for B_INVALID
> > +                      * BackendType
> > +                      */
> > +                     int                     rownum = 
> > (beentry->st_backendType - 1) * IOPATH_NUM_TYPES + io_path;
>
>
> Perhaps worth wrapping this in a macro or inline function? It's repeated and 
> nontrivial.
>

Done.

>
> > +     /* Add stats from all exited backends */
> > +     backend_io_path_ops = pgstat_fetch_exited_backend_buffers();
>
> It's probably *not* worth it, but I do wonder if we should do the addition on 
> the SQL
> level, and actually have two functions, one returning data for exited
> backends, and one for currently connected ones.
>

It would be easy enough to implement. I would defer to others on whether
or not this would be useful. My use case for pg_stat_buffers() is to see
what backends' IO during a benchmark or test workload. For that, I reset
the stats before and then query pg_stat_buffers after running the
benchmark. I don't know if I would use exited and live stats
individually. In a real workload, I could see using
pg_stat_buffers live and exited to see if the workload causing lots of
backends to do their own writes is ongoing. Though a given workload may
be composed of lots of different queries, with backends exiting
throughout.

>
> > +static inline void
> > +pgstat_inc_ioop(IOOp io_op, IOPath io_path)
> > +{
> > +     IOOps      *io_ops;
> > +     PgBackendStatus *beentry = MyBEEntry;
> > +
> > +     Assert(beentry);
> > +
> > +     io_ops = &beentry->io_path_stats[io_path];
> > +     switch (io_op)
> > +     {
> > +             case IOOP_ALLOC:
> > +                     pg_atomic_write_u64(&io_ops->allocs,
> > +                                                             
> > pg_atomic_read_u64(&io_ops->allocs) + 1);
> > +                     break;
> > +             case IOOP_EXTEND:
> > +                     pg_atomic_write_u64(&io_ops->extends,
> > +                                                             
> > pg_atomic_read_u64(&io_ops->extends) + 1);
> > +                     break;
> > +             case IOOP_FSYNC:
> > +                     pg_atomic_write_u64(&io_ops->fsyncs,
> > +                                                             
> > pg_atomic_read_u64(&io_ops->fsyncs) + 1);
> > +                     break;
> > +             case IOOP_WRITE:
> > +                     pg_atomic_write_u64(&io_ops->writes,
> > +                                                             
> > pg_atomic_read_u64(&io_ops->writes) + 1);
> > +                     break;
> > +     }
> > +}
>
> IIRC Thomas Munro had a patch adding a nonatomic_add or such
> somewhere. Perhaps in the recovery readahead thread? Might be worth using
> instead?
>

I've added Thomas' function in a separate commit. I looked for a better
place to add it (I was thinking somewhere in src/backend/utils/misc) but
couldn't find anywhere that made sense.

I also added a call to pgstat_inc_ioop() in ProcessSyncRequests() to capture
when the checkpointer does fsyncs.

I also added pgstat_inc_ioop() calls to callers of smgrwrite() flushing local
buffers. I don't know if that is desirable or not in this patch. They could be
removed if wrappers for smgrwrite() go in and pgstat_inc_ioop() can be called
from within those wrappers.

- Melanie
From 977cd26ee8b489772b99de46330c9492a1839b6d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 11 Oct 2021 13:43:50 -0400
Subject: [PATCH 2/2] updates

---
 src/backend/postmaster/checkpointer.c       |   2 +-
 src/backend/postmaster/pgstat.c             | 104 +++++++++++---------
 src/backend/storage/buffer/bufmgr.c         |  25 +++--
 src/backend/storage/buffer/freelist.c       |   6 +-
 src/backend/storage/buffer/localbuf.c       |   3 +
 src/backend/storage/sync/sync.c             |   1 +
 src/backend/utils/activity/backend_status.c |   5 +-
 src/backend/utils/adt/pgstatfuncs.c         |  61 ++++++------
 src/include/pgstat.h                        |   3 +-
 src/include/utils/backend_status.h          |  12 +--
 10 files changed, 119 insertions(+), 103 deletions(-)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 8f2ef63ee5..dec325e40e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -1083,12 +1083,12 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
 		(CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests &&
 		 !CompactCheckpointerRequestQueue()))
 	{
+		LWLockRelease(CheckpointerCommLock);
 		/*
 		 * Count the subset of writes where backends have to do their own
 		 * fsync
 		 */
 		pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED);
-		LWLockRelease(CheckpointerCommLock);
 		return false;
 	}
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index fbec722a1f..e0762444af 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1435,6 +1435,28 @@ pgstat_reset_counters(void)
 	pgstat_send(&msg, sizeof(msg));
 }
 
+/*
+ * Helper function to collect and send live backends' current IO operations
+ * stats counters when a stats reset is initiated so that they may be deducted
+ * from future totals.
+ */
+static void
+pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg)
+{
+		int			backend_type;
+		PgStatIOPathOps ops[BACKEND_NUM_TYPES];
+
+		memset(ops, 0, sizeof(ops));
+		pgstat_report_live_backend_io_path_ops(ops);
+
+		for (backend_type = 1; backend_type < BACKEND_NUM_TYPES; backend_type++)
+		{
+			msg->m_backend_resets.backend_type = backend_type;
+			memcpy(&msg->m_backend_resets.iop, &ops[backend_type], sizeof(msg->m_backend_resets.iop));
+			pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter));
+		}
+}
+
 /* ----------
  * pgstat_reset_shared_counters() -
  *
@@ -1452,12 +1474,17 @@ pgstat_reset_shared_counters(const char *target)
 	if (pgStatSock == PGINVALID_SOCKET)
 		return;
 
-	if (strcmp(target, "archiver") == 0)
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
+	if (strcmp(target, "buffers") == 0)
+	{
+		msg.m_resettarget = RESET_BUFFERS;
+		pgstat_send_buffers_reset(&msg);
+		return;
+	}
+	else if (strcmp(target, "archiver") == 0)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
-	else if (strcmp(target, "buffers") == 0)
-		msg.m_resettarget = RESET_BUFFERS;
 	else if (strcmp(target, "wal") == 0)
 		msg.m_resettarget = RESET_WAL;
 	else
@@ -1466,25 +1493,8 @@ pgstat_reset_shared_counters(const char *target)
 				 errmsg("unrecognized reset target: \"%s\"", target),
 				 errhint("Target must be \"archiver\", \"bgwriter\", or \"wal\".")));
 
-	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
-
-	if (msg.m_resettarget == RESET_BUFFERS)
-	{
-		int			backend_type;
-		PgStatIOPathOps ops[BACKEND_NUM_TYPES];
 
-		memset(ops, 0, sizeof(ops));
-		pgstat_report_live_backend_io_path_ops(ops);
-
-		for (backend_type = 1; backend_type < BACKEND_NUM_TYPES; backend_type++)
-		{
-			msg.m_backend_resets.backend_type = backend_type;
-			memcpy(&msg.m_backend_resets.iop, &ops[backend_type], sizeof(msg.m_backend_resets.iop));
-			pgstat_send(&msg, sizeof(msg));
-		}
-	}
-	else
-		pgstat_send(&msg, sizeof(msg));
+	pgstat_send(&msg, sizeof(msg));
 
 }
 
@@ -3137,30 +3147,6 @@ pgstat_send(void *msg, int len)
 #endif
 }
 
-/*
- * Add live IO Op stats for all IO Paths (e.g. shared, local) to those in the
- * equivalent stats structure for exited backends. Note that this adds and
- * doesn't set, so the destination stats structure should be zeroed out by the
- * caller initially. This would commonly be used to transfer all IO Op stats
- * for all IO Paths for a particular backend type to the pgstats structure.
- */
-void
-pgstat_add_io_path_ops(PgStatIOOps *dest, IOOps *src, int io_path_num_types)
-{
-	int			io_path;
-
-	for (io_path = 0; io_path < io_path_num_types; io_path++)
-	{
-		dest->allocs += pg_atomic_read_u64(&src->allocs);
-		dest->extends += pg_atomic_read_u64(&src->extends);
-		dest->fsyncs += pg_atomic_read_u64(&src->fsyncs);
-		dest->writes += pg_atomic_read_u64(&src->writes);
-		dest++;
-		src++;
-	}
-
-}
-
 /* ----------
  * pgstat_send_archiver() -
  *
@@ -3234,9 +3220,8 @@ pgstat_send_buffers(void)
 	memset(&msg, 0, sizeof(msg));
 	msg.backend_type = beentry->st_backendType;
 
-	pgstat_add_io_path_ops(msg.iop.io_path_ops,
-						   (IOOps *) &beentry->io_path_stats,
-						   IOPATH_NUM_TYPES);
+	pgstat_sum_io_path_ops(msg.iop.io_path_ops,
+						   (IOOps *) &beentry->io_path_stats);
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS);
 	pgstat_send(&msg, sizeof(msg));
@@ -3407,6 +3392,29 @@ pgstat_send_slru(void)
 	}
 }
 
+/*
+ * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared,
+ * local) to those in the equivalent stats structure for exited backends. Note
+ * that this adds and doesn't set, so the destination stats structure should be
+ * zeroed out by the caller initially. This would commonly be used to transfer
+ * all IO Op stats for all IO Paths for a particular backend type to the
+ * pgstats structure.
+ */
+void
+pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src)
+{
+	int			io_path;
+	for (io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
+	{
+		dest->allocs += pg_atomic_read_u64(&src->allocs);
+		dest->extends += pg_atomic_read_u64(&src->extends);
+		dest->fsyncs += pg_atomic_read_u64(&src->fsyncs);
+		dest->writes += pg_atomic_read_u64(&src->writes);
+		dest++;
+		src++;
+	}
+
+}
 
 /* ----------
  * PgstatCollectorMain() -
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index b911dd9ce5..537c8dcadc 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -480,7 +480,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 							   BlockNumber blockNum,
 							   BufferAccessStrategy strategy,
 							   bool *foundPtr);
-static void FlushBuffer(BufferDesc *buf, SMgrRelation reln);
+static void FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOPath iopath);
 static void FindAndDropRelFileNodeBuffers(RelFileNode rnode,
 										  ForkNumber forkNum,
 										  BlockNumber nForkBlock,
@@ -1262,7 +1262,6 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 				 */
 
 				iopath = from_ring ? IOPATH_STRATEGY : IOPATH_SHARED;
-				pgstat_inc_ioop(IOOP_WRITE, iopath);
 
 				/* OK, do the I/O */
 				TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum, blockNum,
@@ -1270,7 +1269,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 														  smgr->smgr_rnode.node.dbNode,
 														  smgr->smgr_rnode.node.relNode);
 
-				FlushBuffer(buf, NULL);
+				FlushBuffer(buf, NULL, iopath);
 				LWLockRelease(BufferDescriptorGetContentLock(buf));
 
 				ScheduleBufferTagForWriteback(&BackendWritebackContext,
@@ -2566,11 +2565,10 @@ SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
 	 * buffer is clean by the time we've locked it.)
 	 */
 
-	pgstat_inc_ioop(IOOP_WRITE, IOPATH_SHARED);
 	PinBuffer_Locked(bufHdr);
 	LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOPATH_SHARED);
 
 	LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 
@@ -2818,9 +2816,12 @@ BufferGetTag(Buffer buffer, RelFileNode *rnode, ForkNumber *forknum,
  *
  * If the caller has an smgr reference for the buffer's relation, pass it
  * as the second parameter.  If not, pass NULL.
+ *
+ * IOPath will always be IOPATH_SHARED except when a buffer access strategy is
+ * used and the buffer being flushed is a buffer from the strategy ring.
  */
 static void
-FlushBuffer(BufferDesc *buf, SMgrRelation reln)
+FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOPath iopath)
 {
 	XLogRecPtr	recptr;
 	ErrorContextCallback errcallback;
@@ -2912,6 +2913,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 			  bufToWrite,
 			  false);
 
+	pgstat_inc_ioop(IOOP_WRITE, iopath);
+
 	if (track_io_timing)
 	{
 		INSTR_TIME_SET_CURRENT(io_time);
@@ -3559,6 +3562,8 @@ FlushRelationBuffers(Relation rel)
 						  localpage,
 						  false);
 
+				pgstat_inc_ioop(IOOP_WRITE, IOPATH_LOCAL);
+
 				buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 				pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
 
@@ -3594,7 +3599,7 @@ FlushRelationBuffers(Relation rel)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, RelationGetSmgr(rel));
+			FlushBuffer(bufHdr, RelationGetSmgr(rel), IOPATH_SHARED);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr, true);
 		}
@@ -3690,7 +3695,7 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, srelent->srel);
+			FlushBuffer(bufHdr, srelent->srel, IOPATH_SHARED);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr, true);
 		}
@@ -3746,7 +3751,7 @@ FlushDatabaseBuffers(Oid dbid)
 		{
 			PinBuffer_Locked(bufHdr);
 			LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);
-			FlushBuffer(bufHdr, NULL);
+			FlushBuffer(bufHdr, NULL, IOPATH_SHARED);
 			LWLockRelease(BufferDescriptorGetContentLock(bufHdr));
 			UnpinBuffer(bufHdr, true);
 		}
@@ -3773,7 +3778,7 @@ FlushOneBuffer(Buffer buffer)
 
 	Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
 
-	FlushBuffer(bufHdr, NULL);
+	FlushBuffer(bufHdr, NULL, IOPATH_SHARED);
 }
 
 /*
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index e2e1c3bf56..c7ca8d75aa 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -689,8 +689,8 @@ bool
 StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool *from_ring)
 {
 	/*
-	 * If we decide to use the dirty buffer selected by StrategyGetBuffer(),
-	 * then ensure that we count it as such in pg_stat_buffers view.
+	 * Start by assuming that we will use the dirty buffer selected by
+	 * StrategyGetBuffer().
 	 */
 	*from_ring = true;
 
@@ -712,7 +712,7 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool *from_
 	/*
 	 * Since we will not be writing out a dirty buffer from the ring, set
 	 * from_ring to false so that the caller does not count this write as a
-	 * "strategy write" and can do proper bookkeeping for pg_stat_buffers.
+	 * "strategy write" and can do proper bookkeeping.
 	 */
 	*from_ring = false;
 
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 04b3558ea3..f396a2b68d 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -20,6 +20,7 @@
 #include "executor/instrument.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
+#include "utils/backend_status.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner_private.h"
@@ -226,6 +227,8 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 				  localpage,
 				  false);
 
+		pgstat_inc_ioop(IOOP_WRITE, IOPATH_LOCAL);
+
 		/* Mark not-dirty now in case we error out below */
 		buf_state &= ~BM_DIRTY;
 		pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index 4a2ed414b0..8e5be66998 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -396,6 +396,7 @@ ProcessSyncRequests(void)
 					total_elapsed += elapsed;
 					processed++;
 
+					pgstat_inc_ioop(IOOP_FSYNC, IOPATH_SHARED);
 					if (log_checkpoints)
 						elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f ms",
 							 processed,
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index f326297517..f853ee6c1c 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -670,9 +670,8 @@ pgstat_report_live_backend_io_path_ops(PgStatIOPathOps *backend_io_path_ops)
 		if (beentry->st_procpid == 0)
 			continue;
 
-		pgstat_add_io_path_ops(backend_io_path_ops[beentry->st_backendType].io_path_ops,
-							   (IOOps *) beentry->io_path_stats,
-							   IOPATH_NUM_TYPES);
+		pgstat_sum_io_path_ops(backend_io_path_ops[beentry->st_backendType].io_path_ops,
+							   (IOOps *) beentry->io_path_stats);
 
 	}
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 557b2673c0..d6ac325d63 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1751,10 +1751,40 @@ pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
 	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_bgwriter()->stat_reset_timestamp);
 }
 
+/*
+* When adding a new column to the pg_stat_buffers view, add a new enum
+* value here above COLUMN_LENGTH.
+*/
+enum
+{
+	COLUMN_BACKEND_TYPE,
+	COLUMN_IO_PATH,
+	COLUMN_ALLOCS,
+	COLUMN_EXTENDS,
+	COLUMN_FSYNCS,
+	COLUMN_WRITES,
+	COLUMN_RESET_TIME,
+	COLUMN_LENGTH,
+};
+
+#define NROWS ((BACKEND_NUM_TYPES - 1) * IOPATH_NUM_TYPES)
+/*
+ * Helper function to get the correct row in the pg_stat_buffers view.
+ */
+static inline Datum *
+get_pg_stat_buffers_row(Datum all_values[NROWS][COLUMN_LENGTH], BackendType backend_type, IOPath io_path)
+{
+
+	/*
+	 * Subtract 1 from backend_type to avoid having rows for B_INVALID
+	 * BackendType
+	 */
+	return all_values[(backend_type - 1) * IOPATH_NUM_TYPES + io_path];
+}
+
 Datum
 pg_stat_get_buffers(PG_FUNCTION_ARGS)
 {
-#define NROWS ((BACKEND_NUM_TYPES - 1) * IOPATH_NUM_TYPES)
 	PgStat_BackendIOPathOps *backend_io_path_ops;
 	int			i;
 	int			io_path,
@@ -1765,22 +1795,6 @@ pg_stat_get_buffers(PG_FUNCTION_ARGS)
 
 	Tuplestorestate *tupstore = pg_stat_make_tuplestore(fcinfo, &tupdesc);
 
-	/*
-	 * When adding a new column to the pg_stat_buffers view, add a new enum
-	 * value here above COLUMN_LENGTH.
-	 */
-	enum
-	{
-		COLUMN_BACKEND_TYPE,
-		COLUMN_IO_PATH,
-		COLUMN_ALLOCS,
-		COLUMN_EXTENDS,
-		COLUMN_FSYNCS,
-		COLUMN_WRITES,
-		COLUMN_RESET_TIME,
-		COLUMN_LENGTH,
-	};
-
 	Datum		all_values[NROWS][COLUMN_LENGTH];
 	bool		all_nulls[NROWS][COLUMN_LENGTH];
 
@@ -1805,12 +1819,7 @@ pg_stat_get_buffers(PG_FUNCTION_ARGS)
 
 		for (io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
 		{
-			/*
-			 * Subtract 1 from backend_type to avoid having rows for B_INVALID
-			 * BackendType
-			 */
-			int			rownum = (beentry->st_backendType - 1) * IOPATH_NUM_TYPES + io_path;
-			Datum	   *values = all_values[rownum];
+			Datum *values = get_pg_stat_buffers_row(all_values, beentry->st_backendType, io_path);
 
 			/*
 			 * COLUMN_RESET_TIME, COLUMN_BACKEND_TYPE, and COLUMN_IO_PATH will
@@ -1839,11 +1848,7 @@ pg_stat_get_buffers(PG_FUNCTION_ARGS)
 
 		for (io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
 		{
-			/*
-			 * Subtract 1 from backend_type to avoid having rows for B_INVALID
-			 * BackendType
-			 */
-			Datum	   *values = all_values[(backend_type - 1) * IOPATH_NUM_TYPES + io_path];
+			Datum *values = get_pg_stat_buffers_row(all_values, backend_type, io_path);
 
 			values[COLUMN_BACKEND_TYPE] = backend_type_desc;
 			values[COLUMN_IO_PATH] = CStringGetTextDatum(GetIOPathDesc(io_path));
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8ff87a3f54..2d72933e90 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1156,13 +1156,12 @@ extern void pgstat_twophase_postcommit(TransactionId xid, uint16 info,
 extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
-extern void pgstat_add_io_path_ops(PgStatIOOps *dest,
-								   IOOps *src, int io_path_num_types);
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
 extern void pgstat_send_buffers(void);
 extern void pgstat_send_checkpointer(void);
 extern void pgstat_send_wal(bool force);
+extern void pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src);
 
 /* ----------
  * Support functions for the SQL-callable functions to
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index c0149ce0de..f0392a07dc 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -371,20 +371,16 @@ pgstat_inc_ioop(IOOp io_op, IOPath io_path)
 	switch (io_op)
 	{
 		case IOOP_ALLOC:
-			pg_atomic_write_u64(&io_ops->allocs,
-								pg_atomic_read_u64(&io_ops->allocs) + 1);
+			inc_counter(&io_ops->allocs);
 			break;
 		case IOOP_EXTEND:
-			pg_atomic_write_u64(&io_ops->extends,
-								pg_atomic_read_u64(&io_ops->extends) + 1);
+			inc_counter(&io_ops->extends);
 			break;
 		case IOOP_FSYNC:
-			pg_atomic_write_u64(&io_ops->fsyncs,
-								pg_atomic_read_u64(&io_ops->fsyncs) + 1);
+			inc_counter(&io_ops->fsyncs);
 			break;
 		case IOOP_WRITE:
-			pg_atomic_write_u64(&io_ops->writes,
-								pg_atomic_read_u64(&io_ops->writes) + 1);
+			inc_counter(&io_ops->writes);
 			break;
 	}
 }
-- 
2.27.0

From 1f7a8a274aebf62e84cf7cbfdb95097ed55e7c14 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 11 Oct 2021 16:15:06 -0400
Subject: [PATCH 1/2] Read-only atomic's backend write function

For counters in shared memory which can be read by any backend but only
written to by one backend, an atomic is still needed to protect against
torn values, however, pg_atomic_fetch_add_u64() is overkill for
incrementing the counter. inc_counter() is a helper function which can
be used to increment these values safely but without unnecessary
overhead.

Author: Thomas Munro
---
 src/include/utils/backend_status.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 419de72591..c0149ce0de 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -349,6 +349,16 @@ extern void pgstat_clear_backend_activity_snapshot(void);
 /* Activity reporting functions */
 typedef struct PgStatIOPathOps PgStatIOPathOps;
 
+/*
+ * On modern systems this is really just *counter++.  On some older systems
+ * there might be more to it, due to inability to read and write 64 bit values
+ * atomically.
+ */
+static inline void inc_counter(pg_atomic_uint64 *counter)
+{
+	pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
+}
+
 static inline void
 pgstat_inc_ioop(IOOp io_op, IOPath io_path)
 {
-- 
2.27.0

Reply via email to