Hi hackers,

As Greg Stark noted elsewhere [0], it is presently very difficult to
identify the PID of the session using a temporary schema, which is
particularly unfortunate when a temporary table is putting a cluster in
danger of transaction ID wraparound.  I noted [1] that the following query
can be used to identify the PID for a given backend ID:

        SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM 
pg_stat_get_backend_idset() bid;

But on closer inspection, this is just plain wrong.  The backend IDs
returned by pg_stat_get_backend_idset() might initially bear some
resemblance to the backend IDs stored in PGPROC, so my suggested query
might work some of the time, but the pg_stat_get_backend_* backend IDs
typically diverge from the PGPROC backend IDs as sessions connect and
disconnect.

I think it would be nice to have a reliable way to discover the PID for a
given temporary schema via SQL.  The other thread [2] introduces a helpful
log message that indicates the PID for temporary tables that are in danger
of causing transaction ID wraparound, and I intend for this proposal to be
complementary to that work.

At first, I thought about adding a new function for retrieving the PGPROC
backend IDs, but I am worried that having two sets of backend IDs would be
even more confusing than having one set that can't reliably be used for
temporary schemas.  Instead, I tried adjusting the pg_stat_get_backend_*()
suite of functions to use the PGPROC backend IDs.  This ended up being
simpler than anticipated.  I added a backend_id field to the
LocalPgBackendStatus struct (which is populated within
pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to
bsearch() for the entry with the given PGPROC backend ID.

This does result in a small behavior change.  Currently,
pg_stat_get_backend_idset() simply returns a range of numbers (1 to the
number of active backends).  With the attached patch, this function will
still return a set of numbers, but there might be gaps between the IDs, and
the maximum backend ID will usually be greater than the number of active
backends.  I suppose this might break some existing uses, but I'm not sure
how much we should worry about that.  IMO uniting the backend IDs is a net
improvement.

Thoughts?

[0] 
https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com
[1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com
[2] https://commitfest.postgresql.org/39/3358/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 2bbdc6d99c2d84bd8016bb6df370e34100b5f0bc Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Fri, 12 Aug 2022 23:07:37 -0700
Subject: [PATCH v1 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC
 backend IDs.

Presently, these functions use the index of the backend's entry in
localBackendStatusTable as the backend ID.  While this might bear
some resemblance to the backend ID of the backend's PGPROC struct,
it can quickly diverge as sessions connect and disconnect.  This
change modifies the pg_stat_get_backend_*() suite of functions to
use the PGPROC backend IDs instead.  While we could instead
introduce a new function for retrieving PGPROC backend IDs,
presenting two sets of backend IDs to users seems likely to cause
even more confusion than what already exists.

This is primarily useful for discovering the session that owns a
resource named with its PGPROC backend ID.  For example, temporary
schema names include the PGPROC backend ID, and it might be
necessary to identify the session that owns a temporary table that
is putting the cluster in danger of transaction ID wraparound.

Author: Nathan Bossart
---
 doc/src/sgml/monitoring.sgml                |  8 ++---
 src/backend/utils/activity/backend_status.c | 40 +++++++++++++++++++--
 src/backend/utils/adt/pgstatfuncs.c         |  7 ++--
 src/include/utils/backend_status.h          |  8 +++++
 4 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..ecd0410229 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5488,10 +5488,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    information.  In such cases, an older set of per-backend statistics
    access functions can be used; these are shown in <xref
    linkend="monitoring-stats-backend-funcs-table"/>.
-   These access functions use a backend ID number, which ranges from one
-   to the number of currently active backends.
+   These access functions use the process's backend ID number.
    The function <function>pg_stat_get_backend_idset</function> provides a
-   convenient way to generate one row for each active backend for
+   convenient way to list all the active backends' ID numbers for
    invoking these functions.  For example, to show the <acronym>PID</acronym>s and
    current queries of all backends:
 
@@ -5526,8 +5525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
         <returnvalue>setof integer</returnvalue>
        </para>
        <para>
-        Returns the set of currently active backend ID numbers (from 1 to the
-        number of active backends).
+        Returns the set of currently active backend ID numbers.
        </para></entry>
       </row>
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..159d022070 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -849,6 +849,7 @@ pgstat_read_current_status(void)
 			BackendIdGetTransactionIds(i,
 									   &localentry->backend_xid,
 									   &localentry->backend_xmin);
+			localentry->backend_id = i;
 
 			localentry++;
 			localappname += NAMEDATALEN;
@@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void)
 	return MyBEEntry->st_query_id;
 }
 
+/* ----------
+ * cmp_lbestatus
+ *
+ *	Comparison function for bsearch() on an array of LocalPgBackendStatus.  The
+ *	backend_id field is used to compare the arguments.
+ *
+ * ----------
+ */
+static int
+cmp_lbestatus(const void *a, const void *b)
+{
+	LocalPgBackendStatus *lbestatus1 = (LocalPgBackendStatus *) a;
+	LocalPgBackendStatus *lbestatus2 = (LocalPgBackendStatus *) b;
+
+	return lbestatus1->backend_id - lbestatus2->backend_id;
+}
 
 /* ----------
  * pgstat_fetch_stat_beentry() -
@@ -1052,6 +1069,10 @@ pgstat_get_my_query_id(void)
  *	Support function for the SQL-callable pgstat* functions. Returns
  *	our local copy of the current-activity entry for one backend.
  *
+ *	Unlike pgstat_fetch_stat_local_beentry(), the beid argument refers to the
+ *	backend ID stored in the backend's PGPROC struct instead of its index in
+ *	the localBackendStatusTable.
+ *
  *	NB: caller is responsible for a check if the user is permitted to see
  *	this info (especially the querystring).
  * ----------
@@ -1059,12 +1080,23 @@ pgstat_get_my_query_id(void)
 PgBackendStatus *
 pgstat_fetch_stat_beentry(int beid)
 {
+	LocalPgBackendStatus key;
+	LocalPgBackendStatus *ret;
+
 	pgstat_read_current_status();
 
-	if (beid < 1 || beid > localNumBackends)
+	if (beid < 1 || beid > NumBackendStatSlots)
 		return NULL;
 
-	return &localBackendStatusTable[beid - 1].backendStatus;
+	key.backend_id = beid;
+	ret = (LocalPgBackendStatus *) bsearch(&key, localBackendStatusTable,
+										   localNumBackends,
+										   sizeof(LocalPgBackendStatus),
+										   cmp_lbestatus);
+	if (ret)
+		return &ret->backendStatus;
+
+	return NULL;
 }
 
 
@@ -1074,6 +1106,10 @@ pgstat_fetch_stat_beentry(int beid)
  *	Like pgstat_fetch_stat_beentry() but with locally computed additions (like
  *	xid and xmin values of the backend)
  *
+ *	Unlike pgstat_fetch_stat_beentry(), the beid argument refers to the index
+ *	in the localBackendStatusTable instead of the backend ID stored in the
+ *	backend's PGPROC struct.
+ *
  *	NB: caller is responsible for a check if the user is permitted to see
  *	this info (especially the querystring).
  * ----------
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index d9e2a79382..7f06cba958 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -440,8 +440,10 @@ pg_stat_get_backend_idset(PG_FUNCTION_ARGS)
 
 	if (result <= fctx[1])
 	{
+		LocalPgBackendStatus *localStatus = pgstat_fetch_stat_local_beentry(fctx[0]);
+
 		/* do when there is more left to send */
-		SRF_RETURN_NEXT(funcctx, Int32GetDatum(result));
+		SRF_RETURN_NEXT(funcctx, Int32GetDatum(localStatus->backend_id));
 	}
 	else
 	{
@@ -1187,7 +1189,8 @@ pg_stat_get_db_numbackends(PG_FUNCTION_ARGS)
 	result = 0;
 	for (beid = 1; beid <= tot_backends; beid++)
 	{
-		PgBackendStatus *beentry = pgstat_fetch_stat_beentry(beid);
+		LocalPgBackendStatus *localStatus = pgstat_fetch_stat_local_beentry(beid);
+		PgBackendStatus *beentry = &localStatus->backendStatus;
 
 		if (beentry && beentry->st_databaseid == dbid)
 			result++;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 7403bca25e..f7c7c6e671 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -13,6 +13,7 @@
 #include "datatype/timestamp.h"
 #include "libpq/pqcomm.h"
 #include "miscadmin.h"			/* for BackendType */
+#include "storage/backendid.h"
 #include "utils/backend_progress.h"
 
 
@@ -258,6 +259,13 @@ typedef struct LocalPgBackendStatus
 	 * not.
 	 */
 	TransactionId backend_xmin;
+
+	/*
+	 * The backend ID.  For auxiliary processes, this will be set to a value
+	 * greater than MaxBackends (since auxiliary processes do not have proper
+	 * backend IDs).
+	 */
+	BackendId backend_id;
 } LocalPgBackendStatus;
 
 
-- 
2.25.1

Reply via email to