Re: identifying the backend that owns a temporary schema

2022-09-29 Thread Nathan Bossart
On Thu, Sep 29, 2022 at 10:47:06AM -0400, Tom Lane wrote:
> OK.  Will push shortly.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: identifying the backend that owns a temporary schema

2022-09-29 Thread Tom Lane
Nathan Bossart  writes:
> On Wed, Sep 28, 2022 at 06:56:20PM -0400, Tom Lane wrote:
>> A point that still bothers me a bit about pg_stat_get_backend_idset is
>> that it could miss or duplicate some backend IDs if the user calls
>> pg_stat_clear_snapshot() partway through the SRF's run, and we reload
>> a different set of backend entries than we had before.  I added a comment
>> about that, with an argument why it's not worth working harder, but
>> is the argument convincing?  If not, what should we do?

> Isn't this an existing problem?  Granted, it'd manifest differently with
> this patch, but ISTM we could already end up with too many or too few
> backend IDs if there's a refresh partway through.

Right.  I'd been thinking the current code wouldn't generate duplicate IDs
--- but it might produce duplicate query output anyway, in case a given
backend's entry is later in the array than it was before.  So really
there's not much guarantees here in any case.

>> -if (beid < 1 || beid > localNumBackends)
>> -return NULL;

> The reason I'd kept this in was because I was worried about overflow in the
> comparator function.  Upon further inspection, I don't think there's
> actually any way that will produce incorrect results.  And I'm not sure we
> should worry about such cases too much, anyway.

Ah, I see: if the user passes in a "backend ID" that is close to INT_MIN,
then the comparator's subtraction could overflow.  We could fix that by
writing out the comparator code longhand ("if (a < b) return -1;" etc),
but I don't really think it's necessary.  bsearch is guaranteed to
correctly report that such a key is not present, even if it takes a
strange search path through the array due to inconsistent comparator
results.  So the test quoted above just serves to fail a bit more quickly,
but we certainly shouldn't be optimizing for the case of a bad ID.

> Overall, LGTM.

OK.  Will push shortly.

regards, tom lane




Re: identifying the backend that owns a temporary schema

2022-09-28 Thread Nathan Bossart
On Wed, Sep 28, 2022 at 06:56:20PM -0400, Tom Lane wrote:
> I reviewed this and made some changes, some cosmetic some less so.

Thanks for the detailed review.

> A point that still bothers me a bit about pg_stat_get_backend_idset is
> that it could miss or duplicate some backend IDs if the user calls
> pg_stat_clear_snapshot() partway through the SRF's run, and we reload
> a different set of backend entries than we had before.  I added a comment
> about that, with an argument why it's not worth working harder, but
> is the argument convincing?  If not, what should we do?

Isn't this an existing problem?  Granted, it'd manifest differently with
this patch, but ISTM we could already end up with too many or too few
backend IDs if there's a refresh partway through.  I don't know if there's
an easy way to avoid mismatches altogether besides perhaps ERROR-ing if
there's a concurrent refresh.

> - if (beid < 1 || beid > localNumBackends)
> - return NULL;

The reason I'd kept this in was because I was worried about overflow in the
comparator function.  Upon further inspection, I don't think there's
actually any way that will produce incorrect results.  And I'm not sure we
should worry about such cases too much, anyway.

Overall, LGTM.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: identifying the backend that owns a temporary schema

2022-09-28 Thread Tom Lane
Nathan Bossart  writes:
> Thanks for the suggestion.  I used it in v4 of the patch.

I reviewed this and made some changes, some cosmetic some less so.

Notably, I was bemused that of the four calls of
pgstat_fetch_stat_local_beentry, three tested for a NULL result even
though they cannot get one, while the fourth (pg_stat_get_backend_idset)
*is* at hazard of a NULL result but lacked a check.  I changed
pg_stat_get_backend_idset so that it too cannot get a NULL, and deleted
the dead code from the other callers.

A point that still bothers me a bit about pg_stat_get_backend_idset is
that it could miss or duplicate some backend IDs if the user calls
pg_stat_clear_snapshot() partway through the SRF's run, and we reload
a different set of backend entries than we had before.  I added a comment
about that, with an argument why it's not worth working harder, but
is the argument convincing?  If not, what should we do?

Also, I realized that the functions we're changing here are mostly
not exercised in the current regression tests :-(.  So I added a
small test case.

I think this is probably committable if you agree with my changes.

regards, tom lane

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..342b20ebeb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5485,20 +5485,23 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
the pg_stat_activity view, returns a set of records
containing all the available information about each backend process.
Sometimes it may be more convenient to obtain just a subset of this
-   information.  In such cases, an older set of per-backend statistics
+   information.  In such cases, another set of per-backend statistics
access functions can be used; these are shown in .
-   These access functions use a backend ID number, which ranges from one
-   to the number of currently active backends.
+   These access functions use the session's backend ID number, which is a
+   small positive integer that is distinct from the backend ID of any
+   concurrent session, although a session's ID can be recycled as soon as
+   it exits.  The backend ID is used, among other things, to identify the
+   session's temporary schema if it has one.
The function pg_stat_get_backend_idset provides a
-   convenient way to generate one row for each active backend for
+   convenient way to list all the active backends' ID numbers for
invoking these functions.  For example, to show the PIDs and
current queries of all backends:
 
 
-SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
-   pg_stat_get_backend_activity(s.backendid) AS query
-FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
+SELECT pg_stat_get_backend_pid(backendid) AS pid,
+   pg_stat_get_backend_activity(backendid) AS query
+FROM pg_stat_get_backend_idset() AS backendid;
 
   
 
@@ -5526,8 +5529,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 setof integer


-Returns the set of currently active backend ID numbers (from 1 to the
-number of active backends).
+Returns the set of currently active backend ID numbers.

   
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..22c79e156b 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -846,6 +846,12 @@ pgstat_read_current_status(void)
 		/* Only valid entries get included into the local array */
 		if (localentry->backendStatus.st_procpid > 0)
 		{
+			/*
+			 * The array index is exactly the BackendId of the source backend.
+			 * Note that this means the localBackendStatusTable is in order by
+			 * backend_id.  pgstat_fetch_stat_beentry() depends on that.
+			 */
+			localentry->backend_id = i;
 			BackendIdGetTransactionIds(i,
 	   >backend_xid,
 	   >backend_xmin);
@@ -1045,26 +1051,57 @@ pgstat_get_my_query_id(void)
 	return MyBEEntry->st_query_id;
 }
 
+/* --
+ * cmp_lbestatus
+ *
+ *	Comparison function for bsearch() on an array of LocalPgBackendStatus.
+ *	The backend_id field is used to compare the arguments.
+ * --
+ */
+static int
+cmp_lbestatus(const void *a, const void *b)
+{
+	const LocalPgBackendStatus *lbestatus1 = (const LocalPgBackendStatus *) a;
+	const LocalPgBackendStatus *lbestatus2 = (const LocalPgBackendStatus *) b;
+
+	return lbestatus1->backend_id - lbestatus2->backend_id;
+}
 
 /* --
  * pgstat_fetch_stat_beentry() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	our local copy of the current-activity entry for one backend.
+ *	our local copy of the current-activity entry for one backend,
+ *	or NULL if the given beid doesn't identify any known session.
+ *
+ *	The beid argument is the BackendId of the desired session
+ *	(note that 

Re: identifying the backend that owns a temporary schema

2022-09-26 Thread Nathan Bossart
On Mon, Sep 26, 2022 at 03:50:09PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote:
>>> One thing I don't like about it documentation-wise is that it leaves
>>> the concept of backend ID pretty much completely undefined.
> 
>> How specific do you think this definition ought to be?
> 
> Fairly specific, I think, so that people can reason about how it behaves.
> Notably, it seems absolutely critical to be clear that the IDs recycle
> over short time frames.  Maybe like
> 
> These access functions use the session's backend ID number, which is
> a small integer that is distinct from the backend ID of any concurrent
> session, although an ID can be recycled as soon as the session exits.
> The backend ID is used, among other things, to identify the session's
> temporary schema if it has one.
> 
> I'd prefer to use the terminology "session" than "backend" in the
> definition.  I suppose we can't get away with actually calling it
> a "session ID" given that "backend ID" is used in so many places;
> but I think people have a clearer handle on what a session is.

Thanks for the suggestion.  I used it in v4 of the patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d8d32b20e79654732b3d5c99d39a2c463271207c Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 12 Aug 2022 23:07:37 -0700
Subject: [PATCH v4 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC
 backend IDs.

Presently, these functions use the index of the backend's entry in
localBackendStatusTable as the backend ID.  While this might bear
some resemblance to the backend ID of the backend's PGPROC struct,
it can quickly diverge as sessions connect and disconnect.  This
change modifies the pg_stat_get_backend_*() suite of functions to
use the PGPROC backend IDs instead.  While we could instead
introduce a new function for retrieving PGPROC backend IDs,
presenting two sets of backend IDs to users seems likely to cause
even more confusion than what already exists.

This is primarily useful for discovering the session that owns a
resource named with its PGPROC backend ID.  For example, temporary
schema names include the PGPROC backend ID, and it might be
necessary to identify the session that owns a temporary table that
is putting the cluster in danger of transaction ID wraparound.
---
 doc/src/sgml/monitoring.sgml| 12 ---
 src/backend/utils/activity/backend_status.c | 40 +++--
 src/backend/utils/adt/pgstatfuncs.c | 11 +++---
 src/include/utils/backend_status.h  |  8 +
 4 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..4ca17e9f6f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5488,10 +5488,13 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
information.  In such cases, an older set of per-backend statistics
access functions can be used; these are shown in .
-   These access functions use a backend ID number, which ranges from one
-   to the number of currently active backends.
+   These access functions use the session's backend ID number, which is a small
+   integer that is distinct from the backend ID of any concurrent session,
+   although an ID can be recycled as soon as the session exits.  The backend ID
+   is used, among other things, to identify the session's temporary schema if
+   it has one.
The function pg_stat_get_backend_idset provides a
-   convenient way to generate one row for each active backend for
+   convenient way to list all the active backends' ID numbers for
invoking these functions.  For example, to show the PIDs and
current queries of all backends:
 
@@ -5526,8 +5529,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 setof integer


-Returns the set of currently active backend ID numbers (from 1 to the
-number of active backends).
+Returns the set of currently active backend ID numbers.

   
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..159d022070 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -849,6 +849,7 @@ pgstat_read_current_status(void)
 			BackendIdGetTransactionIds(i,
 	   >backend_xid,
 	   >backend_xmin);
+			localentry->backend_id = i;
 
 			localentry++;
 			localappname += NAMEDATALEN;
@@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void)
 	return MyBEEntry->st_query_id;
 }
 
+/* --
+ * cmp_lbestatus
+ *
+ *	Comparison function for bsearch() on an array of LocalPgBackendStatus.  The
+ *	backend_id field is used to compare the arguments.
+ *
+ * --
+ */
+static int
+cmp_lbestatus(const void *a, const void *b)
+{
+	LocalPgBackendStatus 

Re: identifying the backend that owns a temporary schema

2022-09-26 Thread Tom Lane
Nathan Bossart  writes:
> On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote:
>> One thing I don't like about it documentation-wise is that it leaves
>> the concept of backend ID pretty much completely undefined.

> How specific do you think this definition ought to be?

Fairly specific, I think, so that people can reason about how it behaves.
Notably, it seems absolutely critical to be clear that the IDs recycle
over short time frames.  Maybe like

These access functions use the session's backend ID number, which is
a small integer that is distinct from the backend ID of any concurrent
session, although an ID can be recycled as soon as the session exits.
The backend ID is used, among other things, to identify the session's
temporary schema if it has one.

I'd prefer to use the terminology "session" than "backend" in the
definition.  I suppose we can't get away with actually calling it
a "session ID" given that "backend ID" is used in so many places;
but I think people have a clearer handle on what a session is.

regards, tom lane




Re: identifying the backend that owns a temporary schema

2022-09-26 Thread Nathan Bossart
On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote:
> One thing I don't like about it documentation-wise is that it leaves
> the concept of backend ID pretty much completely undefined.

How specific do you think this definition ought to be?  All I've come up
with so far is "internal identifier for the backend that is independent
from its PID," which is what I used in the attached patch.  Do we want to
mention its uses in more detail (e.g., temp schema name), or should we keep
it vague?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 4fd862d40ba3c703e00973d9743a0d4897fe4197 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 12 Aug 2022 23:07:37 -0700
Subject: [PATCH v3 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC
 backend IDs.

Presently, these functions use the index of the backend's entry in
localBackendStatusTable as the backend ID.  While this might bear
some resemblance to the backend ID of the backend's PGPROC struct,
it can quickly diverge as sessions connect and disconnect.  This
change modifies the pg_stat_get_backend_*() suite of functions to
use the PGPROC backend IDs instead.  While we could instead
introduce a new function for retrieving PGPROC backend IDs,
presenting two sets of backend IDs to users seems likely to cause
even more confusion than what already exists.

This is primarily useful for discovering the session that owns a
resource named with its PGPROC backend ID.  For example, temporary
schema names include the PGPROC backend ID, and it might be
necessary to identify the session that owns a temporary table that
is putting the cluster in danger of transaction ID wraparound.

Author: Nathan Bossart
---
 doc/src/sgml/monitoring.sgml| 10 +++---
 src/backend/utils/activity/backend_status.c | 40 +++--
 src/backend/utils/adt/pgstatfuncs.c | 11 +++---
 src/include/utils/backend_status.h  |  8 +
 4 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..47f2883576 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5488,10 +5488,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
information.  In such cases, an older set of per-backend statistics
access functions can be used; these are shown in .
-   These access functions use a backend ID number, which ranges from one
-   to the number of currently active backends.
+   These access functions use the process's backend ID number, which is an
+   internal identifier for the backend that is independent from its
+   PID.
The function pg_stat_get_backend_idset provides a
-   convenient way to generate one row for each active backend for
+   convenient way to list all the active backends' ID numbers for
invoking these functions.  For example, to show the PIDs and
current queries of all backends:
 
@@ -5526,8 +5527,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 setof integer


-Returns the set of currently active backend ID numbers (from 1 to the
-number of active backends).
+Returns the set of currently active backend ID numbers.

   
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..159d022070 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -849,6 +849,7 @@ pgstat_read_current_status(void)
 			BackendIdGetTransactionIds(i,
 	   >backend_xid,
 	   >backend_xmin);
+			localentry->backend_id = i;
 
 			localentry++;
 			localappname += NAMEDATALEN;
@@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void)
 	return MyBEEntry->st_query_id;
 }
 
+/* --
+ * cmp_lbestatus
+ *
+ *	Comparison function for bsearch() on an array of LocalPgBackendStatus.  The
+ *	backend_id field is used to compare the arguments.
+ *
+ * --
+ */
+static int
+cmp_lbestatus(const void *a, const void *b)
+{
+	LocalPgBackendStatus *lbestatus1 = (LocalPgBackendStatus *) a;
+	LocalPgBackendStatus *lbestatus2 = (LocalPgBackendStatus *) b;
+
+	return lbestatus1->backend_id - lbestatus2->backend_id;
+}
 
 /* --
  * pgstat_fetch_stat_beentry() -
@@ -1052,6 +1069,10 @@ pgstat_get_my_query_id(void)
  *	Support function for the SQL-callable pgstat* functions. Returns
  *	our local copy of the current-activity entry for one backend.
  *
+ *	Unlike pgstat_fetch_stat_local_beentry(), the beid argument refers to the
+ *	backend ID stored in the backend's PGPROC struct instead of its index in
+ *	the localBackendStatusTable.
+ *
  *	NB: caller is responsible for a check if the user is permitted to see
  *	this info (especially the querystring).
  * --
@@ -1059,12 +1080,23 @@ pgstat_get_my_query_id(void)
 PgBackendStatus *
 pgstat_fetch_stat_beentry(int beid)
 {
+	LocalPgBackendStatus 

Re: identifying the backend that owns a temporary schema

2022-09-24 Thread Tom Lane
Nathan Bossart  writes:
> On Tue, Aug 23, 2022 at 10:29:05AM +0100, Greg Stark wrote:
>> Alternately should pg_stat_activity show the actual temp schema name
>> instead of the id? I don't recall if it's visible outside the backend
>> but if it is, could pg_stat_activity show whether the temp schema is
>> actually attached or not?

> I'm open to adding the backend ID or the temp schema name to
> pg_stat_activity, but I wouldn't be surprised to learn that others aren't.

FWIW, I'd vote against adding the temp schema per se.  We can see from
outside whether the corresponding temp schema exists, but we can't readily
tell whether the session has decided to use it, so attributing it to the
session is a bit dangerous.  Maybe there is an argument for having
sessions report it to pgstats when they do adopt a temp schema, but I
think there'd be race conditions, rollback after error, and other issues
to contend with there.

The proposed patch seems like an independent first step in any case.

One thing I don't like about it documentation-wise is that it leaves
the concept of backend ID pretty much completely undefined.

regards, tom lane




Re: identifying the backend that owns a temporary schema

2022-08-29 Thread Nathan Bossart
On Tue, Aug 23, 2022 at 10:29:05AM +0100, Greg Stark wrote:
> Having this function would be great (I admit I never responded because
> I never figured out if your suggestion was right or not:). But should
> it also be added to the pg_stat_activity view? Perhaps even just in
> the SQL view using the function.
> 
> Alternately should pg_stat_activity show the actual temp schema name
> instead of the id? I don't recall if it's visible outside the backend
> but if it is, could pg_stat_activity show whether the temp schema is
> actually attached or not?

I'm open to adding the backend ID or the temp schema name to
pg_stat_activity, but I wouldn't be surprised to learn that others aren't.
It'd be great to hear a few more opinions on the idea before I spend too
much time on the patches.  IMO we should still adjust the
pg_stat_get_backend_*() functions even if we do end up adjusting
pg_stat_activity.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: identifying the backend that owns a temporary schema

2022-08-23 Thread Isaac Morland
On Tue, 23 Aug 2022 at 05:29, Greg Stark  wrote:

> Having this function would be great (I admit I never responded because
> I never figured out if your suggestion was right or not:). But should
> it also be added to the pg_stat_activity view? Perhaps even just in
> the SQL view using the function.
>
> Alternately should pg_stat_activity show the actual temp schema name
> instead of the id? I don't recall if it's visible outside the backend
> but if it is, could pg_stat_activity show whether the temp schema is
> actually attached or not?
>

Would it work to cast the schema oid to type regnamespace? Then the actual
data (numeric oid) would be present in the view, but it would display as
text.


Re: identifying the backend that owns a temporary schema

2022-08-23 Thread Greg Stark
Having this function would be great (I admit I never responded because
I never figured out if your suggestion was right or not:). But should
it also be added to the pg_stat_activity view? Perhaps even just in
the SQL view using the function.

Alternately should pg_stat_activity show the actual temp schema name
instead of the id? I don't recall if it's visible outside the backend
but if it is, could pg_stat_activity show whether the temp schema is
actually attached or not?




Re: identifying the backend that owns a temporary schema

2022-08-23 Thread Kyotaro Horiguchi
At Tue, 16 Aug 2022 16:04:23 -0700, Nathan Bossart  
wrote in 
> On Mon, Aug 15, 2022 at 02:47:25PM -0700, Jeremy Schneider wrote:
> > I'll take a look at the patch if I can... and I'm hopeful that we're
> > able to move this idea forward and get this little gap in PG filled once
> > and for all!
> 
> Thanks!
> 
> I noticed that the "result" variable in pg_stat_get_backend_idset() is kind
> of pointless after my patch is applied, so here is a v2 with it removed.

It seems to be a sensible way to expose the PGPROC backend ids to SQL
interface.  It inserts bsearch into relatively frequently-called
function but (I believe) that doesn't seem to matter much (comparing
to, for example, the size of id->position translation table).

I don't think pgstat_fetch_stat_beentry needs to check for
out-of-range ids anymore. That case is a kind of rare and bsearch
properly handles it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: identifying the backend that owns a temporary schema

2022-08-16 Thread Nathan Bossart
On Mon, Aug 15, 2022 at 02:47:25PM -0700, Jeremy Schneider wrote:
> I'll take a look at the patch if I can... and I'm hopeful that we're
> able to move this idea forward and get this little gap in PG filled once
> and for all!

Thanks!

I noticed that the "result" variable in pg_stat_get_backend_idset() is kind
of pointless after my patch is applied, so here is a v2 with it removed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 0f2af75e8b1be3dc75979f4902cc4634636a3fe2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 12 Aug 2022 23:07:37 -0700
Subject: [PATCH v2 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC
 backend IDs.

Presently, these functions use the index of the backend's entry in
localBackendStatusTable as the backend ID.  While this might bear
some resemblance to the backend ID of the backend's PGPROC struct,
it can quickly diverge as sessions connect and disconnect.  This
change modifies the pg_stat_get_backend_*() suite of functions to
use the PGPROC backend IDs instead.  While we could instead
introduce a new function for retrieving PGPROC backend IDs,
presenting two sets of backend IDs to users seems likely to cause
even more confusion than what already exists.

This is primarily useful for discovering the session that owns a
resource named with its PGPROC backend ID.  For example, temporary
schema names include the PGPROC backend ID, and it might be
necessary to identify the session that owns a temporary table that
is putting the cluster in danger of transaction ID wraparound.

Author: Nathan Bossart
---
 doc/src/sgml/monitoring.sgml|  8 ++---
 src/backend/utils/activity/backend_status.c | 40 +++--
 src/backend/utils/adt/pgstatfuncs.c | 11 +++---
 src/include/utils/backend_status.h  |  8 +
 4 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..ecd0410229 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5488,10 +5488,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
information.  In such cases, an older set of per-backend statistics
access functions can be used; these are shown in .
-   These access functions use a backend ID number, which ranges from one
-   to the number of currently active backends.
+   These access functions use the process's backend ID number.
The function pg_stat_get_backend_idset provides a
-   convenient way to generate one row for each active backend for
+   convenient way to list all the active backends' ID numbers for
invoking these functions.  For example, to show the PIDs and
current queries of all backends:
 
@@ -5526,8 +5525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 setof integer


-Returns the set of currently active backend ID numbers (from 1 to the
-number of active backends).
+Returns the set of currently active backend ID numbers.

   
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..159d022070 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -849,6 +849,7 @@ pgstat_read_current_status(void)
 			BackendIdGetTransactionIds(i,
 	   >backend_xid,
 	   >backend_xmin);
+			localentry->backend_id = i;
 
 			localentry++;
 			localappname += NAMEDATALEN;
@@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void)
 	return MyBEEntry->st_query_id;
 }
 
+/* --
+ * cmp_lbestatus
+ *
+ *	Comparison function for bsearch() on an array of LocalPgBackendStatus.  The
+ *	backend_id field is used to compare the arguments.
+ *
+ * --
+ */
+static int
+cmp_lbestatus(const void *a, const void *b)
+{
+	LocalPgBackendStatus *lbestatus1 = (LocalPgBackendStatus *) a;
+	LocalPgBackendStatus *lbestatus2 = (LocalPgBackendStatus *) b;
+
+	return lbestatus1->backend_id - lbestatus2->backend_id;
+}
 
 /* --
  * pgstat_fetch_stat_beentry() -
@@ -1052,6 +1069,10 @@ pgstat_get_my_query_id(void)
  *	Support function for the SQL-callable pgstat* functions. Returns
  *	our local copy of the current-activity entry for one backend.
  *
+ *	Unlike pgstat_fetch_stat_local_beentry(), the beid argument refers to the
+ *	backend ID stored in the backend's PGPROC struct instead of its index in
+ *	the localBackendStatusTable.
+ *
  *	NB: caller is responsible for a check if the user is permitted to see
  *	this info (especially the querystring).
  * --
@@ -1059,12 +1080,23 @@ pgstat_get_my_query_id(void)
 PgBackendStatus *
 pgstat_fetch_stat_beentry(int beid)
 {
+	LocalPgBackendStatus key;
+	LocalPgBackendStatus *ret;
+
 	pgstat_read_current_status();
 
-	if (beid < 1 || beid > localNumBackends)
+	if (beid < 1 || beid > NumBackendStatSlots)
 		return NULL;
 
-	return [beid - 

identifying the backend that owns a temporary schema

2022-08-15 Thread Nathan Bossart
Hi hackers,

As Greg Stark noted elsewhere [0], it is presently very difficult to
identify the PID of the session using a temporary schema, which is
particularly unfortunate when a temporary table is putting a cluster in
danger of transaction ID wraparound.  I noted [1] that the following query
can be used to identify the PID for a given backend ID:

SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM 
pg_stat_get_backend_idset() bid;

But on closer inspection, this is just plain wrong.  The backend IDs
returned by pg_stat_get_backend_idset() might initially bear some
resemblance to the backend IDs stored in PGPROC, so my suggested query
might work some of the time, but the pg_stat_get_backend_* backend IDs
typically diverge from the PGPROC backend IDs as sessions connect and
disconnect.

I think it would be nice to have a reliable way to discover the PID for a
given temporary schema via SQL.  The other thread [2] introduces a helpful
log message that indicates the PID for temporary tables that are in danger
of causing transaction ID wraparound, and I intend for this proposal to be
complementary to that work.

At first, I thought about adding a new function for retrieving the PGPROC
backend IDs, but I am worried that having two sets of backend IDs would be
even more confusing than having one set that can't reliably be used for
temporary schemas.  Instead, I tried adjusting the pg_stat_get_backend_*()
suite of functions to use the PGPROC backend IDs.  This ended up being
simpler than anticipated.  I added a backend_id field to the
LocalPgBackendStatus struct (which is populated within
pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to
bsearch() for the entry with the given PGPROC backend ID.

This does result in a small behavior change.  Currently,
pg_stat_get_backend_idset() simply returns a range of numbers (1 to the
number of active backends).  With the attached patch, this function will
still return a set of numbers, but there might be gaps between the IDs, and
the maximum backend ID will usually be greater than the number of active
backends.  I suppose this might break some existing uses, but I'm not sure
how much we should worry about that.  IMO uniting the backend IDs is a net
improvement.

Thoughts?

[0] 
https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com
[1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com
[2] https://commitfest.postgresql.org/39/3358/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 2bbdc6d99c2d84bd8016bb6df370e34100b5f0bc Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 12 Aug 2022 23:07:37 -0700
Subject: [PATCH v1 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC
 backend IDs.

Presently, these functions use the index of the backend's entry in
localBackendStatusTable as the backend ID.  While this might bear
some resemblance to the backend ID of the backend's PGPROC struct,
it can quickly diverge as sessions connect and disconnect.  This
change modifies the pg_stat_get_backend_*() suite of functions to
use the PGPROC backend IDs instead.  While we could instead
introduce a new function for retrieving PGPROC backend IDs,
presenting two sets of backend IDs to users seems likely to cause
even more confusion than what already exists.

This is primarily useful for discovering the session that owns a
resource named with its PGPROC backend ID.  For example, temporary
schema names include the PGPROC backend ID, and it might be
necessary to identify the session that owns a temporary table that
is putting the cluster in danger of transaction ID wraparound.

Author: Nathan Bossart
---
 doc/src/sgml/monitoring.sgml|  8 ++---
 src/backend/utils/activity/backend_status.c | 40 +++--
 src/backend/utils/adt/pgstatfuncs.c |  7 ++--
 src/include/utils/backend_status.h  |  8 +
 4 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..ecd0410229 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5488,10 +5488,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
information.  In such cases, an older set of per-backend statistics
access functions can be used; these are shown in .
-   These access functions use a backend ID number, which ranges from one
-   to the number of currently active backends.
+   These access functions use the process's backend ID number.
The function pg_stat_get_backend_idset provides a
-   convenient way to generate one row for each active backend for
+   convenient way to list all the active backends' ID numbers for
invoking these functions.  For example, to show the PIDs and
current queries of all backends:
 
@@ -5526,8 +5525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 setof integer


-