Re: identifying the backend that owns a temporary schema
On Thu, Sep 29, 2022 at 10:47:06AM -0400, Tom Lane wrote: > OK. Will push shortly. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: identifying the backend that owns a temporary schema
Nathan Bossart writes: > On Wed, Sep 28, 2022 at 06:56:20PM -0400, Tom Lane wrote: >> A point that still bothers me a bit about pg_stat_get_backend_idset is >> that it could miss or duplicate some backend IDs if the user calls >> pg_stat_clear_snapshot() partway through the SRF's run, and we reload >> a different set of backend entries than we had before. I added a comment >> about that, with an argument why it's not worth working harder, but >> is the argument convincing? If not, what should we do? > Isn't this an existing problem? Granted, it'd manifest differently with > this patch, but ISTM we could already end up with too many or too few > backend IDs if there's a refresh partway through. Right. I'd been thinking the current code wouldn't generate duplicate IDs --- but it might produce duplicate query output anyway, in case a given backend's entry is later in the array than it was before. So really there's not much guarantees here in any case. >> -if (beid < 1 || beid > localNumBackends) >> -return NULL; > The reason I'd kept this in was because I was worried about overflow in the > comparator function. Upon further inspection, I don't think there's > actually any way that will produce incorrect results. And I'm not sure we > should worry about such cases too much, anyway. Ah, I see: if the user passes in a "backend ID" that is close to INT_MIN, then the comparator's subtraction could overflow. We could fix that by writing out the comparator code longhand ("if (a < b) return -1;" etc), but I don't really think it's necessary. bsearch is guaranteed to correctly report that such a key is not present, even if it takes a strange search path through the array due to inconsistent comparator results. So the test quoted above just serves to fail a bit more quickly, but we certainly shouldn't be optimizing for the case of a bad ID. > Overall, LGTM. OK. Will push shortly. regards, tom lane
Re: identifying the backend that owns a temporary schema
On Wed, Sep 28, 2022 at 06:56:20PM -0400, Tom Lane wrote: > I reviewed this and made some changes, some cosmetic some less so. Thanks for the detailed review. > A point that still bothers me a bit about pg_stat_get_backend_idset is > that it could miss or duplicate some backend IDs if the user calls > pg_stat_clear_snapshot() partway through the SRF's run, and we reload > a different set of backend entries than we had before. I added a comment > about that, with an argument why it's not worth working harder, but > is the argument convincing? If not, what should we do? Isn't this an existing problem? Granted, it'd manifest differently with this patch, but ISTM we could already end up with too many or too few backend IDs if there's a refresh partway through. I don't know if there's an easy way to avoid mismatches altogether besides perhaps ERROR-ing if there's a concurrent refresh. > - if (beid < 1 || beid > localNumBackends) > - return NULL; The reason I'd kept this in was because I was worried about overflow in the comparator function. Upon further inspection, I don't think there's actually any way that will produce incorrect results. And I'm not sure we should worry about such cases too much, anyway. Overall, LGTM. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: identifying the backend that owns a temporary schema
Nathan Bossart writes: > Thanks for the suggestion. I used it in v4 of the patch. I reviewed this and made some changes, some cosmetic some less so. Notably, I was bemused that of the four calls of pgstat_fetch_stat_local_beentry, three tested for a NULL result even though they cannot get one, while the fourth (pg_stat_get_backend_idset) *is* at hazard of a NULL result but lacked a check. I changed pg_stat_get_backend_idset so that it too cannot get a NULL, and deleted the dead code from the other callers. A point that still bothers me a bit about pg_stat_get_backend_idset is that it could miss or duplicate some backend IDs if the user calls pg_stat_clear_snapshot() partway through the SRF's run, and we reload a different set of backend entries than we had before. I added a comment about that, with an argument why it's not worth working harder, but is the argument convincing? If not, what should we do? Also, I realized that the functions we're changing here are mostly not exercised in the current regression tests :-(. So I added a small test case. I think this is probably committable if you agree with my changes. regards, tom lane diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 1d9509a2f6..342b20ebeb 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5485,20 +5485,23 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i the pg_stat_activity view, returns a set of records containing all the available information about each backend process. Sometimes it may be more convenient to obtain just a subset of this - information. In such cases, an older set of per-backend statistics + information. In such cases, another set of per-backend statistics access functions can be used; these are shown in . - These access functions use a backend ID number, which ranges from one - to the number of currently active backends. + These access functions use the session's backend ID number, which is a + small positive integer that is distinct from the backend ID of any + concurrent session, although a session's ID can be recycled as soon as + it exits. The backend ID is used, among other things, to identify the + session's temporary schema if it has one. The function pg_stat_get_backend_idset provides a - convenient way to generate one row for each active backend for + convenient way to list all the active backends' ID numbers for invoking these functions. For example, to show the PIDs and current queries of all backends: -SELECT pg_stat_get_backend_pid(s.backendid) AS pid, - pg_stat_get_backend_activity(s.backendid) AS query -FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s; +SELECT pg_stat_get_backend_pid(backendid) AS pid, + pg_stat_get_backend_activity(backendid) AS query +FROM pg_stat_get_backend_idset() AS backendid; @@ -5526,8 +5529,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, setof integer -Returns the set of currently active backend ID numbers (from 1 to the -number of active backends). +Returns the set of currently active backend ID numbers. diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index c7ed1e6d7a..22c79e156b 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -846,6 +846,12 @@ pgstat_read_current_status(void) /* Only valid entries get included into the local array */ if (localentry->backendStatus.st_procpid > 0) { + /* + * The array index is exactly the BackendId of the source backend. + * Note that this means the localBackendStatusTable is in order by + * backend_id. pgstat_fetch_stat_beentry() depends on that. + */ + localentry->backend_id = i; BackendIdGetTransactionIds(i, >backend_xid, >backend_xmin); @@ -1045,26 +1051,57 @@ pgstat_get_my_query_id(void) return MyBEEntry->st_query_id; } +/* -- + * cmp_lbestatus + * + * Comparison function for bsearch() on an array of LocalPgBackendStatus. + * The backend_id field is used to compare the arguments. + * -- + */ +static int +cmp_lbestatus(const void *a, const void *b) +{ + const LocalPgBackendStatus *lbestatus1 = (const LocalPgBackendStatus *) a; + const LocalPgBackendStatus *lbestatus2 = (const LocalPgBackendStatus *) b; + + return lbestatus1->backend_id - lbestatus2->backend_id; +} /* -- * pgstat_fetch_stat_beentry() - * * Support function for the SQL-callable pgstat* functions. Returns - * our local copy of the current-activity entry for one backend. + * our local copy of the current-activity entry for one backend, + * or NULL if the given beid doesn't identify any known session. + * + * The beid argument is the BackendId of the desired session + * (note that
Re: identifying the backend that owns a temporary schema
On Mon, Sep 26, 2022 at 03:50:09PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote: >>> One thing I don't like about it documentation-wise is that it leaves >>> the concept of backend ID pretty much completely undefined. > >> How specific do you think this definition ought to be? > > Fairly specific, I think, so that people can reason about how it behaves. > Notably, it seems absolutely critical to be clear that the IDs recycle > over short time frames. Maybe like > > These access functions use the session's backend ID number, which is > a small integer that is distinct from the backend ID of any concurrent > session, although an ID can be recycled as soon as the session exits. > The backend ID is used, among other things, to identify the session's > temporary schema if it has one. > > I'd prefer to use the terminology "session" than "backend" in the > definition. I suppose we can't get away with actually calling it > a "session ID" given that "backend ID" is used in so many places; > but I think people have a clearer handle on what a session is. Thanks for the suggestion. I used it in v4 of the patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From d8d32b20e79654732b3d5c99d39a2c463271207c Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 12 Aug 2022 23:07:37 -0700 Subject: [PATCH v4 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC backend IDs. Presently, these functions use the index of the backend's entry in localBackendStatusTable as the backend ID. While this might bear some resemblance to the backend ID of the backend's PGPROC struct, it can quickly diverge as sessions connect and disconnect. This change modifies the pg_stat_get_backend_*() suite of functions to use the PGPROC backend IDs instead. While we could instead introduce a new function for retrieving PGPROC backend IDs, presenting two sets of backend IDs to users seems likely to cause even more confusion than what already exists. This is primarily useful for discovering the session that owns a resource named with its PGPROC backend ID. For example, temporary schema names include the PGPROC backend ID, and it might be necessary to identify the session that owns a temporary table that is putting the cluster in danger of transaction ID wraparound. --- doc/src/sgml/monitoring.sgml| 12 --- src/backend/utils/activity/backend_status.c | 40 +++-- src/backend/utils/adt/pgstatfuncs.c | 11 +++--- src/include/utils/backend_status.h | 8 + 4 files changed, 59 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 1d9509a2f6..4ca17e9f6f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5488,10 +5488,13 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i information. In such cases, an older set of per-backend statistics access functions can be used; these are shown in . - These access functions use a backend ID number, which ranges from one - to the number of currently active backends. + These access functions use the session's backend ID number, which is a small + integer that is distinct from the backend ID of any concurrent session, + although an ID can be recycled as soon as the session exits. The backend ID + is used, among other things, to identify the session's temporary schema if + it has one. The function pg_stat_get_backend_idset provides a - convenient way to generate one row for each active backend for + convenient way to list all the active backends' ID numbers for invoking these functions. For example, to show the PIDs and current queries of all backends: @@ -5526,8 +5529,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, setof integer -Returns the set of currently active backend ID numbers (from 1 to the -number of active backends). +Returns the set of currently active backend ID numbers. diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index c7ed1e6d7a..159d022070 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -849,6 +849,7 @@ pgstat_read_current_status(void) BackendIdGetTransactionIds(i, >backend_xid, >backend_xmin); + localentry->backend_id = i; localentry++; localappname += NAMEDATALEN; @@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void) return MyBEEntry->st_query_id; } +/* -- + * cmp_lbestatus + * + * Comparison function for bsearch() on an array of LocalPgBackendStatus. The + * backend_id field is used to compare the arguments. + * + * -- + */ +static int +cmp_lbestatus(const void *a, const void *b) +{ + LocalPgBackendStatus
Re: identifying the backend that owns a temporary schema
Nathan Bossart writes: > On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote: >> One thing I don't like about it documentation-wise is that it leaves >> the concept of backend ID pretty much completely undefined. > How specific do you think this definition ought to be? Fairly specific, I think, so that people can reason about how it behaves. Notably, it seems absolutely critical to be clear that the IDs recycle over short time frames. Maybe like These access functions use the session's backend ID number, which is a small integer that is distinct from the backend ID of any concurrent session, although an ID can be recycled as soon as the session exits. The backend ID is used, among other things, to identify the session's temporary schema if it has one. I'd prefer to use the terminology "session" than "backend" in the definition. I suppose we can't get away with actually calling it a "session ID" given that "backend ID" is used in so many places; but I think people have a clearer handle on what a session is. regards, tom lane
Re: identifying the backend that owns a temporary schema
On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote: > One thing I don't like about it documentation-wise is that it leaves > the concept of backend ID pretty much completely undefined. How specific do you think this definition ought to be? All I've come up with so far is "internal identifier for the backend that is independent from its PID," which is what I used in the attached patch. Do we want to mention its uses in more detail (e.g., temp schema name), or should we keep it vague? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 4fd862d40ba3c703e00973d9743a0d4897fe4197 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 12 Aug 2022 23:07:37 -0700 Subject: [PATCH v3 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC backend IDs. Presently, these functions use the index of the backend's entry in localBackendStatusTable as the backend ID. While this might bear some resemblance to the backend ID of the backend's PGPROC struct, it can quickly diverge as sessions connect and disconnect. This change modifies the pg_stat_get_backend_*() suite of functions to use the PGPROC backend IDs instead. While we could instead introduce a new function for retrieving PGPROC backend IDs, presenting two sets of backend IDs to users seems likely to cause even more confusion than what already exists. This is primarily useful for discovering the session that owns a resource named with its PGPROC backend ID. For example, temporary schema names include the PGPROC backend ID, and it might be necessary to identify the session that owns a temporary table that is putting the cluster in danger of transaction ID wraparound. Author: Nathan Bossart --- doc/src/sgml/monitoring.sgml| 10 +++--- src/backend/utils/activity/backend_status.c | 40 +++-- src/backend/utils/adt/pgstatfuncs.c | 11 +++--- src/include/utils/backend_status.h | 8 + 4 files changed, 57 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 1d9509a2f6..47f2883576 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5488,10 +5488,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i information. In such cases, an older set of per-backend statistics access functions can be used; these are shown in . - These access functions use a backend ID number, which ranges from one - to the number of currently active backends. + These access functions use the process's backend ID number, which is an + internal identifier for the backend that is independent from its + PID. The function pg_stat_get_backend_idset provides a - convenient way to generate one row for each active backend for + convenient way to list all the active backends' ID numbers for invoking these functions. For example, to show the PIDs and current queries of all backends: @@ -5526,8 +5527,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, setof integer -Returns the set of currently active backend ID numbers (from 1 to the -number of active backends). +Returns the set of currently active backend ID numbers. diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index c7ed1e6d7a..159d022070 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -849,6 +849,7 @@ pgstat_read_current_status(void) BackendIdGetTransactionIds(i, >backend_xid, >backend_xmin); + localentry->backend_id = i; localentry++; localappname += NAMEDATALEN; @@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void) return MyBEEntry->st_query_id; } +/* -- + * cmp_lbestatus + * + * Comparison function for bsearch() on an array of LocalPgBackendStatus. The + * backend_id field is used to compare the arguments. + * + * -- + */ +static int +cmp_lbestatus(const void *a, const void *b) +{ + LocalPgBackendStatus *lbestatus1 = (LocalPgBackendStatus *) a; + LocalPgBackendStatus *lbestatus2 = (LocalPgBackendStatus *) b; + + return lbestatus1->backend_id - lbestatus2->backend_id; +} /* -- * pgstat_fetch_stat_beentry() - @@ -1052,6 +1069,10 @@ pgstat_get_my_query_id(void) * Support function for the SQL-callable pgstat* functions. Returns * our local copy of the current-activity entry for one backend. * + * Unlike pgstat_fetch_stat_local_beentry(), the beid argument refers to the + * backend ID stored in the backend's PGPROC struct instead of its index in + * the localBackendStatusTable. + * * NB: caller is responsible for a check if the user is permitted to see * this info (especially the querystring). * -- @@ -1059,12 +1080,23 @@ pgstat_get_my_query_id(void) PgBackendStatus * pgstat_fetch_stat_beentry(int beid) { + LocalPgBackendStatus
Re: identifying the backend that owns a temporary schema
Nathan Bossart writes: > On Tue, Aug 23, 2022 at 10:29:05AM +0100, Greg Stark wrote: >> Alternately should pg_stat_activity show the actual temp schema name >> instead of the id? I don't recall if it's visible outside the backend >> but if it is, could pg_stat_activity show whether the temp schema is >> actually attached or not? > I'm open to adding the backend ID or the temp schema name to > pg_stat_activity, but I wouldn't be surprised to learn that others aren't. FWIW, I'd vote against adding the temp schema per se. We can see from outside whether the corresponding temp schema exists, but we can't readily tell whether the session has decided to use it, so attributing it to the session is a bit dangerous. Maybe there is an argument for having sessions report it to pgstats when they do adopt a temp schema, but I think there'd be race conditions, rollback after error, and other issues to contend with there. The proposed patch seems like an independent first step in any case. One thing I don't like about it documentation-wise is that it leaves the concept of backend ID pretty much completely undefined. regards, tom lane
Re: identifying the backend that owns a temporary schema
On Tue, Aug 23, 2022 at 10:29:05AM +0100, Greg Stark wrote: > Having this function would be great (I admit I never responded because > I never figured out if your suggestion was right or not:). But should > it also be added to the pg_stat_activity view? Perhaps even just in > the SQL view using the function. > > Alternately should pg_stat_activity show the actual temp schema name > instead of the id? I don't recall if it's visible outside the backend > but if it is, could pg_stat_activity show whether the temp schema is > actually attached or not? I'm open to adding the backend ID or the temp schema name to pg_stat_activity, but I wouldn't be surprised to learn that others aren't. It'd be great to hear a few more opinions on the idea before I spend too much time on the patches. IMO we should still adjust the pg_stat_get_backend_*() functions even if we do end up adjusting pg_stat_activity. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: identifying the backend that owns a temporary schema
On Tue, 23 Aug 2022 at 05:29, Greg Stark wrote: > Having this function would be great (I admit I never responded because > I never figured out if your suggestion was right or not:). But should > it also be added to the pg_stat_activity view? Perhaps even just in > the SQL view using the function. > > Alternately should pg_stat_activity show the actual temp schema name > instead of the id? I don't recall if it's visible outside the backend > but if it is, could pg_stat_activity show whether the temp schema is > actually attached or not? > Would it work to cast the schema oid to type regnamespace? Then the actual data (numeric oid) would be present in the view, but it would display as text.
Re: identifying the backend that owns a temporary schema
Having this function would be great (I admit I never responded because I never figured out if your suggestion was right or not:). But should it also be added to the pg_stat_activity view? Perhaps even just in the SQL view using the function. Alternately should pg_stat_activity show the actual temp schema name instead of the id? I don't recall if it's visible outside the backend but if it is, could pg_stat_activity show whether the temp schema is actually attached or not?
Re: identifying the backend that owns a temporary schema
At Tue, 16 Aug 2022 16:04:23 -0700, Nathan Bossart wrote in > On Mon, Aug 15, 2022 at 02:47:25PM -0700, Jeremy Schneider wrote: > > I'll take a look at the patch if I can... and I'm hopeful that we're > > able to move this idea forward and get this little gap in PG filled once > > and for all! > > Thanks! > > I noticed that the "result" variable in pg_stat_get_backend_idset() is kind > of pointless after my patch is applied, so here is a v2 with it removed. It seems to be a sensible way to expose the PGPROC backend ids to SQL interface. It inserts bsearch into relatively frequently-called function but (I believe) that doesn't seem to matter much (comparing to, for example, the size of id->position translation table). I don't think pgstat_fetch_stat_beentry needs to check for out-of-range ids anymore. That case is a kind of rare and bsearch properly handles it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: identifying the backend that owns a temporary schema
On Mon, Aug 15, 2022 at 02:47:25PM -0700, Jeremy Schneider wrote: > I'll take a look at the patch if I can... and I'm hopeful that we're > able to move this idea forward and get this little gap in PG filled once > and for all! Thanks! I noticed that the "result" variable in pg_stat_get_backend_idset() is kind of pointless after my patch is applied, so here is a v2 with it removed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 0f2af75e8b1be3dc75979f4902cc4634636a3fe2 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 12 Aug 2022 23:07:37 -0700 Subject: [PATCH v2 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC backend IDs. Presently, these functions use the index of the backend's entry in localBackendStatusTable as the backend ID. While this might bear some resemblance to the backend ID of the backend's PGPROC struct, it can quickly diverge as sessions connect and disconnect. This change modifies the pg_stat_get_backend_*() suite of functions to use the PGPROC backend IDs instead. While we could instead introduce a new function for retrieving PGPROC backend IDs, presenting two sets of backend IDs to users seems likely to cause even more confusion than what already exists. This is primarily useful for discovering the session that owns a resource named with its PGPROC backend ID. For example, temporary schema names include the PGPROC backend ID, and it might be necessary to identify the session that owns a temporary table that is putting the cluster in danger of transaction ID wraparound. Author: Nathan Bossart --- doc/src/sgml/monitoring.sgml| 8 ++--- src/backend/utils/activity/backend_status.c | 40 +++-- src/backend/utils/adt/pgstatfuncs.c | 11 +++--- src/include/utils/backend_status.h | 8 + 4 files changed, 55 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 1d9509a2f6..ecd0410229 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5488,10 +5488,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i information. In such cases, an older set of per-backend statistics access functions can be used; these are shown in . - These access functions use a backend ID number, which ranges from one - to the number of currently active backends. + These access functions use the process's backend ID number. The function pg_stat_get_backend_idset provides a - convenient way to generate one row for each active backend for + convenient way to list all the active backends' ID numbers for invoking these functions. For example, to show the PIDs and current queries of all backends: @@ -5526,8 +5525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, setof integer -Returns the set of currently active backend ID numbers (from 1 to the -number of active backends). +Returns the set of currently active backend ID numbers. diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index c7ed1e6d7a..159d022070 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -849,6 +849,7 @@ pgstat_read_current_status(void) BackendIdGetTransactionIds(i, >backend_xid, >backend_xmin); + localentry->backend_id = i; localentry++; localappname += NAMEDATALEN; @@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void) return MyBEEntry->st_query_id; } +/* -- + * cmp_lbestatus + * + * Comparison function for bsearch() on an array of LocalPgBackendStatus. The + * backend_id field is used to compare the arguments. + * + * -- + */ +static int +cmp_lbestatus(const void *a, const void *b) +{ + LocalPgBackendStatus *lbestatus1 = (LocalPgBackendStatus *) a; + LocalPgBackendStatus *lbestatus2 = (LocalPgBackendStatus *) b; + + return lbestatus1->backend_id - lbestatus2->backend_id; +} /* -- * pgstat_fetch_stat_beentry() - @@ -1052,6 +1069,10 @@ pgstat_get_my_query_id(void) * Support function for the SQL-callable pgstat* functions. Returns * our local copy of the current-activity entry for one backend. * + * Unlike pgstat_fetch_stat_local_beentry(), the beid argument refers to the + * backend ID stored in the backend's PGPROC struct instead of its index in + * the localBackendStatusTable. + * * NB: caller is responsible for a check if the user is permitted to see * this info (especially the querystring). * -- @@ -1059,12 +1080,23 @@ pgstat_get_my_query_id(void) PgBackendStatus * pgstat_fetch_stat_beentry(int beid) { + LocalPgBackendStatus key; + LocalPgBackendStatus *ret; + pgstat_read_current_status(); - if (beid < 1 || beid > localNumBackends) + if (beid < 1 || beid > NumBackendStatSlots) return NULL; - return [beid -
identifying the backend that owns a temporary schema
Hi hackers, As Greg Stark noted elsewhere [0], it is presently very difficult to identify the PID of the session using a temporary schema, which is particularly unfortunate when a temporary table is putting a cluster in danger of transaction ID wraparound. I noted [1] that the following query can be used to identify the PID for a given backend ID: SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid; But on closer inspection, this is just plain wrong. The backend IDs returned by pg_stat_get_backend_idset() might initially bear some resemblance to the backend IDs stored in PGPROC, so my suggested query might work some of the time, but the pg_stat_get_backend_* backend IDs typically diverge from the PGPROC backend IDs as sessions connect and disconnect. I think it would be nice to have a reliable way to discover the PID for a given temporary schema via SQL. The other thread [2] introduces a helpful log message that indicates the PID for temporary tables that are in danger of causing transaction ID wraparound, and I intend for this proposal to be complementary to that work. At first, I thought about adding a new function for retrieving the PGPROC backend IDs, but I am worried that having two sets of backend IDs would be even more confusing than having one set that can't reliably be used for temporary schemas. Instead, I tried adjusting the pg_stat_get_backend_*() suite of functions to use the PGPROC backend IDs. This ended up being simpler than anticipated. I added a backend_id field to the LocalPgBackendStatus struct (which is populated within pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to bsearch() for the entry with the given PGPROC backend ID. This does result in a small behavior change. Currently, pg_stat_get_backend_idset() simply returns a range of numbers (1 to the number of active backends). With the attached patch, this function will still return a set of numbers, but there might be gaps between the IDs, and the maximum backend ID will usually be greater than the number of active backends. I suppose this might break some existing uses, but I'm not sure how much we should worry about that. IMO uniting the backend IDs is a net improvement. Thoughts? [0] https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com [1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com [2] https://commitfest.postgresql.org/39/3358/ -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 2bbdc6d99c2d84bd8016bb6df370e34100b5f0bc Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 12 Aug 2022 23:07:37 -0700 Subject: [PATCH v1 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC backend IDs. Presently, these functions use the index of the backend's entry in localBackendStatusTable as the backend ID. While this might bear some resemblance to the backend ID of the backend's PGPROC struct, it can quickly diverge as sessions connect and disconnect. This change modifies the pg_stat_get_backend_*() suite of functions to use the PGPROC backend IDs instead. While we could instead introduce a new function for retrieving PGPROC backend IDs, presenting two sets of backend IDs to users seems likely to cause even more confusion than what already exists. This is primarily useful for discovering the session that owns a resource named with its PGPROC backend ID. For example, temporary schema names include the PGPROC backend ID, and it might be necessary to identify the session that owns a temporary table that is putting the cluster in danger of transaction ID wraparound. Author: Nathan Bossart --- doc/src/sgml/monitoring.sgml| 8 ++--- src/backend/utils/activity/backend_status.c | 40 +++-- src/backend/utils/adt/pgstatfuncs.c | 7 ++-- src/include/utils/backend_status.h | 8 + 4 files changed, 54 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 1d9509a2f6..ecd0410229 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5488,10 +5488,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i information. In such cases, an older set of per-backend statistics access functions can be used; these are shown in . - These access functions use a backend ID number, which ranges from one - to the number of currently active backends. + These access functions use the process's backend ID number. The function pg_stat_get_backend_idset provides a - convenient way to generate one row for each active backend for + convenient way to list all the active backends' ID numbers for invoking these functions. For example, to show the PIDs and current queries of all backends: @@ -5526,8 +5525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, setof integer -