On 03/02/2026 14:46, Sami Imseih wrote:
Another thing I didn't do in this patch yet: I feel we should replace
BackendPidGetProc() with a function like "PGPROC *PidGetPGProc(pid_t)",
that would work for backends and aux processes alike. It's a common
pattern to call BackendPidGetProc() followed by AuxiliaryPidGetProc()
currently. Even for the callers that specifically want to only check
backend processes, I think it would be more natural to call
PidGetPGProc(), and then check the process type.
+1 for such a function, and it could replace 6 different places ( if I counted
correctly ) in code where this pattern is used. At minimum, shouldn't
the fix for pg_stat_get_backend_wait_event() and
pg_stat_get_backend_wait_event_type() follow the same pattern?
"
proc = BackendPidGetProc(pid);
if (proc == NULL)
proc = AuxiliaryPidGetProc(pid);
"
Yeah, that would be the most straightforward fix. But it feels silly to
call BackendPidGetProc(pid), when we already have the ProcNumber at hand.
Come to think of it, why is wait_event_info stored in PGPROC in the
first place, rather than in PgBackendStatus? All the other
pg_stat_get_backend_*() functions just read the local PgBackendStatus copy.
That point was debated when the wait events were introduced [1] [2].
AFAICS the main motivation was that aux processes didn't have
PgBackendStatus entries, and we wanted to expose wait events for aux
processes too. That has changed since then, aux processes do have
PgBackendStatus entries now, so that argument is moot.
Because wait_event_info is fetched from PGPROC, it's not part of the
"activity snapshot". So when you run "select * pg_stat_activity"
repeatedly in the same transaction, the wait_events will change, even
though the other fields are fetched once and frozen for the duration of
the transaction. Tom pointed this out back then [1], but looks like that
point was then forgotten, as we haven't documented that exception either.
There might be a performance argument too, although I haven't done any
benchmarking and it's probably not really significant: PgBackendStatus
is accessed less frequently by other backends than PGPROC, so you might
get less cache line bouncing if wait_event_info is in PgBackendStatus
instead of PGPROC.
So how about moving wait_event_info to PgBackendStatus, per attached?
[1] https://www.postgresql.org/message-id/4067.1439561494%40sss.pgh.pa.us
[2]
https://www.postgresql.org/message-id/CA%2BTgmoZ-8ZpoUM9BGtBUP1u4dUQhC-9EpEDLzyK0dG4pKMDUwQ%40mail.gmail.com
- Heikki
From 929cafe2955c375167f3a17fd22afc885a34a45b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Tue, 3 Feb 2026 21:51:40 +0200
Subject: [PATCH 1/1] Move 'wait_event_info' to PgBackendStatus
---
src/backend/storage/lmgr/proc.c | 9 -------
src/backend/utils/activity/backend_status.c | 28 +++++++++++++++++----
src/backend/utils/adt/pgstatfuncs.c | 14 ++++-------
src/backend/utils/adt/waitfuncs.c | 5 ++--
src/include/storage/proc.h | 2 --
src/include/utils/backend_status.h | 10 ++++++++
6 files changed, 40 insertions(+), 28 deletions(-)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index fdeed0f3953..cf4fefd952f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -521,9 +521,6 @@ InitProcess(void)
Assert(MyProc->lockGroupLeader == NULL);
Assert(dlist_is_empty(&MyProc->lockGroupMembers));
- /* Initialize wait event information. */
- MyProc->wait_event_info = 0;
-
/* Initialize fields for group transaction status update. */
MyProc->clogGroupMember = false;
MyProc->clogGroupMemberXid = InvalidTransactionId;
@@ -540,9 +537,6 @@ InitProcess(void)
OwnLatch(&MyProc->procLatch);
SwitchToSharedLatch();
- /* now that we have a proc, report wait events to shared memory */
- pgstat_set_wait_event_storage(&MyProc->wait_event_info);
-
/*
* We might be reusing a semaphore that belonged to a failed process. So
* be careful and reinitialize its value here. (This is not strictly
@@ -710,9 +704,6 @@ InitAuxiliaryProcess(void)
OwnLatch(&MyProc->procLatch);
SwitchToSharedLatch();
- /* now that we have a proc, report wait events to shared memory */
- pgstat_set_wait_event_storage(&MyProc->wait_event_info);
-
/* Check that group locking fields are in a proper initial state. */
Assert(MyProc->lockGroupLeader == NULL);
Assert(dlist_is_empty(&MyProc->lockGroupMembers));
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c84e6536580..7c4f2c209c8 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -322,6 +322,7 @@ pgstat_bestart_initial(void)
lbeentry.st_progress_command_target = InvalidOid;
lbeentry.st_query_id = INT64CONST(0);
lbeentry.st_plan_id = INT64CONST(0);
+ lbeentry.st_wait_event_info = 0;
/*
* we don't zero st_progress_param here to save cycles; nobody should
@@ -369,6 +370,9 @@ pgstat_bestart_initial(void)
#endif
PGSTAT_END_WRITE_ACTIVITY(vbeentry);
+
+ /* now that we have a backend entry, report wait events to shared memory */
+ pgstat_set_wait_event_storage(unvolatize(uint32 *, &vbeentry->st_wait_event_info));
}
/* ----------
@@ -585,8 +589,6 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
{
if (beentry->st_state != STATE_DISABLED)
{
- volatile PGPROC *proc = MyProc;
-
/*
* track_activities is disabled, but we last reported a
* non-disabled state. As our final update, change the state and
@@ -597,11 +599,11 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
beentry->st_state_start_timestamp = 0;
beentry->st_activity_raw[0] = '\0';
beentry->st_activity_start_timestamp = 0;
- /* st_xact_start_timestamp and wait_event_info are also disabled */
+ /* st_xact_start_timestamp and st_wait_event_info are also disabled */
beentry->st_xact_start_timestamp = 0;
beentry->st_query_id = INT64CONST(0);
beentry->st_plan_id = INT64CONST(0);
- proc->wait_event_info = 0;
+ beentry->st_wait_event_info = 0;
PGSTAT_END_WRITE_ACTIVITY(beentry);
}
return;
@@ -895,6 +897,13 @@ pgstat_read_current_status(void)
if (localentry->backendStatus.st_procpid > 0)
{
memcpy(&localentry->backendStatus, unvolatize(PgBackendStatus *, beentry), sizeof(PgBackendStatus));
+ /*
+ * st_wait_event_info is not protected by the changecount, so
+ * make sure we read it atomically. (Any half-way decent
+ * memcpy() implementation will read it as one word, but let's
+ * not assume it)
+ */
+ localentry->backendStatus.st_wait_event_info = beentry->st_wait_event_info;
/*
* For each PgBackendStatus field that is a pointer, copy the
@@ -1189,6 +1198,15 @@ pgstat_get_backend_type_by_proc_number(ProcNumber procNumber)
return status->st_backendType;
}
+uint32
+pgstat_get_wait_event_info_by_proc_number(ProcNumber procNumber)
+{
+ volatile PgBackendStatus *status = &BackendStatusArray[procNumber];
+
+ /* st_wait_event_info doesn't use the changecount mechanism */
+ return status->st_wait_event_info;
+}
+
/* ----------
* cmp_lbestatus
*
@@ -1210,7 +1228,7 @@ cmp_lbestatus(const void *a, const void *b)
*
* Support function for the SQL-callable pgstat* functions. Returns
* our local copy of the current-activity entry for one backend,
- * or NULL if the given beid doesn't identify any known session.
+ * or NULL if the given procNumber doesn't identify any known session.
*
* The argument is the ProcNumber of the desired session
* (note that this is unlike pgstat_get_local_beentry_by_index()).
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 73ca0bb0b7f..21dd32acf29 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -32,8 +32,6 @@
#include "utils/builtins.h"
#include "utils/timestamp.h"
-#define UINT32_ACCESS_ONCE(var) ((uint32)(*((volatile uint32 *)&(var))))
-
#define HAS_PGSTAT_PERMISSIONS(role) (has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role))
#define PG_STAT_GET_RELENTRY_INT64(stat) \
@@ -469,7 +467,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
uint32 raw_wait_event;
PGPROC *leader;
- raw_wait_event = UINT32_ACCESS_ONCE(proc->wait_event_info);
+ raw_wait_event = beentry->st_wait_event_info;
wait_event_type = pgstat_get_wait_event_type(raw_wait_event);
wait_event = pgstat_get_wait_event(raw_wait_event);
@@ -817,15 +815,14 @@ pg_stat_get_backend_wait_event_type(PG_FUNCTION_ARGS)
{
int32 procNumber = PG_GETARG_INT32(0);
PgBackendStatus *beentry;
- PGPROC *proc;
const char *wait_event_type = NULL;
if ((beentry = pgstat_get_beentry_by_proc_number(procNumber)) == NULL)
wait_event_type = "<backend information not available>";
else if (!HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
wait_event_type = "<insufficient privilege>";
- else if ((proc = BackendPidGetProc(beentry->st_procpid)) != NULL)
- wait_event_type = pgstat_get_wait_event_type(proc->wait_event_info);
+ else
+ wait_event_type = pgstat_get_wait_event_type(beentry->st_wait_event_info);
if (!wait_event_type)
PG_RETURN_NULL();
@@ -838,15 +835,14 @@ pg_stat_get_backend_wait_event(PG_FUNCTION_ARGS)
{
int32 procNumber = PG_GETARG_INT32(0);
PgBackendStatus *beentry;
- PGPROC *proc;
const char *wait_event = NULL;
if ((beentry = pgstat_get_beentry_by_proc_number(procNumber)) == NULL)
wait_event = "<backend information not available>";
else if (!HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
wait_event = "<insufficient privilege>";
- else if ((proc = BackendPidGetProc(beentry->st_procpid)) != NULL)
- wait_event = pgstat_get_wait_event(proc->wait_event_info);
+ else
+ wait_event = pgstat_get_wait_event(beentry->st_wait_event_info);
if (!wait_event)
PG_RETURN_NULL();
diff --git a/src/backend/utils/adt/waitfuncs.c b/src/backend/utils/adt/waitfuncs.c
index 135e7ba8a7a..862ad8733f8 100644
--- a/src/backend/utils/adt/waitfuncs.c
+++ b/src/backend/utils/adt/waitfuncs.c
@@ -17,11 +17,10 @@
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/array.h"
+#include "utils/backend_status.h"
#include "utils/fmgrprotos.h"
#include "utils/wait_event.h"
-#define UINT32_ACCESS_ONCE(var) ((uint32)(*((volatile uint32 *)&(var))))
-
/*
* pg_isolation_test_session_is_blocked - support function for isolationtester
@@ -56,7 +55,7 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
if (proc == NULL)
PG_RETURN_BOOL(false); /* session gone: definitely unblocked */
wait_event_type =
- pgstat_get_wait_event_type(UINT32_ACCESS_ONCE(proc->wait_event_info));
+ pgstat_get_wait_event_type(pgstat_get_wait_event_info_by_proc_number(GetNumberFromPGProc(proc)));
if (wait_event_type && strcmp("InjectionPoint", wait_event_type) == 0)
PG_RETURN_BOOL(true);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 81f1960a635..99efcbebb64 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -298,8 +298,6 @@ struct PGPROC
*/
TransactionId procArrayGroupMemberXid;
- uint32 wait_event_info; /* proc's wait information */
-
/* Support for group transaction status update. */
bool clogGroupMember; /* true, if member of clog group */
pg_atomic_uint32 clogGroupNext; /* next clog group member */
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 781e48c0c10..616d1047954 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -174,6 +174,15 @@ typedef struct PgBackendStatus
/* plan identifier, optionally computed using planner_hook */
int64 st_plan_id;
+
+ /*
+ * Proc's wait information. This *not* protected by the changecount
+ * mechanism, because reading and writing an uint32 is assumed to atomic.
+ * This is updated very frequently, so we want to keep the overhead as
+ * small as possible.
+ */
+ uint32 st_wait_event_info;
+
} PgBackendStatus;
@@ -332,6 +341,7 @@ extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer,
extern int64 pgstat_get_my_query_id(void);
extern int64 pgstat_get_my_plan_id(void);
extern BackendType pgstat_get_backend_type_by_proc_number(ProcNumber procNumber);
+extern uint32 pgstat_get_wait_event_info_by_proc_number(ProcNumber procNumber);
/* ----------
--
2.47.3