On 05/02/2026 22:43, Sami Imseih wrote:
There is also a discussion [0] about wait event/activity field
inconsistency
with pg_stat_activity with a repro in [1].
The repro I was referring to in [1] is actually
https://www.postgresql.org/message-id/ab1c0a7d-e789-5ef5-1180-42708ac6fe2d%40postgrespro.ru
That is inherent. The wait event is updated in an unsynchronized fashion. As
noted in that thread.
Making it synchronized (via st_changecount) would make wait event overhead
vastly higher.
Correct, and I don't think we should pay the overhead of ensuring
strict synchronization.
The patch that Heikkei is proposing [0] will update wait_event_info bypassing
st_changecount.
```
+++ b/src/include/utils/backend_status.h
@@ -174,6 +174,15 @@ typedef struct PgBackendStatus
/* plan identifier, optionally computed using planner_hook */
int64 st_plan_id;
+
+ /*
+ * Proc's wait information. This *not* protected by the changecount
+ * mechanism, because reading and writing an uint32 is assumed
to atomic.
+ * This is updated very frequently, so we want to keep the overhead as
+ * small as possible.
+ */
+ uint32 st_wait_event_info;
+
```
Which is the same assumption we are already making.
```
static inline void
pgstat_report_wait_start(uint32 wait_event_info)
{
/*
* Since this is a four-byte field which is always read and written as
* four-bytes, updates are atomic.
*/
*(volatile uint32 *) my_wait_event_info = wait_event_info;
}
```
So without st_changecount, wait_events could still be out of sync on with
other backend status fields (i.e. state), but at least my testing shows
better results with [0] applied. Better results here are no samples
with " state = active, wait_event = ClientRead and backend_type =
client backend"
Right, there's some inherent fuzziness if st_changecount is not updated.
My patch nevertheless improves things:
1. If you call pg_stat_get_backend_wait_event_type(1234), and then
pg_stat_get_backend_wait_event(1234), the results will be consistent.
Currently, it's possible to get a nonsensical combination like "LWLock"
and "ClientRead".
2. Because all other status updates bump st_changecount, and the
function that *reads* the fields checks st_changecount, you do actually
get some consistency between wait_event_info and the other fields. I
haven't fully though through what the guarantee is, but I think it's now
impossible to get a combination like "state=active,
wait_event=ClientRead", as you saw in your testing. This consistency
break down if we add more fields that don't update st_changecount, i.e.
you might get inconsistent reads between such fields, but it's an
improvement.
I'm not sure how much we want to document and guarantee these things.
I'd love to keep some flexibility to relax them in the future for
performance reasons. But I think this patch is an improvement, and
doesn't promise too much.
Using the attached repro.sh (attached):
Nice, thanks for the testing!
- Heikki