On Fri, Jun 14, 2024 at 02:31:37PM +0900, Michael Paquier wrote:
> I don't think that this is going to fly far except if we introduce a
> concept of "generation" or "age" in the stats entries.  The idea is
> simple: when a stats entry is reinitialized because of a drop&create,
> increment a counter to tell that this is a new generation, and keep
> track of it in *both* PgStat_EntryRef (local backend reference to the
> shmem stats entry) *and* PgStatShared_HashEntry (the central one).
> When releasing an entry, if we know that the shared entry we are
> pointing at is not of the same generation as the local reference, it
> means that the entry has been reused for something else with the same
> hash key, so give up.  It should not be that invasive, still it means
> ABI breakage in the two pgstats internal structures I am mentioning,
> which is OK for a backpatch as this stuff is internal.  On top of
> that, this problem means that we can silently and randomly lose stats,
> which is not cool.
> 
> I'll try to give it a go on Monday.

Here you go, the patch introduces what I've named an "age" counter
attached to the shared entry references, and copied over to the local
references.  The countner is initialized at 0 and incremented each
time an entry is reused, then when attempting to drop an entry we
cross-check the version hold locally with the shared one.

While looking at the whole, this is close to a concept patch sent
previously, where a counter is used in the shared entry with a
cross-check done with the local reference, that was posted here
(noticed that today):
https://www.postgresql.org/message-id/20230603.063418.767871221863527769.horikyota....@gmail.com

The logic is different though, as we don't need to care about the
contents of the local cache when cross-checking the "age" count when
retrieving the contents, just the case where a backend would attempt
to drop an entry it thinks is OK to operate on, that got reused
because of the effect of other backends doing creates and drops with
the same hash key.

This idea needs more eyes, so I am adding that to the next CF for now.
I've played with it for a few hours and concurrent replication slot
drops/creates, without breaking it.  I have not implemented an
isolation test for this case, as it depends on where we are going with
their integration with injection points.
--
Michael
From 3d9a8b6913c0ce61d800d3b31f392230580e99fc Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Mon, 17 Jun 2024 13:22:33 +0900
Subject: [PATCH] Introduce "agecount" for shared pgstats

This fixes a set of race conditions with cumulative statistics where a
shared stats entry could be dropped while it should still be valid
when it gets reused.  This can happen for the following case, causing
the same hash key to be used:
- replication slots that compute internally an index number.
- other object types, with a wraparound involved.

The age counter is tracked in the shared entries, initialized at 0 when
an entry is created and incremented when the same entry is reused, to
avoid concurrent turbulences on drop because of other backends still
holding a reference to it.
---
 src/include/utils/pgstat_internal.h       | 14 ++++++++
 src/backend/utils/activity/pgstat_shmem.c | 40 ++++++++++++++++++++---
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index dbbca31602..2ea29b7638 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -93,6 +93,14 @@ typedef struct PgStatShared_HashEntry
 	 */
 	pg_atomic_uint32 refcount;
 
+	/*
+	 * Counter tracking the number of times the entry has been reused.
+	 *
+	 * Set to 0 when the entry is created, and incremented by one each time
+	 * the shared entry is reinitialized with pgstat_reinit_entry().
+	 */
+	pg_atomic_uint32 agecount;
+
 	/*
 	 * Pointer to shared stats. The stats entry always starts with
 	 * PgStatShared_Common, embedded in a larger struct containing the
@@ -132,6 +140,12 @@ typedef struct PgStat_EntryRef
 	 */
 	PgStatShared_Common *shared_stats;
 
+	/*
+	 * Copy of PgStatShared_HashEntry->agecount, keeping locally track of the
+	 * shared stats entry "age" retrieved (number of times reused).
+	 */
+	uint32		agecount;
+
 	/*
 	 * Pending statistics data that will need to be flushed to shared memory
 	 * stats eventually. Each stats kind utilizing pending data defines what
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index fb79c5e771..a9de1985a4 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -270,6 +270,11 @@ pgstat_init_entry(PgStat_Kind kind,
 	 * further if a longer lived reference is needed.
 	 */
 	pg_atomic_init_u32(&shhashent->refcount, 1);
+
+	/*
+	 * Initialize agecount to 0, as freshly initialized.
+	 */
+	pg_atomic_init_u32(&shhashent->agecount, 0);
 	shhashent->dropped = false;
 
 	chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size);
@@ -293,6 +298,12 @@ pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent)
 
 	/* mark as not dropped anymore */
 	pg_atomic_fetch_add_u32(&shhashent->refcount, 1);
+
+	/*
+	 * Increment agecount, to let any backend with local references know that
+	 * what they point to is outdated.
+	 */
+	pg_atomic_fetch_add_u32(&shhashent->agecount, 1);
 	shhashent->dropped = false;
 
 	/* reinitialize content */
@@ -333,6 +344,7 @@ pgstat_acquire_entry_ref(PgStat_EntryRef *entry_ref,
 
 	entry_ref->shared_stats = shheader;
 	entry_ref->shared_entry = shhashent;
+	entry_ref->agecount = pg_atomic_read_u32(&shhashent->agecount);
 }
 
 /*
@@ -491,7 +503,8 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, bool create,
 			 * case are replication slot stats, where a new slot can be
 			 * created with the same index just after dropping. But oid
 			 * wraparound can lead to other cases as well. We just reset the
-			 * stats to their plain state.
+			 * stats to their plain state, while incrementing its "age" in the
+			 * shared entry for any remaining local references.
 			 */
 			shheader = pgstat_reinit_entry(kind, shhashent);
 			pgstat_acquire_entry_ref(entry_ref, shhashent, shheader);
@@ -558,10 +571,27 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref,
 			if (!shent)
 				elog(ERROR, "could not find just referenced shared stats entry");
 
-			Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0);
-			Assert(entry_ref->shared_entry == shent);
-
-			pgstat_free_entry(shent, NULL);
+			/*
+			 * This entry may have been reinitialized while trying to release
+			 * it, so double-check that it has not been reused while holding a
+			 * lock on its shared entry.
+			 */
+			if (pg_atomic_read_u32(&entry_ref->shared_entry->agecount) ==
+				entry_ref->agecount)
+			{
+				/* Same generation, so we're OK with the removal */
+				Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0);
+				Assert(entry_ref->shared_entry == shent);
+				pgstat_free_entry(shent, NULL);
+			}
+			else
+			{
+				/*
+				 * Shared stats entry has been reinitialized, so do not drop
+				 * its shared entry, only release its lock.
+				 */
+				dshash_release_lock(pgStatLocal.shared_hash, shent);
+			}
 		}
 	}
 
-- 
2.45.1

Attachment: signature.asc
Description: PGP signature

Reply via email to