Hi,

SMgrRelationData objects don't currently have a defined lifetime, so
it's hard to know when the result of smgropen() might become a
dangling pointer.  This has caused a few bugs in the past, and the
usual fix is to just call smgropen() more often and not hold onto
pointers.  If you're doing that frequently enough, the hash table
lookups can show up in profiles.  I'm interested in this topic for
more than just micro-optimisations, though: in order to be able to
batch/merge smgr operations, I'd like to be able to collect them in
data structures that survive more than just a few lines of code.
(Examples to follow in later emails).

The simplest idea seems to be to tie object lifetime to transactions
using the existing AtEOXact_SMgr() mechanism.  In recovery, the
obvious corresponding time would be the commit/abort record that
destroys the storage.

This could be achieved by extending smgrrelease().  That was a
solution to the same problem in a narrower context: we didn't want
CFIs to randomly free SMgrRelations, but we needed to be able to
force-close fds in other backends, to fix various edge cases.

The new idea is to overload smgrrelease(it) so that it also clears the
owner, which means that AtEOXact_SMgr() will eventually smgrclose(it),
unless it is re-owned by a relation before then.  That choice stems
from the complete lack of information available via sinval in the case
of an overflow.  We must (1) close all descriptors because any file
might have been unlinked, (2) keep all pointers valid and yet (3) not
leak dropped smgr objects forever.  In this patch, smgrreleaseall()
achieves those goals.

Proof-of-concept patch attached.  Are there holes in this scheme?
Better ideas welcome.  In terms of spelling, another option would be
to change the behaviour of smgrclose() to work as described, ie it
would call smgrrelease() and then also disown, so we don't have to
change most of those callers, and then add a new function
smgrdestroy() for the few places that truly need it.  Or something
like that.

Other completely different ideas I've bounced around with various
hackers and decided against: references counts, "holder" objects that
can be an "owner" (like Relation, but when you don't have a Relation)
but can re-open on demand.  Seemed needlessly complicated.

While studying this I noticed a minor thinko in smgrrelease() in
15+16, so here's a fix for that also.  I haven't figured out a
sequence that makes anything bad happen, but we should really
invalidate smgr_targblock when a relfilenode is reused, since it might
point past the end.  This becomes more obvious once smgrrelease() is
used for truncation, as proposed here.
From 844e56e9ea9a56e04813886a6f7ded19a3af7f90 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 14 Aug 2023 11:01:28 +1200
Subject: [PATCH 1/2] Invalidate smgr_targblock on release.

In rare circumstances involving relfilenode reuse, it may be possible
for smgr_targblock to finish up pointing past the end.

Oversight in b74e94dc.  Back-patch to 15.
---
 src/backend/storage/smgr/smgr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index f76c4605db..65e7436306 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -295,6 +295,7 @@ smgrrelease(SMgrRelation reln)
 	{
 		smgrsw[reln->smgr_which].smgr_close(reln, forknum);
 		reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
+		reln->smgr_targblock = InvalidBlockNumber;
 	}
 }
 
-- 
2.39.2

From b18fea5f263c46f87b84e242ff228cf1ab97b3e8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 10 Aug 2023 18:16:31 +1200
Subject: [PATCH 2/2] Give SMgrRelation pointers a well-defined lifetime.

After calling smgropen(), it was not clear how long you could continue
to use the result, because various code paths including cache
invalidation could call smgrclose().

Guarantee that the object won't be freed until the end of the current
transaction, or in recovery, the commit/abort record that destroys the
storage.

This is achieved by making most places that previously called
smgrclose() use smgrrelease() instead, and giving smgrrelease() the
additional task of 'disowning' the relation so that it is closed at
end-of-transaction.  That allows all pointers to remain valid, even in
the case of sinval overflow where we have no information about which
relations have been dropped or truncated.
---
 src/backend/access/heap/heapam_handler.c |  6 +++---
 src/backend/access/heap/visibilitymap.c  |  2 +-
 src/backend/access/transam/xlogutils.c   |  6 +++---
 src/backend/catalog/storage.c            |  2 +-
 src/backend/commands/cluster.c           |  4 ++--
 src/backend/commands/dbcommands.c        |  2 +-
 src/backend/commands/sequence.c          |  2 +-
 src/backend/commands/tablecmds.c         |  4 ++--
 src/backend/storage/buffer/bufmgr.c      |  4 ++--
 src/backend/storage/smgr/smgr.c          | 21 ++++++++++++++------
 src/backend/utils/cache/inval.c          |  2 +-
 src/backend/utils/cache/relcache.c       | 25 ++++++++++++------------
 src/include/storage/smgr.h               |  2 +-
 src/include/utils/rel.h                  | 16 +++++----------
 src/include/utils/relcache.h             |  2 +-
 15 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 5a17112c91..83673da2fb 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -623,7 +623,7 @@ heapam_relation_set_new_filelocator(Relation rel,
 		smgrimmedsync(srel, INIT_FORKNUM);
 	}
 
-	smgrclose(srel);
+	smgrrelease(srel);
 }
 
 static void
@@ -682,9 +682,9 @@ heapam_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
 	}
 
 
-	/* drop old relation, and close new one */
+	/* drop old relation, and release new one */
 	RelationDropStorage(rel);
-	smgrclose(dstrel);
+	smgrrelease(dstrel);
 }
 
 static void
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7d54ec9c0f..06b16bf2f1 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -635,7 +635,7 @@ vm_extend(Relation rel, BlockNumber vm_nblocks)
 							  RBM_ZERO_ON_ERROR);
 
 	/*
-	 * Send a shared-inval message to force other backends to close any smgr
+	 * Send a shared-inval message to force other backends to release any smgr
 	 * references they may have for this rel, which we are about to change.
 	 * This is a useful optimization because it means that backends don't have
 	 * to keep checking for creation or extension of the file, which happens
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index e174a2a891..1bbd65376e 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -654,12 +654,12 @@ void
 XLogDropDatabase(Oid dbid)
 {
 	/*
-	 * This is unnecessarily heavy-handed, as it will close SMgrRelation
+	 * This is unnecessarily heavy-handed, as it will release SMgrRelation
 	 * objects for other databases as well. DROP DATABASE occurs seldom enough
 	 * that it's not worth introducing a variant of smgrclose for just this
-	 * purpose. XXX: Or should we rather leave the smgr entries dangling?
+	 * purpose.
 	 */
-	smgrcloseall();
+	smgrreleaseall();
 
 	forget_invalid_pages_db(dbid);
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 2add053489..7d2697455f 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -226,7 +226,7 @@ RelationDropStorage(Relation rel)
 	 * for now I'll keep the logic simple.
 	 */
 
-	RelationCloseSmgr(rel);
+	RelationReleaseSmgr(rel);
 }
 
 /*
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index a3bef6ac34..6bf0431c62 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1439,8 +1439,8 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 	 * itself, the smgr close on pg_class must happen after all accesses in
 	 * this function.
 	 */
-	RelationCloseSmgrByOid(r1);
-	RelationCloseSmgrByOid(r2);
+	RelationReleaseSmgrByOid(r1);
+	RelationReleaseSmgrByOid(r2);
 }
 
 /*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 307729ab7e..911ef2aa19 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -278,7 +278,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 
 	smgr = smgropen(rlocator, InvalidBackendId);
 	nblocks = smgrnblocks(smgr, MAIN_FORKNUM);
-	smgrclose(smgr);
+	smgrrelease(smgr);
 
 	/* Use a buffer access strategy since this is a bulk read operation. */
 	bstrategy = GetAccessStrategy(BAS_BULKREAD);
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index fc4f77e787..64fd1564ae 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -360,7 +360,7 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
 		log_smgrcreate(&rel->rd_locator, INIT_FORKNUM);
 		fill_seq_fork_with_data(rel, tuple, INIT_FORKNUM);
 		FlushRelationBuffers(rel);
-		smgrclose(srel);
+		smgrrelease(srel);
 	}
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 727f151750..1eee2e743c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14825,9 +14825,9 @@ index_copy_data(Relation rel, RelFileLocator newrlocator)
 		}
 	}
 
-	/* drop old relation, and close new one */
+	/* drop old relation, and release new one */
 	RelationDropStorage(rel);
-	smgrclose(dstrel);
+	smgrrelease(dstrel);
 }
 
 /*
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index df22aaa1c5..79ffa25cb1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4392,10 +4392,10 @@ CreateAndCopyRelationData(RelFileLocator src_rlocator,
 	rlocator.backend = InvalidBackendId;
 
 	rlocator.locator = src_rlocator;
-	smgrcloserellocator(rlocator);
+	smgrreleaserellocator(rlocator);
 
 	rlocator.locator = dst_rlocator;
-	smgrcloserellocator(rlocator);
+	smgrreleaserellocator(rlocator);
 }
 
 /* ---------------------------------------------------------------------
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 65e7436306..c3728b713f 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -286,7 +286,9 @@ smgrclose(SMgrRelation reln)
 /*
  * smgrrelease() -- Release all resources used by this object.
  *
- * The object remains valid.
+ * The object remains valid, but is moved to the unknown list where it will
+ * be closed by AtEOXact_SMgr().  It may be re-owned if it is accessed by
+ * a relation before then.
  */
 void
 smgrrelease(SMgrRelation reln)
@@ -296,6 +298,13 @@ smgrrelease(SMgrRelation reln)
 		smgrsw[reln->smgr_which].smgr_close(reln, forknum);
 		reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
 		reln->smgr_targblock = InvalidBlockNumber;
+
+		if (reln->smgr_owner)
+		{
+			*reln->smgr_owner = NULL;
+			reln->smgr_owner = NULL;
+			dlist_push_tail(&unowned_relns, &reln->node);
+		}
 	}
 }
 
@@ -340,15 +349,15 @@ smgrcloseall(void)
 }
 
 /*
- * smgrcloserellocator() -- Close SMgrRelation object for given RelFileLocator,
+ * smgrreleaserellocator() -- Close SMgrRelation object for given RelFileLocator,
  *							if one exists.
  *
- * This has the same effects as smgrclose(smgropen(rlocator)), but it avoids
+ * This has the same effects as smgrrelease(smgropen(rlocator)), but it avoids
  * uselessly creating a hashtable entry only to drop it again when no
  * such entry exists already.
  */
 void
-smgrcloserellocator(RelFileLocatorBackend rlocator)
+smgrreleaserellocator(RelFileLocatorBackend rlocator)
 {
 	SMgrRelation reln;
 
@@ -360,7 +369,7 @@ smgrcloserellocator(RelFileLocatorBackend rlocator)
 									  &rlocator,
 									  HASH_FIND, NULL);
 	if (reln != NULL)
-		smgrclose(reln);
+		smgrrelease(reln);
 }
 
 /*
@@ -664,7 +673,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nb
 	DropRelationBuffers(reln, forknum, nforks, nblocks);
 
 	/*
-	 * Send a shared-inval message to force other backends to close any smgr
+	 * Send a shared-inval message to force other backends to release any smgr
 	 * references they may have for this rel.  This is useful because they
 	 * might have open file pointers to segments that got removed, and/or
 	 * smgr_targblock variables pointing past the new rel end.  (The inval
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 0008826f67..b755d2fe9e 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -665,7 +665,7 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
 
 		rlocator.locator = msg->sm.rlocator;
 		rlocator.backend = (msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;
-		smgrcloserellocator(rlocator);
+		smgrreleaserellocator(rlocator);
 	}
 	else if (msg->id == SHAREDINVALRELMAP_ID)
 	{
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 8e08ca1c68..68f001dc80 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2226,8 +2226,8 @@ RelationReloadIndexInfo(Relation relation)
 		   !relation->rd_isvalid &&
 		   relation->rd_droppedSubid == InvalidSubTransactionId);
 
-	/* Ensure it's closed at smgr level */
-	RelationCloseSmgr(relation);
+	/* Ensure it's released at smgr level */
+	RelationReleaseSmgr(relation);
 
 	/* Must free any AM cached data upon relcache flush */
 	if (relation->rd_amcache)
@@ -2409,7 +2409,7 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
 	 * weren't closed already.  (This was probably done by caller, but let's
 	 * just be real sure.)
 	 */
-	RelationCloseSmgr(relation);
+	RelationReleaseSmgr(relation);
 
 	/* break mutual link with stats entry */
 	pgstat_unlink_relation(relation);
@@ -2512,7 +2512,7 @@ RelationClearRelation(Relation relation, bool rebuild)
 	 * that the low-level file access state is updated after, say, a vacuum
 	 * truncation.
 	 */
-	RelationCloseSmgr(relation);
+	RelationReleaseSmgr(relation);
 
 	/* Free AM cached data, if any */
 	if (relation->rd_amcache)
@@ -2953,8 +2953,9 @@ RelationCacheInvalidate(bool debug_discard)
 	{
 		relation = idhentry->reldesc;
 
-		/* Must close all smgr references to avoid leaving dangling ptrs */
-		RelationCloseSmgr(relation);
+		/* Disconnect smgr references to avoid leaving dangling ptrs */
+		if (relation->rd_smgr)
+			smgrclearowner(&relation->rd_smgr, relation->rd_smgr);
 
 		/*
 		 * Ignore new relations; no other backend will manipulate them before
@@ -3007,11 +3008,11 @@ RelationCacheInvalidate(bool debug_discard)
 	}
 
 	/*
-	 * Now zap any remaining smgr cache entries.  This must happen before we
+	 * Now release remaining smgr cache entries.  This must happen before we
 	 * start to rebuild entries, since that may involve catalog fetches which
 	 * will re-open catalog files.
 	 */
-	smgrcloseall();
+	smgrreleaseall();
 
 	/* Phase 2: rebuild the items found to need rebuild in phase 1 */
 	foreach(l, rebuildFirstList)
@@ -3034,13 +3035,13 @@ RelationCacheInvalidate(bool debug_discard)
 }
 
 /*
- * RelationCloseSmgrByOid - close a relcache entry's smgr link
+ * RelationReleaseSmgrByOid - close a relcache entry's smgr link
  *
  * Needed in some cases where we are changing a relation's physical mapping.
  * The link will be automatically reopened on next use.
  */
 void
-RelationCloseSmgrByOid(Oid relationId)
+RelationReleaseSmgrByOid(Oid relationId)
 {
 	Relation	relation;
 
@@ -3049,7 +3050,7 @@ RelationCloseSmgrByOid(Oid relationId)
 	if (!PointerIsValid(relation))
 		return;					/* not in cache, nothing to do */
 
-	RelationCloseSmgr(relation);
+	RelationReleaseSmgr(relation);
 }
 
 static void
@@ -3816,7 +3817,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		SMgrRelation srel;
 
 		srel = RelationCreateStorage(newrlocator, persistence, true);
-		smgrclose(srel);
+		smgrrelease(srel);
 	}
 	else
 	{
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index a9a179aaba..9f4ced6721 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -84,9 +84,9 @@ extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
 extern void smgrclearowner(SMgrRelation *owner, SMgrRelation reln);
 extern void smgrclose(SMgrRelation reln);
 extern void smgrcloseall(void);
-extern void smgrcloserellocator(RelFileLocatorBackend rlocator);
 extern void smgrrelease(SMgrRelation reln);
 extern void smgrreleaseall(void);
+extern void smgrreleaserellocator(RelFileLocatorBackend rlocator);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdosyncall(SMgrRelation *rels, int nrels);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 1426a353cd..210c5aa5b3 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -561,12 +561,6 @@ typedef struct ViewOptions
  *
  * Very little code is authorized to touch rel->rd_smgr directly.  Instead
  * use this function to fetch its value.
- *
- * Note: since a relcache flush can cause the file handle to be closed again,
- * it's unwise to hold onto the pointer returned by this function for any
- * long period.  Recommended practice is to just re-execute RelationGetSmgr
- * each time you need to access the SMgrRelation.  It's quite cheap in
- * comparison to whatever an smgr function is going to do.
  */
 static inline SMgrRelation
 RelationGetSmgr(Relation rel)
@@ -577,16 +571,16 @@ RelationGetSmgr(Relation rel)
 }
 
 /*
- * RelationCloseSmgr
- *		Close the relation at the smgr level, if not already done.
+ * RelationReleaseSmgr
+ *		Release the relation at the smgr level, if not already done.
  */
 static inline void
-RelationCloseSmgr(Relation relation)
+RelationReleaseSmgr(Relation relation)
 {
 	if (relation->rd_smgr != NULL)
-		smgrclose(relation->rd_smgr);
+		smgrrelease(relation->rd_smgr);
 
-	/* smgrclose should unhook from owner pointer */
+	/* smgrrelease should unhook from owner pointer */
 	Assert(relation->rd_smgr == NULL);
 }
 #endif							/* !FRONTEND */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 38524641f4..c02ebab649 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -130,7 +130,7 @@ extern void RelationCacheInvalidateEntry(Oid relationId);
 
 extern void RelationCacheInvalidate(bool debug_discard);
 
-extern void RelationCloseSmgrByOid(Oid relationId);
+extern void RelationReleaseSmgrByOid(Oid relationId);
 
 #ifdef USE_ASSERT_CHECKING
 extern void AssertPendingSyncs_RelationCache(void);
-- 
2.39.2

Reply via email to