Re: Recovering from detoast-related catcache invalidations

2024-01-14 Thread Xiaoran Wang
This is an interesting idea.
 Although some catalog tables are not in catcaches,
such as pg_depend, when scanning them, if there is any
SharedInvalidationMessage, the CatalogSnapshot
will be invalidated and recreated ("RelationInvalidatesSnapshotsOnly"
in  syscache.c)
Maybe during the system_scan, it receives the SharedInvalidationMessages
and returns the tuples which
are out of date. systable_recheck_tuple is used in dependency.c for such
case.



Tom Lane  于2024年1月14日周日 03:12写道:

> I wrote:
> > Xiaoran Wang  writes:
> >> Hmm, how about first checking if any invalidated shared messages have
> been
> >> accepted, then rechecking the tuple's visibility?
> >> If there is no invalidated shared message accepted during
> >> 'toast_flatten_tuple',
> >> there is no need to do then visibility check, then it can save several
> >> CPU cycles.
>
> > Meh, I'd just as soon not add the additional dependency/risk of bugs.
> > This is an expensive and seldom-taken code path, so I don't think
> > shaving a few cycles is really important.
>
> It occurred to me that this idea might be more interesting if we
> could encapsulate it right into systable_recheck_tuple: something
> like having systable_beginscan capture the current
> SharedInvalidMessageCounter and save it in the SysScanDesc struct,
> then compare in systable_recheck_tuple to possibly short-circuit
> that work.  This'd eliminate one of the main bug hazards in the
> idea, namely that you might capture SharedInvalidMessageCounter too
> late, after something's already happened.  However, the whole idea
> only works for catalogs that have catcaches, and the other users of
> systable_recheck_tuple are interested in pg_depend which doesn't.
> So that put a damper on my enthusiasm for the idea.
>
> regards, tom lane
>


Re: Recovering from detoast-related catcache invalidations

2024-01-14 Thread Noah Misch
On Fri, Jan 12, 2024 at 03:47:13PM -0500, Tom Lane wrote:
> I wrote:
> > This is uncomfortably much in bed with the tuple table slot code,
> > perhaps, but I don't see a way to do it more cleanly unless we want
> > to add some new provisions to that API.  Andres, do you have any
> > thoughts about that?
> 
> Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
> which is meant for exactly this purpose.  So v4 attached.

systable_recheck_tuple() is blind to heap_inplace_update(), so it's not a
general proxy for invalidation messages.  The commit for $SUBJECT (ad98fb1)
doesn't create any new malfunctions, but I expect the systable_recheck_tuple()
part will change again before the heap_inplace_update() story is over
(https://postgr.es/m/flat/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bg2mgqecerqr5gg...@mail.gmail.com).




Re: Recovering from detoast-related catcache invalidations

2024-01-13 Thread Tom Lane
I wrote:
> Xiaoran Wang  writes:
>> Hmm, how about first checking if any invalidated shared messages have been
>> accepted, then rechecking the tuple's visibility?
>> If there is no invalidated shared message accepted during
>> 'toast_flatten_tuple',
>> there is no need to do then visibility check, then it can save several
>> CPU cycles.

> Meh, I'd just as soon not add the additional dependency/risk of bugs.
> This is an expensive and seldom-taken code path, so I don't think
> shaving a few cycles is really important.

It occurred to me that this idea might be more interesting if we
could encapsulate it right into systable_recheck_tuple: something
like having systable_beginscan capture the current
SharedInvalidMessageCounter and save it in the SysScanDesc struct,
then compare in systable_recheck_tuple to possibly short-circuit
that work.  This'd eliminate one of the main bug hazards in the
idea, namely that you might capture SharedInvalidMessageCounter too
late, after something's already happened.  However, the whole idea
only works for catalogs that have catcaches, and the other users of
systable_recheck_tuple are interested in pg_depend which doesn't.
So that put a damper on my enthusiasm for the idea.

regards, tom lane




Re: Recovering from detoast-related catcache invalidations

2024-01-13 Thread Tom Lane
Xiaoran Wang  writes:
> Hmm, how about first checking if any invalidated shared messages have been
> accepted, then rechecking the tuple's visibility?
> If there is no invalidated shared message accepted during
> 'toast_flatten_tuple',
> there is no need to do then visibility check, then it can save several
> CPU cycles.

Meh, I'd just as soon not add the additional dependency/risk of bugs.
This is an expensive and seldom-taken code path, so I don't think
shaving a few cycles is really important.

regards, tom lane




Re: Recovering from detoast-related catcache invalidations

2024-01-13 Thread Xiaoran Wang
Hmm, how about first checking if any invalidated shared messages have been
accepted, then rechecking the tuple's visibility?
If there is no invalidated shared message accepted during
'toast_flatten_tuple',
there is no need to do then visibility check, then it can save several
CPU cycles.


   if (inval_count != SharedInvalidMessageCounter &&
!systable_recheck_tuple(scandesc, ntp))
   {
  heap_freetuple(dtp);
  return NULL;
}



Xiaoran Wang  于2024年1月13日周六 13:16写道:

> Great! That's what exactly we need.
>
> The patch LGTM,  +1
>
>
> Tom Lane  于2024年1月13日周六 04:47写道:
>
>> I wrote:
>> > This is uncomfortably much in bed with the tuple table slot code,
>> > perhaps, but I don't see a way to do it more cleanly unless we want
>> > to add some new provisions to that API.  Andres, do you have any
>> > thoughts about that?
>>
>> Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
>> which is meant for exactly this purpose.  So v4 attached.
>>
>> regards, tom lane
>>
>>


Re: Recovering from detoast-related catcache invalidations

2024-01-12 Thread Xiaoran Wang
Great! That's what exactly we need.

The patch LGTM,  +1


Tom Lane  于2024年1月13日周六 04:47写道:

> I wrote:
> > This is uncomfortably much in bed with the tuple table slot code,
> > perhaps, but I don't see a way to do it more cleanly unless we want
> > to add some new provisions to that API.  Andres, do you have any
> > thoughts about that?
>
> Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
> which is meant for exactly this purpose.  So v4 attached.
>
> regards, tom lane
>
>


Re: Recovering from detoast-related catcache invalidations

2024-01-12 Thread Tom Lane
I wrote:
> This is uncomfortably much in bed with the tuple table slot code,
> perhaps, but I don't see a way to do it more cleanly unless we want
> to add some new provisions to that API.  Andres, do you have any
> thoughts about that?

Oh!  After nosing around a bit more I remembered systable_recheck_tuple,
which is meant for exactly this purpose.  So v4 attached.

regards, tom lane

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index c0591cffe5..0dcd45d2f0 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
+#include "common/pg_prng.h"
 #include "miscadmin.h"
 #include "port/pg_bitutils.h"
 #ifdef CATCACHE_STATS
@@ -90,10 +91,10 @@ static void CatCachePrintStats(int code, Datum arg);
 static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
 static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
+		HeapTuple ntp, SysScanDesc scandesc,
 		Datum *arguments,
-		uint32 hashValue, Index hashIndex,
-		bool negative);
+		uint32 hashValue, Index hashIndex);
 
 static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner);
 static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner);
@@ -1372,6 +1373,7 @@ SearchCatCacheMiss(CatCache *cache,
 	SysScanDesc scandesc;
 	HeapTuple	ntp;
 	CatCTup*ct;
+	bool		stale;
 	Datum		arguments[CATCACHE_MAXKEYS];
 
 	/* Initialize local parameter array */
@@ -1380,16 +1382,6 @@ SearchCatCacheMiss(CatCache *cache,
 	arguments[2] = v3;
 	arguments[3] = v4;
 
-	/*
-	 * Ok, need to make a lookup in the relation, copy the scankey and fill
-	 * out any per-call fields.
-	 */
-	memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-	cur_skey[0].sk_argument = v1;
-	cur_skey[1].sk_argument = v2;
-	cur_skey[2].sk_argument = v3;
-	cur_skey[3].sk_argument = v4;
-
 	/*
 	 * Tuple was not found in cache, so we have to try to retrieve it directly
 	 * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1396,28 @@ SearchCatCacheMiss(CatCache *cache,
 	 * will eventually age out of the cache, so there's no functional problem.
 	 * This case is rare enough that it's not worth expending extra cycles to
 	 * detect.
+	 *
+	 * Another case, which we *must* handle, is that the tuple could become
+	 * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+	 * AcceptInvalidationMessages can run during TOAST table access).  We do
+	 * not want to return already-stale catcache entries, so we loop around
+	 * and do the table scan again if that happens.
 	 */
 	relation = table_open(cache->cc_reloid, AccessShareLock);
 
+	do
+	{
+		/*
+		 * Ok, need to make a lookup in the relation, copy the scankey and
+		 * fill out any per-call fields.  (We must re-do this when retrying,
+		 * because systable_beginscan scribbles on the scankey.)
+		 */
+		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+		cur_skey[0].sk_argument = v1;
+		cur_skey[1].sk_argument = v2;
+		cur_skey[2].sk_argument = v3;
+		cur_skey[3].sk_argument = v4;
+
 	scandesc = systable_beginscan(relation,
   cache->cc_indexoid,
   IndexScanOK(cache, cur_skey),
@@ -1415,12 +1426,18 @@ SearchCatCacheMiss(CatCache *cache,
   cur_skey);
 
 	ct = NULL;
+	stale = false;
 
 	while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 	{
-		ct = CatalogCacheCreateEntry(cache, ntp, arguments,
-	 hashValue, hashIndex,
-	 false);
+		ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
+	 hashValue, hashIndex);
+		/* upon failure, we must start the scan over */
+		if (ct == NULL)
+		{
+			stale = true;
+			break;
+		}
 		/* immediately set the refcount to 1 */
 		ResourceOwnerEnlarge(CurrentResourceOwner);
 		ct->refcount++;
@@ -1429,6 +1446,7 @@ SearchCatCacheMiss(CatCache *cache,
 	}
 
 	systable_endscan(scandesc);
+	} while (stale);
 
 	table_close(relation, AccessShareLock);
 
@@ -1447,9 +1465,11 @@ SearchCatCacheMiss(CatCache *cache,
 		if (IsBootstrapProcessingMode())
 			return NULL;
 
-		ct = CatalogCacheCreateEntry(cache, NULL, arguments,
-	 hashValue, hashIndex,
-	 true);
+		ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
+	 hashValue, hashIndex);
+
+		/* Creating a negative cache entry shouldn't fail */
+		Assert(ct != NULL);
 
 		CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
    cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
@@ -1663,7 +1683,8 @@ SearchCatCacheList(CatCache *cache,
 	 * We have to bump the member refcounts temporarily to ensure they won't
 	 * get dropped from the 

Re: Recovering from detoast-related catcache invalidations

2024-01-12 Thread Tom Lane
Xiaoran Wang  writes:
>> Maybe, but that undocumented hack in SetHintBits seems completely
>> unacceptable.  Isn't there a cleaner way to make this check?

> Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the
> tuple has been deleted.
> As the tuple's xmin must been committed, so we just need to check if its
> xmax is committed,

I'm not super thrilled with that.  Something I realized last night is
that your proposal only works if "ntp" is pointing directly into the
catalog's disk buffers.  If something earlier than this code had made
a local-memory copy of the catalog tuple, then it's possible that its
header fields (particularly xmax) are out of date compared to shared
buffers and would fail to tell us that some other process just
invalidated the tuple.  Now in fact, with the current implementation
of syscache_getnext() the result is actually a live tuple and so we
can expect to see any relevant updates.  But I think we'd better add
some Asserts that that's so; and that also provides us with a way to
call HeapTupleSatisfiesVisibility fully legally, because we can get
the buffer reference out of the scan descriptor too.

This is uncomfortably much in bed with the tuple table slot code,
perhaps, but I don't see a way to do it more cleanly unless we want
to add some new provisions to that API.  Andres, do you have any
thoughts about that?

Anyway, this approach gets rid of false positives, which is great
for performance and bad for testing.  Code coverage says that now
we never hit the failure paths during regression tests, which is
unsurprising, but I'm not very comfortable with leaving those paths
unexercised.  I tried to make an isolation test to exercise them,
but there's no good way at the SQL level to get a session to block
during the detoast step.  LOCK TABLE on some catalog's toast table
would do, but we disallow it.  I thought about adding a small C
function to regress.so to take out such a lock, but we have no
infrastructure for referencing regress.so from isolation tests.
What I ended up doing is adding a random failure about 0.1% of
the time in USE_ASSERT_CHECKING builds --- that's intellectually
ugly for sure, but doing better seems like way more work than
it's worth.

regards, tom lane

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index c0591cffe5..b1ce7bf513 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/heaptoast.h"
 #include "access/relscan.h"
 #include "access/sysattr.h"
@@ -24,8 +25,10 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
+#include "common/pg_prng.h"
 #include "miscadmin.h"
 #include "port/pg_bitutils.h"
+#include "storage/bufmgr.h"
 #ifdef CATCACHE_STATS
 #include "storage/ipc.h"		/* for on_proc_exit */
 #endif
@@ -90,10 +93,10 @@ static void CatCachePrintStats(int code, Datum arg);
 static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
 static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
 static void CatalogCacheInitializeCache(CatCache *cache);
-static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
+static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
+		HeapTuple ntp, SysScanDesc scandesc,
 		Datum *arguments,
-		uint32 hashValue, Index hashIndex,
-		bool negative);
+		uint32 hashValue, Index hashIndex);
 
 static void ReleaseCatCacheWithOwner(HeapTuple tuple, ResourceOwner resowner);
 static void ReleaseCatCacheListWithOwner(CatCList *list, ResourceOwner resowner);
@@ -1372,6 +1375,7 @@ SearchCatCacheMiss(CatCache *cache,
 	SysScanDesc scandesc;
 	HeapTuple	ntp;
 	CatCTup*ct;
+	bool		stale;
 	Datum		arguments[CATCACHE_MAXKEYS];
 
 	/* Initialize local parameter array */
@@ -1380,16 +1384,6 @@ SearchCatCacheMiss(CatCache *cache,
 	arguments[2] = v3;
 	arguments[3] = v4;
 
-	/*
-	 * Ok, need to make a lookup in the relation, copy the scankey and fill
-	 * out any per-call fields.
-	 */
-	memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-	cur_skey[0].sk_argument = v1;
-	cur_skey[1].sk_argument = v2;
-	cur_skey[2].sk_argument = v3;
-	cur_skey[3].sk_argument = v4;
-
 	/*
 	 * Tuple was not found in cache, so we have to try to retrieve it directly
 	 * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1398,28 @@ SearchCatCacheMiss(CatCache *cache,
 	 * will eventually age out of the cache, so there's no functional problem.
 	 * This case is rare enough that it's not worth expending extra cycles to
 	 * detect.
+	 *
+	 * Another case, which we *must* handle, is that the tuple could become
+	 * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+	 * AcceptInvalidationMessages can run during TOAST table access).  We do
+	 * not want to return 

Re: Recovering from detoast-related catcache invalidations

2024-01-11 Thread Xiaoran Wang
> Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
> than GetCatalogSnapshot is the right thing, because the catcaches
> use the latter.
Yes, you are right, should use GetCatalogSnapshot here.

> Maybe, but that undocumented hack in SetHintBits seems completely
> unacceptable.  Isn't there a cleaner way to make this check?
Maybe we don't need to call 'HeapTupleSatisfiesVisibility' to check if the
tuple has been deleted.
As the tuple's xmin must been committed, so we just need to check if its
xmax is committed,
like the below:


@@ -1956,9 +1956,11 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple
ntp, Datum *arguments,
 */
if (HeapTupleHasExternal(ntp))
{
+   TransactionId xmax;

dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
-   if (!HeapTupleSatisfiesVisibility(ntp,
GetNonHistoricCatalogSnapshot(cache->cc_reloid), InvalidBuffer))
+   xmax = HeapTupleHeaderGetUpdateXid(ntp->t_data);
+   if (TransactionIdIsValid(xmax) &&
TransactionIdDidCommit(xmax))
{
heap_freetuple(dtp);
return NULL;


I'm not quite sure the code is correct, I cannot clearly understand
'HeapTupleHeaderGetUpdateXid', and I need more time to dive into it.

Any thoughts?


Tom Lane  于2024年1月12日周五 06:21写道:

> Xiaoran Wang  writes:
> >>> The detection of "get an invalidation" could be refined: what I did
> >>> here is to check for any advance of SharedInvalidMessageCounter,
> >>> which clearly will have a significant number of false positives.
>
> > I have reviewed your patch, and it looks good.  But instead of checking
> for
> > any advance of SharedInvalidMessageCounter ( if the invalidate message is
> > not related to the current tuple, it is a little expensive)  I have
> another
> > idea:  we can recheck the visibility of the tuple with
> CatalogSnapshot(the
> > CatalogSnapthot must be refreshed if there is any SharedInvalidMessages)
> if
> > it is not visible, we re-fetch the tuple, otherwise, we can continue to
> use
> > it as it is not outdated.
>
> Maybe, but that undocumented hack in SetHintBits seems completely
> unacceptable.  Isn't there a cleaner way to make this check?
>
> Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
> than GetCatalogSnapshot is the right thing, because the catcaches
> use the latter.
>
> regards, tom lane
>


Re: Recovering from detoast-related catcache invalidations

2024-01-11 Thread Tom Lane
Xiaoran Wang  writes:
>>> The detection of "get an invalidation" could be refined: what I did
>>> here is to check for any advance of SharedInvalidMessageCounter,
>>> which clearly will have a significant number of false positives.

> I have reviewed your patch, and it looks good.  But instead of checking for
> any advance of SharedInvalidMessageCounter ( if the invalidate message is
> not related to the current tuple, it is a little expensive)  I have another
> idea:  we can recheck the visibility of the tuple with CatalogSnapshot(the
> CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
> it is not visible, we re-fetch the tuple, otherwise, we can continue to use
> it as it is not outdated.

Maybe, but that undocumented hack in SetHintBits seems completely
unacceptable.  Isn't there a cleaner way to make this check?

Also, I'm pretty dubious that GetNonHistoricCatalogSnapshot rather
than GetCatalogSnapshot is the right thing, because the catcaches
use the latter.

regards, tom lane




Re: Recovering from detoast-related catcache invalidations

2024-01-11 Thread Xiaoran Wang
Hi,


>> BTW, while nosing around I found what seems like a very nasty related
>> bug. Suppose that a catalog tuple being loaded into syscache contains
>> some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
>> involving fetches from toast tables that will certainly cause
>> AcceptInvalidationMessages calls. What if one of those should have
>> invalidated this tuple? We will not notice, because it's not in
>> the hashtable yet. When we do add it, we will mark it not-dead,
>> meaning that the stale entry looks fine and could persist for a long
>> while.
I spent some time trying to understand the bug and finally, I can reproduce
it locally with the following steps:

step1:
create a function called 'test' with a long body that must be stored in a
toast table.
and put it in schema 'yy' by : "alter function test set schema yy";

step 2:
I  added a breakpoint at  'toast_flatten_tuple'  for session1 ,
 then execute the following SQL:
--
set search_path='public';
alter function test set schema xx;
--
step 3:
when the session1 stops at the breakpoint, I open session2 and execute
---
set search_path = 'yy';
alter function test set schema public;
---
step4:
resume the session1 , it reports the error  "ERROR:  could not find a
function named "test""

step 5:
continue to execute "alter function test set schema xx;" in session1, but
it still can not work and report the above error although the function test
already belongs to schema 'public'

Obviously, in session 1, the "test"  proc tuple in the cache is outdated.

>> The detection of "get an invalidation" could be refined: what I did
>> here is to check for any advance of SharedInvalidMessageCounter,
>> which clearly will have a significant number of false positives.
>> However, the only way I see to make that a lot better is to
>> temporarily create a placeholder catcache entry (probably a negative
>> one) with the same keys, and then see if it got marked dead.
>> This seems a little expensive, plus I'm afraid that it'd be actively
>> wrong in the recursive-lookup cases that the existing comment in
>> SearchCatCacheMiss is talking about (that is, the catcache entry
>> might mislead any recursive lookup that happens).

I have reviewed your patch, and it looks good.  But instead of checking for
any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive)  I have another
idea:  we can recheck the visibility of the tuple with CatalogSnapshot(the
CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
it is not visible, we re-fetch the tuple, otherwise, we can continue to use
it as it is not outdated.

I added a commit based on your patch and attached it.


0001-Recheck-the-tuple-visibility.patch
Description: Binary data


Re: Recovering from detoast-related catcache invalidations

2023-11-17 Thread Tom Lane
I wrote:
> In bug #18163 [1], Alexander proved the misgivings I had in [2]
> about catcache detoasting being possibly unsafe:
> ...
> Attached is a POC patch for fixing this.

The cfbot pointed out that this needed a rebase.  No substantive
changes.

regards, tom lane

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 2e2e4d9f1f..2d72e8106c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1372,6 +1372,7 @@ SearchCatCacheMiss(CatCache *cache,
 	SysScanDesc scandesc;
 	HeapTuple	ntp;
 	CatCTup*ct;
+	bool		stale;
 	Datum		arguments[CATCACHE_MAXKEYS];
 
 	/* Initialize local parameter array */
@@ -1380,16 +1381,6 @@ SearchCatCacheMiss(CatCache *cache,
 	arguments[2] = v3;
 	arguments[3] = v4;
 
-	/*
-	 * Ok, need to make a lookup in the relation, copy the scankey and fill
-	 * out any per-call fields.
-	 */
-	memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-	cur_skey[0].sk_argument = v1;
-	cur_skey[1].sk_argument = v2;
-	cur_skey[2].sk_argument = v3;
-	cur_skey[3].sk_argument = v4;
-
 	/*
 	 * Tuple was not found in cache, so we have to try to retrieve it directly
 	 * from the relation.  If found, we will add it to the cache; if not
@@ -1404,9 +1395,28 @@ SearchCatCacheMiss(CatCache *cache,
 	 * will eventually age out of the cache, so there's no functional problem.
 	 * This case is rare enough that it's not worth expending extra cycles to
 	 * detect.
+	 *
+	 * Another case, which we *must* handle, is that the tuple could become
+	 * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+	 * AcceptInvalidationMessages can run during TOAST table access).  We do
+	 * not want to return already-stale catcache entries, so we loop around
+	 * and do the table scan again if that happens.
 	 */
 	relation = table_open(cache->cc_reloid, AccessShareLock);
 
+	do
+	{
+		/*
+		 * Ok, need to make a lookup in the relation, copy the scankey and
+		 * fill out any per-call fields.  (We must re-do this when retrying,
+		 * because systable_beginscan scribbles on the scankey.)
+		 */
+		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+		cur_skey[0].sk_argument = v1;
+		cur_skey[1].sk_argument = v2;
+		cur_skey[2].sk_argument = v3;
+		cur_skey[3].sk_argument = v4;
+
 	scandesc = systable_beginscan(relation,
   cache->cc_indexoid,
   IndexScanOK(cache, cur_skey),
@@ -1415,12 +1425,19 @@ SearchCatCacheMiss(CatCache *cache,
   cur_skey);
 
 	ct = NULL;
+	stale = false;
 
 	while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 	{
 		ct = CatalogCacheCreateEntry(cache, ntp, arguments,
 	 hashValue, hashIndex,
 	 false);
+		/* upon failure, we must start the scan over */
+		if (ct == NULL)
+		{
+			stale = true;
+			break;
+		}
 		/* immediately set the refcount to 1 */
 		ResourceOwnerEnlarge(CurrentResourceOwner);
 		ct->refcount++;
@@ -1429,6 +1446,7 @@ SearchCatCacheMiss(CatCache *cache,
 	}
 
 	systable_endscan(scandesc);
+	} while (stale);
 
 	table_close(relation, AccessShareLock);
 
@@ -1451,6 +1469,9 @@ SearchCatCacheMiss(CatCache *cache,
 	 hashValue, hashIndex,
 	 true);
 
+		/* Creating a negative cache entry shouldn't fail */
+		Assert(ct != NULL);
+
 		CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
    cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
 		CACHE_elog(DEBUG2, "SearchCatCache(%s): put neg entry in bucket %d",
@@ -1663,7 +1684,8 @@ SearchCatCacheList(CatCache *cache,
 	 * We have to bump the member refcounts temporarily to ensure they won't
 	 * get dropped from the cache while loading other members. We use a PG_TRY
 	 * block to ensure we can undo those refcounts if we get an error before
-	 * we finish constructing the CatCList.
+	 * we finish constructing the CatCList.  ctlist must be valid throughout
+	 * the PG_TRY block.
 	 */
 	ctlist = NIL;
 
@@ -1672,19 +1694,23 @@ SearchCatCacheList(CatCache *cache,
 		ScanKeyData cur_skey[CATCACHE_MAXKEYS];
 		Relation	relation;
 		SysScanDesc scandesc;
-
-		/*
-		 * Ok, need to make a lookup in the relation, copy the scankey and
-		 * fill out any per-call fields.
-		 */
-		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-		cur_skey[0].sk_argument = v1;
-		cur_skey[1].sk_argument = v2;
-		cur_skey[2].sk_argument = v3;
-		cur_skey[3].sk_argument = v4;
+		bool		stale;
 
 		relation = table_open(cache->cc_reloid, AccessShareLock);
 
+		do
+		{
+			/*
+			 * Ok, need to make a lookup in the relation, copy the scankey and
+			 * fill out any per-call fields.  (We must re-do this when
+			 * retrying, because systable_beginscan scribbles on the scankey.)
+			 */
+			memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+			cur_skey[0].sk_argument = v1;
+			cur_skey[1].sk_argument = v2;
+			cur_skey[2].sk_argument = v3;
+			cur_skey[3].sk_argument = v4;
+
 		scandesc = 

Recovering from detoast-related catcache invalidations

2023-10-26 Thread Tom Lane
In bug #18163 [1], Alexander proved the misgivings I had in [2]
about catcache detoasting being possibly unsafe:

>> BTW, while nosing around I found what seems like a very nasty related
>> bug.  Suppose that a catalog tuple being loaded into syscache contains
>> some toasted fields.  CatalogCacheCreateEntry will flatten the tuple,
>> involving fetches from toast tables that will certainly cause
>> AcceptInvalidationMessages calls.  What if one of those should have
>> invalidated this tuple?  We will not notice, because it's not in
>> the hashtable yet.  When we do add it, we will mark it not-dead,
>> meaning that the stale entry looks fine and could persist for a long
>> while.

Attached is a POC patch for fixing this.  The idea is that if we get
an invalidation while trying to detoast a catalog tuple, we should
go around and re-read the tuple a second time to get an up-to-date
version (or, possibly, discover that it no longer exists).  In the
case of SearchCatCacheList, we have to drop and reload the entire
catcache list, but fortunately not a lot of new code is needed.

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.
However, the only way I see to make that a lot better is to
temporarily create a placeholder catcache entry (probably a negative
one) with the same keys, and then see if it got marked dead.
This seems a little expensive, plus I'm afraid that it'd be actively
wrong in the recursive-lookup cases that the existing comment in
SearchCatCacheMiss is talking about (that is, the catcache entry
might mislead any recursive lookup that happens).

Moreover, if we did do something like that then the new code
paths would be essentially untested.  As the patch stands, the
reload path seems to get taken 10 to 20 times during a
"make installcheck-parallel" run of the core regression tests
(out of about 150 times that catcache detoasting is required).
Probably all of those are false-positive cases, but at least
they're exercising the logic.

So I'm inclined to leave it like this, but perhaps somebody
else will have a different opinion.

(BTW, there's a fair amount of existing catcache.c code that
will need to be indented another tab stop, but in the interests
of keeping the patch legible I didn't do that yet.)

Comments?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/18163-859bad19a43edcf6%40postgresql.org
[2] https://www.postgresql.org/message-id/1389919.1697144487%40sss.pgh.pa.us

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 1aacb736c2..8255162a72 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1319,6 +1319,7 @@ SearchCatCacheMiss(CatCache *cache,
 	SysScanDesc scandesc;
 	HeapTuple	ntp;
 	CatCTup*ct;
+	bool		stale;
 	Datum		arguments[CATCACHE_MAXKEYS];
 
 	/* Initialize local parameter array */
@@ -1327,16 +1328,6 @@ SearchCatCacheMiss(CatCache *cache,
 	arguments[2] = v3;
 	arguments[3] = v4;
 
-	/*
-	 * Ok, need to make a lookup in the relation, copy the scankey and fill
-	 * out any per-call fields.
-	 */
-	memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-	cur_skey[0].sk_argument = v1;
-	cur_skey[1].sk_argument = v2;
-	cur_skey[2].sk_argument = v3;
-	cur_skey[3].sk_argument = v4;
-
 	/*
 	 * Tuple was not found in cache, so we have to try to retrieve it directly
 	 * from the relation.  If found, we will add it to the cache; if not
@@ -1351,9 +1342,28 @@ SearchCatCacheMiss(CatCache *cache,
 	 * will eventually age out of the cache, so there's no functional problem.
 	 * This case is rare enough that it's not worth expending extra cycles to
 	 * detect.
+	 *
+	 * Another case, which we *must* handle, is that the tuple could become
+	 * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+	 * AcceptInvalidationMessages can run during TOAST table access).  We do
+	 * not want to return already-stale catcache entries, so we loop around
+	 * and do the table scan again if that happens.
 	 */
 	relation = table_open(cache->cc_reloid, AccessShareLock);
 
+	do
+	{
+		/*
+		 * Ok, need to make a lookup in the relation, copy the scankey and
+		 * fill out any per-call fields.  (We must re-do this when retrying,
+		 * because systable_beginscan scribbles on the scankey.)
+		 */
+		memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+		cur_skey[0].sk_argument = v1;
+		cur_skey[1].sk_argument = v2;
+		cur_skey[2].sk_argument = v3;
+		cur_skey[3].sk_argument = v4;
+
 	scandesc = systable_beginscan(relation,
   cache->cc_indexoid,
   IndexScanOK(cache, cur_skey),
@@ -1362,12 +1372,19 @@ SearchCatCacheMiss(CatCache *cache,
   cur_skey);
 
 	ct = NULL;
+	stale = false;
 
 	while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
 	{
 		ct =