old_snapshot_threshold bottleneck on replica

2023-01-23 Thread Maxim Orlov
Hi!

One of our customers stumble onto a significant performance degradation
while running multiple OLAP-like queries on a replica.
After some investigation, it became clear that the problem is in accessing
old_snapshot_threshold parameter.

Accessing old_snapshot_threshold parameter is guarded by mutex_threshold.
This is not a problem on primary
server, since we rarely call GetOldSnapshotThresholdTimestamp:

5028 void
5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
5030 {
5031 if (RelationAllowsEarlyPruning(relation)
5032 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
5033 ereport(ERROR,
5034 (errcode(ERRCODE_SNAPSHOT_TOO_OLD),
5035  errmsg("snapshot too old")));

But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp
much often. So, this become a
bottleneck. The customer solve this issue by setting old_snapshot_threshold
to 0. But, I think, we can
do something about it.

Some more investigation:

-- On primary --
$ ./bin/psql postgres -c "create database benchmark"
CREATE DATABASE
$ ./bin/pgbench -i -Uorlov -s300 benchmark
dropping old tables...
NOTICE:  table "pgbench_accounts" does not exist, skipping
...
creating tables...
generating data (client-side)...
3000 of 3000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s).

-- On secondary --
$ touch 1.sql
$ vim 1.sql
$ cat 1.sql
\set bid random(1, 300)
BEGIN;
SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid;
END;
$ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark
pgbench (16devel)
progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
...
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed

$ perf record -F 99 -a -g --call-graph=dwarf sleep 5
$ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file
$ grep s_lock file | wc -l

3486


My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA
0001 patch.
With patch 0001 we got:

$ grep s_lock file2 | wc -l
8


Maybe, we shall go farther and remove mutex_threshold here? This will lead
to inconsistency of
threshold_timestamp and threshold_xid, but is this really a problem?

Thoughts?

-- 
Best regards,
Maxim Orlov.


0001-PGPRO-7624-use-atomic-old_snapshot_threshold.patch
Description: Binary data


Re: old_snapshot_threshold bottleneck on replica

2023-01-24 Thread Pavel Borisov
Hi, Maxim!

On Mon, 23 Jan 2023 at 18:40, Maxim Orlov  wrote:
>
> Hi!
>
> One of our customers stumble onto a significant performance degradation while 
> running multiple OLAP-like queries on a replica.
> After some investigation, it became clear that the problem is in accessing 
> old_snapshot_threshold parameter.
>
> Accessing old_snapshot_threshold parameter is guarded by mutex_threshold. 
> This is not a problem on primary
> server, since we rarely call GetOldSnapshotThresholdTimestamp:
>
> 5028 void
> 5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
> 5030 {
> 5031 if (RelationAllowsEarlyPruning(relation)
> 5032 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
> 5033 ereport(ERROR,
> 5034 (errcode(ERRCODE_SNAPSHOT_TOO_OLD),
> 5035  errmsg("snapshot too old")));
>
> But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp 
> much often. So, this become a
> bottleneck. The customer solve this issue by setting old_snapshot_threshold 
> to 0. But, I think, we can
> do something about it.
>
> Some more investigation:
>
> -- On primary --
> $ ./bin/psql postgres -c "create database benchmark"
> CREATE DATABASE
> $ ./bin/pgbench -i -Uorlov -s300 benchmark
> dropping old tables...
> NOTICE:  table "pgbench_accounts" does not exist, skipping
> ...
> creating tables...
> generating data (client-side)...
> 3000 of 3000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s)
> vacuuming...
> creating primary keys...
> done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side 
> generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s).
>
> -- On secondary --
> $ touch 1.sql
> $ vim 1.sql
> $ cat 1.sql
> \set bid random(1, 300)
> BEGIN;
> SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid;
> END;
> $ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark
> pgbench (16devel)
> progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
> ...
> progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
>
> $ perf record -F 99 -a -g --call-graph=dwarf sleep 5
> $ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file
> $ grep s_lock file | wc -l
>
> 3486
>
>
> My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 
> 0001 patch.
> With patch 0001 we got:
>
> $ grep s_lock file2 | wc -l
> 8
>
>
> Maybe, we shall go farther and remove mutex_threshold here? This will lead to 
> inconsistency of
> threshold_timestamp and threshold_xid, but is this really a problem?
>
> Thoughts?

I think optimizing locking and switching to atomics wherever it
improves performance is a good direction. If performance improvement
could be demonstrated in a more direct way it would be a good argument
to commit the improvement. Personally I like TPS plots like in [1].

[1] 
https://www.postgresql.org/message-id/CALT9ZEHSX1Hpz5xjDA62yHAHtpinkA6hg8Zt-odyxqppmKbQFA%40mail.gmail.com

Kind regards,
Pavel Borisov,
Supabase




Re: old_snapshot_threshold bottleneck on replica

2023-01-24 Thread Robert Haas
On Mon, Jan 23, 2023 at 9:40 AM Maxim Orlov  wrote:
> One of our customers stumble onto a significant performance degradation while 
> running multiple OLAP-like queries on a replica.
> After some investigation, it became clear that the problem is in accessing 
> old_snapshot_threshold parameter.

It has been suggested that we remove that feature entirely.

> My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 
> 0001 patch.

This patch changes threshold_timestamp and threshold_xid to use
atomics, but it does not remove mutex_threshold which, according to a
quick glance at the comments, protects exactly those two fields. So,
either:

(1) that mutex also protects something else and the existing comment
is wrong, or

(2) the mutex should have been removed but the patch neglected to do so, or

(3) the mutex is still needed for some reason, in which case either
(3a) the patch isn't actually safe or (3b) the patch needs comments to
explain what the new synchronization model is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: old_snapshot_threshold bottleneck on replica

2023-01-25 Thread Maxim Orlov
On Tue, 24 Jan 2023 at 18:46, Robert Haas  wrote:

>
> (1) that mutex also protects something else and the existing comment
> is wrong, or
>
> (2) the mutex should have been removed but the patch neglected to do so, or
>
> (3) the mutex is still needed for some reason, in which case either
> (3a) the patch isn't actually safe or (3b) the patch needs comments to
> explain what the new synchronization model is.
>
> Yes, you're absolutely right. And my first intention was to remove this
mutex completely.
But in TransactionIdLimitedForOldSnapshots these variable is using
conjointly. So, I'm not
sure, is it completely safe to remove mutex. Actually, removing mutex and
switch to atomics
was my first choice. I've run all the tests and no problems were found.
But, at that time I choose
to be more conservative. Anyway, here is the new variant.

-- 
Best regards,
Maxim Orlov.


v2-0001-Use-atomic-old_snapshot_threshold.patch
Description: Binary data


Re: old_snapshot_threshold bottleneck on replica

2023-01-25 Thread Robert Haas
On Wed, Jan 25, 2023 at 3:52 AM Maxim Orlov  wrote:
> But in TransactionIdLimitedForOldSnapshots these variable is using 
> conjointly. So, I'm not
> sure, is it completely safe to remove mutex.

Well, that's something we - and ideally you, as the patch author -
need to analyze and figure out. We can't just take a shot and hope for
the best.

> Actually, removing mutex and switch to atomics
> was my first choice. I've run all the tests and no problems were found

Unfortunately, that kind of testing is not very likely to find a
subtle synchronization problem. That's why a theoretical analysis is
so important.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: old_snapshot_threshold bottleneck on replica

2023-01-27 Thread Maxim Orlov
On Wed, 25 Jan 2023 at 16:52, Robert Haas  wrote:

> On Wed, Jan 25, 2023 at 3:52 AM Maxim Orlov  wrote:
>
> Well, that's something we - and ideally you, as the patch author -
> need to analyze and figure out. We can't just take a shot and hope for
> the best.
>

I thank you for your advices. I've dived deeper into the problem and I
think v2 patch is wrong.
Accessing threshold_timestamp and threshold_xid in
TransactionIdLimitedForOldSnapshots
without lock would lead to an improper xlimit calculation.

So, my choice would be (3b). My goal is to optimize access to the
threshold_timestamp to avoid
multiple spinlock acquisition on read. In the same time, simultaneous
access to these variable
(threshold_timestamp and threshold_xid) should be protected with spinlock.

I remove atomic for threshold_xid and add comments on mutex_threshold. PFA,
v3. I

-- 
Best regards,
Maxim Orlov.


v3-0001-Use-atomic-old_snapshot_threshold.patch
Description: Binary data


Re: old_snapshot_threshold bottleneck on replica

2023-01-27 Thread Robert Haas
On Fri, Jan 27, 2023 at 9:30 AM Maxim Orlov  wrote:
> I thank you for your advices. I've dived deeper into the problem and I think 
> v2 patch is wrong.

Cool!

> Accessing threshold_timestamp and threshold_xid in 
> TransactionIdLimitedForOldSnapshots
> without lock would lead to an improper xlimit calculation.

That would be a bummer.

> So, my choice would be (3b). My goal is to optimize access to the 
> threshold_timestamp to avoid
> multiple spinlock acquisition on read. In the same time, simultaneous access 
> to these variable
> (threshold_timestamp and threshold_xid) should be protected with spinlock.
>
> I remove atomic for threshold_xid and add comments on mutex_threshold. PFA, 
> v3. I

Interesting, but it's still not entirely clear to me from reading the
comments why we should think that this is safe.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: old_snapshot_threshold bottleneck on replica

2023-01-30 Thread Maxim Orlov
On Fri, 27 Jan 2023 at 18:18, Robert Haas  wrote:

>
> Interesting, but it's still not entirely clear to me from reading the
> comments why we should think that this is safe.
>

In overall, I think this is safe, because we do not change algorithm here.
More specific, threshold_timestamp have only used in a few cases:
1). To get the latest value by calling GetOldSnapshotThresholdTimestamp.
This will work, since we only change the sync type here from the spinlock
to an atomic.
2). In TransactionIdLimitedForOldSnapshots, but here no changes in the
behaviour will be done. Sync model will be the save as before the patch.
3). In SnapshotTooOldMagicForTest, which is a stub to make
old_snapshot_threshold tests appear "working". But no coherence with the
threshold_xid here.

So, we have a two use cases for the threshold_timestamp:
a). When the threshold_timestamp is used in conjunction with the
threshold_xid. We must use spinlock to sync.
b). When the threshold_timestamp is used without conjunction with the
threshold_xid. In this case, we use atomic values.

-- 
Best regards,
Maxim Orlov.


Re: old_snapshot_threshold bottleneck on replica

2023-02-13 Thread Andres Freund
Hi,

On 2023-01-24 10:46:28 -0500, Robert Haas wrote:
> On Mon, Jan 23, 2023 at 9:40 AM Maxim Orlov  wrote:
> > One of our customers stumble onto a significant performance degradation 
> > while running multiple OLAP-like queries on a replica.
> > After some investigation, it became clear that the problem is in accessing 
> > old_snapshot_threshold parameter.
>
> It has been suggested that we remove that feature entirely.

Indeed. There's a lot of things wrong with it. We have reproducers for
creating wrong query results. Nobody has shown interest in fixing the
problems, for several years by now. It costs users that *do not* use the
feature performance (*).

I think we're doing our users a disservice by claiming to have this feature.

I don't think a lot of the existing code would survive if we were to create a
newer version, more maintainable / reliable, version of the feature.

Greetings,

Andres Freund

(*) E.g. TestForOldSnapshot() is called in a good number of places, and emits
quite a bit of code. It's not executed, but the emitted code is large
enough to lead to worse code being generated.




Re: old_snapshot_threshold bottleneck on replica

2023-09-05 Thread Thomas Munro
On Wed, Jun 14, 2023 at 3:56 PM Thomas Munro  wrote:
> On Tue, Feb 14, 2023 at 9:45 AM Andres Freund  wrote:
> > Indeed. There's a lot of things wrong with it. We have reproducers for
> > creating wrong query results. Nobody has shown interest in fixing the
> > problems, for several years by now. It costs users that *do not* use the
> > feature performance (*).
> >
> > I think we're doing our users a disservice by claiming to have this feature.
> >
> > I don't think a lot of the existing code would survive if we were to create 
> > a
> > newer version, more maintainable / reliable, version of the feature.
>
> I raised this at the recent developer meeting and the assembled
> hackers agreed.  Does anyone think we *shouldn't* drop the feature?  I
> volunteered to write a removal patch for v17, so here's a first run
> through to find all the traces of this feature.  In this first go I
> removed everything I could think of, but we might want to keep some
> vestiges.  I guess we might want to keep the registered error
> class/code?  Should we invent a place where we keep stuff like #define
> TestForOldSnapshot(...) expanding to nothing for some amount of time,
> for extensions?  I dunno, I bet extensions doing stuff that
> sophisticated already have a bunch of version tests anyway.  I suppose
> keeping the GUC wouldn't really be helpful (if you're using it, you
> probably want to know that it isn't available anymore and think about
> the implications for your application).

Done.

I hope we get "snapshot too old" back one day.




Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Peter Geoghegan
On Tue, Sep 5, 2023 at 12:58 AM Thomas Munro  wrote:
> I hope we get "snapshot too old" back one day.

Thanks for working on this. Though I wonder why you didn't do
something closer to a straight revert of the feature. Why is nbtree
still passing around snapshots needlessly?

Also, why are there still many comments referencing the feature?
There's the one above should_attempt_truncation(), for example.
Another appears above init_toast_snapshot(). Are these just
oversights, or was it deliberate? You said something about retaining
vestiges.

-- 
Peter Geoghegan




Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Thomas Munro
On Fri, Sep 8, 2023 at 1:53 PM Peter Geoghegan  wrote:
> On Tue, Sep 5, 2023 at 12:58 AM Thomas Munro  wrote:
> > I hope we get "snapshot too old" back one day.
>
> Thanks for working on this. Though I wonder why you didn't do
> something closer to a straight revert of the feature. Why is nbtree
> still passing around snapshots needlessly?
>
> Also, why are there still many comments referencing the feature?
> There's the one above should_attempt_truncation(), for example.
> Another appears above init_toast_snapshot(). Are these just
> oversights, or was it deliberate? You said something about retaining
> vestiges.

Oh.  Not intentional.  Looking now...




Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Thomas Munro
On Fri, Sep 8, 2023 at 2:00 PM Thomas Munro  wrote:
> On Fri, Sep 8, 2023 at 1:53 PM Peter Geoghegan  wrote:
> > Thanks for working on this. Though I wonder why you didn't do
> > something closer to a straight revert of the feature. Why is nbtree
> > still passing around snapshots needlessly?

The code moved around quite a few times over several commits and quite
a lot since then, which is why I didn't go for straight revert, but
clearly the manual approach risked missing things.  I think the
attached removes all unused 'snapshot' arguments from AM-internal
functions.  Checked by compiling with clang's -Wunused-parameters, and
then searching for 'snapshot', and excluding the expected cases.

> > Also, why are there still many comments referencing the feature?
> > There's the one above should_attempt_truncation(), for example.
> > Another appears above init_toast_snapshot(). Are these just
> > oversights, or was it deliberate? You said something about retaining
> > vestiges.

Stray comments removed.
From 7d57de24c5a4e9ab3dacca9231a9893b909439f6 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 8 Sep 2023 14:29:13 +1200
Subject: [PATCH] Remove some more "snapshot too old" vestiges.

Commit f691f5b8 removed the logic, but left behind some now-useless
Snapshot arguments to various AM-internal functions, and missed a couple
of comments.

Reported-by: Peter Geoghegan 
Discussion: https://postgr.es/m/CAH2-Wznj9qSNXZ1P1uWTUD_FeaTezbUazb416EPwi4Qr_jR_6A%40mail.gmail.com

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 94a9759322..dbb83d80f8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -2694,7 +2694,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup)
 	 */
 	Assert(state->readonly && state->rootdescend);
 	exists = false;
-	stack = _bt_search(state->rel, NULL, key, &lbuf, BT_READ, NULL);
+	stack = _bt_search(state->rel, NULL, key, &lbuf, BT_READ);
 
 	if (BufferIsValid(lbuf))
 	{
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index d4fec654bb..a7538f32c2 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -169,7 +169,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	MemoryContext oldcxt = CurrentMemoryContext;
 	bool		autosummarize = BrinGetAutoSummarize(idxRel);
 
-	revmap = brinRevmapInitialize(idxRel, &pagesPerRange, NULL);
+	revmap = brinRevmapInitialize(idxRel, &pagesPerRange);
 
 	/*
 	 * origHeapBlk is the block number where the insertion occurred.  heapBlk
@@ -202,7 +202,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 
 			lastPageTuple =
 brinGetTupleForHeapBlock(revmap, lastPageRange, &buf, &off,
-		 NULL, BUFFER_LOCK_SHARE, NULL);
+		 NULL, BUFFER_LOCK_SHARE);
 			if (!lastPageTuple)
 			{
 bool		recorded;
@@ -222,7 +222,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		}
 
 		brtup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off,
-		 NULL, BUFFER_LOCK_SHARE, NULL);
+		 NULL, BUFFER_LOCK_SHARE);
 
 		/* if range is unsummarized, there's nothing to do */
 		if (!brtup)
@@ -332,8 +332,7 @@ brinbeginscan(Relation r, int nkeys, int norderbys)
 	scan = RelationGetIndexScan(r, nkeys, norderbys);
 
 	opaque = palloc_object(BrinOpaque);
-	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange,
-			   scan->xs_snapshot);
+	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange);
 	opaque->bo_bdesc = brin_build_desc(r);
 	scan->opaque = opaque;
 
@@ -537,8 +536,7 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 		MemoryContextResetAndDeleteChildren(perRangeCxt);
 
 		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
-	   &off, &size, BUFFER_LOCK_SHARE,
-	   scan->xs_snapshot);
+	   &off, &size, BUFFER_LOCK_SHARE);
 		if (tup)
 		{
 			gottuple = true;
@@ -880,7 +878,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	/*
 	 * Initialize our state, including the deformed tuple state.
 	 */
-	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
+	revmap = brinRevmapInitialize(index, &pagesPerRange);
 	state = initialize_brin_buildstate(index, revmap, pagesPerRange);
 
 	/*
@@ -1458,8 +1456,7 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 		 * the same.)
 		 */
 		phtup = brinGetTupleForHeapBlock(state->bs_rmAccess, heapBlk, &phbuf,
-		 &offset, &phsz, BUFFER_LOCK_SHARE,
-		 NULL);
+		 &offset, &phsz, BUFFER_LOCK_SHARE);
 		/* the placeholder tuple must exist */
 		if (phtup == NULL)
 			elog(ERROR, "missing placeholder tuple");
@@ -1496,7 +1493,7 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 	Buffer		buf;
 	BlockNumber startBlk;
 
-	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
+	revmap = brinRevmapInitialize(index, &pagesPerRange);
 
 	/* determine range of pages to 

Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Peter Geoghegan
On Thu, Sep 7, 2023 at 8:49 PM Thomas Munro  wrote:
> The code moved around quite a few times over several commits and quite
> a lot since then, which is why I didn't go for straight revert, but
> clearly the manual approach risked missing things.

It's not a big deal, obviously.

> I think the
> attached removes all unused 'snapshot' arguments from AM-internal
> functions.  Checked by compiling with clang's -Wunused-parameters, and
> then searching for 'snapshot', and excluding the expected cases.

This looks right to me.

Thanks
-- 
Peter Geoghegan




Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Thomas Munro
On Fri, Sep 8, 2023 at 4:22 PM Peter Geoghegan  wrote:
> This looks right to me.

Thanks, pushed.




Re: old_snapshot_threshold bottleneck on replica

2023-06-13 Thread Thomas Munro
On Tue, Feb 14, 2023 at 9:45 AM Andres Freund  wrote:
> On 2023-01-24 10:46:28 -0500, Robert Haas wrote:
> > On Mon, Jan 23, 2023 at 9:40 AM Maxim Orlov  wrote:
> > > One of our customers stumble onto a significant performance degradation 
> > > while running multiple OLAP-like queries on a replica.
> > > After some investigation, it became clear that the problem is in 
> > > accessing old_snapshot_threshold parameter.
> >
> > It has been suggested that we remove that feature entirely.
>
> Indeed. There's a lot of things wrong with it. We have reproducers for
> creating wrong query results. Nobody has shown interest in fixing the
> problems, for several years by now. It costs users that *do not* use the
> feature performance (*).
>
> I think we're doing our users a disservice by claiming to have this feature.
>
> I don't think a lot of the existing code would survive if we were to create a
> newer version, more maintainable / reliable, version of the feature.

I raised this at the recent developer meeting and the assembled
hackers agreed.  Does anyone think we *shouldn't* drop the feature?  I
volunteered to write a removal patch for v17, so here's a first run
through to find all the traces of this feature.  In this first go I
removed everything I could think of, but we might want to keep some
vestiges.  I guess we might want to keep the registered error
class/code?  Should we invent a place where we keep stuff like #define
TestForOldSnapshot(...) expanding to nothing for some amount of time,
for extensions?  I dunno, I bet extensions doing stuff that
sophisticated already have a bunch of version tests anyway.  I suppose
keeping the GUC wouldn't really be helpful (if you're using it, you
probably want to know that it isn't available anymore and think about
the implications for your application).
From 3e2b3f3b20ce83737f3421ab3c7794853724266d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 14 Jun 2023 13:41:37 +1200
Subject: [PATCH] Remove old_snapshot_threshold.

Remove the old_snapshot_threshold feature, better known for its error
"snapshot too old".  Unfortunately it had a number of problems, and we
agreed to remove it.  There is no doubt that it was useful, and
someone might propose a new implementation in the future.

XXX draft, may contain nuts

Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com
---
 contrib/Makefile  |   1 -
 contrib/bloom/blscan.c|   1 -
 contrib/meson.build   |   1 -
 contrib/old_snapshot/Makefile |  21 -
 contrib/old_snapshot/meson.build  |  23 -
 contrib/old_snapshot/old_snapshot--1.0.sql|  14 -
 contrib/old_snapshot/old_snapshot.control |   5 -
 contrib/old_snapshot/time_mapping.c   | 142 --
 doc/src/sgml/config.sgml  |  69 ---
 doc/src/sgml/contrib.sgml |   1 -
 doc/src/sgml/filelist.sgml|   1 -
 doc/src/sgml/monitoring.sgml  |   4 -
 doc/src/sgml/oldsnapshot.sgml |  33 --
 src/backend/access/brin/brin_revmap.c |   7 -
 src/backend/access/gin/ginbtree.c |   2 -
 src/backend/access/gin/ginget.c   |   4 -
 src/backend/access/gist/gistget.c |   1 -
 src/backend/access/hash/hashsearch.c  |   6 -
 src/backend/access/heap/heapam.c  |   9 -
 src/backend/access/heap/pruneheap.c   | 120 +
 src/backend/access/heap/vacuumlazy.c  |   5 +-
 src/backend/access/nbtree/nbtsearch.c |   9 -
 src/backend/access/spgist/spgscan.c   |   1 -
 src/backend/catalog/index.c   |  20 +-
 src/backend/commands/vacuum.c |  19 -
 src/backend/storage/buffer/bufmgr.c   |  17 -
 src/backend/storage/ipc/ipci.c|   2 -
 src/backend/storage/ipc/procarray.c   |  36 +-
 src/backend/storage/lmgr/lwlocknames.txt  |   2 +-
 src/backend/utils/errcodes.txt|   4 -
 src/backend/utils/misc/guc_tables.c   |  11 -
 src/backend/utils/misc/postgresql.conf.sample |   2 -
 src/backend/utils/time/snapmgr.c  | 468 --
 src/include/access/heapam.h   |   2 -
 src/include/storage/bufmgr.h  |  36 --
 src/include/utils/old_snapshot.h  |  75 ---
 src/include/utils/snapmgr.h   |  49 --
 src/test/modules/Makefile |   1 -
 src/test/modules/meson.build  |   1 -
 src/test/modules/snapshot_too_old/.gitignore  |   3 -
 src/test/modules/snapshot_too_old/Makefile|  28 --
 .../expected/sto_using_cursor.out |  19 -
 .../expected/sto_using_hash_index.out |  19 -
 .../expected/sto_using_select.out |  18 -
 src/test/modules/snapshot_too_old/meson.build |  19 -
 .../specs/sto_using_cursor.spec   |  38 --
 .../specs/sto_using_has