Re: Adding REPACK [concurrently]

Alvaro Herrera Fri, 08 May 2026 06:58:41 -0700

On 2026-May-08, Amit Kapila wrote:

> On Wed, May 6, 2026 at 1:55 PM Antonin Houska <[email protected]> wrote:


> > One idea occurred to me yet, effectively it's just a cleanup. Part of it was
> > already proposed [1].

Hmm, I think this cleanup makes sense.  If I apply the test patches
(0001 and 0002 here), they fail almost immediately; but after applying
0003 all is again well.  I think these tests are a good thing to have in
the tree, even if we end up reverting db-specific snapshots later.

After some back and forth, I modified the tests slightly so that
the search PG_TEST_EXTRA for the string "stress_concurrently=N".  The N
can be changed so that the tests run for longer; if not given, it's
taken as 1, and the tests run for around 6 seconds (so N=10 means runs
for a minute).  I think this is a convenient gadget for other tests of
this kind on CONCURRENTLY commands, such as the one proposed for CIC
elsewhere.

As written, these tests would run nowhere until we add that string in
some buildfarm animal.  I debated with myself whether to assume N=1 when
the string is not given.  I still think that's a good idea but perhaps
we should have something to prevent it from running by default when
under Valgrind or other slow things like that.  In normal conditions,
the total runtime is not affected when they are run with N=1 as part of
the whole test suite.

> Some issues/inefficiencies regarding this fix and base code related to
> db-specific snapshots built during decoding: [...]

Thanks for spending time reviewing this code.  I think none of these
problems are fundamental in nature, but they are obviously worth
addressing for 19.  If we hit some roadblock, we can still revert only
db-specific snapshots.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"

>From 5c3795e7d2971d91de270284c74c97b961966206 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 13 Dec 2025 18:13:37 +0100
Subject: [PATCH 1/3] stress test for repack concurrently

---
 contrib/amcheck/meson.build       |   1 +
 contrib/amcheck/t/007_repack_1.pl | 121 ++++++++++++++++++++++++++++++
 doc/src/sgml/regress.sgml         |  16 ++++
 3 files changed, 138 insertions(+)
 create mode 100644 contrib/amcheck/t/007_repack_1.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index d5137ef691d..cb4bc32e98a 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -50,6 +50,7 @@ tests += {
       't/004_verify_nbtree_unique.pl',
       't/005_pitr.pl',
       't/006_verify_gin.pl',
+      't/007_repack_1.pl',
     ],
   },
 }
diff --git a/contrib/amcheck/t/007_repack_1.pl b/contrib/amcheck/t/007_repack_1.pl
new file mode 100644
index 00000000000..17f3caa3a38
--- /dev/null
+++ b/contrib/amcheck/t/007_repack_1.pl
@@ -0,0 +1,121 @@
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+# PG_TEST_EXTRA carries a numerical stress value; 0 disables the test,
+# 1 means the test takes about 6 seconds, and the increase should be
+# roughly linear.
+my $matched = ($ENV{PG_TEST_EXTRA} // '') =~ /\bstress_concurrently(?:=(?<stressval>[0-9]*))?\b/;
+my $stressval = $+{stressval} // 1;
+if (!$matched or $stressval == 0)
+{
+	plan skip_all => 'skipping disabled REPACK CONCURRENTLY stress test';
+}
+
+note "stressval is $stressval";
+
+my $node;
+
+# Test set-up
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+
+my $nrows = 10_000 * $stressval;
+my $duration = 6 * $stressval;
+my $no_hot = int(rand(2));
+
+$node->start;
+$node->safe_psql('postgres', q(CREATE TABLE tbl(id int PRIMARY KEY, val int)));
+
+if ($no_hot)
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(val);));
+}
+else
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(id);));
+}
+
+
+# Load amcheck
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Insert $nrows rows into tbl
+$node->safe_psql('postgres', qq(
+	INSERT INTO tbl SELECT g, g FROM generate_series(1, $nrows) g
+));
+
+my $sum = $node->safe_psql('postgres', q(
+	SELECT SUM(val) AS sum FROM tbl
+));
+
+
+$node->pgbench(
+"--no-vacuum --client=30 --jobs=4 --exit-on-abort -T $duration",
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+{
+	'concurrent_ops' => qq(
+		SELECT pg_try_advisory_lock(42)::integer AS gotlock \\gset
+		\\if :gotlock
+			REPACK (CONCURRENTLY) tbl USING INDEX tbl_pkey;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl USING INDEX test_idx;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			SELECT pg_advisory_unlock(42);
+		\\else
+			\\set num_a random(1, $nrows)
+			\\set num_b random(1, $nrows)
+			\\set diff random(1, 10000)
+			BEGIN;
+			UPDATE tbl SET val = val + :diff WHERE id = :num_a;
+			\\sleep 1 ms
+			UPDATE tbl SET val = val - :diff WHERE id = :num_b;
+			\\sleep 1 ms
+			COMMIT;
+
+			BEGIN
+			--TRANSACTION ISOLATION LEVEL REPEATABLE READ
+			;
+			SELECT 1;
+			\\sleep 1 ms
+			SELECT COALESCE(SUM(val), 0) AS sum FROM tbl \\gset p_
+			\\if :p_sum != $sum
+				COMMIT;
+				SELECT (:p_sum) / 0;
+			\\endif
+
+			COMMIT;
+		\\endif
+	)
+});
+
+$node->stop;
+
+done_testing();
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index c74941bfbf2..c5b1b79a60c 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -386,6 +386,22 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>stress_concurrently</literal><optional>=VALUE</optional></term>
+     <listitem>
+      <para>
+       Run some additional, moderately expensive tests for commands with a
+       <literal>CONCURRENTLY</literal> option.  Optionally, an integer can be
+       given after an equal sign, which affects how long each such test
+       runs for.
+       Each test is calibrated so that, when given a value of 10,
+       it runs for approximately 60 seconds.
+       If no value is given, 1 is assumed.
+       Explicitly giving a value of 0 causes the test to be skipped.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><literal>wal_consistency_checking</literal></term>
      <listitem>
-- 
2.47.3

>From 6b00fa00f91af2e6a5bf613242158f19e263cc17 Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Sat, 13 Dec 2025 18:46:46 +0100
Subject: [PATCH 2/3] one more stress test for repack concurrently

---
 contrib/amcheck/meson.build       |   1 +
 contrib/amcheck/t/008_repack_2.pl | 111 ++++++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)
 create mode 100644 contrib/amcheck/t/008_repack_2.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index cb4bc32e98a..6799df214f9 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -51,6 +51,7 @@ tests += {
       't/005_pitr.pl',
       't/006_verify_gin.pl',
       't/007_repack_1.pl',
+      't/008_repack_2.pl',
     ],
   },
 }
diff --git a/contrib/amcheck/t/008_repack_2.pl b/contrib/amcheck/t/008_repack_2.pl
new file mode 100644
index 00000000000..a287ee2bb71
--- /dev/null
+++ b/contrib/amcheck/t/008_repack_2.pl
@@ -0,0 +1,111 @@
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+# PG_TEST_EXTRA carries a numerical stress value; 0 disables the test,
+# 1 means the test takes about 6 seconds, and the increase should be
+# roughly linear.
+my $matched = ($ENV{PG_TEST_EXTRA} // '') =~ /\bstress_concurrently(?:=(?<stressval>[0-9]*))?\b/;
+my $stressval = $+{stressval} // 1;
+if (!$matched or $stressval == 0)
+{
+	plan skip_all => 'skipping disabled REPACK CONCURRENTLY stress test';
+}
+
+my $node;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+
+my $duration = 6 * $stressval;
+my $no_hot = int(rand(2));
+
+$node->start;
+$node->safe_psql('postgres', q(CREATE TABLE tbl(id SERIAL PRIMARY KEY, val int)));
+if ($no_hot)
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(val);));
+}
+else
+{
+	$node->safe_psql('postgres', q(CREATE INDEX test_idx ON tbl(id);));
+}
+
+# Load amcheck
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+my $sum = $node->safe_psql('postgres', q(
+	SELECT SUM(val) AS sum FROM tbl
+));
+
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE last_j START 1 INCREMENT 1;));
+
+
+$node->pgbench(
+"--no-vacuum --client=30 --jobs=4 --exit-on-abort -T $duration",
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+{
+	'concurrent_ops' => qq(
+		SELECT pg_try_advisory_lock(42)::integer AS gotlock \\gset
+		\\if :gotlock
+			REPACK (CONCURRENTLY) tbl USING INDEX tbl_pkey;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl USING INDEX test_idx;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			REPACK (CONCURRENTLY) tbl;
+			SELECT bt_index_parent_check('tbl_pkey', heapallindexed => true);
+			SELECT bt_index_parent_check('test_idx', heapallindexed => true);
+			\\sleep 10 ms
+
+			SELECT pg_advisory_unlock(42);
+		\\else
+			SELECT pg_advisory_lock(43);
+				BEGIN;
+				INSERT INTO tbl(val) VALUES (nextval('last_j')) RETURNING val \\gset p_
+				COMMIT;
+			SELECT pg_advisory_unlock(43);
+			\\sleep 1 ms
+
+			BEGIN
+			--TRANSACTION ISOLATION LEVEL REPEATABLE READ
+			;
+			SELECT 1;
+			\\sleep 1 ms
+			SELECT COUNT(*) AS count FROM tbl WHERE val <= :p_j \\gset p_
+			\\if :p_count != :p_j
+				COMMIT;
+				SELECT (:p_count) / 0;
+			\\endif
+
+			COMMIT;
+		\\endif
+	)
+});
+
+$node->stop;
+done_testing();
-- 
2.47.3

>From 0f74853d7134164457977fda41e1776455fdb6a6 Mon Sep 17 00:00:00 2001
From: Antonin Houska <[email protected]>
Date: Wed, 6 May 2026 09:38:11 +0200
Subject: [PATCH 3/3] Distinguish properly when database-specific transaction
 list should be used.

Currently, decoding session that relies on database-specific xl_running_xacts
WAL records can also use some information of cluster-wide records and vice
versa. Although this approach might reduce the total number of
xl_running_xacts records, it's simply not correct.

SnapBuildProcessRunningXacts() now decides at the very beginning whether
particular record can be used or not.
---
 src/backend/replication/logical/snapbuild.c | 55 +++++++++++++--------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index c8309b96ed4..b9661b7d70a 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1157,6 +1157,39 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	ReorderBufferTXN *txn;
 	TransactionId xmin;
 
+	/*
+	 * Each decoding session should use either cluster-wide snapshots or
+	 * database-specific ones, but not both. Since the latter can have more
+	 * recent value of oldestRunningXid, builder->xmin could go backwards if
+	 * it was followed by cluster-wide snapshot - see the ->xmin adjustment
+	 * below.
+	 */
+	if (!db_specific && OidIsValid(running->dbid))
+		return;
+
+	if (db_specific)
+	{
+		/*
+		 * Make sure that we have the snapshots needed for startup, as well as
+		 * those for regular cleanup: each time the cluster-wide snapshot is
+		 * created (typically on slot creation or by checkpointer), the
+		 * database-specific snapshot is requested here if the current session
+		 * needs it.
+		 */
+		if (!OidIsValid(running->dbid))
+		{
+			LogStandbySnapshot(MyDatabaseId);
+
+			return;
+		}
+		/*
+		 * Snapshots issued for other databases do not contain the information
+		 * about transactions in our database.
+		 */
+		else if (running->dbid != MyDatabaseId)
+			return;
+	}
+
 	/*
 	 * If we're not consistent yet, inspect the record to see whether it
 	 * allows to get closer to being consistent. If we are consistent, dump
@@ -1171,17 +1204,7 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 		 */
 		if (db_specific)
 		{
-			/*
-			 * If we must only keep track of transactions running in the
-			 * current database, we need transaction info from exactly that
-			 * database.
-			 */
-			if (running->dbid != MyDatabaseId)
-			{
-				LogStandbySnapshot(MyDatabaseId);
-
-				return;
-			}
+			Assert(running->dbid == MyDatabaseId);
 
 			/*
 			 * We'd better be able to check during scan if the plugin does not
@@ -1198,16 +1221,6 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	else
 		SnapBuildSerialize(builder, lsn);
 
-	/*
-	 * Database specific transaction info may exist to reach CONSISTENT state
-	 * faster, however the code below makes no use of it. Moreover, such
-	 * record might cause problems because the following normal (cluster-wide)
-	 * record can have lower value of oldestRunningXid. In that case, let's
-	 * wait with the cleanup for the next regular cluster-wide record.
-	 */
-	if (OidIsValid(running->dbid))
-		return;
-
 	/*
 	 * Update range of interesting xids based on the running xacts
 	 * information. We don't increase ->xmax using it, because once we are in
-- 
2.47.3

Re: Adding REPACK [concurrently]

Reply via email to