Multixid SLRU truncation bugs at wraparound

Heikki Linnakangas Fri, 07 Nov 2025 07:33:42 -0800

While working on the reported pg_upgrade failure at multixid wraparound[1], I bumped into another bug related to multixid wraparound. If yourun vacuum freeze, and it advances oldestMultiXactId, and nextMulti hasjust wrapped around to 0, you get this in the log:

LOG:  MultiXact member wraparound protections are disabled because oldest 
checkpointed MultiXact 1 does not exist on disk


Culprit: TruncateMultiXact does this:

    LWLockAcquire(MultiXactGenLock, LW_SHARED);
    nextMulti = MultiXactState->nextMXact;
    nextOffset = MultiXactState->nextOffset;
    oldestMulti = MultiXactState->oldestMultiXactId;
    LWLockRelease(MultiXactGenLock);
    Assert(MultiXactIdIsValid(oldestMulti));

    ...


    /*

* First, compute the safe truncation point for MultiXactMember.This is

     * the starting offset of the oldest multixact.
     *
     * Hopefully, find_multixact_start will always work here, because we've

* already checked that it doesn't precede the earliest MultiXacton disk.

     * But if it fails, don't truncate anything, and log a message.
     */
    if (oldestMulti == nextMulti)
    {
        /* there are NO MultiXacts */
        oldestOffset = nextOffset;
    }
    else if (!find_multixact_start(oldestMulti, &oldestOffset))
    {
        ereport(LOG,

(errmsg("oldest MultiXact %u not found, earliestMultiXact %u, skipping truncation",

                        oldestMulti, earliest)));
        LWLockRelease(MultiXactTruncationLock);
        return;
    }

Scenario 1: In the buggy scenario, oldestMulti is 1 and nextMulti is 0.We should take the "there are NO MultiXacts" codepath in that case,because we skip over 0 when assigning multixids. Instead, we callfind_multixact_start with oldestMulti==1, which returns false becausemultixid 1 hasn't been assigned and the SLRU segment doesn't exist yet.There's a similar bug in SetOffsetVacuumLimit().

Scenario 2: In scenario 1 we just fail to truncate the SLRUs and you getthe log message. But I think there might be more serious variants ofthis. If the SLRU segment exists but the offset for multixid 1 hasn'tbeen set yet, find_multixact_start() will return 0 instead, and we willproceed with the truncation based on incorrect oldestOffset==0 value,possibly removing SLRU segments that are still needed.


Attached is a fix for scenarios 1 and 2, and a test case for scenario 1.

Scenario 3: I also noticed that the above code isn't prepared for therace condition that the offset corresponding to 'oldestMulti' hasn'tbeen stored in the SLRU yet, even without wraparound. That couldtheoretically happen if the backend executingMultiXactIdCreateFromMembers() gets stuck for a long time between thecalls to GetNewMultiXactId() and RecordNewMultiXact(), but I think we'resaved by the fact that we only create new multixids while holding a lockon a heap page, and a system-wide VACUUM FREEZE that would advanceoldestMulti would need to lock the heap page too. It's scary though,because it could also lead to truncating away members SLRU segments thatare still needed. The attached patch does *not* address this scenario.

[1]https://www.postgresql.org/message-id/cacg%3dezaapsmtjd%3dm2sfn5ucuggd3fg8z8qte8xq9k5-%[email protected]


- Heikki

From 557b22e931233e336704d04defee2e19c7706d1c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Fri, 7 Nov 2025 17:21:26 +0200
Subject: [PATCH 1/2] Add test for vacuuming at multixid wraparound

This currently fails. The next commit fixes the failure.

This isn't fully polished, and I'm not sure if it's worth committing.
---
 src/test/modules/test_misc/meson.build        |   1 +
 .../test_misc/t/010_mxid_wraparound.pl        | 123 ++++++++++++++++++
 2 files changed, 124 insertions(+)
 create mode 100644 src/test/modules/test_misc/t/010_mxid_wraparound.pl

diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index f258bf1ccd9..cf57ed21dc6 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -18,6 +18,7 @@ tests += {
       't/007_catcache_inval.pl',
       't/008_replslot_single_user.pl',
       't/009_log_temp_files.pl',
+      't/010_mxid_wraparound.pl',
     ],
   },
 }
diff --git a/src/test/modules/test_misc/t/010_mxid_wraparound.pl b/src/test/modules/test_misc/t/010_mxid_wraparound.pl
new file mode 100644
index 00000000000..487cb71eacc
--- /dev/null
+++ b/src/test/modules/test_misc/t/010_mxid_wraparound.pl
@@ -0,0 +1,123 @@
+#
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub print_controldata_info
+{
+	my $node = shift;
+	my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+
+	foreach (split("\n", $stdout))
+	{
+		if ($_ =~ /^Latest checkpoint's Next\s*(.*)$/mg or
+			$_ =~ /^Latest checkpoint's oldest\s*(.*)$/mg)
+		{
+			print $_."\n";
+		}
+	}
+}
+
+sub create_mxid
+{
+	my $node = shift;
+	my $conn1 = $node->background_psql('postgres');
+	my $conn2 = $node->background_psql('postgres');
+
+	$conn1->query_safe(qq(
+		BEGIN;
+		SELECT * FROM test_table WHERE id = 1 FOR SHARE;
+	));
+	$conn2->query_safe(qq(
+		BEGIN;
+		SELECT * FROM test_table WHERE id = 1 FOR SHARE;
+	));
+
+	$conn1->query_safe(qq(COMMIT;));
+	$conn2->query_safe(qq(COMMIT;));
+
+	$conn1->quit;
+	$conn2->quit;
+}
+
+# 1) Create test cluster
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+
+$node->start;
+
+$node->safe_psql('postgres',
+qq(
+	CREATE TABLE test_table (id integer NOT NULL PRIMARY KEY, val text);
+	INSERT INTO test_table VALUES (1, 'a');
+));
+
+create_mxid($node);
+
+$node->safe_psql('postgres', qq(UPDATE pg_database SET datallowconn = TRUE WHERE datname = 'template0';));
+$node->stop;
+
+# 2) Advance mxid to UINT32_MAX. We do it in three steps, with vacuums in between, to avoid
+# causing a situation where datminmxid has already wrapped around
+
+# Step 1
+command_ok(
+	[ 'pg_resetwal', '-m', '1492123648,1', $node->data_dir ],
+	'approaching the mxid limit');
+$node->start;
+create_mxid($node);
+$node->command_ok([ 'vacuumdb', '-a', '--freeze' ], 'vacuum all databases');
+$node->stop;
+
+print ">>> pg_controldata: \n";
+print_controldata_info($node);
+
+# Step 2
+command_ok(
+	[ 'pg_resetwal', '-m', '2984247296,1492123648', $node->data_dir ],
+	'approaching the mxid limit');
+$node->start;
+create_mxid($node);
+$node->command_ok([ 'vacuumdb', '-a', '--freeze' ], 'vacuum all databases');
+$node->stop;
+
+# Step 3. This finally gets us to UINT32_MAX.
+command_ok(
+	[ 'pg_resetwal', '-m', '4294967295,2984247296', $node->data_dir ],
+	'approaching the mxid limit');
+
+print ">>> pg_controldata: \n";
+print_controldata_info($node);
+
+# The last step advances nextMulti to value that's not at the beginning of SLRU segment,
+# Postgres expects the segment file to already exit. Create it.
+my $offsets_seg = $node->data_dir . '/pg_multixact/offsets/FFFF';
+open my $fh1, '>', $offsets_seg or BAIL_OUT($!);
+binmode $fh1;
+print $fh1 pack("x[262144]");
+close $fh1;
+
+
+$node->start;
+create_mxid($node);
+$node->command_ok([ 'vacuumdb', '-a', '--freeze' ], 'vacuum all databases');
+is($node->safe_psql('postgres', qq(TABLE test_table;)),
+	'1|a',
+	'check table contents');
+$node->stop;
+
+ok( !$node->log_contains("wraparound protections are disabled"),
+	"check that log doesn't contain 'wraparound protections are disabled'");
+
+ok( !$node->log_contains("cannot truncate up to MultiXact"),
+	"check that log doesn't contain 'cannot truncate up to MultiXact'");
+
+ok( !$node->log_contains("skipping truncation"),
+	"check that log doesn't contain 'skipping truncation'");
+
+done_testing();
-- 
2.47.3

From 757483c3446dfd4566da32079d7ed45cf73ee0bc Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Fri, 7 Nov 2025 17:06:07 +0200
Subject: [PATCH 2/2] Fix truncation of multixid SLRUs at wraparound

SetOffsetVacuumLimit() and TruncateMultiXact() have checks for
MultiXactState->nextMXact == MultiXactState->oldestMultiXactId.
However, those checks didn't work as intended at wraparound. When the
last multixid before wraparound (UINT32_MAX) is consumed,
MultiXactState->nextMXact is advanced to 0, but because 0 is not a
valid multixid, all code that reads MultiXactState->nextMXact treats 0
as if the value was 1. Except for the checks in SetOffsetVacuumLimit()
and TruncateMultiXact().

As a result, at exactly multixid wraparound, VACUUM would fail to
truncate multixact SLRUs, or worse, it might truncate the offsets SLRU
incorrectly. I think the incorrect truncation is possible if a new
multixid is assigned concurrently just as vacuum reads the offsets
SLRU. The failure to truncate is easier to reproduce, but less
serious.

Discussion: XXX
---
 src/backend/access/transam/multixact.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9d5f130af7e..735486f9df7 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2673,6 +2673,9 @@ SetOffsetVacuumLimit(bool is_startup)
 	Assert(MultiXactState->finishedStartup);
 	LWLockRelease(MultiXactGenLock);

+	if (nextMXact < FirstMultiXactId)
+		nextMXact = FirstMultiXactId;
+
 	/*
 	 * Determine the offset of the oldest multixact.  Normally, we can read
 	 * the offset from the multixact itself, but there's an important special
@@ -3075,6 +3078,9 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	LWLockRelease(MultiXactGenLock);
 	Assert(MultiXactIdIsValid(oldestMulti));

+	if (nextMulti < FirstMultiXactId)
+		nextMulti = FirstMultiXactId;
+
 	/*
 	 * Make sure to only attempt truncation if there's values to truncate
 	 * away. In normal processing values shouldn't go backwards, but there's
-- 
2.47.3

Multixid SLRU truncation bugs at wraparound

Reply via email to