On Thu, 30 Oct 2025 at 12:10, Heikki Linnakangas <[email protected]> wrote:

>
> Oh I see, the 'base' is not necessarily the base offset of the first
> multixact on the page, it's the base offset of the first multixid that
> is written to the page. And the (short) offsets can be negative. That's
> a frighteningly clever encoding scheme. One upshot of that is that WAL
> redo might get construct the page with a different 'base'. I guess that
> works, but it scares me. Could we come up with a more deterministic scheme?
>
> Definitely! The most stable approach is the one we had before, which
used actual 64-bit offsets in the SLRU. To be honest, I'm completely
happy with it. After all, what's most important for me is to have 64-bit
xids in Postgres, and this patch is a step towards that goal.

PFA v20 returns to using actual 64-bit offsets for on-disk SLRU
segments.

Fortunately, now that I've separated reading and writing offsets into
different functions, switching from one implementation to another is
easy to do.

Here's a quick overview of the current state of the patch:
1) Access to the offset is placed to separate calls:
   MXOffsetWrite/MXOffsetRead.
2) I abandoned byte juggling in pg_upgrade and moved to using logic that
   replicates the work with offsets im multixact.c
3) As a result, the update issue came down to the correct implementation
   of functions MXOffsetWrite/MXOffsetRead.
4) The only question that remains is the question of disk representation
   of 64-bit offsets in SLRU segments.

My thoughts on point (4).

Using 32-bit offsets + some kind of packing:
Pros:
 + Reduce the total disc space used by the segments; ideally it is
   almost the same as before.
Cons:
 - Reduces reliability (losing a part will most likely result in losing
   the entire page).
 - Complicates code, especially considering that segments may be written
   to the page in random order.

Using 64-bit offsets in SLRU:
Pros:
 + Easy to implement/transparent logic.
Cons:
 - Increases the amount of disk space used.

In terms of speed, I'm not sure which will be faster. On the one hand,
64-bit eliminates the necessity for calculations and branching. On the
other hand, the amount of data used will increase.

I am not opposed to any of these options, as our primary goal is getting
64-bit offsets. However, I like the approach using full 64-bit offsets
in SLRU, because it is more clear and, should we say, robust. Yes, it
will increase the number of segment, however this is not heap data in
for a table. Under typical circumstances, there should not be too many
such segments.

-- 
Best regards,
Maxim Orlov.
From dd45bfb97be126c03a1c4e41f5794c5726dcd413 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Wed, 29 Oct 2025 14:19:56 +0300
Subject: [PATCH v20 5/5] TEST: Add test for 64-bit mxoff in pg_upgrade

---
 src/bin/pg_upgrade/t/007_mxoff.pl | 461 ++++++++++++++++++++++++++++++
 1 file changed, 461 insertions(+)
 create mode 100644 src/bin/pg_upgrade/t/007_mxoff.pl

diff --git a/src/bin/pg_upgrade/t/007_mxoff.pl 
b/src/bin/pg_upgrade/t/007_mxoff.pl
new file mode 100644
index 00000000000..26fd6e9c5d0
--- /dev/null
+++ b/src/bin/pg_upgrade/t/007_mxoff.pl
@@ -0,0 +1,461 @@
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use Math::BigInt;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::AdjustDump;
+use PostgreSQL::Test::AdjustUpgrade;
+use Test::More;
+
+# This test involves different multitransaction states, similarly to that of
+# 002_pg_upgrade.pl.
+
+unless (defined($ENV{oldinstall}))
+{
+       plan skip_all => 'to run test set oldinstall environment variable to 
the pre 64-bit mxoff cluster';
+}
+
+# Temp dir for a dumps.
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+
+# Can be changed to test the other modes.
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Get NextMultiOffset.
+sub next_mxoff
+{
+       my $node = shift;
+
+       my $pg_controldata_path =
+               defined($node->install_path) ?
+                       $node->install_path . '/bin/pg_controldata' :
+                       'pg_controldata';
+       my ($stdout, $stderr) = run_command([ $pg_controldata_path,
+                                                                               
        $node->data_dir ]);
+       my @control_data = split("\n", $stdout);
+       my $next_mxoff = undef;
+
+       foreach (@control_data)
+       {
+               if ($_ =~ /^Latest checkpoint's NextMultiOffset:\s*(.*)$/mg)
+               {
+                       $next_mxoff = $1;
+                       last;
+               }
+       }
+       die "NextMultiOffset not found in control file\n"
+               unless defined($next_mxoff);
+
+       return $next_mxoff;
+}
+
+# Consume around 10k of mxoffsets.
+sub mxact_eater
+{
+       my $node = shift;
+       my $tbl = 'FOO';
+
+       my ($mxoff1, $mxoff2);
+
+       $mxoff1 = next_mxoff($node);
+       $node->start;
+       $node->safe_psql('postgres',
+               "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+               "       WITH (AUTOVACUUM_ENABLED=FALSE);" .
+               "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) 
G;");
+
+       # consume around 10k mxoff
+       my $nclients = 10;
+       my $update_every = 75;
+       my @connections = ();
+
+       for (0..$nclients)
+       {
+               my $conn = $node->background_psql('postgres');
+               $conn->query_safe("BEGIN");
+
+               push(@connections, $conn);
+       }
+
+       for (my $i = 0; $i < 1000; $i++)
+       {
+               my $conn = $connections[$i % $nclients];
+
+               $conn->query_safe("COMMIT;");
+               $conn->query_safe("BEGIN");
+
+               if ($i % $update_every == 0)
+               {
+                       $conn->query_safe(
+                               "UPDATE ${tbl} SET " .
+                               "N_UPDATED = N_UPDATED + 1 " .
+                               "WHERE I = ${i} % 50");
+               }
+               else
+               {
+                       $conn->query_safe(
+                               "SELECT * FROM ${tbl} FOR KEY SHARE");
+               }
+       }
+
+       for my $conn (@connections)
+       {
+               $conn->quit();
+       }
+
+       $node->stop;
+       $mxoff2 = next_mxoff($node);
+
+       return $mxoff1, $mxoff2;
+}
+
+# Consume around 2M of mxoffsets.
+sub mxact_huge_eater
+{
+       my $node = shift;
+       my $tbl = 'FOO';
+
+       my ($mxoff1, $mxoff2);
+
+       $mxoff1 = next_mxoff($node);
+       $node->start;
+       $node->safe_psql('postgres',
+               "CREATE TABLE ${tbl} (I INT PRIMARY KEY, N_UPDATED INT) " .
+               "       WITH (AUTOVACUUM_ENABLED=FALSE);" .
+               "INSERT INTO ${tbl} SELECT G, 0 FROM GENERATE_SERIES(1, 50) 
G;");
+
+       # consume around 1M mxoff
+       my $nclients = 10;
+       my $update_every = 95;
+       my @connections = ();
+       my $timeout = 10 * $PostgreSQL::Test::Utils::timeout_default;
+
+       for (0..$nclients)
+       {
+               my $conn = $node->background_psql('postgres',
+                                                                               
  timeout => $timeout);
+               $conn->query_safe("BEGIN");
+
+               push(@connections, $conn);
+       }
+
+       # It's a long process, better to tell about progress.
+       my $n_steps = 200_000;
+       my $step = int($n_steps / 10);
+
+       diag "\nstart to consume mxoffsets ...\n";
+       for (my $i = 0; $i < $n_steps; $i++)
+       {
+               my $conn = $connections[$i % $nclients];
+
+               $conn->query_safe("COMMIT;");
+               $conn->query_safe("BEGIN");
+
+               if ($i % $update_every == 0)
+               {
+                       # Perform some non-key UPDATEs too, to exercise 
different multixact
+                       # member statuses.
+                       $conn->query_safe(
+                               "UPDATE ${tbl} SET " .
+                               "N_UPDATED = N_UPDATED + 1 " .
+                               "WHERE I = ${i} % 50");
+               }
+               else
+               {
+                       $conn->query_safe(
+                               "SELECT * FROM ${tbl} " .
+                               "TABLESAMPLE SYSTEM (85) " .
+                               "FOR KEY SHARE");
+               }
+
+               if ($i % $step == 0)
+               {
+                       my $done = int(($i / $n_steps) * 100);
+                       diag "$done% done...";
+               }
+       }
+
+       for my $conn (@connections)
+       {
+               $conn->quit();
+       }
+
+       $node->stop;
+       $mxoff2 = next_mxoff($node);
+
+       return $mxoff1, $mxoff2;
+}
+
+# Set oldest multixact-offset
+sub reset_mxoff
+{
+       my $node = shift;
+       my $offset = shift;
+
+       my $pg_resetwal_path = $node->install_path . '/bin/pg_resetwal';
+
+       # Get block size
+       my $out = (run_command([ $pg_resetwal_path, '--dry-run',
+                                                        $node->data_dir ]))[0];
+               $out =~ /^Database block size: *(\d+)$/m or die;
+       my $blcksz = $1;
+
+       # Reset to new offset
+       my @cmd = ($pg_resetwal_path, '--pgdata' => $node->data_dir);
+       push @cmd, '--multixact-offset' => $offset;
+       command_ok(\@cmd, 'set oldest multixact-offset');
+
+       # Fill empty pg_multixact/members segment
+       my $mult = 32 * int($blcksz / 20) * 4;
+       my $segname = sprintf "%04X", $offset / $mult;
+
+       my @dd = ('dd');
+       push @dd, "if=/dev/zero";
+       push @dd, "of=" . $node->data_dir . "/pg_multixact/members/" . $segname;
+       push @dd, "bs=$blcksz";
+       push @dd, "count=32";
+       command_ok(\@dd, 'fill empty multixact-members');
+}
+
+sub get_dump_for_comparison
+{
+       my ($node, $db, $file_prefix, $adjust_child_columns) = @_;
+
+       my $dumpfile = $tempdir . '/' . $file_prefix . '.sql';
+       my $dump_adjusted = "${dumpfile}_adjusted";
+
+       open(my $dh, '>', $dump_adjusted)
+         || die "could not open $dump_adjusted for writing $!";
+
+       $node->run_log(
+               [
+                       'pg_dump', '--no-sync',
+                       '--restrict-key' => 'test',
+                       '-d' => $node->connstr($db),
+                       '-f' => $dumpfile
+               ]);
+
+       print $dh adjust_regress_dumpfile(slurp_file($dumpfile),
+               $adjust_child_columns);
+       close($dh);
+
+       return $dump_adjusted;
+}
+
+# Main test workhorse routine.
+# Make pg_upgrade, dump data and compare it.
+sub run_test
+{
+       my $tag = shift;
+       my $oldnode = shift;
+       my $newnode = shift;
+
+       command_ok(
+               [
+                       'pg_upgrade', '--no-sync',
+                       '--old-datadir' => $oldnode->data_dir,
+                       '--new-datadir' => $newnode->data_dir,
+                       '--old-bindir' => $oldnode->config_data('--bindir'),
+                       '--new-bindir' => $newnode->config_data('--bindir'),
+                       '--socketdir' => $newnode->host,
+                       '--old-port' => $oldnode->port,
+                       '--new-port' => $newnode->port,
+                       $mode,
+               ],
+               'run of pg_upgrade for new instance');
+       ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+               "pg_upgrade_output.d/ removed after pg_upgrade success");
+
+       $oldnode->start;
+       my $src_dump =
+               get_dump_for_comparison($oldnode, 'postgres',
+                                                               
"oldnode_${tag}_dump", 0);
+       $oldnode->stop;
+
+       $newnode->start;
+       my $dst_dump =
+               get_dump_for_comparison($newnode, 'postgres',
+                                                               
"newnode_${tag}_dump", 0);
+       $newnode->stop;
+
+       compare_files($src_dump, $dst_dump,
+               'dump outputs from original and restored regression databases 
match');
+}
+
+sub to_hex
+{
+       my $arg = shift;
+
+       $arg = Math::BigInt->new($arg);
+       $arg = $arg->as_hex();
+
+       return $arg;
+}
+
+# case #1: start old node from defaults
+{
+       my $tag = 1;
+       my $old =
+               PostgreSQL::Test::Cluster->new("oldnode${tag}",
+                                                                          
install_path => $ENV{oldinstall});
+       $old->init(extra => ['-k']);
+
+       my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+       my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+       $new->init;
+
+       run_test($tag, $old, $new);
+
+       $start_mxoff = to_hex($start_mxoff);
+       $finish_mxoff = to_hex($finish_mxoff);
+
+       my $next_mxoff = to_hex(next_mxoff($new));
+
+       note ">>> case #${tag}\n" .
+                " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+                " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #2: start old node from before 32-bit wraparound
+{
+       my $tag = 2;
+       my $old =
+               PostgreSQL::Test::Cluster->new("oldnode${tag}",
+                                                                          
install_path => $ENV{oldinstall});
+
+       $old->init(extra => ['-k']);
+       reset_mxoff($old, 0xFFFF0000);
+
+       my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+       my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+       $new->init;
+
+       run_test($tag, $old, $new);
+
+       $start_mxoff = to_hex($start_mxoff);
+       $finish_mxoff = to_hex($finish_mxoff);
+
+       my $next_mxoff = to_hex(next_mxoff($new));
+
+       note ">>> case #${tag}\n" .
+                " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+                " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #3: start old node near 32-bit wraparound and reach wraparound state.
+{
+       my $tag = 3;
+       my $old =
+               PostgreSQL::Test::Cluster->new("oldnode${tag}",
+                                                                          
install_path => $ENV{oldinstall});
+
+       $old->init(extra => ['-k']);
+
+       reset_mxoff($old, 0xFFFFEC77);
+       my ($start_mxoff, $finish_mxoff) = mxact_eater($old);
+
+       my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+       $new->init;
+
+       run_test($tag, $old, $new);
+
+       $start_mxoff = to_hex($start_mxoff);
+       $finish_mxoff = to_hex($finish_mxoff);
+
+       my $next_mxoff = to_hex(next_mxoff($new));
+
+       note ">>> case #${tag}\n" .
+                " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+                " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #4: start old node from defaults
+{
+       my $tag = 4;
+       my $old =
+               PostgreSQL::Test::Cluster->new("oldnode${tag}",
+                                                                          
install_path => $ENV{oldinstall});
+
+       $old->init(extra => ['-k']);
+
+       diag "test #${tag} for multiple mxoff segments";
+       my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+       my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+       $new->init;
+
+       run_test($tag, $old, $new);
+
+       $start_mxoff = to_hex($start_mxoff);
+       $finish_mxoff = to_hex($finish_mxoff);
+
+       my $next_mxoff = to_hex(next_mxoff($new));
+
+       note ">>> case #${tag}\n" .
+                " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+                " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #5: start old node from before 32-bit wraparound
+{
+       my $tag = 5;
+       my $old =
+               PostgreSQL::Test::Cluster->new("oldnode${tag}",
+                                                                          
install_path => $ENV{oldinstall});
+
+       $old->init(extra => ['-k']);
+       reset_mxoff($old, 0xFFFF0000);
+
+       diag "test #${tag} for multiple mxoff segments";
+       my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+       my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+       $new->init;
+
+       run_test($tag, $old, $new);
+
+       $start_mxoff = to_hex($start_mxoff);
+       $finish_mxoff = to_hex($finish_mxoff);
+
+       my $next_mxoff = to_hex(next_mxoff($new));
+
+       note ">>> case #${tag}\n" .
+                " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+                " newnode mxoff ${next_mxoff}\n";
+}
+
+# case #6: start old node near 32-bit wraparound and reach wraparound state.
+{
+       my $tag = 6;
+       my $old =
+               PostgreSQL::Test::Cluster->new("oldnode${tag}",
+                                                                          
install_path => $ENV{oldinstall});
+
+       $old->init(extra => ['-k']);
+
+       reset_mxoff($old, 0xFFFFFFFF - 1_000_000);
+       my ($start_mxoff, $finish_mxoff) = mxact_huge_eater($old);
+
+       diag "test #${tag} for multiple mxoff segments";
+       my $new = PostgreSQL::Test::Cluster->new("newnode${tag}");
+       $new->init;
+
+       run_test($tag, $old, $new);
+
+       $start_mxoff = to_hex($start_mxoff);
+       $finish_mxoff = to_hex($finish_mxoff);
+
+       my $next_mxoff = to_hex(next_mxoff($new));
+
+       note ">>> case #${tag}\n" .
+                " oldnode mxoff from ${start_mxoff} to ${finish_mxoff}\n" .
+                " newnode mxoff ${next_mxoff}\n";
+}
+
+done_testing();
-- 
2.43.0

Attachment: v20-0003-Add-test-for-64-bit-mxoff-in-pg_resetwal.patch
Description: Binary data

Attachment: v20-0001-Use-64-bit-multixact-offsets.patch
Description: Binary data

From 9ae48f1a66fcb2f1651a168c481214f1f9b6e201 Mon Sep 17 00:00:00 2001
From: Maxim Orlov <[email protected]>
Date: Fri, 24 Oct 2025 11:47:50 +0300
Subject: [PATCH v20 4/5] TEST: bump catversion

To avoid constant CF-bot complains, make catversion bump in a separate
commit.

NOTE: keep it in sync with MULTIXACTOFFSET_FORMATCHANGE_CAT_VER
---
 src/include/catalog/catversion.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 18e95179ab6..6a13fa3cdb0 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*                                                     yyyymmddN */
-#define CATALOG_VERSION_NO     202510281
+#define CATALOG_VERSION_NO     999999999
 
 #endif
-- 
2.43.0

Attachment: v20-0002-Add-pg_upgarde-for-64-bit-multixact-offsets.patch
Description: Binary data

Reply via email to