On 3/13/24 19:15, Michael Paquier wrote:
On Wed, Mar 13, 2024 at 01:12:28PM +1300, David Steele wrote:

Not sure what to look for here. There are no distinct messages for crash
recovery. Perhaps there should be?

The closest thing I can think of here would be "database system was
not properly shut down; automatic recovery in progress" as we don't
have InArchiveRecovery, after checking that the canary is missing.  If
you don't like this suggestion, feel free to say so, of course :)

That works for me. I think I got it confused with "database system was interrupted..." when I was looking at the success vs. fail logs.

Sure, I added a check for the new log message when recovering with a
backup_label.

+ok($node_replica->log_contains('completed backup recovery with redo LSN'),
+   'verify backup recovery performed with backup_label');

Okay for this choice.  I was thinking first about "starting backup
recovery with redo LSN", closer to the area where the backup_label is
read.

I think you are right that the start message is better since it can only appear once when the backup_label is found. The completed message could in theory appear after a restart, though the backup_label must have been found at some point.

Regards,
-David
From 91f8e8a390713760116990be05d96f8a7e0d1bde Mon Sep 17 00:00:00 2001
From: David Steele <da...@pgmasters.net>
Date: Wed, 13 Mar 2024 20:02:40 +0000
Subject: Add basic tests for the low-level backup method.

There are currently no tests for the low-level backup method. pg_backup_start()
and pg_backup_stop() are called but not exercised in any real fashion.

There is a lot more that can be done, but this at least provides some basic
tests and provides a place for future improvement.
---
 src/test/recovery/t/042_low_level_backup.pl | 114 ++++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100644 src/test/recovery/t/042_low_level_backup.pl

diff --git a/src/test/recovery/t/042_low_level_backup.pl 
b/src/test/recovery/t/042_low_level_backup.pl
new file mode 100644
index 0000000000..76bac7e324
--- /dev/null
+++ b/src/test/recovery/t/042_low_level_backup.pl
@@ -0,0 +1,114 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test low-level backup method by using pg_backup_start() and pg_backup_stop()
+# to create backups.
+
+use strict;
+use warnings;
+
+use File::Copy qw(copy);
+use File::Path qw(remove_tree);
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Start primary node with archiving
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(has_archiving => 1, allows_streaming => 1);
+$node_primary->start;
+
+# Start backup
+my $backup_name = 'backup1';
+my $psql = $node_primary->background_psql('postgres');
+
+$psql->query_safe("SET client_min_messages TO WARNING");
+$psql->set_query_timer_restart();
+$psql->query_safe("select pg_backup_start('test label')");
+
+# Copy files
+my $backup_dir = $node_primary->backup_dir() . '/' . $backup_name;
+
+PostgreSQL::Test::RecursiveCopy::copypath(
+       $node_primary->data_dir(), $backup_dir);
+
+# Cleanup some files/paths that should not be in the backup. There is no
+# attempt to all the exclusions done by pg_basebackup here, in part because
+# they are not required, but also to keep the test simple.
+#
+# Also remove pg_control because it needs to be copied later and it will be
+# obvious if the copy fails.
+unlink("$backup_dir/postmaster.pid")
+       or BAIL_OUT("unable to unlink $backup_dir/postmaster.pid");
+unlink("$backup_dir/postmaster.opts")
+       or BAIL_OUT("unable to unlink $backup_dir/postmaster.opts");
+unlink("$backup_dir/global/pg_control")
+       or BAIL_OUT("unable to unlink $backup_dir/global/pg_control");
+
+remove_tree("$backup_dir/pg_wal", {keep_root => 1})
+       or BAIL_OUT("unable to unlink contents of $backup_dir/pg_wal");
+
+# Create a table that will be used to verify that recovery started at the
+# correct location, rather than the location recorded in pg_control
+$psql->query_safe("create table canary (id int)");
+
+# Advance the checkpoint location in pg_control past the location where the
+# backup started. Switch the WAL to make it really obvious that the location
+# is different and to put the checkpoint in a new WAL segment.
+$psql->query_safe("select pg_switch_wal()");
+$psql->query_safe("checkpoint");
+
+# Copy pg_control last so it contains the new checkpoint
+copy($node_primary->data_dir() . '/global/pg_control',
+               "$backup_dir/global/pg_control")
+       or BAIL_OUT("unable to copy global/pg_control");
+
+# Stop backup and get backup_label
+my $backup_label = $psql->query_safe("select labelfile from pg_backup_stop()");
+
+# Rather than writing out backup_label, try to recover the backup without
+# backup_label to demonstrate that recovery will not work correctly without it,
+# i.e. the canary table will be missing and the cluster will be corrupt. 
Provide
+# only the WAL segment that recovery will think it needs.
+#
+# The point of this test is to explicitly demonstrate that backup_label is
+# being used in a later test to get the correct recovery info.
+my $node_replica = PostgreSQL::Test::Cluster->new('replica_fail');
+$node_replica->init_from_backup($node_primary, $backup_name);
+my $canary_query = "select count(*) from pg_class where relname = 'canary'";
+
+copy($node_primary->archive_dir() . '/000000010000000000000003',
+               $node_replica->data_dir() . '/pg_wal/000000010000000000000003')
+       or BAIL_OUT("unable to copy 000000010000000000000003");
+
+$node_replica->start;
+
+ok($node_replica->safe_psql('postgres', $canary_query) == 0,
+       'canary is missing');
+
+# Check log to ensure crash recovery was used because backup label was missing
+ok($node_replica->log_contains(
+       'database system was not properly shut down; automatic recovery in 
progress'),
+       'verify backup recovery performed with crash recovery');
+
+$node_replica->teardown_node();
+$node_replica->clean_node();
+
+# Save backup_label into the backup directory and recover using the primary's
+# archive. This time recovery will succeed and the canary table will be
+# present.
+append_to_file("$backup_dir/backup_label", $backup_label);
+
+$node_replica = PostgreSQL::Test::Cluster->new('replica_success');
+$node_replica->init_from_backup($node_primary, $backup_name,
+       has_restoring => 1);
+$node_replica->start;
+
+ok($node_replica->safe_psql('postgres', $canary_query) == 1,
+       'canary is present');
+
+# Check log to ensure backup_label was used for recovery
+ok($node_replica->log_contains('starting backup recovery with redo LSN'),
+       'verify backup recovery performed with backup_label');
+
+done_testing();
-- 
2.34.1

Reply via email to