[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

Michael Paquier Sat, 03 Oct 2015 04:40:22 -0700

On Fri, Oct 2, 2015 at 11:10 PM, Amir Rohan wrote:
> On 10/02/2015 03:33 PM, Michael Paquier wrote:
> > Any server instances created during the tests should never use a
> > user-defined port for portability. Hence using those ports as keys
> > just made sense. We could have for example custom names, that have
> > port values assigned to them, but that's actually an overkill and
> > complicates the whole facility.
> >
>
> Something like:
>
>     global nPortsAssigned = 0;
>     AssignPort() -> return is_ok(nPortsAssigned++)
>
> was what I used.


Why do you need that. Creating a node is in the most basic way a
matter of calling make_master, make_*_standby where a port number, or
identifier gets uniquely assigned to a node. The main point of the
approach taken by this patch is to make port assignment transparent
for the caller.

> >> 4) Port assignment relies on liveness checks on running servers.
> >> If a server is shut down and a new instantiated, the port will get
> >> reused, data will get trashed, and various confusing things can happen.
> >
> > Right. The safest way to do that is to check in get_free_port if a
> > port number is used by a registered node, and continue to loop in if
> > that's the case. So done.
> >
>
> That eliminates the "sweet and gentle" variant of the scenario, but it's
> susceptible to the "ABA problem":
> https://en.wikipedia.org/wiki/ABA_problem
> https://youtu.be/CmxkPChOcvw?t=786

I learnt a new thing here. That's basically an existing problem even
with the existing perl test infrastructure relying on TestLib.pm when
tests are run in parallel. What we would need here is a global mapping
file storing all the port numbers used by all the nodes currently in
the tests.

>
> Granted, you have to try fairly hard to shoot yourself in the leg,
> but since the solution is so simple, why not? If we never reuse ports
> within a single test, this goes away.

Well, you can reuse the same port number in a test. Simply teardown
the existing node and then recreate a new one. I think that port
number assignment to a node should be transparent to the caller, in
our case the perl test script holding a scenario.

> >> 5) Servers are shutdown with -m 'immediate', which can lead to races
> >> in the script when archiving is turned on. That may be good for some
> >> tests, but there's no control over it.
> >
> > I hesitated with fast here actually. So changed this way. We would
> > want as wall a teardown command to stop the node with immediate and
> > unregister the node from the active list.
> >
>
> In particular, I was shutting down an archiving node and the archiving
> was truncated. I *think* smart doesn't do this. But again, it's really
> that the test writer can't easily override, not that the default is wrong.

Ah, OK. Then fast is just fine. It shuts down the node correctly.
"smart" would wait for all the current connections to finish but I am
wondering if it currently matters here: I don't see yet a clear use
case yet where it would make sense to have multi-threaded script... If
somebody comes back with a clear idea here perhaps we could revisit
that but it does not seem worth it now.

> >> Other issues:
> >> 6. Directory structure, used one directory per thing but more logical
> >> to place all things related to an instance under a single directory,
> >> and name them according to role (57333_backup, and so on).
> >
> > Er, well. The first version of the patch did so, and then I switched
> > to an approach closer to what the existing TAP facility is doing. But
> > well let's simplify things a bit.
> >
>
> I know, I know, but:
> 1) an instance is a "thing" in your script, so having its associated
> paraphernalia in one place makes more sense (maybe only to me).
> 2) That's often how folks (well, how I) arrange things in deployment,
> at least with archive/backup as symlinks to the nas.
>
> Alternatively, naming the dirs with a prefix (srv_foo_HASH,
> backup_foo_backupname_HASH, etc') would work as well.

The useful portion about tempdir is that it cleans up itself
automatically should an error happen. It does not seem to me we want
use that.

> >> 7. enable_restoring() uses "cp -i" 'archive_command', not a good fit
> >> for an automated test.
> >
> > This seems like a good default to me, and actually that's portable on
> > Windows easily. One could always append a custom archive_command in a
> > test when for example testing conflicting archiving when archive_mode
> > = always.
> >
>
> Ok, I wasn't sure about this, but specifically activating a switch that
> asks for input from the user during a test? hmm.

Er... The -i switch is a bad idea. I removed it. Honestly I don't
recall why it was here to begin with...

>  >> 8. No canned way to output a pprinted overview of the running system
> >> (paths, ports, for manual checking).
> >
> > Hm. Why not... Are you suggesting something like print_current_conf
> > that goes through all the registered nodes and outputs that? How would
> > you use it?
> >
>
> For one thin, I could open a few terminals and `$(print_env_for_server
> 5437), so psql just worked.
> I wish PEG had that as well.

Hm. Isn't that coupled with the case where a failure happens then but
tempdirs are cleaned up then? I would expect hackers to run those runs
until the end. If a failure happens, it would then be useful to get a
dump of what happens. However, it seems to me that we can get the same
information by logging all the information when creating a node in the
log file. I have added a routine in this sense, which is called each
time a node is initialized. It seems helpful either way.

> >> 11. a canned "server is responding to queries" helper would be convenient.
> >
> > Do you mean a wrapper on pg_isready? Do you have use cases in mind for it?
> >
>
> Block until recovery is finished, before testing. eliminate races, and
> avoid the stupid sleep(3) I used.

TODO

> >> 4b) server shutdown should perhaps be "smart" by default, or segmented
> >> into calmly_bring_to_a_close(), pull_electric_plug() and
> >> drop_down_the_stairs_into_swimming_pool().
> >
> > Nope, not agreeing here. "immediate" is rather violent to stop a node,
> > hence I have switched it to use "fast" and there is now a
> > teardown_node routine that uses immediate, that's more aimed at
> > cleanup up existing things fiercely.
> >
>
> Ok, not as the default, but possible to request a specific kind of
> shutdown. I needed smart in my case. Plus, in a scenario, you might
> expressly be testing behavior for a specific mode, it needs to be
> controllable.

If your test script is running with a single thread, "fast" or "smart"
would not really make a difference, no?

> > I have as well moved RecoveryTest.pm to src/test/perl so as all the
> > consumers of prove_check can use it by default, and decoupled
> > start_node from make_master and make_*_standby so as it is possible to
> > add for example new parameters to their postgresql.conf and
> > recovery.conf files before starting them.
> >
> > Thanks a lot for the feedback! Attached is an updated patch with all
> > the things mentioned above done. Are included as well the typo fixes
> > you sent upthread.
> > Regards,
> >
>
> Great! I'll look at it and post again if there's more. If any of the
> above extra explanations make it clearer why I suggested some changes
> you didn't like...

I am attaching a new version. I found a small bug in test case 001
when checking if standby 2 has caught up. There is also this dump
function that is helpful. The -i switch in cp command has been removed
as well.
-- 
Michael

diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm
index a4c1737..ea219d7 100644
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@@ -125,38 +125,6 @@ sub check_query
 	}
 }
 
-# Run a query once a second, until it returns 't' (i.e. SQL boolean true).
-sub poll_query_until
-{
-	my ($query, $connstr) = @_;
-
-	my $max_attempts = 30;
-	my $attempts     = 0;
-	my ($stdout, $stderr);
-
-	while ($attempts < $max_attempts)
-	{
-		my $cmd = [ 'psql', '-At', '-c', "$query", '-d', "$connstr" ];
-		my $result = run $cmd, '>', \$stdout, '2>', \$stderr;
-
-		chomp($stdout);
-		$stdout =~ s/\r//g if $Config{osname} eq 'msys';
-		if ($stdout eq "t")
-		{
-			return 1;
-		}
-
-		# Wait a second before retrying.
-		sleep 1;
-		$attempts++;
-	}
-
-	# The query result didn't change in 30 seconds. Give up. Print the stderr
-	# from the last attempt, hopefully that's useful for debugging.
-	diag $stderr;
-	return 0;
-}
-
 sub append_to_file
 {
 	my ($filename, $str) = @_;
diff --git a/src/test/Makefile b/src/test/Makefile
index b713c2c..d6e51eb 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -17,7 +17,7 @@ SUBDIRS = regress isolation modules
 # We don't build or execute examples/, locale/, or thread/ by default,
 # but we do want "make clean" etc to recurse into them.  Likewise for ssl/,
 # because the SSL test suite is not secure to run on a multi-user system.
-ALWAYS_SUBDIRS = examples locale thread ssl
+ALWAYS_SUBDIRS = examples locale thread ssl recovery
 
 # We want to recurse to all subdirs for all standard targets, except that
 # installcheck and install should not recurse into the subdirectory "modules".
diff --git a/src/test/perl/RecoveryTest.pm b/src/test/perl/RecoveryTest.pm
new file mode 100644
index 0000000..0aade6d
--- /dev/null
+++ b/src/test/perl/RecoveryTest.pm
@@ -0,0 +1,375 @@
+package RecoveryTest;
+
+# Set of common routines for recovery regression tests for a PostgreSQL
+# cluster. This includes global variables and methods that can be used
+# by the various set of tests present to set up cluster nodes and
+# configure them according to the test scenario wanted.
+#
+# Cluster nodes can be freely created using initdb or using the existing
+# base backup of another node, with minimum configuration done when the
+# node is created for the first time like having a proper port number.
+# It is then up to the test to decide what to do with the newly-created
+# node.
+#
+# Environment configuration of each node is available through a set
+# of global variables provided by this package, hashed depending on the
+# port number of a node:
+# - connstr_nodes connection string to connect to this node
+# - datadir_nodes to get the data folder of a given node
+# - archive_nodes for the location of the WAL archives of a node
+# - backup_nodes for the location of base backups of a node
+# - applname_nodes, application_name to use for a standby
+#
+# Nodes are identified by their port number, which should be unique
+# for each node of the cluster as it is run locally.
+
+use Cwd;
+use TestLib;
+use Test::More;
+
+use Archive::Tar;
+use IPC::Run qw(run start);
+
+use Exporter 'import';
+
+our @EXPORT = qw(
+	%connstr_nodes
+	%datadir_nodes
+	%backup_nodes
+	%archive_nodes
+	%applname_nodes
+
+	append_to_file
+	backup_node
+	disable_node
+	dump_node_info
+	enable_archiving
+	enable_node
+	enable_restoring
+	enable_streaming
+	get_free_port
+	init_node
+	init_node_from_backup
+	make_master
+	make_warm_standby
+	make_hot_standby
+	restart_node
+	start_node
+	stop_node
+	teardown_node
+);
+
+# Global variables for node data
+%datadir_nodes = {};	# PGDATA folders
+%backup_nodes = {};		# Backup base folder
+%archive_nodes = {};	# Archive base folder
+%connstr_nodes = {};	# Connection strings
+%applname_nodes = {};	# application_name used for standbys
+
+# Database used for each connection attempt via psql
+$ENV{PGDATABASE} = "postgres";
+
+# Tracker of active nodes
+my @active_nodes = ();
+
+# Set of handy routines able to set up a node with different characteristics
+# Enable streaming replication
+sub enable_streaming
+{
+	my $port_root = shift; # Instance to link to
+	my $port_standby = shift;
+
+	append_to_file("$datadir_nodes{ $port_standby }/recovery.conf", qq(
+primary_conninfo='$connstr_nodes{ $port_root } application_name=$applname_nodes{ $port_standby }'
+standby_mode=on
+recovery_target_timeline='latest'
+));
+}
+
+# Enable the use of restore_command from a node
+sub enable_restoring
+{
+	my $port_root = shift; # Instance to link to
+	my $port_standby = shift;
+	my $path = $archive_nodes{ $port_root };
+
+	# Switch path to use slashes on Windows
+	$path =~ tr#\\#/# if ($windows_os);
+	my $copy_command = $windows_os ?
+		"copy \"$path\\\\%f\" \"%p\"" :
+		"cp -i $path/%f %p";
+	append_to_file("$datadir_nodes{ $port_standby }/recovery.conf", qq(
+restore_command='$copy_command'
+standby_mode=on
+recovery_target_timeline='latest'
+));
+}
+
+# Enable WAL archiving on a node
+sub enable_archiving
+{
+	my $port = shift;
+	my $path = $archive_nodes{ $port };
+
+	# Switch path to use slashes on Windows
+	$path =~ tr#\\#/# if ($windows_os);
+	my $copy_command = $windows_os ?
+		"copy \"%p\" \"$path\\\\%f\"" :
+		"cp %p $path/%f";
+
+	# Enable archive_mode and archive_command on node
+	append_to_file("$datadir_nodes{ $port }/postgresql.conf", qq(
+archive_mode = on
+archive_command = '$copy_command'
+));
+}
+
+# Master node initialization
+sub make_master
+{
+	my $port_master = get_free_port();
+	print "# Initializing master node wih port $port_master\n";
+	init_node($port_master);
+	return $port_master;
+}
+
+# Standby node initializations
+# Node only streaming.
+sub make_hot_standby
+{
+	my $port_master = shift;
+	my $backup_name = shift;
+	my $port_standby = get_free_port();
+
+	print "# Initializing streaming mode for node $port_standby from node $port_master\n";
+	init_node_from_backup($port_standby, $port_master, $backup_name);
+
+	# Start second node, streaming from first one
+	enable_streaming($port_master, $port_standby);
+	return $port_standby;
+}
+
+# Node getting WAL only from archives
+sub make_warm_standby
+{
+	my $port_master = shift;
+	my $backup_name = shift;
+	my $port_standby = get_free_port();
+
+	print "# Initializing archive mode for node $port_standby from node $port_master\n";
+	init_node_from_backup($port_standby, $port_master, $backup_name);
+
+	# Start second node, streaming from first one
+	enable_restoring($port_master, $port_standby);
+	return $port_standby;
+}
+
+sub configure_base_node
+{
+	my $port = shift;
+
+	# Make configuration somewhat generic to test recovery
+	append_to_file("$datadir_nodes{ $port }/postgresql.conf", qq(
+port = $port
+wal_level = hot_standby
+max_wal_senders = 5
+wal_keep_segments = 20
+max_wal_size = 128MB
+shared_buffers = 1MB
+wal_log_hints = on
+hot_standby = on
+autovacuum = off
+));
+
+	configure_hba_for_replication($datadir_nodes{ $port });
+}
+
+# Get a port number not in use currently for a new node
+# As port number retrieval is based on the nodes currently running and
+# their presence in the list of registered ports, be sure that the node
+# that is consuming this port number has already been started and that
+# it is not registered yet.
+sub get_free_port
+{
+	my $found = 0;
+	# XXX: Should this part use PG_VERSION_NUM?
+	my $port = 90600 % 16384 + 49152;
+
+	while ($found == 0)
+	{
+		$port++;
+		print "# Checking for port $port\n";
+		my $devnull = $windows_os ? "nul" : "/dev/null";
+		my $ret = system("psql -X -p $port postgres < $devnull");
+		if ($ret != 0)
+		{
+			# Found a potential candidate, check first that it is
+			# not included in the list of registered nodes.
+			$found = 1 if ! grep (/$port/, @array);
+		}
+	}
+
+	print "# Found free port $port\n";
+	# Lock port number found
+	enable_node($port);
+	return $port;
+}
+
+# Low-level routines to initialize a node
+# Initialize a node from scratch
+sub init_node
+{
+	my $port = shift;
+
+	# Save configuration information
+	my $base_dir = TestLib::tempdir;
+	$datadir_nodes{ $port } = "$base_dir/pgdata";
+	print "Base directory $base_dir/pgdata, Port $port\n";
+	$backup_nodes{ $port } = "$base_dir/backup";
+	mkdir $backup_nodes{ $port };
+	$archive_nodes{ $port } =  "$base_dir/archives";
+	mkdir $archive_nodes{ $port };
+	$connstr_nodes{ $port } = "port=$port";
+	$applname_nodes{ $port } = "node_$port";
+
+	# Log some information
+	dump_node_info($port);
+
+	standard_initdb($datadir_nodes{ $port });
+	configure_base_node($port);
+}
+
+# Initialize a node from an existing base backup
+sub init_node_from_backup
+{
+	my ($port, $root_port, $backup_name) = @_;
+
+	my $backup_path =  "$backup_nodes{ $root_port }/$backup_name";
+	my $backup_file = "$backup_path/base.tar";
+
+	print "Initializing node $port from backup \"$backup_name\" of node $root_port\n";
+	# Check existence of backup wanted
+	if ( ! -d $backup_path )
+	{
+		die "Backup $backup_path does not exist";
+	}
+
+	# Save configuration information
+	my $base_dir = TestLib::tempdir;
+	$datadir_nodes{ $port } = "$base_dir/pgdata";
+	mkdir $datadir_nodes{ $port }, 0700;
+	$backup_nodes{ $port } = "$base_dir/backup";
+	mkdir $backup_nodes{ $port };
+	$archive_nodes{ $port } =  "$base_dir/archives";
+	mkdir $archive_nodes{ $port };
+	$connstr_nodes{ $port } = "port=$port";
+	$applname_nodes{ $port } = "node_$port";
+
+	# Log some information
+	dump_node_info($port);
+
+	# Extract the base backup wanted
+	my $current_dir = cwd();
+
+	# Temporary move to the place of extraction
+	chdir "$datadir_nodes{ $port }";
+	Archive::Tar->extract_archive($backup_file);
+	chdir "$current_dir";
+	configure_base_node($port);
+}
+
+# Start a node
+sub start_node
+{
+	my $port = shift;
+	system_or_bail('pg_ctl', '-w', '-D', $datadir_nodes{ $port },
+				   '-l', "$log_path/node_$port.log",
+				   'start');
+}
+
+# Stop a node. This routine can be called during a test.
+sub stop_node
+{
+	my $port = shift;
+	system('pg_ctl', '-D', $datadir_nodes{ $port }, '-m',
+		   'fast', 'stop');
+}
+
+sub restart_node
+{
+	my $port = shift;
+	system_or_bail('pg_ctl', '-w', '-D', $datadir_nodes{ $port }, '-m',
+				   'fast', 'restart');
+}
+
+# Mark given node as active, making the port number used by this node
+# locked for this test.
+sub enable_node
+{
+	my $port = shift;
+	print "# Port $port has been locked for new node\n";
+	push(@active_nodes, $port);
+}
+
+# Disactivate given node, node should have been stopped before calling
+# this routine.
+sub disable_node
+{
+	my $port = shift;
+	print "# Port $port has been disabled\n";
+	@active_nodes = grep { $_ ne $port } @active_nodes;
+}
+
+# Remove any traces of given node.
+sub teardown_node
+{
+	my $port = shift;
+	system('pg_ctl', '-D', $datadir_nodes{ $port }, '-m',
+		   'immediate', 'stop');
+	disable_node($port);
+}
+
+# Create a backup on a node already running
+sub backup_node
+{
+	my ($port, $backup_name) = @_;
+
+	my $backup_path = "$backup_nodes{ $port }/$backup_name";
+
+	print "Taking backup from node $port\n";
+	# Backup a node in tar format, it is more portable across platforms
+	system_or_bail("pg_basebackup -D $backup_path -p $port --format=t -x");
+	print "# Backup finished\n"
+}
+
+# Dump information of a node
+sub dump_node_info
+{
+	my $port = shift;
+	print "New node initialized: ${port}\n";
+	print "Data directory: $datadir_nodes{ $port }\n";
+	print "Backup directory: $backup_nodes{ $port }\n";
+	print "Archive directory: $archive_nodes{ $port }\n";
+	print "Connection string: $connstr_nodes{ $port }\n";
+	print "Application name: applname_nodes{ $port }\n";
+}
+
+# Add a set of parameters to a configuration file
+sub append_to_file
+{
+	my($filename, $str) = @_;
+
+	open my $fh, ">>", $filename or die "could not open file $filename";
+	print $fh $str;
+	close $fh;
+}
+
+END
+{
+	foreach my $port (@active_nodes)
+	{
+		teardown_node($port);
+	}
+}
+
+1;
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 02533eb..0dc0d6d 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -12,6 +12,7 @@ our @EXPORT = qw(
   configure_hba_for_replication
   start_test_server
   restart_test_server
+  poll_query_until
   psql
   slurp_dir
   slurp_file
@@ -220,6 +221,37 @@ END
 	}
 }
 
+sub poll_query_until
+{
+	my ($query, $connstr) = @_;
+
+	my $max_attempts = 30;
+	my $attempts     = 0;
+	my ($stdout, $stderr);
+
+	while ($attempts < $max_attempts)
+	{
+		my $cmd = [ 'psql', '-At', '-c', "$query", '-d', "$connstr" ];
+		my $result = run $cmd, '>', \$stdout, '2>', \$stderr;
+
+		chomp($stdout);
+		$stdout =~ s/\r//g if $Config{osname} eq 'msys';
+		if ($stdout eq "t")
+		{
+			return 1;
+		}
+
+		# Wait a second before retrying.
+		sleep 1;
+		$attempts++;
+	}
+
+	# The query result didn't change in 30 seconds. Give up. Print the stderr
+	# from the last attempt, hopefully that's useful for debugging.
+	diag $stderr;
+	return 0;
+}
+
 sub psql
 {
 	my ($dbname, $sql) = @_;
diff --git a/src/test/recovery/.gitignore b/src/test/recovery/.gitignore
new file mode 100644
index 0000000..499fa7d
--- /dev/null
+++ b/src/test/recovery/.gitignore
@@ -0,0 +1,3 @@
+# Generated by test suite
+/regress_log/
+/tmp_check/
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
new file mode 100644
index 0000000..16c063a
--- /dev/null
+++ b/src/test/recovery/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/recovery
+#
+# Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/recovery/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/recovery
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
diff --git a/src/test/recovery/README b/src/test/recovery/README
new file mode 100644
index 0000000..20b98e0
--- /dev/null
+++ b/src/test/recovery/README
@@ -0,0 +1,19 @@
+src/test/recovery/README
+
+Regression tests for recovery and replication
+=============================================
+
+This directory contains a test suite for recovery and replication,
+testing mainly the interactions of recovery.conf with cluster
+instances by providing a simple set of routines that can be used
+to define a custom cluster for a test, including backup, archiving,
+and streaming configuration.
+
+Running the tests
+=================
+
+    make check
+
+NOTE: This creates a temporary installation, and some tests may
+create one or multiple nodes, be they master or standby(s) for the
+purpose of the tests.
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
new file mode 100644
index 0000000..fd4bfbf
--- /dev/null
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -0,0 +1,62 @@
+# Minimal test testing streaming replication
+use strict;
+use warnings;
+use TestLib;
+use Test::More tests => 4;
+
+use RecoveryTest;
+
+# Initialize master node
+my $port_master = make_master();
+start_node($port_master);
+my $backup_name = 'my_backup';
+
+# Take backup
+backup_node($port_master, $backup_name);
+
+# Create streaming standby linking to master
+my $port_standby_1 = make_hot_standby($port_master, $backup_name);
+start_node($port_standby_1);
+
+# Take backup of standby 1 (not mandatory, but useful to check if
+# pg_basebackup works on a standby).
+backup_node($port_standby_1, $backup_name);
+
+# Create second standby node linking to standby 1
+my $port_standby_2 = make_hot_standby($port_standby_1, $backup_name);
+start_node($port_standby_2);
+
+# Create some content on master and check its presence in standby 1 an
+psql $connstr_nodes{ $port_master },
+	"CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a";
+
+# Wait for standbys to catch up
+my $caughtup_query = "SELECT pg_current_xlog_location() = write_location FROM pg_stat_replication WHERE application_name = '$applname_nodes{ $port_standby_1 }';";
+poll_query_until($caughtup_query, $connstr_nodes{ $port_master })
+	or die "Timed out while waiting for standby 1 to catch up";
+$caughtup_query = "SELECT pg_last_xlog_replay_location() = write_location FROM pg_stat_replication WHERE application_name = '$applname_nodes{ $port_standby_2 }';";
+poll_query_until($caughtup_query, $connstr_nodes{ $port_standby_1 })
+	or die "Timed out while waiting for standby 2 to catch up";
+
+my $result = psql $connstr_nodes{ $port_standby_1 },
+	"SELECT count(*) FROM tab_int";
+print "standby 1: $result\n";
+is($result, qq(1002), 'check streamed content on standby 1');
+
+$result = psql $connstr_nodes{ $port_standby_2 },
+	"SELECT count(*) FROM tab_int";
+print "standby 2: $result\n";
+is($result, qq(1002), 'check streamed content on standby 2');
+
+# Check that only READ-only queries can run on standbys
+command_fails(['psql', '-A', '-t',  '--no-psqlrc',
+	"-d $connstr_nodes{ $port_standby_1 }", '-c', "INSERT INTO tab_int VALUES (1)"],
+	'Read-only queries on standby 1');
+command_fails(['psql', '-A', '-t',  '--no-psqlrc',
+	"-d $connstr_nodes{ $port_standby_2 }", '-c', "INSERT INTO tab_int VALUES (1)"],
+	'Read-only queries on standby 2');
+
+# Cleanup nodes
+teardown_node($port_standby_2);
+teardown_node($port_standby_1);
+teardown_node($port_master);
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
new file mode 100644
index 0000000..2356394
--- /dev/null
+++ b/src/test/recovery/t/002_archiving.pl
@@ -0,0 +1,52 @@
+# test for archiving with warm standby
+use strict;
+use warnings;
+use TestLib;
+use Test::More tests => 1;
+use File::Copy;
+use RecoveryTest;
+
+# Initialize master node, doing archives
+my $port_master = get_free_port();
+my $backup_name = 'my_backup';
+init_node($port_master);
+enable_archiving($port_master);
+
+# Start it
+start_node($port_master);
+
+# Take backup for slave
+backup_node($port_master, 'my_backup');
+
+# Initialize standby node from backup, fetching WAL from archives
+my $port_standby = make_warm_standby($port_master, $backup_name);
+append_to_file("$datadir_nodes{ $port_standby }/postgresql.conf", qq(
+wal_retrieve_retry_interval = '100ms'
+));
+start_node($port_standby);
+
+# Create some content on master
+psql $connstr_nodes{ $port_master },
+	"CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a";
+my $current_lsn = psql $connstr_nodes{ $port_master },
+	"SELECT pg_current_xlog_location();";
+
+# Force archiving of WAL file to make it present on master
+psql $connstr_nodes{ $port_master }, "SELECT pg_switch_xlog()";
+
+# Add some more content, it should not be present on standby
+psql $connstr_nodes{ $port_master },
+	"INSERT INTO tab_int VALUES (generate_series(1001,2000))";
+
+# Wait until necessary replay has been done on standby
+my $caughtup_query = "SELECT '$current_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
+poll_query_until($caughtup_query, $connstr_nodes{ $port_standby })
+	or die "Timed out while waiting for standby to catch up";
+
+my $result = psql $connstr_nodes{ $port_standby },
+	"SELECT count(*) FROM tab_int";
+is($result, qq(1000), 'check content from archives');
+
+# Cleanup nodes
+teardown_node($port_standby);
+teardown_node($port_master);
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
new file mode 100644
index 0000000..bfc894f
--- /dev/null
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -0,0 +1,139 @@
+# Test for recovery targets: name, timestamp, XID
+use strict;
+use warnings;
+use TestLib;
+use Test::More tests => 7;
+
+use RecoveryTest;
+
+# Create and test a standby from given backup, with a certain
+# recovery target.
+sub test_recovery_standby
+{
+	my $test_name = shift;
+	my $port_master = shift;
+	my $recovery_params = shift;
+	my $num_rows = shift;
+	my $until_lsn = shift;
+
+	my $port_standby = get_free_port();
+
+	init_node_from_backup($port_standby, $port_master, 'my_backup');
+	enable_restoring($port_master, $port_standby);
+
+	foreach my $param_item (@$recovery_params)
+	{
+		append_to_file("$datadir_nodes{ $port_standby }/recovery.conf",
+					   qq($param_item
+));
+	}
+
+	start_node($port_standby);
+
+	# Wait until standby has replayed enough data
+	my $caughtup_query = "SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
+	poll_query_until($caughtup_query, $connstr_nodes{ $port_standby })
+		or die "Timed out while waiting for standby to catch up";
+
+	# Create some content on master and check its presence in standby
+	my $result = psql $connstr_nodes{ $port_standby },
+		"SELECT count(*) FROM tab_int";
+	is($result, qq($num_rows), "check standby content for $test_name");
+
+	# Stop standby node
+	teardown_node($port_standby);
+}
+
+# Initialize master node
+my $port_master = get_free_port();
+init_node($port_master);
+enable_archiving($port_master);
+
+# Start it
+start_node($port_master);
+
+# Create data before taking the backup, aimed at testing
+# recovery_target = 'immediate'
+psql $connstr_nodes{ $port_master },
+	"CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a";
+my $lsn1 = psql $connstr_nodes{ $port_master },
+	"SELECT pg_current_xlog_location();";
+
+# Take backup from which all operations will be run
+backup_node($port_master, 'my_backup');
+
+# Insert some data with used as a replay reference, with a recovery
+# target TXID.
+psql $connstr_nodes{ $port_master },
+	"INSERT INTO tab_int VALUES (generate_series(1001,2000))";
+my $recovery_txid = psql $connstr_nodes{ $port_master },
+	"SELECT txid_current()";
+my $lsn2 = psql $connstr_nodes{ $port_master },
+	"SELECT pg_current_xlog_location();";
+
+# More data, with recovery target timestamp
+psql $connstr_nodes{ $port_master },
+	"INSERT INTO tab_int VALUES (generate_series(2001,3000))";
+my $recovery_time = psql $connstr_nodes{ $port_master }, "SELECT now()";
+my $lsn3 = psql $connstr_nodes{ $port_master },
+	"SELECT pg_current_xlog_location();";
+
+# Even more data, this time with a recovery target name
+psql $connstr_nodes{ $port_master },
+	"INSERT INTO tab_int VALUES (generate_series(3001,4000))";
+my $recovery_name = "my_target";
+my $lsn4 = psql $connstr_nodes{ $port_master },
+	"SELECT pg_current_xlog_location();";
+psql $connstr_nodes{ $port_master },
+	"SELECT pg_create_restore_point('$recovery_name')";
+
+# Force archiving of WAL file
+psql $connstr_nodes{ $port_master }, "SELECT pg_switch_xlog()";
+
+# Test recovery targets
+my @recovery_params = ( "recovery_target = 'immediate'" );
+test_recovery_standby('immediate target', $port_master,
+					  \@recovery_params,
+					  "1000", $lsn1);
+@recovery_params = ( "recovery_target_xid = '$recovery_txid'" );
+test_recovery_standby('XID', $port_master,
+					  \@recovery_params,
+					  "2000", $lsn2);
+@recovery_params = ( "recovery_target_time = '$recovery_time'" );
+test_recovery_standby('Time', $port_master,
+					  \@recovery_params,
+					  "3000", $lsn3);
+@recovery_params = ( "recovery_target_name = '$recovery_name'" );
+test_recovery_standby('Name', $port_master,
+					  \@recovery_params,
+					  "4000", $lsn4);
+
+# Multiple targets
+# Last entry has priority (note that an array respects the order of items
+# not hashes).
+@recovery_params = (
+	"recovery_target_name = '$recovery_name'",
+	"recovery_target_xid  = '$recovery_txid'",
+	"recovery_target_time = '$recovery_time'"
+);
+test_recovery_standby('Name + XID + Time', $port_master,
+					  \@recovery_params,
+					  "3000", $lsn3);
+@recovery_params = (
+	"recovery_target_time = '$recovery_time'",
+	"recovery_target_name = '$recovery_name'",
+	"recovery_target_xid  = '$recovery_txid'"
+);
+test_recovery_standby('Time + Name + XID', $port_master,
+					  \@recovery_params,
+					  "2000", $lsn2);
+@recovery_params = (
+	"recovery_target_xid  = '$recovery_txid'",
+	"recovery_target_time = '$recovery_time'",
+	"recovery_target_name = '$recovery_name'"
+);
+test_recovery_standby('XID + Time + Name', $port_master,
+					  \@recovery_params,
+					  "4000", $lsn4);
+
+teardown_node($port_master);
diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl
new file mode 100644
index 0000000..2455ca4
--- /dev/null
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -0,0 +1,68 @@
+# Tets for timeline switch
+# Encure that a standby is able to follow a newly-promoted standby
+# on a new timeline.
+use strict;
+use warnings;
+use File::Path qw(remove_tree);
+use TestLib;
+use Test::More tests => 1;
+
+use RecoveryTest;
+
+# Initialize master node
+my $port_master = make_master();
+start_node($port_master);
+
+# Take backup
+my $backup_name = 'my_backup';
+backup_node($port_master, $backup_name);
+
+# Create two standbys linking to it
+my $port_standby_1 = make_hot_standby($port_master, $backup_name);
+start_node($port_standby_1);
+my $port_standby_2 = make_hot_standby($port_master, $backup_name);
+start_node($port_standby_2);
+
+# Create some content on master
+psql $connstr_nodes { $port_master },
+	"CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a";
+my $until_lsn = psql $connstr_nodes{ $port_master },
+	"SELECT pg_current_xlog_location();";
+
+# Wait until standby has replayed enough data on standby 1
+my $caughtup_query = "SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
+poll_query_until($caughtup_query, $connstr_nodes{ $port_standby_1 })
+	or die "Timed out while waiting for standby to catch up";
+
+# Stop and remove master, and promote standby 1, switching it to a new timeline
+teardown_node($port_master);
+system_or_bail('pg_ctl', '-w', '-D', $datadir_nodes{ $port_standby_1 },
+			   'promote');
+print "# Promoted standby 1\n";
+
+# Switch standby 2 to replay from standby 1
+remove_tree("$datadir_nodes{ $port_standby_2 }/recovery.conf");
+append_to_file("$datadir_nodes{ $port_standby_2 }/recovery.conf", qq(
+primary_conninfo='$connstr_nodes{ $port_standby_1 }'
+standby_mode=on
+recovery_target_timeline='latest'
+));
+restart_node($port_standby_2);
+
+# Insert some data in standby 1 and check its presence in standby 2
+# to ensure that the timeline switch has been done.
+psql $connstr_nodes{ $port_standby_1 },
+	"INSERT INTO tab_int VALUES (generate_series(1001,2000))";
+$until_lsn = psql $connstr_nodes{ $port_standby_1 },
+	"SELECT pg_current_xlog_location();";
+$caughtup_query = "SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
+poll_query_until($caughtup_query, $connstr_nodes{ $port_standby_2 })
+	or die "Timed out while waiting for standby to catch up";
+
+my $result = psql $connstr_nodes{ $port_standby_2 },
+	"SELECT count(*) FROM tab_int";
+is($result, qq(2000), 'check content of standby 2');
+
+# Stop nodes
+teardown_node($port_standby_2);
+teardown_node($port_standby_1);
diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl
new file mode 100644
index 0000000..76ea60a
--- /dev/null
+++ b/src/test/recovery/t/005_replay_delay.pl
@@ -0,0 +1,54 @@
+# Checks for recovery_min_apply_delay
+use strict;
+use warnings;
+use TestLib;
+use Test::More tests => 2;
+
+use RecoveryTest;
+
+# Initialize master node
+my $port_master = make_master();
+start_node($port_master);
+
+# And some content
+psql $connstr_nodes{ $port_master },
+	"CREATE TABLE tab_int AS SELECT generate_series(1,10) AS a";
+
+# Take backup
+backup_node($port_master, 'my_backup');
+
+# Initialize node from backup
+my $port_standby = get_free_port();
+init_node_from_backup($port_standby, $port_master, 'my_backup');
+
+# Start second node, streaming from first one
+append_to_file("$datadir_nodes{ $port_standby }/recovery.conf", qq(
+primary_conninfo='$connstr_nodes{ $port_master }'
+recovery_min_apply_delay = '2s'
+standby_mode=on
+recovery_target_timeline='latest'
+));
+start_node($port_standby);
+
+# Make new content on master and check its presence in standby
+# depending on the delay of 2s applied above.
+psql $connstr_nodes{ $port_master },
+	"INSERT INTO tab_int VALUES (generate_series(11,20))";
+sleep 1;
+# Here we should have only 10 rows
+my $result = psql $connstr_nodes{ $port_standby },
+	"SELECT count(*) FROM tab_int";
+is($result, qq(10), 'check content with delay of 1s');
+
+# Now wait for replay to complete on standby
+my $until_lsn = psql $connstr_nodes{ $port_master },
+	"SELECT pg_current_xlog_location();";
+my $caughtup_query = "SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
+poll_query_until($caughtup_query, $connstr_nodes{ $port_standby })
+	or die "Timed out while waiting for standby to catch up";
+$result = psql $connstr_nodes{ $port_standby }, "SELECT count(*) FROM tab_int";
+is($result, qq(20), 'check content with delay of 2s');
+
+# Stop nodes
+teardown_node($port_standby);
+teardown_node($port_master);
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index d3d736b..1355508 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -194,8 +194,9 @@ sub tapcheck
 	foreach my $test_path (@$tap_dirs)
 	{
 		# Like on Unix "make check-world", don't run the SSL test suite
-		# automatically.
+		# or the recovery test suite automatically.
 		next if ($test_path =~ /\/src\/test\/ssl\//);
+		next if ($test_path =~ /\/src\/test\/recovery\//);
 
 		my $dir = dirname($test_path);
 		chdir $dir;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

Reply via email to