Hi all,

Alvaro has reported a rather rare buildfarm failure involving
007_sync_rep.pl to which I have responded here:
https://www.postgresql.org/message-id/20190613060123.gc1...@paquier.xyz

The buildfarm failure is here:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2019-05-12%2020%3A37%3A11

It involves a race condition related to the way the standbys of the
test are stopped and restarted to ensure that they appear in the
correct order in the WAL sender array of the primary, but feel free to
look at the message above for all the details.

Attached is a patch to improve the stability of the test.  The fix I
am proposing is very simple: in order to make sure that a standby is
added into the WAL sender array of the primary, let's check after
pg_stat_replication after a standby is started.  This can be done
consistently with a small wrapper in the tests.

Any thoughts?
--
Michael
diff --git a/src/test/recovery/t/007_sync_rep.pl b/src/test/recovery/t/007_sync_rep.pl
index bba47da17a..653a6bc842 100644
--- a/src/test/recovery/t/007_sync_rep.pl
+++ b/src/test/recovery/t/007_sync_rep.pl
@@ -27,6 +27,23 @@ sub test_sync_state
 	return;
 }
 
+# Start a standby and check that it is registered within the WAL sender
+# array of the given primary.  This polls the primary's pg_stat_replication
+# until the standby is confirmed as registered.
+sub start_standby_and_wait
+{
+	my ($master, $standby) = @_;
+	my $master_name  = $master->name;
+	my $standby_name = $standby->name;
+	my $query =
+	  "SELECT count(1) = 1 FROM pg_stat_replication WHERE application_name = '$standby_name'";
+
+	$standby->start;
+
+	print "### Waiting for standby \"$standby_name\" on \"$master_name\"\n";
+	$master->poll_query_until('postgres', $query);
+}
+
 # Initialize master node
 my $node_master = get_new_node('master');
 $node_master->init(allows_streaming => 1);
@@ -36,23 +53,27 @@ my $backup_name = 'master_backup';
 # Take backup
 $node_master->backup($backup_name);
 
+# Create all the standbys.  Their status on the primary is checked to
+# ensure the ordering of each one of them in the WAL sender array of the
+# primary.
+
 # Create standby1 linking to master
 my $node_standby_1 = get_new_node('standby1');
 $node_standby_1->init_from_backup($node_master, $backup_name,
 	has_streaming => 1);
-$node_standby_1->start;
+start_standby_and_wait($node_master, $node_standby_1);
 
 # Create standby2 linking to master
 my $node_standby_2 = get_new_node('standby2');
 $node_standby_2->init_from_backup($node_master, $backup_name,
 	has_streaming => 1);
-$node_standby_2->start;
+start_standby_and_wait($node_master, $node_standby_2);
 
 # Create standby3 linking to master
 my $node_standby_3 = get_new_node('standby3');
 $node_standby_3->init_from_backup($node_master, $backup_name,
 	has_streaming => 1);
-$node_standby_3->start;
+start_standby_and_wait($node_master, $node_standby_3);
 
 # Check that sync_state is determined correctly when
 # synchronous_standby_names is specified in old syntax.
@@ -82,8 +103,10 @@ $node_standby_1->stop;
 $node_standby_2->stop;
 $node_standby_3->stop;
 
-$node_standby_2->start;
-$node_standby_3->start;
+# Make sure that each standby reports back to the primary in
+# the wanted order.
+start_standby_and_wait($node_master, $node_standby_2);
+start_standby_and_wait($node_master, $node_standby_3);
 
 # Specify 2 as the number of sync standbys.
 # Check that two standbys are in 'sync' state.
@@ -94,7 +117,7 @@ standby3|3|sync),
 	'2(standby1,standby2,standby3)');
 
 # Start standby1
-$node_standby_1->start;
+start_standby_and_wait($node_master, $node_standby_1);
 
 # Create standby4 linking to master
 my $node_standby_4 = get_new_node('standby4');

Attachment: signature.asc
Description: PGP signature

Reply via email to