On Sun, Oct 08, 2023 at 05:48:55PM -0400, Tom Lane wrote:
> There have been intermittent failures on various buildfarm machines
> since this went in.  After seeing one on my own animal mamba [1],
> I tried to reproduce it manually on that machine, and it does
> indeed fail about one time in two.  The buildfarm script is not
> managing to capture the relevant log files, but what I see in a
> manual run is that 001_worker_spi.pl logs this:

Thanks for the logs, I've noticed the failure but could not make any
sense of it based on the lack of information provided from the
buildfarm.  Serinus has complained once, for instance. 

> Since this only seems to happen on slow machines, I'd call it a timing
> problem or race condition.  Unless you want to argue that the race
> should not happen, probably the fix is to make the test script cope
> with this worker_spi_launch() call failing.  As long as we see the
> expected result from wait_for_log, we can be pretty sure the right
> thing happened.

The trick to reproduce the failure is to slow down worker_spi_launch()
before WaitForBackgroundWorkerStartup() with a worker already
registered so as the worker has the time to start and exit because of
the ALLOW_CONNECTIONS restriction.  (SendPostmasterSignal() in
RegisterDynamicBackgroundWorker() interrupts a hardcoded sleep, so
I've just used an on-disk flag.)

Another thing is that we cannot rely on the PID returned by launch()
as it could fail, so $worker3_pid needs to disappear.  If we do that,
I'd rather just switch to a specific database for the tests with
ALLOWCONN rather than reuse "mydb" that could have other workers.  The
attached fixes the issue for me.
--
Michael
diff --git a/src/test/modules/worker_spi/t/001_worker_spi.pl b/src/test/modules/worker_spi/t/001_worker_spi.pl
index 06bb73816f..044e208812 100644
--- a/src/test/modules/worker_spi/t/001_worker_spi.pl
+++ b/src/test/modules/worker_spi/t/001_worker_spi.pl
@@ -105,30 +105,31 @@ postgres|myrole|WorkerSpiMain]),
 ) or die "Timed out while waiting for dynamic bgworkers to be launched";
 
 # Check BGWORKER_BYPASS_ALLOWCONN.
-$node->safe_psql('postgres', q(ALTER DATABASE mydb ALLOW_CONNECTIONS false;));
+$node->safe_psql('postgres', q(CREATE DATABASE noconndb ALLOW_CONNECTIONS false;));
+my $noconndb_id = $node->safe_psql('mydb',
+	"SELECT oid FROM pg_database where datname = 'noconndb';");
 my $log_offset = -s $node->logfile;
 
-# bgworker cannot be launched with connection restriction.
-my $worker3_pid = $node->safe_psql('postgres',
-	qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id);]);
+# worker_spi_launch() may be able to detect that the worker has been
+# stopped, so do not rely on psql_safe().
+$node->psql('postgres',
+	qq[SELECT worker_spi_launch(12, $noconndb_id, $myrole_id);]);
 $node->wait_for_log(
-	qr/database "mydb" is not currently accepting connections/, $log_offset);
+	qr/database "noconndb" is not currently accepting connections/, $log_offset);
 
 $result = $node->safe_psql('postgres',
-	"SELECT count(*) FROM pg_stat_activity WHERE pid = $worker3_pid;");
+	"SELECT count(*) FROM pg_stat_activity WHERE datname = 'noconndb';");
 is($result, '0', 'dynamic bgworker without BYPASS_ALLOWCONN not started');
 
 # bgworker bypasses the connection check, and can be launched.
 my $worker4_pid = $node->safe_psql('postgres',
-	qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id, '{"ALLOWCONN"}');]);
+	qq[SELECT worker_spi_launch(12, $noconndb_id, $myrole_id, '{"ALLOWCONN"}');]);
 ok( $node->poll_query_until(
 		'postgres',
 		qq[SELECT datname, usename, wait_event FROM pg_stat_activity
             WHERE backend_type = 'worker_spi dynamic' AND
             pid IN ($worker4_pid) ORDER BY datname;],
-		qq[mydb|myrole|WorkerSpiMain]),
+		qq[noconndb|myrole|WorkerSpiMain]),
 	'dynamic bgworker with BYPASS_ALLOWCONN started');
 
-$node->safe_psql('postgres', q(ALTER DATABASE mydb ALLOW_CONNECTIONS true;));
-
 done_testing();

Attachment: signature.asc
Description: PGP signature

Reply via email to