On Mon, Feb 26, 2024 at 02:01:45PM +0000, Bertrand Drouvot wrote:
> Though [1] mentioned up-thread is not pushed yet, I'm Sharing the POC patch 
> now
> (see the attached).

I have looked at what you have here.

First, in a build where 818fefd8fd is included, this makes the test
script a lot slower.  Most of the logic is quick, but we're spending
10s or so checking that catalog_xmin has advanced.  Could it be
possible to make that faster?

A second issue is the failure mode when 818fefd8fd is reverted.  The
test is getting stuck when we are waiting on the standby to catch up,
until a timeout decides to kick in to fail the test, and all the
previous tests pass.  Could it be possible to make that more
responsive?  I assume that in the failure mode we would get an
incorrect conflict_reason for injection_inactiveslot, succeeding in
checking the failure.

+    my $terminated = 0;
+    for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+    {
+        if ($node_standby->log_contains(
+            'terminating process .* to release replication slot 
\"injection_activeslot\"', $logstart))
+        {
+            $terminated = 1;
+            last;
+        }
+        usleep(100_000);
+    }
+    ok($terminated, 'terminating process holding the active slot is logged 
with injection point');

The LOG exists when we are sure that the startup process is waiting
in the injection point, so this loop could be replaced with something
like:
+   $node_standby->wait_for_event('startup', 'TerminateProcessHoldingSlot');
+   ok( $node_standby->log_contains('terminating process .* .. ', 'termin .. ';)

Nit: the name of the injection point should be
terminate-process-holding-slot rather than
TerminateProcessHoldingSlot, to be consistent with the other ones. 
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to