On 2021/05/19 15:25, Kyotaro Horiguchi wrote:
At Wed, 19 May 2021 11:19:13 +0530, Dilip Kumar <dilipbal...@gmail.com> wrote in
On Wed, May 19, 2021 at 10:16 AM Fujii Masao
<masao.fu...@oss.nttdata.com> wrote:

On 2021/05/18 15:46, Michael Paquier wrote:
On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:
Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode.  It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion.  Could you add one based on
pg_get_wal_replay_pause_state()?

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Even for #2, we can not ensure that whether it will be 'paused' or
'pause requested'.

We often use poll_query_until() to make sure some desired state is
reached.

Yes.

 And, as Michael suggested, the function
pg_get_wal_replay_pause_state() still works at the time of
recovery_end_command.  So a bit more detailed steps are:

IMO this idea is tricky and fragile, so I'm inclined to avoid that if possible.
Attached is the POC patch to add the following tests.

#1. Check that pg_get_wal_replay_pause_state() reports "not paused" at first.
#2. Request to pause archive recovery and wait until it's actually paused.
#3. Request to resume archive recovery and wait until it's actually resumed.
#4. Request to pause archive recovery and wait until it's actually paused.
       Then, check that the paused state ends and promotion continues
       if a promotion is triggered while recovery is paused.

In #4, pg_get_wal_replay_pause_state() is not executed while promotion
is ongoing. #4 checks that pg_is_in_recovery() returns false and
the promotion finishes expectedly in that case. Isn't this test enough for now?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/test/recovery/t/002_archiving.pl 
b/src/test/recovery/t/002_archiving.pl
index c675c0886c..8db7e47d13 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
 use File::Copy;
 
 # Initialize primary node, doing archives
@@ -75,3 +75,42 @@ ok( !-f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
        "RECOVERYHISTORY removed after promotion");
 ok( !-f "$node_standby2_data/pg_wal/RECOVERYXLOG",
        "RECOVERYXLOG removed after promotion");
+
+# Check that archive recovery can be paused or resumed expectedly.
+my $node_standby3 = get_new_node('standby3');
+$node_standby3->init_from_backup($node_primary, $backup_name,
+       has_restoring => 1);
+$node_standby3->start;
+
+# Archive recovery is not yet paused.
+is($node_standby3->safe_psql('postgres',
+       "SELECT pg_get_wal_replay_pause_state()"),
+       'not paused', 'pg_get_wal_replay_pause_state() reports not paused');
+
+# Request to pause archive recovery and wait until it's actually paused.
+$node_standby3->safe_psql('postgres', "SELECT pg_wal_replay_pause()");
+$node_primary->safe_psql('postgres',
+       "INSERT INTO tab_int VALUES (generate_series(2001,2010))");
+$node_standby3->poll_query_until('postgres',
+       "SELECT pg_get_wal_replay_pause_state() = 'paused'")
+       or die "Timed out while waiting for archive recovery to be paused";
+
+# Request to resume archive recovery and wait until it's actually resumed.
+$node_standby3->safe_psql('postgres', "SELECT pg_wal_replay_resume()");
+$node_standby3->poll_query_until('postgres',
+       "SELECT pg_get_wal_replay_pause_state() = 'not paused'")
+       or die "Timed out while waiting for archive recovery to be resumed";
+
+# Check that the paused state ends and promotion continues if a promotion
+# is triggered while recovery is paused.
+$node_standby3->safe_psql('postgres', "SELECT pg_wal_replay_pause()");
+$node_primary->safe_psql('postgres',
+       "INSERT INTO tab_int VALUES (generate_series(2011,2020))");
+$node_standby3->poll_query_until('postgres',
+       "SELECT pg_get_wal_replay_pause_state() = 'paused'")
+  or die "Timed out while waiting for archive recovery to be paused";
+
+$node_standby3->promote;
+$node_standby3->poll_query_until('postgres',
+       "SELECT NOT pg_is_in_recovery()")
+  or die "Timed out while waiting for promotion to finish";

Reply via email to