Hi,

On 4/25/23 6:23 AM, Amit Kapila wrote:
On Mon, Apr 24, 2023 at 3:36 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:

Without the second "pg_log_standby_snapshot()" then 
wait_for_subscription_sync() would be waiting
some time on the poll for "SELECT count(1) = 0 FROM pg_subscription_rel WHERE 
srsubstate NOT IN ('r', 's');"

Adding a comment in V3 to explain the need for the second 
pg_log_standby_snapshot().


Won't this still be unpredictable because it is possible that the
tablesync worker may take more time to get launched or create a
replication slot? If that happens after your second
pg_log_standby_snapshot() then wait_for_subscription_sync() will be
hanging.

Oh right, that looks like a possible scenario.

Wouldn't it be better to create a subscription with
(copy_data = false) to make it predictable and then we won't need
pg_log_standby_snapshot() to be performed twice?

If you agree with the above suggestion then you probably need to move
wait_for_subscription_sync() before Insert.


I like that idea, thanks! Done in V4 attached.

Not related to the above corner case, but while re-reading the patch I also 
added:

"
$node_primary->wait_for_replay_catchup($node_standby);
"

between the publication creation on the primary and the subscription to the 
standby
(to ensure the publication gets replicated before we request for the 
subscription creation).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From ed92ea9bf3385d2fe9e4ef0d8a04b87b1c7e6b3d Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Tue, 25 Apr 2023 06:02:17 +0000
Subject: [PATCH v4 2/2] Add retained WAL test in
 035_standby_logical_decoding.pl

Adding one test, to verify that invalidated logical slots do not lead to
retaining WAL.
---
 .../t/035_standby_logical_decoding.pl         | 78 ++++++++++++++++++-
 1 file changed, 76 insertions(+), 2 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index 03346f44f2..6ae4fc1e02 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -9,6 +9,7 @@ use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
+use Time::HiRes qw(usleep);
 
 my ($stdin,             $stdout,            $stderr,
        $cascading_stdout,  $cascading_stderr,  $subscriber_stdin,
@@ -495,9 +496,82 @@ $node_standby->restart;
 check_slots_conflicting_status(1);
 
 ##################################################
-# Verify that invalidated logical slots do not lead to retaining WAL
+# Verify that invalidated logical slots do not lead to retaining WAL.
 ##################################################
-# XXXXX TODO
+
+# Get the restart_lsn from an invalidated slot
+my $restart_lsn = $node_standby->safe_psql('postgres',
+       "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 
'vacuum_full_activeslot' and conflicting is true;"
+);
+
+chomp($restart_lsn);
+
+# Get the WAL file name associated to this lsn on the primary
+my $walfile_name = $node_primary->safe_psql('postgres',
+       "SELECT pg_walfile_name('$restart_lsn')");
+
+chomp($walfile_name);
+
+# Check the WAL file is still on the primary
+ok(-f $node_primary->data_dir . '/pg_wal/' . $walfile_name,
+       "WAL file still on the primary");
+
+# Get the number of WAL files on the standby
+my $nb_standby_files = $node_standby->safe_psql('postgres',
+       "SELECT COUNT(*) FROM pg_ls_dir('pg_wal')");
+
+chomp($nb_standby_files);
+
+# Switch WAL files on the primary
+my @c = (1 .. $nb_standby_files);
+
+$node_primary->safe_psql('postgres', "create table retain_test(a int)");
+
+for (@c)
+{
+       $node_primary->safe_psql(
+               'postgres', "SELECT pg_switch_wal();
+                                          insert into retain_test values("
+                 . $_ . ");");
+}
+
+# Ask for a checkpoint
+$node_primary->safe_psql('postgres', 'checkpoint;');
+
+# Check that the WAL file has not been retained on the primary
+ok(!-f $node_primary->data_dir . '/pg_wal/' . $walfile_name,
+       "WAL file not on the primary anymore");
+
+# Wait for the standby to catch up
+$node_primary->wait_for_catchup($node_standby);
+
+# Generate another WAL switch, more activity and a checkpoint
+$node_primary->safe_psql(
+       'postgres', "SELECT pg_switch_wal();
+                                         insert into retain_test values(1);");
+$node_primary->safe_psql('postgres', 'checkpoint;');
+
+# Wait for the standby to catch up
+$node_primary->wait_for_catchup($node_standby);
+
+# Verify that the wal file has not been retained on the standby
+my $standby_walfile = $node_standby->data_dir . '/pg_wal/' . $walfile_name;
+
+# We can not test if the WAL file still exists immediately.
+# We need to let some time to the standby to actually "remove" it.
+my $i = 0;
+while (1)
+{
+       last if !-f $standby_walfile;
+       if ($i++ == 10 * $default_timeout)
+       {
+               die
+                 "could not determine if WAL file has been retained or not, 
can't continue";
+       }
+       usleep(100_000);
+}
+
+ok(1, "invalidated logical slots do not lead to retaining WAL");
 
 ##################################################
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
-- 
2.34.1

From 53507b322cd0af79b15cb62eff8e39422a8f98c6 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Mon, 24 Apr 2023 05:13:23 +0000
Subject: [PATCH v4 1/2] Add subscribtion to the standby test in
 035_standby_logical_decoding.pl

Adding one test, to verify that subscribtion to the standby is possible.
---
 src/test/perl/PostgreSQL/Test/Cluster.pm      | 11 ++-
 .../t/035_standby_logical_decoding.pl         | 91 ++++++++++++++++++-
 2 files changed, 99 insertions(+), 3 deletions(-)
   7.5% src/test/perl/PostgreSQL/Test/
  92.4% src/test/recovery/t/

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 6f7f4e5de4..819667d42a 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2644,7 +2644,16 @@ sub wait_for_catchup
        }
        if (!defined($target_lsn))
        {
-               $target_lsn = $self->lsn('write');
+               my $isrecovery = $self->safe_psql('postgres', "SELECT 
pg_is_in_recovery()");
+               chomp($isrecovery);
+               if ($isrecovery eq 't')
+               {
+                       $target_lsn = $self->lsn('replay');
+               }
+               else
+               {
+                       $target_lsn = $self->lsn('write');
+               }
        }
        print "Waiting for replication conn "
          . $standby_name . "'s "
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index b8f5311fe9..03346f44f2 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,12 +10,17 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
-my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, 
$handle, $slot);
+my ($stdin,             $stdout,            $stderr,
+       $cascading_stdout,  $cascading_stderr,  $subscriber_stdin,
+       $subscriber_stdout, $subscriber_stderr, $ret,
+       $handle,            $slot);
 
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 my $node_standby = PostgreSQL::Test::Cluster->new('standby');
 my $node_cascading_standby = 
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout    = IPC::Run::timer(2 * $default_timeout);
 my $res;
 
 # Name for the physical slot on primary
@@ -267,7 +272,8 @@ $node_standby->init_from_backup(
        has_streaming => 1,
        has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
-       qq[primary_slot_name = '$primary_slotname']);
+       qq[primary_slot_name = '$primary_slotname'
+       max_replication_slots = 5]);
 $node_standby->start;
 $node_primary->wait_for_replay_catchup($node_standby);
 $node_standby->safe_psql('testdb', qq[SELECT * FROM 
pg_create_physical_replication_slot('$standby_physical_slotname');]);
@@ -285,6 +291,26 @@ $node_cascading_standby->append_conf('postgresql.conf',
 $node_cascading_standby->start;
 $node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
 
+#######################
+# Initialize subscriber node
+#######################
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+my %psql_subscriber = (
+       'subscriber_stdin'  => '',
+       'subscriber_stdout' => '',
+       'subscriber_stderr' => '');
+$psql_subscriber{run} = IPC::Run::start(
+       [ 'psql', '-XA', '-f', '-', '-d', $node_subscriber->connstr('postgres') 
],
+       '<',
+       \$psql_subscriber{subscriber_stdin},
+       '>',
+       \$psql_subscriber{subscriber_stdout},
+       '2>',
+       \$psql_subscriber{subscriber_stderr},
+       $psql_timeout);
+
 ##################################################
 # Test that logical decoding on the standby
 # behaves correctly.
@@ -365,6 +391,67 @@ is( $node_primary->psql(
     3,
     'replaying logical slot from another database fails');
 
+##################################################
+# Test that we can subscribe on the standby with the publication
+# created on the primary.
+##################################################
+
+# Create a table on the primary
+$node_primary->safe_psql('postgres',
+       "CREATE TABLE tab_rep (a int primary key)");
+
+# Create a table (same structure) on the subscriber node
+$node_subscriber->safe_psql('postgres',
+       "CREATE TABLE tab_rep (a int primary key)");
+
+# Create a publication on the primary
+$node_primary->safe_psql('postgres',
+       "CREATE PUBLICATION tap_pub for table tab_rep");
+
+$node_primary->wait_for_replay_catchup($node_standby);
+
+# Subscribe on the standby
+my $standby_connstr = $node_standby->connstr . ' dbname=postgres';
+
+# Not using safe_psql() here as it would wait for activity on the primary
+# and we wouldn't be able to launch pg_log_standby_snapshot() on the primary
+# while waiting.
+# psql_subscriber() allows to not wait synchronously.
+$psql_subscriber{subscriber_stdin} .=
+  qq[CREATE SUBSCRIPTION tap_sub
+     CONNECTION '$standby_connstr'
+     PUBLICATION tap_pub
+     WITH (copy_data = off);];
+$psql_subscriber{subscriber_stdin} .= "\n";
+
+$psql_subscriber{run}->pump_nb();
+
+# Speed up the subscription creation
+$node_primary->safe_psql('postgres', "SELECT pg_log_standby_snapshot()");
+
+# Explicitly shut down psql instance gracefully - to avoid hangs
+# or worse on windows
+$psql_subscriber{subscriber_stdin} .= "\\q\n";
+$psql_subscriber{run}->finish;
+
+$node_subscriber->wait_for_subscription_sync($node_standby, 'tap_sub');
+
+# Insert some rows on the primary
+$node_primary->safe_psql('postgres',
+       qq[INSERT INTO tab_rep select generate_series(1,10);]);
+
+$node_primary->wait_for_replay_catchup($node_standby);
+$node_standby->wait_for_catchup('tap_sub');
+
+# Check that the subscriber can see the rows inserted in the primary
+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_rep");
+is($result, qq(10), 'check replicated inserts after subscription on standby');
+
+# We do not need the subscription and the subscriber anymore
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+$node_subscriber->stop;
+
 ##################################################
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
 # Scenario 1: hot_standby_feedback off and vacuum FULL
-- 
2.34.1

Reply via email to