Re: pg_receivewal starting position

2021-10-31 Thread Michael Paquier
On Fri, Oct 29, 2021 at 10:13:44AM +0200, Ronan Dunklau wrote:
> We could use a single query on the primary (using the primary's checkpoint 
> LSN 
> instead)  but it feels a bit convoluted just to avoid a query on the standby.

Cheating with pg_walfile_name() running on the primary is fine by me.
One thing that stood out while reading this patch again is that we
can use $standby->slot($archive_slot) to grab the slot's restart_lsn.
I have changed that, and applied it.  So we are done with this
thread.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-29 Thread Ronan Dunklau
Le vendredi 29 octobre 2021, 04:27:51 CEST Michael Paquier a écrit :
> On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote:
> > Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s
> > on average on my machine.
> > I think if you reduce the size of the generate_series batches, this should
> > probably be reduced everywhere. With what we do though, inserting a single
> > line should work just as well, I wonder why we insist on inserting a
> > hundred lines ? I updated your patch with that small modification, it
> > also makes the code less verbose.
> 
> Thanks for the extra numbers.  I have added your suggestions,
> switching the dummy table to use a primary key with different values,
> while on it, as there is an argument that it makes debugging easier,
> and applied the speedup patch.

Thanks !

> 
> >> +$standby->psql('',
> >> +  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
> >> +  replication => 1);
> >> Here as well we could use a restart point to reduce the number of
> >> segments archived.
> > 
> > The restart point should be very close, as we don't generate any activity
> > on the primary between the backup and the slot's creation. I'm not sure
> > adding the complexity of triggering a checkpoint on the primary and
> > waiting for the standby to catch up on it would be that useful.
> 
> Yes, you are right here.  The base backup taken from the primary
> at this point ensures a fresh point.
> 
> +# This test is split in two, using the same standby: one test check the
> +# resume-from-folder case, the other the resume-from-slot one.
> This comment needs a refresh, as the resume-from-folder case is no
> more.
> 

Done.

> +$standby->psql(
> +  'postgres',
> +  "SELECT pg_promote(wait_seconds => 300)");
> This could be $standby->promote.
> 

Oh, didn't know about that. 

> +# Switch wal to make sure it is not a partial file but a complete
> segment.
> +$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
> This INSERT needs a slight change to adapt to the primary key of the
> table.  This one is on me :p

Done. 

> 
> Anyway, is this first segment switch really necessary?  From the data
> archived by pg_receivewal in the command testing the TLI jump, we
> finish with the following contents (contents generated after fixing
> the three INSERTs):
> 0001000B
> 0001000C
> 0002000D
> 0002000E.partial
> 0002.history
> 
> So, even if we don't do the first switch, we'd still have one
> completed segment on the previous timeline, before switching to the
> new timeline and the next segment (pg_receivewal is a bit inconsistent
> with the backend here, by the way, as the first segment on the new
> timeline would map with the last segment of the old timeline, but here
> we have a clean switch as of stop_streaming in pg_receivewal.c).

The first completed segment on the previous timeline comes from the fact we 
stream from the restart point. I removed the switch to use the walfilename of 
the replication slot's restart point instead. This means querying both the 
standby (to get the replication slot's restart_lsn) and the primary (to have 
access to pg_walfile_name). 

We could use a single query on the primary (using the primary's checkpoint LSN 
instead)  but it feels a bit convoluted just to avoid a query on the standby.


-- 
Ronan Dunklau>From eb65a8b5883d903dda78bb66c2e1795b48cafc24 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v17] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 57 +++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 2da200396e..1fe32758a0 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 35;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -208,3 +208,58 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch using a replication slot
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = 

Re: pg_receivewal starting position

2021-10-28 Thread Michael Paquier
On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote:
> Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s on 
> average on my machine. 
> I think if you reduce the size of the generate_series batches, this should 
> probably be reduced everywhere. With what we do though, inserting a single 
> line should work just as well, I wonder why we insist on inserting a hundred 
> lines ? I updated your patch with that small modification, it also makes the 
> code less verbose.

Thanks for the extra numbers.  I have added your suggestions,
switching the dummy table to use a primary key with different values,
while on it, as there is an argument that it makes debugging easier,
and applied the speedup patch.

>> +$standby->psql('',
>> +  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
>> +  replication => 1);
>> Here as well we could use a restart point to reduce the number of
>> segments archived.
> 
> The restart point should be very close, as we don't generate any activity on 
> the primary between the backup and the slot's creation. I'm not sure adding 
> the complexity of triggering a checkpoint on the primary and waiting for the 
> standby to catch up on it would be that useful. 

Yes, you are right here.  The base backup taken from the primary
at this point ensures a fresh point.

+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
This comment needs a refresh, as the resume-from-folder case is no
more.

+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
This could be $standby->promote.

+# Switch wal to make sure it is not a partial file but a complete
segment.
+$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
This INSERT needs a slight change to adapt to the primary key of the
table.  This one is on me :p

Anyway, is this first segment switch really necessary?  From the data
archived by pg_receivewal in the command testing the TLI jump, we
finish with the following contents (contents generated after fixing
the three INSERTs):
0001000B
0001000C
0002000D
0002000E.partial
0002.history

So, even if we don't do the first switch, we'd still have one
completed segment on the previous timeline, before switching to the
new timeline and the next segment (pg_receivewal is a bit inconsistent
with the backend here, by the way, as the first segment on the new
timeline would map with the last segment of the old timeline, but here
we have a clean switch as of stop_streaming in pg_receivewal.c).

+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
No arguments against the second segment switch to ensure the presence
of a full segment on the new TLI, of course.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-28 Thread Ronan Dunklau
Le jeudi 28 octobre 2021, 14:31:36 CEST Michael Paquier a écrit :
> On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote:
> > Sorry I sent an intermediary version of the patch, here is the correct
> > one.
> 
> While looking at this patch, I have figured out a simple way to make
> the tests faster without impacting the coverage.  The size of the WAL
> segments archived is a bit of a bottleneck as they need to be zeroed
> by pg_receivewal at creation.  This finishes by being a waste of time,
> even if we don't flush the data.  So I'd like to switch the test so as
> we use a segment size of 1MB, first.
> 
> A second thing is that we copy too many segments than really needed
> when using the slot's restart_lsn as starting point as RESERVE_WAL
> would use the current redo location, so it seems to me that a
> checkpoint is in order before the slot creation.  A third thing is
> that generating some extra data after the end LSN we want to use makes
> the test much faster at the end.
> 
> With those three methods combined, the test goes down from 11s to 9s
> here.  Attached is a patch I'd like to apply to make the test
> cheaper.

Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s on 
average on my machine. 
I think if you reduce the size of the generate_series batches, this should 
probably be reduced everywhere. With what we do though, inserting a single 
line should work just as well, I wonder why we insist on inserting a hundred 
lines ? I updated your patch with that small modification, it also makes the 
code less verbose.


> 
> I also had a look at your patch.  Here are some comments.
> 
> +# Cleanup the previous stream directories to reuse them
> +unlink glob "'${stream_dir}/*'";
> +unlink glob "'${slot_dir}/*'";
> I think that we'd better choose a different location for the
> archives.  Keeping the segments of the previous tests is useful for
> debugging if a previous step of the test failed.

Ok.
> 
> +$standby->psql('',
> +  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
> +  replication => 1);
> Here as well we could use a restart point to reduce the number of
> segments archived.

The restart point should be very close, as we don't generate any activity on 
the primary between the backup and the slot's creation. I'm not sure adding 
the complexity of triggering a checkpoint on the primary and waiting for the 
standby to catch up on it would be that useful. 
> 
> +# Now, try to resume after the promotion, from the folder.
> +$standby->command_ok(
> +   [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
> +'--slot', $folder_slot, '--no-sync'],
> +   "Stream some wal after promoting, resuming from the folder's position");
> What is called the "resume-from-folder" test in the patch is IMO
> too costly and brings little extra value, requiring two commands of
> pg_receivewal (one to populate the folder and one to test the TLI jump
> from the previously-populated point) to test basically the same thing
> as when the starting point is taken from the slot.  Except that
> restarting from the slot is twice cheaper.  The point of the
> resume-from-folder case is to make sure that we restart from the point
> of the archive folder rather than the slot's restart_lsn, but your
> test fails this part, in fact, because the first command populating
> the archive folder also uses "--slot $folder_slot", updating the
> slot's restart_lsn before the second pg_receivewal uses this slot
> again.
> 
> I think, as a whole, that testing for the case where an archive folder
> is populated (without a slot!), followed by a second command where we
> use a slot that has a restart_lsn older than the archive's location,
> to not be that interesting.  If we include such a test, there is no
> need to include that within the TLI jump part, in my opinion.  So I
> think that we had better drop this part of the patch, and keep only
> the case where we resume from a slot for the TLI jump.

You're right about the test not being that interesting in that case. I thought 
it would be worthwhile to test both cases, but resume_from_folder doesn't 
actually exercise any code that was not already called before (timeline 
computation from segment name).

> The commands of pg_receivewal included in the test had better use -n
> so as there is no infinite loop on failure.

Ok.

-- 
Ronan Dunklau>From ad09605b6f8b2d1cc91e67a255bf469363cad2b3 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Thu, 28 Oct 2021 15:12:04 +0200
Subject: [PATCH v16 2/2] Speed up pg_receivewal tests.

Author: Michael Paquier
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 21 ++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index e0422e2242..1691054f5c 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -15,7 +15,7

Re: pg_receivewal starting position

2021-10-28 Thread Michael Paquier
On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote:
> Sorry I sent an intermediary version of the patch, here is the correct one.

While looking at this patch, I have figured out a simple way to make
the tests faster without impacting the coverage.  The size of the WAL
segments archived is a bit of a bottleneck as they need to be zeroed
by pg_receivewal at creation.  This finishes by being a waste of time,
even if we don't flush the data.  So I'd like to switch the test so as
we use a segment size of 1MB, first.

A second thing is that we copy too many segments than really needed
when using the slot's restart_lsn as starting point as RESERVE_WAL
would use the current redo location, so it seems to me that a
checkpoint is in order before the slot creation.  A third thing is
that generating some extra data after the end LSN we want to use makes
the test much faster at the end.

With those three methods combined, the test goes down from 11s to 9s
here.  Attached is a patch I'd like to apply to make the test
cheaper.

I also had a look at your patch.  Here are some comments.

+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
I think that we'd better choose a different location for the
archives.  Keeping the segments of the previous tests is useful for
debugging if a previous step of the test failed.

+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
Here as well we could use a restart point to reduce the number of
segments archived.

+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+   [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+'--slot', $folder_slot, '--no-sync'],
+   "Stream some wal after promoting, resuming from the folder's position");
What is called the "resume-from-folder" test in the patch is IMO
too costly and brings little extra value, requiring two commands of
pg_receivewal (one to populate the folder and one to test the TLI jump
from the previously-populated point) to test basically the same thing
as when the starting point is taken from the slot.  Except that
restarting from the slot is twice cheaper.  The point of the
resume-from-folder case is to make sure that we restart from the point
of the archive folder rather than the slot's restart_lsn, but your
test fails this part, in fact, because the first command populating
the archive folder also uses "--slot $folder_slot", updating the
slot's restart_lsn before the second pg_receivewal uses this slot
again.

I think, as a whole, that testing for the case where an archive folder
is populated (without a slot!), followed by a second command where we
use a slot that has a restart_lsn older than the archive's location,
to not be that interesting.  If we include such a test, there is no
need to include that within the TLI jump part, in my opinion.  So I
think that we had better drop this part of the patch, and keep only
the case where we resume from a slot for the TLI jump.

The commands of pg_receivewal included in the test had better use -n
so as there is no infinite loop on failure.
--
Michael
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..87edcb6200 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -15,7 +15,7 @@ program_options_handling_ok('pg_receivewal');
 umask(0077);
 
 my $primary = PostgreSQL::Test::Cluster->new('primary');
-$primary->init(allows_streaming => 1);
+$primary->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
 $primary->start;
 
 my $stream_dir = $primary->basedir . '/archive_wal';
@@ -167,7 +167,9 @@ mkdir($slot_dir);
 $slot_name = 'archive_slot';
 
 # Setup the slot, reserving WAL at creation (corresponding to the
-# last redo LSN here, actually).
+# last redo LSN here, actually, so use a checkpoint to reduce the
+# number of segments archived).
+$primary->psql('postgres', 'checkpoint;');
 $primary->psql('postgres',
 	"SELECT pg_create_physical_replication_slot('$slot_name', true);");
 
@@ -182,12 +184,16 @@ my $walfile_streamed = $primary->safe_psql(
 # Switch to a new segment, to make sure that the segment retained by the
 # slot is still streamed.  This may not be necessary, but play it safe.
 $primary->psql('postgres',
-	'INSERT INTO test_table VALUES (generate_series(1,100));');
+	'INSERT INTO test_table VALUES (generate_series(1,10));');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 $nextlsn =
   $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
 chomp($nextlsn);
 
+# Add a bit more data to accelerate the end of the next commands.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,10));');
+
 # Check case where the slot does not exist.
 $primary->command_fails_like(
 	[


signature.asc
Description: PGP signatur

Re: pg_receivewal starting position

2021-10-27 Thread Ronan Dunklau
Le mercredi 27 octobre 2021, 10:00:40 CEST Ronan Dunklau a écrit :
> Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit :
> > +my @walfiles = glob "$slot_dir/*";
> > 
> > This is not used.
> 
> Sorry, fixed in attached version.
> 
> > Each pg_receivewal run stalls for about 10 or more seconds before
> > finishing, which is not great from the standpoint of recently
> > increasing test run time.
> > 
> > Maybe we want to advance LSN a bit, after taking $nextlsn then pass
> > "-s 1" to pg_receivewal.
> 
> I incorrectly assumed it was due to the promotion time without looking into
> it. In fact, you're right the LSN was not incremented after we fetched the
> end lsn, and thus we would wait for quite a while. I fixed that too.
> 
> Thank you for the review !

Sorry I sent an intermediary version of the patch, here is the correct one.

-- 
Ronan Dunklau>From 8fa22687e50f02539119568f98b6a11c077f1774 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v15] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 88 +++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..f561aca8f8 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 39;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -206,3 +206,89 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch.
+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
+
+# Create two replication slots.
+# First one is to make sure we keep the wal for the resume-from-folder case.
+my $folder_slot = "folder_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Second one is for testing the resume-from-slot case
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $walfile_before_promotion = $primary->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+# Populate the stream_dir for the resume-from-folder case.
+$nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+# Switch wal to make sure it is not a partial file but a complete segment.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+
+$standby->run_log(
+[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+  '--slot', $folder_slot, '--no-sync'],
+"Stream some wal before promoting");
+
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+
+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+'--slot', $folder_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the folder's position");
+
+ok(-e "$stream_dir/$walfile_before_promoti

Re: pg_receivewal starting position

2021-10-27 Thread Ronan Dunklau
Le mercredi 27 octobre 2021, 04:17:28 CEST Kyotaro Horiguchi a écrit :
> +my @walfiles = glob "$slot_dir/*";
> 
> This is not used.
> 

Sorry, fixed in attached version.

> Each pg_receivewal run stalls for about 10 or more seconds before
> finishing, which is not great from the standpoint of recently
> increasing test run time.

> Maybe we want to advance LSN a bit, after taking $nextlsn then pass
> "-s 1" to pg_receivewal.

I incorrectly assumed it was due to the promotion time without looking into 
it. In fact, you're right the LSN was not incremented after we fetched the end 
lsn, and thus we would wait for quite a while. I fixed that too.

Thank you for the review !

-- 
Ronan Dunklau>From 04747199b7be896a62021e4f064b2342234427d5 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v14] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 86 +++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..28e4e9b3a4 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 39;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -206,3 +206,87 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch.
+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
+
+# Create two replication slots.
+# First one is to make sure we keep the wal for the resume-from-folder case.
+my $folder_slot = "folder_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Second one is for testing the resume-from-slot case
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $walfile_before_promotion = $primary->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+# Switch wal to make sure it is not a partial file but a complete segment.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+
+# Populate the stream_dir for the resume-from-folder case.
+$nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->run_log(
+[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+  '--slot', $folder_slot, '--no-sync'],
+"Stream some wal before promoting");
+
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+'--slot', $folder_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the folder's position");
+
+ok(-e "$stream_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/$walfile_after_promotion",
+  "WAL from the new timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/0002.history",
+  "Timeline history file 

Re: pg_receivewal starting position

2021-10-26 Thread Kyotaro Horiguchi
At Wed, 27 Oct 2021 11:17:28 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Each pg_receivewal run stalls for about 10 or more seconds before
> finishing, which is not great from the standpoint of recently
> increasing test run time.
> 
> Maybe we want to advance LSN a bit, after taking $nextlsn then pass
> "-s 1" to pg_receivewal.

Hmm. Sorry, my fingers slipped.  We don't need '-s 1' for this purpose.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_receivewal starting position

2021-10-26 Thread Kyotaro Horiguchi
At Tue, 26 Oct 2021 11:01:46 +0200, Ronan Dunklau  
wrote in 
> Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit :
> > Yes, I will try to simplify the logic of the patch I sent last week. I'll
> > keep you posted here soon.
> 
> I was able to simplify it quite a bit, by using only one standby for both 
> test 
> scenarios.
> 
> This test case verify that after a timeline switch, if we resume from a 
> previous state we will archive: 
>  - segments from the old timeline
>  - segments from the new timeline
>  - the timeline history file itself.
> 
> I chose to check against a full segment from the previous timeline, but it 
> would have been possible to check that the latest timeline segment was 
> partial. I chose not not, in the unlikely event we promote at an exact 
> segment 
> boundary. I don't think it matters much, since partial wal files are already 
> covered by other tests.

+my @walfiles = glob "$slot_dir/*";

This is not used.

Each pg_receivewal run stalls for about 10 or more seconds before
finishing, which is not great from the standpoint of recently
increasing test run time.

Maybe we want to advance LSN a bit, after taking $nextlsn then pass
"-s 1" to pg_receivewal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_receivewal starting position

2021-10-26 Thread Ronan Dunklau
Le mardi 26 octobre 2021, 08:27:47 CEST Ronan Dunklau a écrit :
> Yes, I will try to simplify the logic of the patch I sent last week. I'll
> keep you posted here soon.

I was able to simplify it quite a bit, by using only one standby for both test 
scenarios.

This test case verify that after a timeline switch, if we resume from a 
previous state we will archive: 
 - segments from the old timeline
 - segments from the new timeline
 - the timeline history file itself.

I chose to check against a full segment from the previous timeline, but it 
would have been possible to check that the latest timeline segment was 
partial. I chose not not, in the unlikely event we promote at an exact segment 
boundary. I don't think it matters much, since partial wal files are already 
covered by other tests.

-- 
Ronan Dunklau>From 0daf03e7e69aa11accb18df86d686004002115ff Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v13] Add a test for pg_receivewal following timeline switch.

pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 87 +++-
 1 file changed, 86 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..f285e9b3f3 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 39;
 
 program_help_ok('pg_receivewal');
 program_version_ok('pg_receivewal');
@@ -206,3 +206,88 @@ $primary->command_ok(
 	"WAL streamed from the slot's restart_lsn");
 ok(-e "$slot_dir/$walfile_streamed",
 	"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch.
+# This test is split in two, using the same standby: one test check the
+# resume-from-folder case, the other the resume-from-slot one.
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
+
+# Create two replication slots.
+# First one is to make sure we keep the wal for the resume-from-folder case.
+my $folder_slot = "folder_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Second one is for testing the resume-from-slot case
+my $archive_slot = "archive_slot";
+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $walfile_before_promotion = $primary->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+# Switch wal to make sure it is not a partial file but a complete segment.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+
+# Populate the stream_dir for the resume-from-folder case.
+$nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->run_log(
+[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+  '--slot', $folder_slot, '--no-sync'],
+"Stream some wal before promoting");
+
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->psql(
+  'postgres',
+  "SELECT pg_promote(wait_seconds => 300)");
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+  "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+$standby->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,100));');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+
+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+	[ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+'--slot', $folder_slot, '--no-sync'],
+	"Stream some wal after promoting, resuming from the folder's position");
+
+ok(-e "$stream_dir/$walfile_before_promotion",
+  "WAL from the old timeline has been archived resuming from the folder");
+ok(-e "$stream_dir/$walfile_after_promotion",
+  "WAL from the 

Re: pg_receivewal starting position

2021-10-25 Thread Ronan Dunklau
Le mardi 26 octobre 2021, 03:15:40 CEST Michael Paquier a écrit :
> I have changed the patch per Ronan's suggestion to have the version
> check out of GetSlotInformation(), addressed what you have reported,
> and the result looked good.  So I have applied this part.

Thanks !
> 
> What remains on this thread is the addition of new tests to make sure
> that pg_receivewal is able to follow a timeline switch.  Now that we
> can restart from a slot that should be a bit easier to implemented as
> a test by creating a slot on a standby.  Ronan, are you planning to
> send a new patch for this part?

Yes, I will try to simplify the logic of the patch I sent last week. I'll keep 
you posted here soon.


-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 05:46:57PM +0530, Bharath Rupireddy wrote:
> StreamLog() isn't reached for create and drop slot cases, see [1]. I
> suggest to remove replication_slot != NULL and have Assert(slot_name)
> in GetSlotInformation():
> /*
>  * Try to get the starting point from the slot.  This is supported in
>  * PostgreSQL 15 and up.
>  */
> if (PQserverVersion(conn) >= 15)
> {
> if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
> &stream.timeline))
> {
> /* Error is logged by GetSlotInformation() */
> return;
> }
> }

Please note that it is possible to use pg_receivewal without a slot,
which is the default case, so we cannot do what you are suggesting
here.  An assertion on slot_name in GetSlotInformation() is not that
helpful either in my opinion, as we would just crash a couple of lines
down the road.

I have changed the patch per Ronan's suggestion to have the version
check out of GetSlotInformation(), addressed what you have reported,
and the result looked good.  So I have applied this part.

What remains on this thread is the addition of new tests to make sure
that pg_receivewal is able to follow a timeline switch.  Now that we
can restart from a slot that should be a bit easier to implemented as
a test by creating a slot on a standby.  Ronan, are you planning to
send a new patch for this part?
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-25 Thread Bharath Rupireddy
On Mon, Oct 25, 2021 at 4:19 PM Michael Paquier  wrote:
> > 6) Why do we need these two assignements?
> > I think we can just get rid of lsn_loc and tli_loc, initlaize
> > *restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
> > the function and directly assign the requrested values to *restart_lsn
> > and *restart_tli, also see comment (8).
>
> FWIW, I find the style of the patch easier to follow.

Then, please change if (*restart_lsn) and if (*restart_tli) to if
(restart_lsn) and if (restart_tli), just to be consistent with the
other parts of the patch and the existing code of RunIdentifySystem():
if (*restart_lsn)
*restart_lsn = lsn_loc;
if (restart_tli != NULL)
*restart_tli = tli_loc;

> > 11) Will replication_slot ever be NULL? If it ever be null, then we
> > don't reach this far right? We see the pg_log_error("%s needs a slot
> > to be specified using --slot". Please revmove below if condition:
> > + * server may not support this option.
>
> Did you notice that this applies when creating or dropping a slot, for
> code paths entirely different than what we are dealing with here?

StreamLog() isn't reached for create and drop slot cases, see [1]. I
suggest to remove replication_slot != NULL and have Assert(slot_name)
in GetSlotInformation():
/*
 * Try to get the starting point from the slot.  This is supported in
 * PostgreSQL 15 and up.
 */
if (PQserverVersion(conn) >= 15)
{
if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline))
{
/* Error is logged by GetSlotInformation() */
return;
}
}

Here is another comment on the patch:
Remove the extra new line above the GetSlotInformation() definition:
  return true;
 }

+-> REMOVE THIS
+/*
+ * Run READ_REPLICATION_SLOT through a given connection and give back to

Apart from the above v12 patch LGTM.

[1]
/*
 * Drop a replication slot.
 */
if (do_drop_slot)
{
if (verbose)
pg_log_info("dropping replication slot \"%s\"", replication_slot);

if (!DropReplicationSlot(conn, replication_slot))
exit(1);
exit(0);
}

/* Create a replication slot */
if (do_create_slot)
{
if (verbose)
pg_log_info("creating replication slot \"%s\"", replication_slot);

if (!CreateReplicationSlot(conn, replication_slot, NULL,
false, true, false,
   slot_exists_ok, false))
exit(1);
exit(0);
}

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 02:40:05PM +0530, Bharath Rupireddy wrote:
> 2) I think we should tweak the below error message
> to
> pg_log_error("could not read replication slot \"%s\": %s",
>  "IDENTIFY_SYSTEM", PQerrorMessage(conn));
> Having slot name in the error message helps to isolate the error
> message from tons of server logs that gets generated.

Yes, this suggestion makes sense.

> 3) Also for the same reasons stated as above, change the below error message
> pg_log_error("could not read replication slot: got %d rows and %d
> fields, expected %d rows and %d or more fields",
> to
> pg_log_error("could not read replication slot  \"%s\": got %d rows and
> %d fields, expected %d rows and %d or more fields", slot_name,

We can even get rid of "or more" to match the condition used.

> 4) Also for the same reasons, change below
> + pg_log_error("could not parse slot's restart_lsn \"%s\"",
> to
> pg_log_error("could not parse replicaton slot  \"%s\" restart_lsn \"%s\"",
>  slot_name, PQgetvalue(res, 0, 1));

Appending the slot name makes sense.

> 5) I think we should also have assertion for the timeline id:
> Assert(stream.startpos != InvalidXLogRecPtr);
> Assert(stream.timeline!= 0);

Okay.

> 6) Why do we need these two assignements?
> I think we can just get rid of lsn_loc and tli_loc, initlaize
> *restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
> the function and directly assign the requrested values to *restart_lsn
> and *restart_tli, also see comment (8).

FWIW, I find the style of the patch easier to follow.

> 9) 80char limit crossed:
> +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
> *restart_lsn, TimeLineID *restart_tli)

pgindent says nothing.

> 10) Missing word "command", and use "issued to the server", so change the 
> below:
> +  READ_REPLICATION_SLOT is issued to retrieve the
> to
> +  READ_REPLICATION_SLOT command is issued to
> the server to retrieve the

Okay.

> 11) Will replication_slot ever be NULL? If it ever be null, then we
> don't reach this far right? We see the pg_log_error("%s needs a slot
> to be specified using --slot". Please revmove below if condition:
> + * server may not support this option.

Did you notice that this applies when creating or dropping a slot, for
code paths entirely different than what we are dealing with here?
--
Michael
From bc38918b44e3e41836bb075e7f915dcc40fddcb9 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 25 Oct 2021 19:48:11 +0900
Subject: [PATCH v12] Allow pg_receivewal to stream from a slot's restart LSN

Prior to this patch, when running pg_receivewal, the streaming start
point would be the current location of the archives if anything is
found, and pg_receivewal would fall back to the current WAL flush
location if there are no archives, as of the result of an
IDENTIFY_SYSTEM command.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.  So this makes
pg_receivewal use the following sequence of methods to determine the
starting streaming LSN:
- Scan the local archives.
- Use the slot's restart_lsn, if supported.
- Fallback to the current flush LSN.

To keep compatibility with prior server versions, we only attempt to use
READ_REPLICATION_SLOT if the backend version is at least 15, and
fallback to the older behavior of streaming from the current flush
LSN if the command is not supported.

Some new TAP tests are added to cover this feature.
---
 src/bin/pg_basebackup/pg_receivewal.c| 31 ++-
 src/bin/pg_basebackup/streamutil.c   | 98 
 src/bin/pg_basebackup/streamutil.h   |  3 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 53 ++-
 doc/src/sgml/ref/pg_receivewal.sgml  | 11 +++
 5 files changed, 192 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..04ba20b197 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -404,15 +404,40 @@ StreamLog(void)
 		exit(1);
 
 	/*
-	 * Figure out where to start streaming.
+	 * Figure out where to start streaming.  First scan the local directory.
 	 */
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/*
+		 * Try to get the starting point from the slot if any.  This is
+		 * supported in PostgreSQL 15 and newer.
+		 */
+		if (replication_slot != NULL &&
+			PQserverVersion(conn) >= 15)
+		{
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
+	&stream.timeline))
+			{
+/* Error is logged by GetSlotInformation() */
+return;
+			}
+		}
+
+		/*
+		 * If it the starting point is still not known

Re: pg_receivewal starting position

2021-10-25 Thread Bharath Rupireddy
On Mon, Oct 25, 2021 at 12:21 PM Michael Paquier  wrote:
> Attached is the updated patch I am finishing with, which is rather
> clean now.  I have tweaked a couple of things while on it, and
> documented better the new GetSlotInformation() in streamutil.c.

Thanks for the v11 patch, here are some comments:

1) Remove the extra whitespace in between "t" and ":"
+ pg_log_error("could not read replication slot : %s",

2) I think we should tweak the below error message
+ pg_log_error("could not read replication slot \"%s\": %s",
slot_name, PQerrorMessage(conn));
to
pg_log_error("could not read replication slot \"%s\": %s",
 "IDENTIFY_SYSTEM", PQerrorMessage(conn));
Having slot name in the error message helps to isolate the error
message from tons of server logs that gets generated.

3) Also for the same reasons stated as above, change the below error message
pg_log_error("could not read replication slot: got %d rows and %d
fields, expected %d rows and %d or more fields",
to
pg_log_error("could not read replication slot  \"%s\": got %d rows and
%d fields, expected %d rows and %d or more fields", slot_name,

4) Also for the same reasons, change below
+ pg_log_error("could not parse slot's restart_lsn \"%s\"",
to
pg_log_error("could not parse replicaton slot  \"%s\" restart_lsn \"%s\"",
 slot_name, PQgetvalue(res, 0, 1));

5) I think we should also have assertion for the timeline id:
Assert(stream.startpos != InvalidXLogRecPtr);
Assert(stream.timeline!= 0);

6) Why do we need these two assignements?
+ if (*restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;

+ /* Assign results if requested */
+ if (restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli)
+ *restart_tli = tli_loc;

I think we can just get rid of lsn_loc and tli_loc, initlaize
*restart_lsn = InvalidXLogRecPtr and *restart_tli = 0 at the start of
the function and directly assign the requrested values to *restart_lsn
and *restart_tli, also see comment (8).

7) Let's be consistent, change the following
+
+ if (*restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;
to
+
+ if (restart_lsn)
+ *restart_lsn = lsn_loc;
+ if (restart_tli != NULL)
+ *restart_tli = tli_loc;

8) Let's extract the values asked by the caller, change:
+ /* restart LSN */
+ if (!PQgetisnull(res, 0, 1))

+ /* current TLI */
+ if (!PQgetisnull(res, 0, 2))
+ tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2));

to

+ /* restart LSN */
+ if (restart_lsn && !PQgetisnull(res, 0, 1))

+ /* current TLI */
+ if (restart_tli && !PQgetisnull(res, 0, 2))

9) 80char limit crossed:
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID *restart_tli)

10) Missing word "command", and use "issued to the server", so change the below:
+  READ_REPLICATION_SLOT is issued to retrieve the
to
+  READ_REPLICATION_SLOT command is issued to
the server to retrieve the

11) Will replication_slot ever be NULL? If it ever be null, then we
don't reach this far right? We see the pg_log_error("%s needs a slot
to be specified using --slot". Please revmove below if condition:
+ * server may not support this option.
+ */
+ if (replication_slot != NULL)

We can just add Assert(slot_name); in GetSlotInformation().

12) How about following:
"If a starting point cannot be calculated with the previous method,
READ_REPLICATION_SLOT command with the provided
slot is issued to the server for retreving the slot's restart_lsn and
timelineid"
instead of
+  and if a replication slot is used, an extra
+  READ_REPLICATION_SLOT is issued to retrieve the
+  slot's restart_lsn to use as starting point.

The IDENTIFY_SYSTEM descritption starts with "If a starting point
cannot be calculated":

 
  If a starting point cannot be calculated with the previous method,
  the latest WAL flush location is used as reported by the server from
  a IDENTIFY_SYSTEM command.

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 10:24:46AM +0200, Ronan Dunklau wrote:
> However, if we want to support the case of "just check if the slot exists", 
> we 
> need to make sure the command is actually executed, and check the version 
> before calling the function, which would make the check executed twice.
> 
> What I'm proposing is just that we let the responsibility of checking 
> PQServerVersion() to the caller, and remove it from GetSlotInformation, ie:
> 
> -   if (replication_slot != NULL)
> +   if (replication_slot != NULL && PQserverVersion(conn) >= 
> 15)
> {
> if (!GetSlotInformation(conn, replication_slot, 
> &stream.startpos,
> 
> &stream.timeline))
> 
> That way, if we introduce a caller wanting to use this function as an API to 
> check a slot exists, the usage of checking the server version beforehand will 
> be consistent.

Ah, good point.  My apologies for not following.  Indeed, the patch
makes this part of the routine a bit blurry.  It is fine by me to do
as you suggest, and let the caller do the version check as you
propose, while making the routine fail if directly called for an older
server.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-25 Thread Ronan Dunklau
Le lundi 25 octobre 2021, 10:10:13 CEST Michael Paquier a écrit :
> On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote:
> > Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :
> >> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
> >>> Does it make sense though ? The NULL slot_name case handling is pretty
> >>> straight forward has it will be handled by string formatting, but in the
> >>> case of a null restart_lsn, we have no way of knowing if the command was
> >>> issued at all.
> >> 
> >> If I am following your point, I don't think that it matters much here,
> >> and it seems useful to me to be able to pass NULL for both of them, so
> >> as one can check if the slot exists or not with an API designed this
> >> way.
> > 
> > You're right, but I'm afraid we would have to check the server version
> > twice in any case different from the basic pg_receivewal on (once in the
> > function itself, and one before calling it if we want a meaningful
> > result). Maybe we should move the version check outside the
> > GetSlotInformation function to avoid this, and let it fail with a syntax
> > error when the server doesn't support it ?
> With the approach taken by the patch, we fall down silently to the
> previous behavior if we connect to a server <= 14, and rely on the new
> behavior with a server >= 15, ensuring compatibility.  Why would you
> want to make sure that the command is executed when we should just
> enforce that the old behavior is what happens when there is a slot
> defined and a backend <= 14?

Sorry I haven't been clear. For the use case of this patch, the current 
approach is perfect (and we supply the restart_lsn).

However, if we want to support the case of "just check if the slot exists", we 
need to make sure the command is actually executed, and check the version 
before calling the function, which would make the check executed twice.

What I'm proposing is just that we let the responsibility of checking 
PQServerVersion() to the caller, and remove it from GetSlotInformation, ie:

-   if (replication_slot != NULL)
+   if (replication_slot != NULL && PQserverVersion(conn) >= 
15)
{
if (!GetSlotInformation(conn, replication_slot, 
&stream.startpos,

&stream.timeline))

That way, if we introduce a caller wanting to use this function as an API to 
check a slot exists, the usage of checking the server version beforehand will 
be consistent.



-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 09:50:01AM +0200, Ronan Dunklau wrote:
> Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :
>> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
>>> Does it make sense though ? The NULL slot_name case handling is pretty
>>> straight forward has it will be handled by string formatting, but in the
>>> case of a null restart_lsn, we have no way of knowing if the command was
>>> issued at all.
>> 
>> If I am following your point, I don't think that it matters much here,
>> and it seems useful to me to be able to pass NULL for both of them, so
>> as one can check if the slot exists or not with an API designed this
>> way.
> 
> You're right, but I'm afraid we would have to check the server version twice 
> in any case different from the basic pg_receivewal on (once in the function 
> itself, and one before calling it if we want a meaningful result). Maybe we 
> should move the version check outside the GetSlotInformation function to 
> avoid 
> this, and let it fail with a syntax error when the server doesn't support it ?

With the approach taken by the patch, we fall down silently to the
previous behavior if we connect to a server <= 14, and rely on the new
behavior with a server >= 15, ensuring compatibility.  Why would you
want to make sure that the command is executed when we should just
enforce that the old behavior is what happens when there is a slot
defined and a backend <= 14?
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-25 Thread Ronan Dunklau
Le lundi 25 octobre 2021, 09:40:10 CEST Michael Paquier a écrit :
> On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
> > Does it make sense though ? The NULL slot_name case handling is pretty
> > straight forward has it will be handled by string formatting, but in the
> > case of a null restart_lsn, we have no way of knowing if the command was
> > issued at all.
> 
> If I am following your point, I don't think that it matters much here,
> and it seems useful to me to be able to pass NULL for both of them, so
> as one can check if the slot exists or not with an API designed this
> way.

You're right, but I'm afraid we would have to check the server version twice 
in any case different from the basic pg_receivewal on (once in the function 
itself, and one before calling it if we want a meaningful result). Maybe we 
should move the version check outside the GetSlotInformation function to avoid 
this, and let it fail with a syntax error when the server doesn't support it ?

-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-10-25 Thread Michael Paquier
On Mon, Oct 25, 2021 at 09:15:32AM +0200, Ronan Dunklau wrote:
> Does it make sense though ? The NULL slot_name case handling is pretty 
> straight forward has it will be handled by string formatting, but in the case 
> of a null restart_lsn, we have no way of knowing if the command was issued at 
> all.

If I am following your point, I don't think that it matters much here,
and it seems useful to me to be able to pass NULL for both of them, so
as one can check if the slot exists or not with an API designed this
way.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-25 Thread Ronan Dunklau
Le lundi 25 octobre 2021, 00:43:11 CEST Michael Paquier a écrit :
> On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote:
> > Thanks. I've no further comments on the v10 patch.
> 
> Okay, thanks.  I have applied this part, then.
> --
> Michael

Thank you all for your work on this patch. 

-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-10-25 Thread Ronan Dunklau
Le lundi 25 octobre 2021, 08:51:23 CEST Michael Paquier a écrit :
> On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> > Done. I haven't touched the timeline switch test patch for now, but I
> > still
> > include it here for completeness.
> 
> As 0001 and 0002 have been applied, I have put my hands on 0003, and
> found a couple of issues upon review.
> 
> +   Assert(slot_name != NULL);
> +   Assert(restart_lsn != NULL);
> There is no need for those asserts, as we should support the case
> where the caller gives NULL for those variables.

Does it make sense though ? The NULL slot_name case handling is pretty 
straight forward has it will be handled by string formatting, but in the case 
of a null restart_lsn, we have no way of knowing if the command was issued at 
all. 

> 
> +   if (PQserverVersion(conn) < 15)
> +   return false;
> Returning false is incorrect for older server versions as we won't
> fallback to the old method when streaming from older server.  What
> this needs to do is return true and set restart_lsn to
> InvalidXLogRecPtr, so as pg_receivewal would just stream from the
> current flush location.  "false" should just be returned on error,
> with pg_log_error().

Thank you, this was an oversight when moving from the more complicated error 
handling code. 

> 
> +$primary->psql('postgres',
> +   'INSERT INTO test_table VALUES (generate_series(1,100));');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$nextlsn =
> +  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
> +chomp($nextlsn);
> There is no need to switch twice to a new WAL segment as we just need
> to be sure that the WAL segment of the restart_lsn is the one
> archived.  Note that RESERVE_WAL uses the last redo point, so it is
> better to use a checkpoint and reduce the number of logs we stream
> into the new location.
> 
> Better to add some --no-sync to the new commands of pg_receivewal, to
> not stress the I/O more than necessary.  I have added some extra -n
> while on it to avoid loops on failure.
> 
> Attached is the updated patch I am finishing with, which is rather
> clean now.  I have tweaked a couple of things while on it, and
> documented better the new GetSlotInformation() in streamutil.c.
> --
> Michael


-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-10-24 Thread Michael Paquier
On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> Done. I haven't touched the timeline switch test patch for now, but I still 
> include it here for completeness.

As 0001 and 0002 have been applied, I have put my hands on 0003, and
found a couple of issues upon review.

+   Assert(slot_name != NULL);
+   Assert(restart_lsn != NULL);
There is no need for those asserts, as we should support the case
where the caller gives NULL for those variables.

+   if (PQserverVersion(conn) < 15)
+   return false;
Returning false is incorrect for older server versions as we won't
fallback to the old method when streaming from older server.  What
this needs to do is return true and set restart_lsn to
InvalidXLogRecPtr, so as pg_receivewal would just stream from the
current flush location.  "false" should just be returned on error,
with pg_log_error().

+$primary->psql('postgres',
+   'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
There is no need to switch twice to a new WAL segment as we just need
to be sure that the WAL segment of the restart_lsn is the one
archived.  Note that RESERVE_WAL uses the last redo point, so it is
better to use a checkpoint and reduce the number of logs we stream
into the new location.

Better to add some --no-sync to the new commands of pg_receivewal, to
not stress the I/O more than necessary.  I have added some extra -n
while on it to avoid loops on failure.

Attached is the updated patch I am finishing with, which is rather
clean now.  I have tweaked a couple of things while on it, and
documented better the new GetSlotInformation() in streamutil.c.
--
Michael
From 3b9c5336866881361ecaa428ae8e57f5ba9aee90 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 25 Oct 2021 15:34:46 +0900
Subject: [PATCH v11] Allow pg_receivewal to stream from a slot's restart LSN

Prior to this patch, when running pg_receivewal, the streaming start
point would be the current location of the archives if anything is
found, and pg_receivewal would fall back to the current WAL flush
location if there are no archives, as of the result of an
IDENTIFY_SYSTEM command.

If for some reason the WAL files from pg_receivewal were moved, we want
to restart where we left at, which is the replication slot's restart_lsn
instead of skipping right to the current flush location.  So this makes
pg_receivewal use the following sequence of methods to determine the
starting streaming LSN:
- Scan the local archives.
- Use the slot's restart_lsn, if supported.
- Fallback to the current flush LSN.

To keep compatibility with prior server versions, we only attempt to use
READ_REPLICATION_SLOT if the backend version is at least 15, and
fallback to the older behavior of streaming from the current flush
LSN if the command is not supported.

Some new TAP tests are added to cover this feature.
---
 src/bin/pg_basebackup/pg_receivewal.c|  29 +-
 src/bin/pg_basebackup/streamutil.c   | 104 +++
 src/bin/pg_basebackup/streamutil.h   |   1 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  53 +-
 doc/src/sgml/ref/pg_receivewal.sgml  |  11 ++
 5 files changed, 194 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index d5140a79fe..85f2ce53a6 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -404,15 +404,38 @@ StreamLog(void)
 		exit(1);
 
 	/*
-	 * Figure out where to start streaming.
+	 * Figure out where to start streaming.  First scan the local directory.
 	 */
 	stream.startpos = FindStreamingStart(&stream.timeline);
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/*
+		 * Try to get the starting point from the slot if any, the upstream
+		 * server may not support this option.
+		 */
+		if (replication_slot != NULL)
+		{
+			if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
+	&stream.timeline))
+			{
+/* Error is logged by GetSlotInformation() */
+return;
+			}
+		}
+
+		/*
+		 * If it the starting point is still not known, use the current WAL
+		 * flush value as last resort.
+		 */
+		if (stream.startpos == InvalidXLogRecPtr)
+		{
+			stream.startpos = serverpos;
+			stream.timeline = servertli;
+		}
 	}
 
+	Assert(stream.startpos != InvalidXLogRecPtr);
+
 	/*
 	 * Always start streaming at the beginning of a segment
 	 */
diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c
index a9bc1ce214..3d3250a3f5 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -479,6 +479,110 @@ RunIdentifySystem(PGconn *conn, char **sysid, TimeLineID *starttli,
 	return true;
 }
 
+
+/*
+ * Run REA

Re: pg_receivewal starting position

2021-10-24 Thread Michael Paquier
On Sun, Oct 24, 2021 at 07:20:57PM +0530, Bharath Rupireddy wrote:
> Thanks. I've no further comments on the v10 patch.

Okay, thanks.  I have applied this part, then.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-24 Thread Bharath Rupireddy
On Sun, Oct 24, 2021 at 12:21 PM Michael Paquier  wrote:
>
> On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote:
> > pg_get_replication_slots holds the ReplicationSlotControlLock until
> > the end of the function so it can be assured that *slot contents will
> > not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
> > released immediately after taking *slot pointer into slot_contents.
> > Isn't it better if we hold the lock until the end of the function so
> > that we can avoid the slot contents becoming stale problems?
>
> The reason is different in the case of pg_get_replication_slots().  We
> have to hold ReplicationSlotControlLock for the whole duration of the
> shared memory scan to return back to the user a consistent set of
> information to the user, for all the slots.

Thanks. I've no further comments on the v10 patch.

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-10-23 Thread Michael Paquier
On Sun, Oct 24, 2021 at 09:08:01AM +0530, Bharath Rupireddy wrote:
> pg_get_replication_slots holds the ReplicationSlotControlLock until
> the end of the function so it can be assured that *slot contents will
> not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
> released immediately after taking *slot pointer into slot_contents.
> Isn't it better if we hold the lock until the end of the function so
> that we can avoid the slot contents becoming stale problems?

The reason is different in the case of pg_get_replication_slots().  We
have to hold ReplicationSlotControlLock for the whole duration of the
shared memory scan to return back to the user a consistent set of
information to the user, for all the slots.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-23 Thread Bharath Rupireddy
On Sun, Oct 24, 2021 at 4:40 AM Michael Paquier  wrote:
> On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:
> > 2) As I previously mentioned, we are not copying the slot contents
> > while holding the spinlock, it's just we are taking the memory address
> > and releasing the lock, so there is a chance that the memory we are
> > looking at can become unavailable or stale while we access
> > slot_contents. So, I suggest we do the memcpy of the *slot to
> > slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
> > contents will be costlier, so let's just take the info that we need
> > (data.database, data.restart_lsn) into local variables while we hold
> > the spin lock
>
> The style of the patch is consistent with what we do in other areas
> (see pg_get_replication_slots as one example).
>
> > + /* Copy slot contents while holding spinlock */
> > + SpinLockAcquire(&slot->mutex);
> > + slot_contents = *slot;
>
> And what this does is to copy the contents of the slot into a local
> area (note that we use a NameData pointing to an area with
> NAMEDATALEN).  Aka if the contents of *slot are changed by whatever
> reason (this cannot change as of the LWLock acquired), what we have
> saved is unchanged as of this command's context.

pg_get_replication_slots holds the ReplicationSlotControlLock until
the end of the function so it can be assured that *slot contents will
not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
released immediately after taking *slot pointer into slot_contents.
Isn't it better if we hold the lock until the end of the function so
that we can avoid the slot contents becoming stale problems?

Having said that, the ReplicationSlotCreate,
SearchNamedReplicationSlot (when need_lock is true) etc. release the
lock immediately. I'm not sure if I should ignore this staleness
problem in ReadReplicationSlot.

> > 3) The data that the new command returns to the client can actually
> > become stale while it is captured and in transit to the client as we
> > release the spinlock and other backends can drop or alter the info.
> > So, it's better we talk about this in the documentation of the new
> > command and also in the comments saying "clients will have to deal
> > with it."
>
> The same can be said with IDENTIFY_SYSTEM when the flushed location
> becomes irrelevant.  I am not sure that this has any need to apply
> here.  We could add that this is useful to get a streaming start
> position though.

I think that's okay, let's not make any changes or add any comments in
regards to the above. The client is basically bound to get the
snapshot of the data at the time it requests the database.

> > 4) How about we be more descriptive about the error added? This will
> > help identify for which replication slot the command has failed from
> > tons of server logs which really help in debugging and analysis.
> > I suggest we have this:
> > errmsg("cannot use \"%s\" command with logical replication slot
> > \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
> > instead of just a plain, non-informative, generic message:
> > errmsg("cannot use \"%s\" with logical replication slots",
>
> Yeah.  I don't mind adding the slot name in this string.

Thanks.

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-10-23 Thread Michael Paquier
On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:
> 1) It's better to initialize nulls with false, we can avoid setting
> them to true. The instances where the columns are not nulls is going
> to be more than the columns with null values, so we could avoid some
> of nulls[i] = false; instructions.

I don't think that this is an improvement in this case as in the
default case we'd return a tuple full of NULL values if the slot does
not exist, so the existing code is simpler when we don't look at the
slot contents.

> 2) As I previously mentioned, we are not copying the slot contents
> while holding the spinlock, it's just we are taking the memory address
> and releasing the lock, so there is a chance that the memory we are
> looking at can become unavailable or stale while we access
> slot_contents. So, I suggest we do the memcpy of the *slot to
> slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
> contents will be costlier, so let's just take the info that we need
> (data.database, data.restart_lsn) into local variables while we hold
> the spin lock

The style of the patch is consistent with what we do in other areas
(see pg_get_replication_slots as one example).

> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(&slot->mutex);
> + slot_contents = *slot;

And what this does is to copy the contents of the slot into a local
area (note that we use a NameData pointing to an area with
NAMEDATALEN).  Aka if the contents of *slot are changed by whatever
reason (this cannot change as of the LWLock acquired), what we have
saved is unchanged as of this command's context.

> 3) The data that the new command returns to the client can actually
> become stale while it is captured and in transit to the client as we
> release the spinlock and other backends can drop or alter the info.
> So, it's better we talk about this in the documentation of the new
> command and also in the comments saying "clients will have to deal
> with it."

The same can be said with IDENTIFY_SYSTEM when the flushed location
becomes irrelevant.  I am not sure that this has any need to apply
here.  We could add that this is useful to get a streaming start
position though.

> 4) How about we be more descriptive about the error added? This will
> help identify for which replication slot the command has failed from
> tons of server logs which really help in debugging and analysis.
> I suggest we have this:
> errmsg("cannot use \"%s\" command with logical replication slot
> \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
> instead of just a plain, non-informative, generic message:
> errmsg("cannot use \"%s\" with logical replication slots",

Yeah.  I don't mind adding the slot name in this string.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-23 Thread Bharath Rupireddy
On Sat, Oct 23, 2021 at 1:14 PM Michael Paquier  wrote:
>
> On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> > Done. I haven't touched the timeline switch test patch for now, but I still
> > include it here for completeness.
>
> Thanks.  I have applied and back-patched 0001, then looked again at
> 0002 that adds READ_REPLICATION_SLOT:
> - Change the TLI to use int8 rather than int4, so as we will always be
> right with TimelineID which is unsigned (this was discussed upthread
> but I got back on it after more thoughts, to avoid any future
> issues).
> - Added an extra initialization for the set of Datum values, just as
> an extra safety net.
> - There was a bug with the timeline returned when executing the
> command while in recovery as ThisTimeLineID is 0 in the context of a
> standby, but we need to support the case of physical slots even when
> streaming archives from a standby.  The fix is similar to what we do
> for IDENTIFY_SYSTEM, where we need to use the timeline currently
> replayed from GetXLogReplayRecPtr(), before looking at the past
> timeline history using restart_lsn and the replayed TLI.
>
> With that in place, I think that we are good now for this part.

Thanks for the updated patch. I have following comments on v10:

1) It's better to initialize nulls with false, we can avoid setting
them to true. The instances where the columns are not nulls is going
to be more than the columns with null values, so we could avoid some
of nulls[i] = false; instructions.
+ MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
I suggest we do the following. The number of instances of hitting the
"else" parts will be less.
MemSet(nulls, false, READ_REPLICATION_SLOT_COLS * sizeof(bool));

if (slot == NULL || !slot->in_use)
{
MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
LWLockRelease(ReplicationSlotControlLock);
}

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

2) As I previously mentioned, we are not copying the slot contents
while holding the spinlock, it's just we are taking the memory address
and releasing the lock, so there is a chance that the memory we are
looking at can become unavailable or stale while we access
slot_contents. So, I suggest we do the memcpy of the *slot to
slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
contents will be costlier, so let's just take the info that we need
(data.database, data.restart_lsn) into local variables while we hold
the spin lock
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

The code will look like following:
Oiddatabase;
XLogRecPtr restart_lsn;

/* Take required information from slot contents while holding
spinlock */
SpinLockAcquire(&slot->mutex);
database= slot->data.database;
restart_lsn= slot->data.restart_lsn;
SpinLockRelease(&slot->mutex);
LWLockRelease(ReplicationSlotControlLock);

3) The data that the new command returns to the client can actually
become stale while it is captured and in transit to the client as we
release the spinlock and other backends can drop or alter the info.
So, it's better we talk about this in the documentation of the new
command and also in the comments saying "clients will have to deal
with it."

4) How about we be more descriptive about the error added? This will
help identify for which replication slot the command has failed from
tons of server logs which really help in debugging and analysis.
I suggest we have this:
errmsg("cannot use \"%s\" command with logical replication slot
\"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
instead of just a plain, non-informative, generic message:
errmsg("cannot use \"%s\" with logical replication slots",

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-10-23 Thread Michael Paquier
On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> Done. I haven't touched the timeline switch test patch for now, but I still 
> include it here for completeness.

Thanks.  I have applied and back-patched 0001, then looked again at
0002 that adds READ_REPLICATION_SLOT:
- Change the TLI to use int8 rather than int4, so as we will always be
right with TimelineID which is unsigned (this was discussed upthread
but I got back on it after more thoughts, to avoid any future
issues).
- Added an extra initialization for the set of Datum values, just as
an extra safety net.
- There was a bug with the timeline returned when executing the
command while in recovery as ThisTimeLineID is 0 in the context of a
standby, but we need to support the case of physical slots even when
streaming archives from a standby.  The fix is similar to what we do
for IDENTIFY_SYSTEM, where we need to use the timeline currently
replayed from GetXLogReplayRecPtr(), before looking at the past
timeline history using restart_lsn and the replayed TLI.

With that in place, I think that we are good now for this part.
--
Michael
From 7e20a294d18c280b8031ee6cc862ba6a661cf40f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 23 Oct 2021 16:28:10 +0900
Subject: [PATCH v10] Add replication command READ_REPLICATION_SLOT

The command is supported for physical slots for now, and returns the
type of slot, its restart_lsn and its restart_tli.

This will be useful for an upcoming patch related to pg_receivewal, to
allow the tool to be able to stream from the position of a slot, rather
than the last WAL position flushed by the backend (as reported by
IDENTIFY_SYSTEM), if the archive directory is found as empty, which
would be an advantage in the case of switching to a different archive
location with the same slot used to avoid holes in what gets backed up.

Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
---
 src/include/nodes/nodes.h   |   1 +
 src/include/nodes/replnodes.h   |  11 ++
 src/backend/replication/repl_gram.y |  16 ++-
 src/backend/replication/repl_scanner.l  |   1 +
 src/backend/replication/walsender.c | 105 
 src/test/recovery/t/001_stream_rep.pl   |  32 +-
 src/test/recovery/t/006_logical_decoding.pl |  11 +-
 doc/src/sgml/protocol.sgml  |  48 +
 src/tools/pgindent/typedefs.list|   1 +
 9 files changed, 223 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..541e9861ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -496,6 +496,7 @@ typedef enum NodeTag
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
+	T_ReadReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..a746fafc12 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -87,6 +87,17 @@ typedef struct StartReplicationCmd
 } StartReplicationCmd;
 
 
+/* --
+ *		READ_REPLICATION_SLOT command
+ * --
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
+
 /* --
  *		TIMELINE_HISTORY command
  * --
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..dcb1108579 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type 	command
 %type 	base_backup start_replication start_logical_replication
 create_replication_slot drop_replication_slot identify_system
-timeline_history show sql_cmd
+read_replication_slot timeline_history show sql_cmd
 %type 	base_backup_legacy_opt_list generic_option_list
 %type 	base_backup_legacy_opt generic_option
 %type 	opt_timeline
@@ -125,6 +126,7 @@ command:
 			| start_logical_replication
 			| create_replication_slot
 			| drop_replication_slot
+			| read_replication_slot
 			| timeline_history
 			| show
 			| sql_cmd
@@ -140,6 +142,18 @@ identify_system:
 }
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+{
+	ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+	n->slotname = $2;
+	$$ = (Node *) n;
+}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 1006

Re: pg_receivewal starting position

2021-10-21 Thread Ronan Dunklau
Le jeudi 21 octobre 2021, 09:21:44 CEST Michael Paquier a écrit :
> On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote:
> > Ok, do you want me to propose a different patch for previous versions ?
> 
> That's not necessary.  Thanks for proposing.
> 
> > Do you mean restart_lsn as the pointer argument to the function, or
> > restart_lsn as the field returned by the command ? If it's the first, I'll
> > change it but if it's the latter it is expected that we sometime run this
> > on a slot where WAL has never been reserved yet.
> 
> restart_lsn as the pointer of the function.

Done. I haven't touched the timeline switch test patch for now, but I still 
include it here for completeness.





-- 
Ronan Dunklau>From b3703868d230f773378742a0deb4360f6cc7343a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 21 Oct 2021 13:21:43 +0900
Subject: [PATCH v10 1/4] doc: Describe calculation method of streaming start
 for pg_receivewal

The documentation was unprecise about the fact that the current WAL
flush location is used if nothing can be found on the local archive
directory describe, independently of the compression used by each
segment (ZLIB or uncompressed).
---
 doc/src/sgml/ref/pg_receivewal.sgml | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..9adbddb657 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -75,6 +75,29 @@ PostgreSQL documentation
one session available for the stream.
   
 
+  
+   The starting point of the write-ahead log streaming is calculated when
+   pg_receivewal starts:
+   
+
+ 
+  First, scan the directory where the WAL segment files are written and
+  find the newest completed segment, using as starting point the beginning
+  of the next WAL segment file. This is calculated independently of the
+  compression method used to compress each segment.
+ 
+
+
+
+ 
+  If a starting point cannot be calculated with the previous method,
+  the latest WAL flush location is used as reported by the server from
+  a IDENTIFY_SYSTEM command.
+ 
+
+   
+  
+
   
If the connection is lost, or if it cannot be initially established,
with a non-fatal error, pg_receivewal will
-- 
2.33.0

>From 067353a88803e96b295f425167c1195ccf984352 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 21 Oct 2021 14:23:41 +0900
Subject: [PATCH v10 2/4] Add READ_REPLICATION_SLOT

---
 doc/src/sgml/protocol.sgml  | 48 +++
 src/backend/replication/repl_gram.y | 16 +++-
 src/backend/replication/repl_scanner.l  |  1 +
 src/backend/replication/walsender.c | 93 +
 src/include/nodes/nodes.h   |  1 +
 src/include/nodes/replnodes.h   | 11 +++
 src/test/recovery/t/001_stream_rep.pl   | 32 ++-
 src/test/recovery/t/006_logical_decoding.pl | 11 ++-
 src/tools/pgindent/typedefs.list|  1 +
 9 files changed, 211 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..37c26ec8ae 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,54 @@ The commands accepted in replication mode are:
 
   
 
+  
+READ_REPLICATION_SLOT slot_name
+  READ_REPLICATION_SLOT
+
+
+ 
+  Read some information associated to a replication slot. Returns a tuple
+  with NULL values if the replication slot does not
+  exist. This command is currently only supported for physical replication
+  slots.
+ 
+ 
+  In response to this command, the server will return a one-row result set,
+  containing the following fields:
+  
+   
+slot_type (text)
+
+ 
+  The replication slot's type, either physical or
+  NULL.
+ 
+
+   
+
+   
+restart_lsn (text)
+
+ 
+  The replication slot's restart_lsn.
+ 
+
+   
+
+   
+restart_tli (int4)
+
+ 
+  The timeline ID associated to restart_lsn,
+  following the current timeline history.
+ 
+
+   
+  
+ 
+
+  
+
   
 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ]
  START_REPLICATION
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..dcb1108579 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type 	command
 %type 	base_bac

Re: pg_receivewal starting position

2021-10-21 Thread Michael Paquier
On Thu, Oct 21, 2021 at 08:29:54AM +0200, Ronan Dunklau wrote:
> Ok, do you want me to propose a different patch for previous versions ?

That's not necessary.  Thanks for proposing.

> Do you mean restart_lsn as the pointer argument to the function, or 
> restart_lsn as the field returned by the command ? If it's the first, I'll 
> change it but if it's the latter it is expected that we sometime run this on 
> a 
> slot where WAL has never been reserved yet.

restart_lsn as the pointer of the function.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-20 Thread Ronan Dunklau
Le jeudi 21 octobre 2021, 07:35:08 CEST Michael Paquier a écrit :
> On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote:
> > After sending the previous patch suite, I figured it would be worthwhile
> > to
> > also have tests covering timeline switches, which was not covered before.
> 
> That seems independent to me.  I'll take a look.
> 
> > So please find attached a new version with an additional patch for those
> > tests, covering both  "resume from last know archive" and "resume from
> > the replication slots position" cases.
> 
> So, taking things in order, I have looked at 0003 and 0001, and
> attached are refined versions for both of them.
> 
> 0003 is an existing hole in the docs, which I think we had better
> address first and backpatch, taking into account that the starting
> point calculation considers compressed segments when looking for
> completed segments.

Ok, do you want me to propose a different patch for previous versions ?

> 
> Regarding 0001, I have found the last test to check for NULL values
> returned by READ_REPLICATION_SLOT after dropping the slot overlaps
> with the first test, so I have removed that.  I have expanded a bit
> the use of like(), and there were some confusion with
> PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT
> and CREATE_REPLICATION_SLOT, and no need for return values in the
> CREATE case either).  Some comments, docs and code have been slightly
> tweaked.

Thank you for this. 


> 
> Here are some comments about 0002.
> 
> +   /* The commpand should always return precisely one tuple */
> s/commpand/command/
> 
> +   pg_log_error("could not fetch replication slot: got %d rows and %d
> fields, expected %d rows and %d or more fields", +   
> PQntuples(res), PQnfields(res), 1, 3);
> Should this be "could not read" instead?
> 
> +   if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
> +   {
> +   pg_log_error("could not parse slot's restart_lsn \"%s\"",
> +PQgetvalue(res, 0, 1));
> +   PQclear(res);
> +   return false;
> +   }
> Wouldn't it be saner to initialize *restart_lsn and *restart_tli to
> some default values at the top of GetSlotInformation() instead, if
> they are specified by the caller?  

Ok.

> And I think that we should still
> complain even if restart_lsn is NULL.

Do you mean restart_lsn as the pointer argument to the function, or 
restart_lsn as the field returned by the command ? If it's the first, I'll 
change it but if it's the latter it is expected that we sometime run this on a 
slot where WAL has never been reserved yet.

> 
> On a quick read of 0004, I find the split of the logic with
> change_timeline() a bit hard to understand.  It looks like we should
> be able to make a cleaner split, but I am not sure how that would
> look, though.

Thanks, at least if the proposal to test this seems sensible I can move 
forward. I wanted to avoid having a lot of code duplication since the test 
setup is a bit more complicated. 
My first approach was to split it into two functions, setup_standby and 
change_timeline, but then realized that what would happen between the two 
invocations would basically be the same for the two test cases, so I ended up 
with that patch. I'll try to see if I can see a better way of organizing that 
code.

-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-10-20 Thread Michael Paquier
On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote:
> After sending the previous patch suite, I figured it would be worthwhile to 
> also have tests covering timeline switches, which was not covered before.

That seems independent to me.  I'll take a look.

> So please find attached a new version with an additional patch for those 
> tests, 
> covering both  "resume from last know archive" and "resume from the 
> replication slots position" cases.

So, taking things in order, I have looked at 0003 and 0001, and
attached are refined versions for both of them.

0003 is an existing hole in the docs, which I think we had better
address first and backpatch, taking into account that the starting
point calculation considers compressed segments when looking for
completed segments.

Regarding 0001, I have found the last test to check for NULL values
returned by READ_REPLICATION_SLOT after dropping the slot overlaps
with the first test, so I have removed that.  I have expanded a bit
the use of like(), and there were some confusion with
PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT
and CREATE_REPLICATION_SLOT, and no need for return values in the 
CREATE case either).  Some comments, docs and code have been slightly
tweaked.

Here are some comments about 0002.

+   /* The commpand should always return precisely one tuple */
s/commpand/command/

+   pg_log_error("could not fetch replication slot: got %d rows and %d 
fields, expected %d rows and %d or more fields",
+PQntuples(res), PQnfields(res), 1, 3);
Should this be "could not read" instead?

+   if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2)
+   {
+   pg_log_error("could not parse slot's restart_lsn \"%s\"",
+PQgetvalue(res, 0, 1));
+   PQclear(res);
+   return false;
+   }
Wouldn't it be saner to initialize *restart_lsn and *restart_tli to
some default values at the top of GetSlotInformation() instead, if
they are specified by the caller?  And I think that we should still
complain even if restart_lsn is NULL.

On a quick read of 0004, I find the split of the logic with
change_timeline() a bit hard to understand.  It looks like we should
be able to make a cleaner split, but I am not sure how that would
look, though.
--
Michael
From afca5e895e07279049f272396e40bdb78ae61d0e Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 21 Oct 2021 13:21:43 +0900
Subject: [PATCH v9 1/2] doc: Describe calculation method of streaming start
 for pg_receivewal

The documentation was unprecise about the fact that the current WAL
flush location is used if nothing can be found on the local archive
directory describe, independently of the compression used by each
segment (ZLIB or uncompressed).
---
 doc/src/sgml/ref/pg_receivewal.sgml | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..9adbddb657 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -75,6 +75,29 @@ PostgreSQL documentation
one session available for the stream.
   
 
+  
+   The starting point of the write-ahead log streaming is calculated when
+   pg_receivewal starts:
+   
+
+ 
+  First, scan the directory where the WAL segment files are written and
+  find the newest completed segment, using as starting point the beginning
+  of the next WAL segment file. This is calculated independently of the
+  compression method used to compress each segment.
+ 
+
+
+
+ 
+  If a starting point cannot be calculated with the previous method,
+  the latest WAL flush location is used as reported by the server from
+  a IDENTIFY_SYSTEM command.
+ 
+
+   
+  
+
   
If the connection is lost, or if it cannot be initially established,
with a non-fatal error, pg_receivewal will
-- 
2.33.0

From 3ef08cc14e4365eaec320294275bee9f19c7dbef Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 21 Oct 2021 14:23:41 +0900
Subject: [PATCH v9 2/2] Add READ_REPLICATION_SLOT

---
 src/include/nodes/nodes.h   |  1 +
 src/include/nodes/replnodes.h   | 11 +++
 src/backend/replication/repl_gram.y | 16 +++-
 src/backend/replication/repl_scanner.l  |  1 +
 src/backend/replication/walsender.c | 93 +
 src/test/recovery/t/001_stream_rep.pl   | 32 ++-
 src/test/recovery/t/006_logical_decoding.pl | 11 ++-
 doc/src/sgml/protocol.sgml  | 48 +++
 src/tools/pgindent/typedefs.list|  1 +
 9 files changed, 211 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..541e9861ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -496,6 +496,7 @@ typedef enum NodeTag
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotC

Re: pg_receivewal starting position

2021-10-20 Thread Ronan Dunklau
Le mercredi 20 octobre 2021 11:40:18 CEST, vous avez écrit :
> > +# Setup the slot, and connect to it a first time
> > +$primary->run_log(
> > +   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
> > +   'creating a replication slot');
> > +$primary->psql('postgres',
> > +   'INSERT INTO test_table VALUES (generate_series(1,100));');
> > +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> > +$nextlsn =
> > +  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
> > +chomp($nextlsn);
> > Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
> > here, rather than going through pg_receivewal?  It seems to me that
> > this would be cheaper without really impacting the coverage.
> 
> You're right, we can skip two invocations of pg_receivewal like this (for
> the slot creation + for starting the slot a first time).

After sending the previous patch suite, I figured it would be worthwhile to 
also have tests covering timeline switches, which was not covered before.
So please find attached a new version with an additional patch for those tests, 
covering both  "resume from last know archive" and "resume from the 
replication slots position" cases.

-- 
Ronan Dunklau>From 5a47f17a17594cc171f744ce383ba820d44b6446 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v8 1/4] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml  | 47 +++
 src/backend/replication/repl_gram.y | 16 +++-
 src/backend/replication/repl_scanner.l  |  1 +
 src/backend/replication/walsender.c | 89 +
 src/include/nodes/nodes.h   |  1 +
 src/include/nodes/replnodes.h   | 10 +++
 src/test/recovery/t/001_stream_rep.pl   | 47 ++-
 src/test/recovery/t/006_logical_decoding.pl |  9 ++-
 src/tools/pgindent/typedefs.list|  1 +
 9 files changed, 218 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..51a15cc3da 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,53 @@ The commands accepted in replication mode are:
 
   
 
+  
+READ_REPLICATION_SLOT slot_name
+  READ_REPLICATION_SLOT
+
+
+ 
+  Read the information of a replication slot. Returns a tuple with
+  NULL values if the replication slot does not
+  exist. This command is currently only supported for physical slots.
+ 
+ 
+  In response to this command, the server will return a one-row result set,
+  containing the following fields:
+  
+   
+slot_type (text)
+
+ 
+  The replication slot's type, either physical or
+  NULL.
+ 
+
+   
+
+   
+restart_lsn (text)
+
+ 
+  The replication slot's restart_lsn.
+ 
+
+   
+
+   
+restart_tli (int4)
+
+ 
+  The timeline ID for the restart_lsn position,
+  when following the current timeline history.
+ 
+
+   
+  
+ 
+
+  
+
   
 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ]
  START_REPLICATION
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..913a99da5a 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type 	command
 %type 	base_backup start_replication start_logical_replication
 create_replication_slot drop_replication_slot identify_system
-timeline_history show sql_cmd
+read_replication_slot timeline_history show sql_cmd
 %type 	base_backup_legacy_opt_list generic_option_list
 %type 	base_backup_legacy_opt generic_option
 %type 	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| read_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 }
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+{
+	ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+	n->slotname = $2;
+	$$ = (Node *) n;
+}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 

Re: pg_receivewal starting position

2021-10-20 Thread Ronan Dunklau
Le mercredi 20 octobre 2021, 07:13:15 CEST Michael Paquier a écrit :
> On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote:
> > Following recommendations, I stripped most of the features from the patch.
> > For now we support only physical replication slots, and only provide the
> > two fields of interest (restart_lsn, restart_tli) in addition to the slot
> > type (fixed at physical) to not paint ourselves in a corner.
> > 
> > I also removed the part about pg_basebackup since other fixes have been
> > proposed for that case.
> 
> Patch 0001 looks rather clean.  I have a couple of comments.

Thank you for the quick review !


> 
> +   if (OidIsValid(slot_contents.data.database))
> +   elog(ERROR, "READ_REPLICATION_SLOT is only supported for
> physical slots");
> 
> elog() can only be used for internal errors.  Errors that can be
> triggered by a user should use ereport() instead.

Ok.
> 
> +ok($stdout eq '||',
> +   "READ_REPLICATION_SLOT returns NULL values if slot does not exist");
> [...]
> +ok($stdout =~ 'physical\|[^|]*\|1',
> +   "READ_REPLICATION_SLOT returns tuple corresponding to the slot");
> Isn't result pattern matching something we usually test with like()?

Ok.
> 
> +($ret, $stdout, $stderr) = $node_primary->psql(
> +   'postgres',
> +   "READ_REPLICATION_SLOT $slotname;",
> +   extra_params => [ '-d', $connstr_rep ]);
> No need for extra_params in this test.  You can just pass down
> "replication => 1" instead, no?

In that test file, every replication connection is obtained by using 
connstr_rep so I thought it would be best to use the same thing.

> 
> --- a/src/test/recovery/t/006_logical_decoding.pl
> +++ b/src/test/recovery/t/006_logical_decoding.pl
> [...]
> +ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots',
> +   'Logical replication slot is not supported');
> This one should use like().

Ok.

> 
> +
> +The slot's restart_lsn can also beused as a
> starting +point if the target directory is empty.
> +
> I am not sure that there is a need for this addition as the same thing
> is said when describing the lookup ordering.

Ok, removed.

> 
> +  If nothing is found and a slot is specified, use the
> +  READ_REPLICATION_SLOT
> +  command.
> It may be clearer to say that the position is retrieved from the
> command.

Ok, done. The doc also uses the active voice here now.

> 
> +bool
> +GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
> *restart_lsn, TimeLineID* restart_tli)
> +{
> Could you extend that so as we still run the command but don't crash
> if the caller specifies NULL for any of the result fields?  This would
> be handy.

Done.

> 
> +   if (PQgetisnull(res, 0, 0))
> +   {
> +   PQclear(res);
> +   pg_log_error("replication slot \"%s\" does not exist",
> slot_name);
> +   return false;
> +   }
> +   if (PQntuples(res) != 1 || PQnfields(res) < 3)
> +   {
> +   pg_log_error("could not fetch replication slot: got %d rows
> and %d fields, expected %d rows and %d or more fields",
> +PQntuples(res), PQnfields(res), 1, 3);
> +   PQclear(res);
> +   return false;
> +   }
> Wouldn't it be better to reverse the order of these two checks?

Yes it is, and the PQntuples condition should be removed from the first error 
test.

> 
> I don't mind the addition of the slot type being part of the result of
> READ_REPLICATION_SLOT even if it is not mandatory (?), but at least
> GetSlotInformation() should check after it.

Ok.

> 
> +# Setup the slot, and connect to it a first time
> +$primary->run_log(
> +   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
> +   'creating a replication slot');
> +$primary->psql('postgres',
> +   'INSERT INTO test_table VALUES (generate_series(1,100));');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$nextlsn =
> +  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
> +chomp($nextlsn);
> Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
> here, rather than going through pg_receivewal?  It seems to me that
> this would be cheaper without really impacting the coverage.

You're right, we can skip two invocations of pg_receivewal like this (for the 
slot creation + for starting the slot a first time).


-- 
Ronan Dunklau>From 845b2fdd33318f0d7528ef0ecbb69e4aecf43c46 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Mon, 6 Sep 2021 11:08:54 +0200
Subject: [PATCH v7 3/3] Add documentation for pg_receivewal

---
 doc/src/sgml/ref/pg_receivewal.sgml | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 45b544cf49..5fb2b61d34 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -37,6 +37,32 @@ PostgreSQL documentation
).
   
 
+  
+When pg_receivewal is launched, it tries to determine the
+starting LS

Re: pg_receivewal starting position

2021-10-19 Thread Michael Paquier
On Tue, Oct 19, 2021 at 05:32:55PM +0200, Ronan Dunklau wrote:
> Following recommendations, I stripped most of the features from the patch. 
> For 
> now we support only physical replication slots, and only provide the two 
> fields 
> of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at 
> physical) to not paint ourselves in a corner.
> 
> I also removed the part about pg_basebackup since other fixes have been 
> proposed for that case. 

Patch 0001 looks rather clean.  I have a couple of comments.

+   if (OidIsValid(slot_contents.data.database))
+   elog(ERROR, "READ_REPLICATION_SLOT is only supported for physical 
slots");

elog() can only be used for internal errors.  Errors that can be
triggered by a user should use ereport() instead.

+ok($stdout eq '||',
+   "READ_REPLICATION_SLOT returns NULL values if slot does not exist");
[...]
+ok($stdout =~ 'physical\|[^|]*\|1',
+   "READ_REPLICATION_SLOT returns tuple corresponding to the slot");
Isn't result pattern matching something we usually test with like()?

+($ret, $stdout, $stderr) = $node_primary->psql(
+   'postgres',
+   "READ_REPLICATION_SLOT $slotname;",
+   extra_params => [ '-d', $connstr_rep ]);
No need for extra_params in this test.  You can just pass down
"replication => 1" instead, no?

--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
[...]
+ok($stderr=~ 'READ_REPLICATION_SLOT is only supported for physical slots',
+   'Logical replication slot is not supported');
This one should use like().

+
+The slot's restart_lsn can also beused as a starting
+point if the target directory is empty.
+
I am not sure that there is a need for this addition as the same thing
is said when describing the lookup ordering.

+  If nothing is found and a slot is specified, use the
+  READ_REPLICATION_SLOT
+  command.
It may be clearer to say that the position is retrieved from the
command.

+bool
+GetSlotInformation(PGconn *conn, const char *slot_name, XLogRecPtr
*restart_lsn, TimeLineID* restart_tli)
+{
Could you extend that so as we still run the command but don't crash
if the caller specifies NULL for any of the result fields?  This would
be handy.

+   if (PQgetisnull(res, 0, 0))
+   {
+   PQclear(res);
+   pg_log_error("replication slot \"%s\" does not exist",
slot_name);
+   return false;
+   }
+   if (PQntuples(res) != 1 || PQnfields(res) < 3)
+   {
+   pg_log_error("could not fetch replication slot: got %d rows
and %d fields, expected %d rows and %d or more fields",
+PQntuples(res), PQnfields(res), 1, 3);
+   PQclear(res);
+   return false;
+   }
Wouldn't it be better to reverse the order of these two checks?

I don't mind the addition of the slot type being part of the result of
READ_REPLICATION_SLOT even if it is not mandatory (?), but at least
GetSlotInformation() should check after it.

+# Setup the slot, and connect to it a first time
+$primary->run_log(
+   [ 'pg_receivewal', '--slot', $slot_name, '--create-slot' ],
+   'creating a replication slot');
+$primary->psql('postgres',
+   'INSERT INTO test_table VALUES (generate_series(1,100));');
+$primary->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+  $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
Wouldn't it be simpler to use CREATE_REPLICATION_SLOT with RESERVE_WAL
here, rather than going through pg_receivewal?  It seems to me that
this would be cheaper without really impacting the coverage.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-19 Thread Ronan Dunklau
Hello,

Following recommendations, I stripped most of the features from the patch. For 
now we support only physical replication slots, and only provide the two fields 
of interest (restart_lsn, restart_tli) in addition to the slot type (fixed at 
physical) to not paint ourselves in a corner.

I also removed the part about pg_basebackup since other fixes have been 
proposed for that case. 

Le vendredi 1 octobre 2021, 09:05:18 CEST Michael Paquier a écrit :
> On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote:
> > Using READ_REPLICATION_SLOT as the command name is fine, and it could
> > be extended with more fields if necessary, implemented now with only
> > what we think is useful.  Returning errors on cases that are still not
> > supported yet is fine, say for logical slots if we decide to reject
> > the case for now, and testrictions can always be lifted in the
> > future.
> 
> And marked as RwF as this was three weeks ago.  Please feel free to
> register a new entry if this is being worked on.
> --
> Michael


-- 
Ronan Dunklau>From 4147665664164eb597fdcc86637ec9c497c36660 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v6 1/3] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml  | 47 +++
 src/backend/replication/repl_gram.y | 16 +++-
 src/backend/replication/repl_scanner.l  |  1 +
 src/backend/replication/walsender.c | 87 +
 src/include/nodes/nodes.h   |  1 +
 src/include/nodes/replnodes.h   | 10 +++
 src/test/recovery/t/001_stream_rep.pl   | 47 ++-
 src/test/recovery/t/006_logical_decoding.pl |  9 ++-
 src/tools/pgindent/typedefs.list|  1 +
 9 files changed, 216 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b95cc88599..51a15cc3da 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2067,6 +2067,53 @@ The commands accepted in replication mode are:
 
   
 
+  
+READ_REPLICATION_SLOT slot_name
+  READ_REPLICATION_SLOT
+
+
+ 
+  Read the information of a replication slot. Returns a tuple with
+  NULL values if the replication slot does not
+  exist. This command is currently only supported for physical slots.
+ 
+ 
+  In response to this command, the server will return a one-row result set,
+  containing the following fields:
+  
+   
+slot_type (text)
+
+ 
+  The replication slot's type, either physical or
+  NULL.
+ 
+
+   
+
+   
+restart_lsn (text)
+
+ 
+  The replication slot's restart_lsn.
+ 
+
+   
+
+   
+restart_tli (int4)
+
+ 
+  The timeline ID for the restart_lsn position,
+  when following the current timeline history.
+ 
+
+   
+  
+ 
+
+  
+
   
 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ]
  START_REPLICATION
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..913a99da5a 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type 	command
 %type 	base_backup start_replication start_logical_replication
 create_replication_slot drop_replication_slot identify_system
-timeline_history show sql_cmd
+read_replication_slot timeline_history show sql_cmd
 %type 	base_backup_legacy_opt_list generic_option_list
 %type 	base_backup_legacy_opt generic_option
 %type 	opt_timeline
@@ -120,6 +121,7 @@ opt_semicolon:	';'
 
 command:
 			identify_system
+			| read_replication_slot
 			| base_backup
 			| start_replication
 			| start_logical_replication
@@ -140,6 +142,18 @@ identify_system:
 }
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+{
+	ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+	n->slotname = $2;
+	$$ = (Node *) n;
+}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -85,6 +85,7 @@ identifier		{ident_start}{ident_cont}*
 BASE_BACKUP			{ return K_BASE_BACKUP; }
 FAST			{ return K_FAST; }
 IDENTIFY_SYSTEM		{ return K_IDENTIFY_SYSTEM; }
+READ_REPLICATION_SLOT	{ return K_READ_REPLICATION_SLOT; }
 SHOW		{ return K_SHOW; }
 LABEL			{ return K_LABEL; }
 NOWAIT	

Re: pg_receivewal starting position

2021-10-01 Thread Michael Paquier
On Mon, Sep 06, 2021 at 04:17:28PM +0900, Michael Paquier wrote:
> Using READ_REPLICATION_SLOT as the command name is fine, and it could
> be extended with more fields if necessary, implemented now with only
> what we think is useful.  Returning errors on cases that are still not
> supported yet is fine, say for logical slots if we decide to reject
> the case for now, and testrictions can always be lifted in the
> future.

And marked as RwF as this was three weeks ago.  Please feel free to
register a new entry if this is being worked on.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-09-06 Thread Michael Paquier
On Mon, Sep 06, 2021 at 12:37:29PM +0530, Bharath Rupireddy wrote:
> -1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What
> happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR"
> some other? IMO, READ_REPLICATION_SLOT should just return info of all
> slots. The clients know better how to deal with the slot type.
> Although, we don't have a use case for logical slots with the
> READ_REPLICATION_SLOT command, let's not change it.

Using READ_REPLICATION_SLOT as the command name is fine, and it could
be extended with more fields if necessary, implemented now with only
what we think is useful.  Returning errors on cases that are still not
supported yet is fine, say for logical slots if we decide to reject
the case for now, and testrictions can always be lifted in the
future.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-09-06 Thread Ronan Dunklau
Le vendredi 3 septembre 2021 17:49:34 CEST, vous avez écrit :
> On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau  wrote:
> > There is still the concern raised by Bharath about being able to select
> > the
> > way to fetch the replication slot information for the user, and what
> > should or should not be included in the documentation. I am in favor of
> > documenting the process of selecting the wal start, and maybe include a
> > --start-lsn option for the user to override it, but that's maybe for
> > another patch.
> 
> Let's hear from others.
> 
> Thanks for the patches. I have some quick comments on the v5 patch-set:
> 
> 0001:
> 1) Do you also want to MemSet values too in ReadReplicationSlot?
> 
> 2) When if clause has single statement we don't generally use "{" and "}"
> + if (slot == NULL || !slot->in_use)
> + {
> + LWLockRelease(ReplicationSlotControlLock);
> + }
> you can just have:
> + if (slot == NULL || !slot->in_use)
> + LWLockRelease(ReplicationSlotControlLock);
> 
> 3) This is still not copying the slot contents, as I said earlier you
> can just take the required info into some local variables instead of
> taking the slot pointer to slot_contents pointer.
> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(&slot->mutex);
> + slot_contents = *slot;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);
> 
> 4) As I said earlier, why can't we have separate tli variables for
> more readability instead of just one slots_position_timeline and
> timeline_history variable? And you are not even resetting those 2
> variables after if
> (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
> up sending the restart_lsn timelineid for confirmed_flush. So, better
> use two separate variables. In fact you can use block local variables:
> + if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
> + {
> + List*tl_history= NIL;
> + TimeLineID tli;
> + tl_history= readTimeLineHistory(ThisTimeLineID);
> + tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
> +   tl_history);
> + values[i] = Int32GetDatum(tli);
> + nulls[i] = false;
> + }
> similarly for confirmed_flush.
> 
> 5) I still don't see the need for below test case:
> + "READ_REPLICATION_SLOT does not produce an error with non existent slot");
> when we have
> + "READ_REPLICATION_SLOT does not produce an error with dropped slot");
> Because for a user, dropped or non existent slot is same, it's just
> that for dropped slot we internally don't delete its entry from the
> shared memory.
> 

Thank you for reiterating those concerns. As I said, I haven't touched 
Michael's version of the patch. I was incorporating those changes in my branch 
before he sent this, so I'll probably merge both before sending an updated 
patch.

> 0002:
> 1) Imagine GetSlotInformation always returns
> READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
> infinite loop there? Instead, why don't you just exit(1); instead of
> return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
> think, you can just do exit(1), no need to retry.
> + case READ_REPLICATION_SLOT_ERROR:
> +
> + /*
> + * Error has been logged by GetSlotInformation, return and
> + * maybe retry
> + */
> + return;

This is the same behaviour we had before: if there is an error with 
pg_receivewal we retry the command. There was no special case for the 
replication slot not existing before, I don't see why we should change it now 
?

Eg:

2021-09-06 09:11:07.774 CEST [95853] ERROR:  replication slot "nonexistent" 
does not exist
2021-09-06 09:11:07.774 CEST [95853] STATEMENT:  START_REPLICATION SLOT 
"nonexistent" 0/100 TIMELINE 1
pg_receivewal: error: could not send replication command "START_REPLICATION": 
ERROR:  replication slot "nonexistent" does not exist
pg_receivewal: disconnected; waiting 5 seconds to try again

Users may rely on it to keep retrying in the background until the slot has 
been created for example.

> 
> 2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
> passed? And why are you having this check after you connect to the
> server and fetch the data?
> + /* If no slotinformation has been passed, we can return immediately */
> + if (slot_info == NULL)
> + {
> + PQclear(res);
> + return READ_REPLICATION_SLOT_OK;
> + }
> Instead you can just have a single assert:
> 
> + Assert(slot_name && slot_info );

At first it was so that we didn't have to fill in all required information if 
we 
don't need too, but it turns out pg_basebackup also has to check for the 
slot's type. I agree we should not support the null slot_info case anymore.

> 
> 3) How about
> pg_log_error("could not read replication slot:
> instead of
> pg_log_error("could not fetch replication slot:

Ok.
> 
> 4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
> + default:
> + if (slot_info.is_logical)
> + {
> + /*
> + * If the slot is not physical we can't expect to
> + * recover from that
> + */
> + p

Re: pg_receivewal starting position

2021-09-06 Thread Bharath Rupireddy
On Mon, Sep 6, 2021 at 12:20 PM Ronan Dunklau  wrote:
>
> Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :
> > On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> > > 0002 for pg_receivewal tried to simplify the logic of information to
> > > return, by using a dedicated struct for this. This accounts for Bahrath's
> > > demands to return every possible field.
> > > In particular, an is_logical field simplifies the detection of the type of
> > > slot. In my opinion it makes sense to simplify it like this on the client
> > > side while being more open-minded on the server side if we ever need to
> > > provide a new type of slot. Also, GetSlotInformation now returns an enum
> > > to be able to handle the different modes of failures, which differ
> > > between pg_receivewal and pg_basebackup.
> >
> > +   if (PQserverVersion(conn) < 15)
> > +   return READ_REPLICATION_SLOT_UNSUPPORTED;
> > [...]
> > +typedef enum {
> > +   READ_REPLICATION_SLOT_OK,
> > +   READ_REPLICATION_SLOT_UNSUPPORTED,
> > +   READ_REPLICATION_SLOT_ERROR,
> > +   READ_REPLICATION_SLOT_NONEXISTENT
> > +} ReadReplicationSlotStatus;
> >
> > Do you really need this much complication?  We could treat the
> > unsupported case and the non-existing case the same way: we don't know
> > so just assign InvalidXlogRecPtr or NULL to the fields of the
> > structure, and make GetSlotInformation() return false just on error,
> > with some pg_log_error() where adapted in its internals.
>
> I actually started with the implementation you propose, but changed my mind
> while writing it because I realised it's easier to reason about like this,
> instead of failing silently during READ_REPLICATION_SLOT to fail a bit later
> when actually trying to start the replication slot because it doesn't exists.
> Either the user expects the replication slot to exists, and in this case we
> should retry the whole loop in the hope of getting an interesting LSN, or the
> user doesn't and shouldn't even pass a replication_slot name to begin with.

I don't think so we need to retry the whole loop if the
READ_REPLICATION_SLOT command fails as pg_receivewal might enter an
infinite loop there. IMO, we should just exit(1) if
READ_REPLICATION_SLOT fails.

> > > There is still the concern raised by Bharath about being able to select
> > > the
> > > way to fetch the replication slot information for the user, and what
> > > should or should not be included in the documentation. I am in favor of
> > > documenting the process of selecting the wal start, and maybe include a
> > > --start-lsn option for the user to override it, but that's maybe for
> > > another patch.
> >
> > The behavior of pg_receivewal that you are implementing should be
> > documented.  We don't say either how the start location is selected
> > when working on an existing directory, so I would recommend to add a
> > paragraph in the description section to detail all that, as of:
> > - First a scan of the existing archives is done.
> > - If nothing is found, and if there is a slot, request the slot
> > information.
> > - If still nothing (aka slot undefined, or command not supported), use
> > the last flush location.
>
> Sounds good, I will add another patch for the documentation of this.

+1.

> > While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
> > grabbing the data of a logical slot.  We are not going to use that
> > with pg_recvlogical as by default START_STREAMING 0/0 would just use
> > the confirmed LSN.  Do we have use cases where this information would
> > help?  There is the argument of consistency with physical slots and
> > that this can be helpful to do sanity checks for clients, but that's
> > rather thin.
>
> If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT
> and error out on the server if the slot is not actually a physical one to
> spare the client from performing those checks. I still think it's better to
> support both cases as opposed to having two completely different APIs
> (READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication
> connection, pg_replication_slots view for logical ones) as it would enable
> more for third-party clients at a relatively low maintenance cost for us.

-1 for READ_PHYSICAL_REPLICATION_SLOT or failing on the server. What
happens if we have another slot type "PHYSIOLOGICAL" or "FOO" or "BAR"
some other? IMO, READ_REPLICATION_SLOT should just return info of all
slots. The clients know better how to deal with the slot type.
Although, we don't have a use case for logical slots with the
READ_REPLICATION_SLOT command, let's not change it.

If others are still concerned about the unnecessary slot being
returned, you might consider passing in a parameter to
READ_REPLICATION_SLOT command, something like below. But this too
looks complex to me. I would vote for what the existing patch does
with READ_REPLICATION_SLOT.
READ_REPLICATION_SLOT 'slot_name' 'physical'; only returns physical
slot

Re: pg_receivewal starting position

2021-09-05 Thread Ronan Dunklau
Le lundi 6 septembre 2021, 06:22:30 CEST Michael Paquier a écrit :
> On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> > 0002 for pg_receivewal tried to simplify the logic of information to
> > return, by using a dedicated struct for this. This accounts for Bahrath's
> > demands to return every possible field.
> > In particular, an is_logical field simplifies the detection of the type of
> > slot. In my opinion it makes sense to simplify it like this on the client
> > side while being more open-minded on the server side if we ever need to
> > provide a new type of slot. Also, GetSlotInformation now returns an enum
> > to be able to handle the different modes of failures, which differ
> > between pg_receivewal and pg_basebackup.
> 
> +   if (PQserverVersion(conn) < 15)
> +   return READ_REPLICATION_SLOT_UNSUPPORTED;
> [...]
> +typedef enum {
> +   READ_REPLICATION_SLOT_OK,
> +   READ_REPLICATION_SLOT_UNSUPPORTED,
> +   READ_REPLICATION_SLOT_ERROR,
> +   READ_REPLICATION_SLOT_NONEXISTENT
> +} ReadReplicationSlotStatus;
> 
> Do you really need this much complication?  We could treat the
> unsupported case and the non-existing case the same way: we don't know
> so just assign InvalidXlogRecPtr or NULL to the fields of the
> structure, and make GetSlotInformation() return false just on error,
> with some pg_log_error() where adapted in its internals.

I actually started with the implementation you propose, but changed my mind 
while writing it because I realised it's easier to reason about like this, 
instead of failing silently during READ_REPLICATION_SLOT to fail a bit later 
when actually trying to start the replication slot because it doesn't exists. 
Either the user expects the replication slot to exists, and in this case we 
should retry the whole loop in the hope of getting an interesting LSN, or the 
user doesn't and shouldn't even pass a replication_slot name to begin with.

> 
> > There is still the concern raised by Bharath about being able to select
> > the
> > way to fetch the replication slot information for the user, and what
> > should or should not be included in the documentation. I am in favor of
> > documenting the process of selecting the wal start, and maybe include a
> > --start-lsn option for the user to override it, but that's maybe for
> > another patch.
> 
> The behavior of pg_receivewal that you are implementing should be
> documented.  We don't say either how the start location is selected
> when working on an existing directory, so I would recommend to add a
> paragraph in the description section to detail all that, as of:
> - First a scan of the existing archives is done.
> - If nothing is found, and if there is a slot, request the slot
> information.
> - If still nothing (aka slot undefined, or command not supported), use
> the last flush location.

Sounds good, I will add another patch for the documentation of this.

> 
> As a whole, I am not really convinced that we need a new option for
> that as long as we rely on a slot with pg_receivewal as these are used
> to make sure that we avoid holes in the WAL archives.
> 
> Regarding pg_basebackup, Daniel has proposed a couple of days ago a
> different solution to trap errors earlier, which would cover the case
> dealt with here:
> https://www.postgresql.org/message-id/0F69E282-97F9-4DB7-8D6D-F927AA6340C8@y
> esql.se 

I will take a look.

> We should not mimic in the frontend errors that are safely trapped
> in the backend with the proper locks, in any case.

I don't understand what you mean by this ? My original proposal was for the 
command to actually attach to the replication slot while reading it, thus 
keeping a lock on it to prevent dropping or concurrent access on the server.
> 
> While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
> grabbing the data of a logical slot.  We are not going to use that
> with pg_recvlogical as by default START_STREAMING 0/0 would just use
> the confirmed LSN.  Do we have use cases where this information would
> help?  There is the argument of consistency with physical slots and
> that this can be helpful to do sanity checks for clients, but that's
> rather thin.

If we don't, we should rename the command to READ_PHYSICAL_REPLICATION_SLOT 
and error out on the server if the slot is not actually a physical one to 
spare the client from performing those checks. I still think it's better to 
support both cases as opposed to having two completely different APIs 
(READ_(PHYSICAL)_REPLICATION_SLOT for physical ones on a replication 
connection, pg_replication_slots view for logical ones) as it would enable 
more for third-party clients at a relatively low maintenance cost for us.  

-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-09-05 Thread Michael Paquier
On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> 
> 0002 for pg_receivewal tried to simplify the logic of information to return, 
> by using a dedicated struct for this. This accounts for Bahrath's demands to 
> return every possible field.
> In particular, an is_logical field simplifies the detection of the type of 
> slot. 
> In my opinion it makes sense to simplify it like this on the client side 
> while 
> being more open-minded on the server side if we ever need to provide a new 
> type of slot. Also, GetSlotInformation now returns an enum to be able to 
> handle the different modes of failures, which differ between pg_receivewal 
> and 
> pg_basebackup. 

+   if (PQserverVersion(conn) < 15)
+   return READ_REPLICATION_SLOT_UNSUPPORTED;
[...]
+typedef enum {
+   READ_REPLICATION_SLOT_OK,
+   READ_REPLICATION_SLOT_UNSUPPORTED,
+   READ_REPLICATION_SLOT_ERROR,
+   READ_REPLICATION_SLOT_NONEXISTENT
+} ReadReplicationSlotStatus;

Do you really need this much complication?  We could treat the
unsupported case and the non-existing case the same way: we don't know
so just assign InvalidXlogRecPtr or NULL to the fields of the
structure, and make GetSlotInformation() return false just on error,
with some pg_log_error() where adapted in its internals.

> There is still the concern raised by Bharath about being able to select the 
> way to fetch the replication slot information for the user, and what should 
> or 
> should not be included in the documentation. I am in favor of documenting the 
> process of selecting the wal start, and maybe include a --start-lsn option 
> for 
> the user to override it, but that's maybe for another patch.

The behavior of pg_receivewal that you are implementing should be
documented.  We don't say either how the start location is selected
when working on an existing directory, so I would recommend to add a
paragraph in the description section to detail all that, as of:
- First a scan of the existing archives is done.
- If nothing is found, and if there is a slot, request the slot
information.
- If still nothing (aka slot undefined, or command not supported), use
the last flush location.

As a whole, I am not really convinced that we need a new option for
that as long as we rely on a slot with pg_receivewal as these are used
to make sure that we avoid holes in the WAL archives.

Regarding pg_basebackup, Daniel has proposed a couple of days ago a
different solution to trap errors earlier, which would cover the case
dealt with here:
https://www.postgresql.org/message-id/0f69e282-97f9-4db7-8d6d-f927aa634...@yesql.se
We should not mimic in the frontend errors that are safely trapped in
the backend with the proper locks, in any case.

While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
grabbing the data of a logical slot.  We are not going to use that
with pg_recvlogical as by default START_STREAMING 0/0 would just use
the confirmed LSN.  Do we have use cases where this information would
help?  There is the argument of consistency with physical slots and
that this can be helpful to do sanity checks for clients, but that's
rather thin.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-09-03 Thread Bharath Rupireddy
On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau  wrote:
> There is still the concern raised by Bharath about being able to select the
> way to fetch the replication slot information for the user, and what should or
> should not be included in the documentation. I am in favor of documenting the
> process of selecting the wal start, and maybe include a --start-lsn option for
> the user to override it, but that's maybe for another patch.

Let's hear from others.

Thanks for the patches. I have some quick comments on the v5 patch-set:

0001:
1) Do you also want to MemSet values too in ReadReplicationSlot?

2) When if clause has single statement we don't generally use "{" and "}"
+ if (slot == NULL || !slot->in_use)
+ {
+ LWLockRelease(ReplicationSlotControlLock);
+ }
you can just have:
+ if (slot == NULL || !slot->in_use)
+ LWLockRelease(ReplicationSlotControlLock);

3) This is still not copying the slot contents, as I said earlier you
can just take the required info into some local variables instead of
taking the slot pointer to slot_contents pointer.
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

4) As I said earlier, why can't we have separate tli variables for
more readability instead of just one slots_position_timeline and
timeline_history variable? And you are not even resetting those 2
variables after if
(!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
up sending the restart_lsn timelineid for confirmed_flush. So, better
use two separate variables. In fact you can use block local variables:
+ if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+ {
+ List*tl_history= NIL;
+ TimeLineID tli;
+ tl_history= readTimeLineHistory(ThisTimeLineID);
+ tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
+   tl_history);
+ values[i] = Int32GetDatum(tli);
+ nulls[i] = false;
+ }
similarly for confirmed_flush.

5) I still don't see the need for below test case:
+ "READ_REPLICATION_SLOT does not produce an error with non existent slot");
when we have
+ "READ_REPLICATION_SLOT does not produce an error with dropped slot");
Because for a user, dropped or non existent slot is same, it's just
that for dropped slot we internally don't delete its entry from the
shared memory.

0002:
1) Imagine GetSlotInformation always returns
READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
infinite loop there? Instead, why don't you just exit(1); instead of
return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
think, you can just do exit(1), no need to retry.
+ case READ_REPLICATION_SLOT_ERROR:
+
+ /*
+ * Error has been logged by GetSlotInformation, return and
+ * maybe retry
+ */
+ return;

2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
passed? And why are you having this check after you connect to the
server and fetch the data?
+ /* If no slotinformation has been passed, we can return immediately */
+ if (slot_info == NULL)
+ {
+ PQclear(res);
+ return READ_REPLICATION_SLOT_OK;
+ }
Instead you can just have a single assert:

+ Assert(slot_name && slot_info );

3) How about
pg_log_error("could not read replication slot:
instead of
pg_log_error("could not fetch replication slot:

4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
+ default:
+ if (slot_info.is_logical)
+ {
+ /*
+ * If the slot is not physical we can't expect to
+ * recover from that
+ */
+ pg_log_error("cannot use the slot provided, physical slot expected.");
+ exit(1);
+ }
+ stream.startpos = slot_info.restart_lsn;
+ stream.timeline = slot_info.restart_tli;
+ }
You can just have another case statement for READ_REPLICATION_SLOT_OK
and in the default you can throw an error "unknown read replication
slot status" or some other better working and exit(1);

5) Do you want initialize slot_info to 0?
+ if (replication_slot)
+ {
+ SlotInformation slot_info;
+ MemSet(slot_info, 0, sizeof(SlotInformation));

6) This isn't readable:
+ slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
How about:
if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0)
   slot_info->is_logical = true;
You don't need to set it false, because you would have
MemSet(slot_info) in the caller.

7) How about
uint32 hi;
uint32 lo;
instead of
+ uint32 hi,
+ lo;

8) Move  SlotInformation * slot_info) to the next line as it crosses
the 80 char limit.
+GetSlotInformation(PGconn *conn, const char *slot_name,
SlotInformation * slot_info)

9) Instead of a boolean is_logical, I would rather suggest to use an
enum or #define macros the slot types correctly, because someday we
might introduce new type slots and somebody wants is_physical kind of
variable name?
+typedef struct SlotInformation {
+ boolis_logical;

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-09-03 Thread Ronan Dunklau
Le jeudi 2 septembre 2021, 10:37:20 CEST Michael Paquier a écrit :
> On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote:
> > I could see the use for sending active_pid for use within pg_basebackup:
> > at
> > least we could fail early if the slot is already in use. But at the same
> > time, maybe it won't be in use anymore once we need it.
> 
> There is no real concurrent protection with this design.  You could
> read that the slot is not active during READ_REPLICATION_SLOT just to
> find out after in the process of pg_basebackup streaming WAL that it
> became in use in-between.  And the backend-side protections would kick
> in at this stage.
> 
> Hmm.  The logic doing the decision-making with pg_receivewal may
> become more tricky when it comes to pg_replication_slots.wal_status,
> max_slot_wal_keep_size and pg_replication_slots.safe_wal_size.  The
> number of cases we'd like to consider impacts directly the amount of
> data send through READ_REPLICATION_SLOT.  That's not really different
> than deciding of a failure, a success or a retry with active_pid at an
> earlier or a later stage of a base backup.  pg_receivewal, on the
> contrary, can just rely on what the backend tells when it begins
> streaming.  So I'd prefer keeping things simple and limit the number
> of fields a minimum for this command.

Ok. Please find attached new patches rebased on master.*

0001 is yours without any modification.

0002 for pg_receivewal tried to simplify the logic of information to return, 
by using a dedicated struct for this. This accounts for Bahrath's demands to 
return every possible field.
In particular, an is_logical field simplifies the detection of the type of 
slot. 
In my opinion it makes sense to simplify it like this on the client side while 
being more open-minded on the server side if we ever need to provide a new 
type of slot. Also, GetSlotInformation now returns an enum to be able to 
handle the different modes of failures, which differ between pg_receivewal and 
pg_basebackup. 

0003 is the pg_basebackup one, not much changed except for the concerns you 
had about the log message and handling of different failure modes.

There is still the concern raised by Bharath about being able to select the 
way to fetch the replication slot information for the user, and what should or 
should not be included in the documentation. I am in favor of documenting the 
process of selecting the wal start, and maybe include a --start-lsn option for 
the user to override it, but that's maybe for another patch.

-- 
Ronan Dunklau>From 60b8cedb196f5acdd75b489c1d2155c2698084a4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v5 1/3] Add READ_REPLICATION_SLOT command

---
 doc/src/sgml/protocol.sgml  |  66 
 src/backend/replication/repl_gram.y |  16 ++-
 src/backend/replication/repl_scanner.l  |   1 +
 src/backend/replication/walsender.c | 112 
 src/include/nodes/nodes.h   |   1 +
 src/include/nodes/replnodes.h   |  10 ++
 src/test/recovery/t/001_stream_rep.pl   |  47 +++-
 src/test/recovery/t/006_logical_decoding.pl |  15 ++-
 src/tools/pgindent/typedefs.list|   1 +
 9 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..8191f17137 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,72 @@ The commands accepted in replication mode are:
 
   
 
+  
+READ_REPLICATION_SLOT slot_name
+  READ_REPLICATION_SLOT
+
+
+ 
+  Read the information of a replication slot. Returns a tuple with
+  NULL values if the replication slot does not
+  exist.
+ 
+ 
+  In response to this command, the server will return a one-row result set,
+  containing the following fields:
+  
+   
+slot_type (text)
+
+ 
+  The replication slot's type, either physical or
+  logical.
+ 
+
+   
+
+   
+restart_lsn (text)
+
+ 
+  The replication slot's restart_lsn.
+ 
+
+   
+
+   
+confirmed_flush_lsn (text)
+
+ 
+  The replication slot's confirmed_flush_lsn.
+ 
+
+   
+
+   
+restart_tli (int4)
+
+ 
+  The timeline ID for the restart_lsn position,
+  when following the current timeline history.
+ 
+
+   
+
+   
+confirmed_flush_tli (int4)
+
+ 
+  The timeline ID for the confirmed_flush_lsn
+  position, when following the current timeline history.
+ 
+
+   
+  
+ 
+
+  
+
   
 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ]
  START_REPLICATION
diff --git a/src/backend/re

Re: pg_receivewal starting position

2021-09-02 Thread Michael Paquier
On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote:
> I could see the use for sending active_pid for use within pg_basebackup: at 
> least we could fail early if the slot is already in use. But at the same 
> time, 
> maybe it won't be in use anymore once we need it.

There is no real concurrent protection with this design.  You could
read that the slot is not active during READ_REPLICATION_SLOT just to
find out after in the process of pg_basebackup streaming WAL that it
became in use in-between.  And the backend-side protections would kick
in at this stage.

Hmm.  The logic doing the decision-making with pg_receivewal may
become more tricky when it comes to pg_replication_slots.wal_status,
max_slot_wal_keep_size and pg_replication_slots.safe_wal_size.  The
number of cases we'd like to consider impacts directly the amount of
data send through READ_REPLICATION_SLOT.  That's not really different
than deciding of a failure, a success or a retry with active_pid at an
earlier or a later stage of a base backup.  pg_receivewal, on the
contrary, can just rely on what the backend tells when it begins
streaming.  So I'd prefer keeping things simple and limit the number
of fields a minimum for this command.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-09-02 Thread Ronan Dunklau
Le jeudi 2 septembre 2021, 09:28:29 CEST Michael Paquier a écrit :
> On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
> >  wrote in If I read the patch
> > correctly the situation above is warned by the following message then
> > continue to the next step giving up to identify start position from slot
> > data.
> 
> Better to fallback to the past behavior if attempting to use a
> pg_receivewal >= 15 with a PG instance older than 14.
> 
> >> "server does not suport fetching the slot's position, resuming from the
> >> current server position instead"> 
> > (The message looks a bit too long, though..)
> 
> Agreed.  Falling back to a warning is not the best answer we can have
> here, as there could be various failure types, and for some of them a
> hard failure is adapted;
> - Failure in the backend while running READ_REPLICATION_SLOT.  This
> should imply a hard failure, no?
> - Slot that does not exist.  In this case, we could fall back to the
> current write position of the server.
> 
> by default if the slot information cannot be retrieved.
> Something that's disturbing me in patch 0002 is that we would ignore
> the results of GetSlotInformation() if any error happens, even if
> there is a problem in the backend, like an OOM.  We should be careful
> about the semantics here.

Ok !

> 
> > However, if the operator doesn't know the server is old, pg_receivewal
> > starts streaming from unexpected position, which is a kind of
> > disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> > an option to explicitly specify how to determine the start position.
> > 
> > --start-source=[server,wal,slot]  specify starting-LSN source, default is
> > 
> >  trying all of them in the order of wal, slot, server.
> > 
> > I don't think the option doesn't need to accept multiple values at once.
> 
> What is the difference between "wal" and "server"?  "wal" stands for
> the start position of the set of files stored in the location
> directory, and "server" is the location that we'd receive from the
> server?  I don't think that we need that because, when using a slot,
> we know that we can rely on the LSN that the slot retains for
> pg_receivewal as that should be the same point as what has been
> streamed last.  Could there be an argument instead for changing the
> default and rely on the slot information rather than scanning the
> local WAL archive path for the start point when using --slot?  When
> using pg_receivewal as a service, relying on a scan of the WAL archive
> directory if there is no slot and fallback to an invalid LSN if there
> is nothing is fine by me, but I think that just relying on the slot
> information is saner as the backend makes sure that nothing is
> missing.  That's also more useful when streaming changes from a single
> slot from multiple locations (stream to location 1 with a slot, stop
> pg_receivewal, stream to location 2 that completes 1 with the same
> slot).

One benefit I see from first trying to get it from the local WAL stream is that 
we may end up in a state where it has been flushed to disk but we couldn't 
advance the replication slot. In that case it is better to resume from the 
point on disk. Maybe taking the max(slot_lsn, local_file_lsn) would work best 
for the use case you're describing.

> 
> +pg_log_error("Slot \"%s\" is not a physical replication slot",
> + replication_slot);
> In 0003, the format of this error is not really project-like.
> Something like that perhaps would be more adapted:
> "cannot use the slot provided, physical slot expected."
> 
> I am not really convinced about the need of getting the active state
> and the PID used in the backend when fetcing the slot data,
> particularly if that's just for some frontend-side checks.  The
> backend has safeguards already for all that.

I could see the use for sending active_pid for use within pg_basebackup: at 
least we could fail early if the slot is already in use. But at the same time, 
maybe it won't be in use anymore once we need it.

> 
> While looking at that, I have applied de1d4fe to add
> PostgresNode::command_fails_like(), coming from 0003, and put my hands
> on 0001 as per the attached, as the starting point.  That basically
> comes down to all the points raised upthread, plus some tweaks for
> things I bumped into to get the semantics of the command to what looks
> like the right shape.

Thanks, I was about to send a new patchset with basically the same thing. It 
would be nice to know we work on the same thing concurrently in the future to 
avoid duplicate efforts. I'll rebase and send the updated version for patches 
0002 and 0003 of my original proposal once we reach an agreement over the 
behaviour / options of pg_receivewal.

Also considering the number of different fields to be filled by the 
GetSlotInformation function, my local branch groups them into a dedicate

Re: pg_receivewal starting position

2021-09-02 Thread Michael Paquier
On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy 
>  wrote in 
> If I read the patch correctly the situation above is warned by the
> following message then continue to the next step giving up to identify
> start position from slot data.

Better to fallback to the past behavior if attempting to use a
pg_receivewal >= 15 with a PG instance older than 14.

>> "server does not suport fetching the slot's position, resuming from the 
>> current server position instead"
> 
> (The message looks a bit too long, though..)

Agreed.  Falling back to a warning is not the best answer we can have
here, as there could be various failure types, and for some of them a
hard failure is adapted;
- Failure in the backend while running READ_REPLICATION_SLOT.  This
should imply a hard failure, no?
- Slot that does not exist.  In this case, we could fall back to the
current write position of the server.  

by default if the slot information cannot be retrieved.
Something that's disturbing me in patch 0002 is that we would ignore
the results of GetSlotInformation() if any error happens, even if
there is a problem in the backend, like an OOM.  We should be careful
about the semantics here.

> However, if the operator doesn't know the server is old, pg_receivewal
> starts streaming from unexpected position, which is a kind of
> disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> an option to explicitly specify how to determine the start position.
> 
> --start-source=[server,wal,slot]  specify starting-LSN source, default is
>  trying all of them in the order of wal, slot, server. 
> 
> I don't think the option doesn't need to accept multiple values at once.

What is the difference between "wal" and "server"?  "wal" stands for
the start position of the set of files stored in the location
directory, and "server" is the location that we'd receive from the
server?  I don't think that we need that because, when using a slot,
we know that we can rely on the LSN that the slot retains for
pg_receivewal as that should be the same point as what has been
streamed last.  Could there be an argument instead for changing the
default and rely on the slot information rather than scanning the
local WAL archive path for the start point when using --slot?  When
using pg_receivewal as a service, relying on a scan of the WAL archive
directory if there is no slot and fallback to an invalid LSN if there
is nothing is fine by me, but I think that just relying on the slot
information is saner as the backend makes sure that nothing is
missing.  That's also more useful when streaming changes from a single
slot from multiple locations (stream to location 1 with a slot, stop
pg_receivewal, stream to location 2 that completes 1 with the same
slot).

+pg_log_error("Slot \"%s\" is not a physical replication slot",
+ replication_slot);
In 0003, the format of this error is not really project-like.
Something like that perhaps would be more adapted:
"cannot use the slot provided, physical slot expected."

I am not really convinced about the need of getting the active state
and the PID used in the backend when fetcing the slot data,
particularly if that's just for some frontend-side checks.  The
backend has safeguards already for all that.

While looking at that, I have applied de1d4fe to add 
PostgresNode::command_fails_like(), coming from 0003, and put my hands
on 0001 as per the attached, as the starting point.  That basically
comes down to all the points raised upthread, plus some tweaks for
things I bumped into to get the semantics of the command to what looks
like the right shape.
--
Michael
From adbd72f70ee3592965f2a52500820d1387dcbf85 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v4] Add READ_REPLICATION_SLOT command

---
 src/include/nodes/nodes.h   |   1 +
 src/include/nodes/replnodes.h   |  10 ++
 src/backend/replication/repl_gram.y |  16 ++-
 src/backend/replication/repl_scanner.l  |   1 +
 src/backend/replication/walsender.c | 112 
 src/test/recovery/t/001_stream_rep.pl   |  47 +++-
 src/test/recovery/t/006_logical_decoding.pl |  15 ++-
 doc/src/sgml/protocol.sgml  |  66 
 src/tools/pgindent/typedefs.list|   1 +
 9 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..5f78bdd573 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -495,6 +495,7 @@ typedef enum NodeTag
 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 */
T_IdentifySystemCmd,
+   T_ReadReplicationSlotCmd,
T_BaseBackupCmd,
T_CreateReplicationSlotCmd,
T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/in

Re: pg_receivewal starting position

2021-09-02 Thread Ronan Dunklau
Le jeudi 2 septembre 2021, 08:42:22 CEST Bharath Rupireddy a écrit :
> On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
> 
>  wrote:
> > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
> >  wrote in> 
> > > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  
wrote:
> > > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an
> > > > > option?
> > > > 
> > > > From my point of view, I already expected it to use something like
> > > > that when using a replication slot. Maybe an option to turn it off
> > > > could be useful ?> > 
> > > IMO, pg_receivewal should have a way to turn off/on using
> > > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
> > > READ_REPLICATION_SLOT (a lower version) but for some reasons the
> > > pg_receivewal(running separately) is upgraded to the version that uses
> > > READ_REPLICATION_SLOT, knowing that the server doesn't support
> > > READ_REPLICATION_SLOT why should user let pg_receivewal run an
> > > unnecessary code?
> > 
> > If I read the patch correctly the situation above is warned by the
> > following message then continue to the next step giving up to identify
> > start position from slot data.
> > 
> > > "server does not suport fetching the slot's position, resuming from the
> > > current server position instead"> 
> > (The message looks a bit too long, though..)
> > 
> > However, if the operator doesn't know the server is old, pg_receivewal
> > starts streaming from unexpected position, which is a kind of
> > disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> > an option to explicitly specify how to determine the start position.
> > 
> > --start-source=[server,wal,slot]  specify starting-LSN source, default is
> > 
> >  trying all of them in the order of wal, slot, server.
> > 
> > I don't think the option doesn't need to accept multiple values at once.
> 
> If --start-source = 'wal' fails, then pg_receivewal should show up an
> error saying "cannot find start position from <>
> directory, try with "server" or "slot" for --start-source". We might
> end having similar errors for other options as well. Isn't this going
> to create unnecessary complexity?
> 
> The existing way the pg_receivewal fetches start pos i.e. first from
> wal directory and then from server start position, isn't known to the
> user at all, no verbose message or anything specified in the docs. Why
> do we need to expose this with the --start-source option? IMO, we can
> keep it that way and we can just have a way to turn off the new
> behaviour that we are proposing here, i.e.fetching the start position
> from the slot's restart_lsn.

Then it should probably be documented. We write in the docs that it is 
strongly recommended to use a replication slot, but do not mention how we 
resume from have been already processed.

If someone really cares about having control over how the start position is 
defined instead of relying on the auto detection, it would be wiser to add a --
startpos parameter similar to the endpos one, which would override everything 
else, instead of different flags for different behaviours.

Regards,

-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-09-01 Thread Bharath Rupireddy
On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
 wrote:
>
> At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy 
>  wrote in
> > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  
> > wrote:
> > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an 
> > > > option?
> > >
> > > From my point of view, I already expected it to use something like that 
> > > when
> > > using a replication slot. Maybe an option to turn it off could be useful ?
> >
> > IMO, pg_receivewal should have a way to turn off/on using
> > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
> > READ_REPLICATION_SLOT (a lower version) but for some reasons the
> > pg_receivewal(running separately) is upgraded to the version that uses
> > READ_REPLICATION_SLOT, knowing that the server doesn't support
> > READ_REPLICATION_SLOT why should user let pg_receivewal run an
> > unnecessary code?
>
> If I read the patch correctly the situation above is warned by the
> following message then continue to the next step giving up to identify
> start position from slot data.
>
> > "server does not suport fetching the slot's position, resuming from the 
> > current server position instead"
>
> (The message looks a bit too long, though..)
>
> However, if the operator doesn't know the server is old, pg_receivewal
> starts streaming from unexpected position, which is a kind of
> disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> an option to explicitly specify how to determine the start position.
>
> --start-source=[server,wal,slot]  specify starting-LSN source, default is
>  trying all of them in the order of wal, slot, server.
>
> I don't think the option doesn't need to accept multiple values at once.

If --start-source = 'wal' fails, then pg_receivewal should show up an
error saying "cannot find start position from <>
directory, try with "server" or "slot" for --start-source". We might
end having similar errors for other options as well. Isn't this going
to create unnecessary complexity?

The existing way the pg_receivewal fetches start pos i.e. first from
wal directory and then from server start position, isn't known to the
user at all, no verbose message or anything specified in the docs. Why
do we need to expose this with the --start-source option? IMO, we can
keep it that way and we can just have a way to turn off the new
behaviour that we are proposing here, i.e.fetching the start position
from the slot's restart_lsn.

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-09-01 Thread Kyotaro Horiguchi
At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy 
 wrote in 
> On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  wrote:
> > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
> >
> > From my point of view, I already expected it to use something like that when
> > using a replication slot. Maybe an option to turn it off could be useful ?
> 
> IMO, pg_receivewal should have a way to turn off/on using
> READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
> READ_REPLICATION_SLOT (a lower version) but for some reasons the
> pg_receivewal(running separately) is upgraded to the version that uses
> READ_REPLICATION_SLOT, knowing that the server doesn't support
> READ_REPLICATION_SLOT why should user let pg_receivewal run an
> unnecessary code?

If I read the patch correctly the situation above is warned by the
following message then continue to the next step giving up to identify
start position from slot data.

> "server does not suport fetching the slot's position, resuming from the 
> current server position instead"

(The message looks a bit too long, though..)

However, if the operator doesn't know the server is old, pg_receivewal
starts streaming from unexpected position, which is a kind of
disaster. So I'm inclined to agree to Bharath, but rather I imagine of
an option to explicitly specify how to determine the start position.

--start-source=[server,wal,slot]  specify starting-LSN source, default is
 trying all of them in the order of wal, slot, server. 

I don't think the option doesn't need to accept multiple values at once.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_receivewal starting position

2021-08-31 Thread Bharath Rupireddy
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  wrote:
> > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?
>
> From my point of view, I already expected it to use something like that when
> using a replication slot. Maybe an option to turn it off could be useful ?

IMO, pg_receivewal should have a way to turn off/on using
READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
READ_REPLICATION_SLOT (a lower version) but for some reasons the
pg_receivewal(running separately) is upgraded to the version that uses
READ_REPLICATION_SLOT, knowing that the server doesn't support
READ_REPLICATION_SLOT why should user let pg_receivewal run an
unnecessary code?

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-08-31 Thread Michael Paquier
On Mon, Aug 30, 2021 at 11:55:42AM +0200, Ronan Dunklau wrote:
> Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :
>> +   if (slot == NULL || !slot->in_use)  
>>
>> [...] +
>>   ereport(ERROR,
>> +   (errcode(ERRCODE_UNDEFINED_OBJECT),
>> +errmsg("replication slot \"%s\" does not exist",
>> +   cmd->slotname)));
>> [...]
>> +   if (PQntuples(res) == 0)
>> +   {
>> +   pg_log_error("replication slot %s does not exist",
>> slot_name); +   PQclear(0);
>> +   return false;
>> So, the backend and ReadReplicationSlot() report an ERROR if a slot
>> does not exist but pg_basebackup's GetSlotInformation() does the same
>> if there are no tuples returned.  That's inconsistent.  Wouldn't it be
>> more instinctive to return a NULL tuple instead if the slot does not
>> exist to be able to check after real ERRORs in frontends using this
>> interface?  
> 
> The attached patch returns no tuple at all when the replication slot doesn't 
> exist. I'm not sure if that's what you meant by returning a NULL tuple ? 

Just return a tuple filled only with NULL values.  I would tend to
code things so as we set all the flags of nulls[] to true by default,
remove has_value and define the number of columns in a #define, as of:
#define READ_REPLICATION_SLOT_COLS 5
[...]
Datum   values[READ_REPLICATION_SLOT_COLS];
boolnulls[READ_REPLICATION_SLOT_COLS];
[...]
MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
Assert(i == READ_REPLICATION_SLOT_COLS);// when filling values.

This would make ReadReplicationSlot() cleaner by removing all the
else{} blocks coded now to handle the NULL values, and that would be
more in-line with the documentation where we state that one tuple is
returned.  Note that this is the same kind of behavior for similar
in-core functions where objects are queried if they don't exist.

I would also suggest a reword of some of the docs, say:
+
+ 
+  Read the information of a replication slot. Returns a tuple with
+  NULL values if the replication slot does not
+  exist.
+ 

> 
>> A slot in use exists, so the error is a bit confusing here
>> anyway, no?
> 
> From my understanding, a slot  *not* in use doesn't exist anymore, as such I 
> don't really understand this point. Could you clarify ?

Yeah, sorry about that.  I did not recall the exact meaning of
in_use.  Considering the slot as undefined if the flag is false is the
right thing to do.

> I was thinking that maybe instead of walking back the timeline history from 
> where we currently are on the server, we could allow an additional argument 
> for the client to specify which timeline it wants. But I guess a replication 
> slot can not be present for a past, divergent timeline ? I have removed that 
> suggestion.

The parent TLI history is linear, so I'd find that a bit strange in
concept, FWIW.

>> -   'slot0'
>> +   'slot0', '-p',
>> +   "$port"
>> Something we are missing here?
> 
> The thing we're missing here is a wrapper for command_fails_like. I've added 
> this to PostgresNode.pm.

It may be better to apply this bit separately, then.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-08-31 Thread Ronan Dunklau
Le mardi 31 août 2021, 13:17:22 CEST Bharath Rupireddy a écrit :
> On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  
wrote:
> > Thank you for this review ! Please see the other side of the thread where
> > I
> > answered Michael Paquier with a new patchset, which includes some of your
> > proposed modifications.
> 
> Thanks for the updated patches. Here are some comments on v3-0001
> patch. I will continue to review 0002 and 0003.

Thank you ! I will send a new version shortly, once I address your remarks 
concerning patch 0002 (and hopefully 0003 :-) ) 
> 
> 1) Missing full stops "." at the end.
> +   logical
> +   when following the current timeline history
> +   position, when following the current timeline history
> 

Good catch, I will take care of it for the next version.

> 2) Can we have the "type" column as "slot_type" just to keep in sync
> with the pg_replication_slots view?

You're right, it makes more sense like this.

> 
> 3) Can we mention the link to pg_replication_slots view in the columns
> - "type", "restart_lsn", "confirmed_flush_lsn"?
> Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
> same as   linkend="view-pg-replication-slots">pg_replication_slots tname> view.

Same as above, thanks.

> 
> 4) Can we use "read_replication_slot" instead of
> "identify_replication_slot", just to be in sync with the actual
> command?

That must have been a leftover from an earlier version of the patch, I will fix 
it also.

> 
> 5) Are you actually copying the slot contents into the slot_contents
> variable here? Isn't just taking the pointer to the shared memory?
> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(&slot->mutex);
> + slot_contents = *slot;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);
> 
> You could just do:
> + Oid dbid;
> + XLogRecPtr restart_lsn;
> + XLogRecPtr confirmed_flush;
> 
> + /* Copy the required slot contents */
> + SpinLockAcquire(&slot->mutex);
> + dbid = slot.data.database;
> + restart_lsn = slot.data.restart_lsn;
> + confirmed_flush = slot.data.confirmed_flush;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);

It's probably simpler that way.

> 
> 6) It looks like you are not sending anything as a response to the
> READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
> You are just calling end_tup_output which just calls rShutdown (points
> to donothingCleanup of printsimpleDR)
> if (has_value)
> do_tup_output(tstate, values, nulls);
> end_tup_output(tstate);

> 
> Can you test the use case where the pg_receivewal asks
> READ_REPLICATION_SLOT with a non-existent replication slot and see
> with your v3 patch how it behaves?

> 
> Why don't you remove has_value flag and do this in ReadReplicationSlot:
> Datum values[5];
> bool nulls[5];
> MemSet(values, 0, sizeof(values));
> MemSet(nulls, 0, sizeof(nulls));
> 
> + dest = CreateDestReceiver(DestRemoteSimple);
> + tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
> + do_tup_output(tstate, values, nulls);
> + end_tup_output(tstate);

As for the API, I think it's cleaner to just send an empty result set instead 
of a null tuple in that case but I won't fight over it if there is consensus on 
having an all-nulls tuple value instead. 

There is indeed a bug, but not here, in the second patch: I still test the 
slot type even when we didn't fetch anything. So I will add a test for that 
too. 
> 
> 7) Why don't we use 2 separate variables "restart_tli",
> "confirmed_flush_tli" instead of "slots_position_timeline", just to be
> more informative?

You're right.

> 
> 8) I still have the question like, how can a client (pg_receivewal for
> instance) know that it is the current owner/user of the slot it is
> requesting the info? As I said upthread, why don't we send "active"
> and "active_pid" fields of the pg_replication_slots view?
> Also, it would be good to send the "wal_status" field so that the
> client can check if the "wal_status" is not "lost"?

 As for pg_receivewal, it can only check that it's not active at that time, 
since we only aquire the replication slot once we know the start_lsn. For the 
basebackup case it's the same thing as we only want to check if it exists. 
As such, I didn't add them as I didn't see the need, but if it can be useful 
why not ? I will do that in the next version.

> 
> 9) There are 2 new lines at the end of ReadReplicationSlot. We give
> only one new line after each function definition.
> end_tup_output(tstate);
> }
> <<1stnewline>>
> <<2ndnewline>>
> /*
>  * Handle TIMELINE_HISTORY command.
>  */
> 

Ok !


> 10) Why do we need to have two test cases for "non-existent" slots?
> Isn't the test case after "DROP REPLICATION" enough?
> +($ret, $stdout, $stderr) = $node_primary->psql(
> +  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
> +  extra_params => [ '-d', $connstr_rep ]);
> +ok( $ret == 0,
> +  "READ_REPL

Re: pg_receivewal starting position

2021-08-31 Thread Bharath Rupireddy
On Tue, Aug 31, 2021 at 4:47 PM Bharath Rupireddy
 wrote:
>
> On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  wrote:
> > Thank you for this review ! Please see the other side of the thread where I
> > answered Michael Paquier with a new patchset, which includes some of your
> > proposed modifications.
>
> Thanks for the updated patches. Here are some comments on v3-0001
> patch. I will continue to review 0002 and 0003.

Continuing my review on the v3 patch set:

0002:
1) I think the following message
"could not fetch replication slot LSN: got %d rows and %d fields,
expected %d rows and %d or more fields"
should be:
"could not read replication slot: got %d rows and %d fields, expected
%d rows and %d or more fields"

2) Why GetSlotInformation is returning InvalidXLogRecPtr? Change it to
return false instead.

3) Alignment issue:
Change below 2 lines:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s",
+   slot_name);
To 1 line, as it will be under 80 char limit:
+ appendPQExpBuffer(query, "READ_REPLICATION_SLOT %s", slot_name);

4) Shouldn't your GetSlotInformation return what ever the
READ_REPLICATION_SLOT gets as output columns to be more generic?
Callers would pass non-null/null pointers to the inputs required/not
required for them. Please refer to RunIdentifySystem how it does that.
GetSlotInformation can just read the tuples that are interested for the callers.
bool
GetSlotInformation(PGconn *conn, const char *slot_name, char **slot_type,
   XLogRecPtr *restart_lsn, uint32 *restart_lsn_tli,
   XLogRecPtr *confirmed_lsn, XLogRecPtr *confirmed_lsn_tli)
{
if (slot_type)
{
/* get the slot_type value from the received tuple */
}
if (restart_lsn)
{
/* get the restart_lsn value from the received tuple */
}
if (restart_lsn_tli)
{
/* get the restart_lsn_tli value from the received tuple */
}

if (confirmed_lsn)
{
/* get the confirmed_lsn value from the received tuple */
}

if (confirmed_lsn_tli)
{
/* get the confirmed_lsn_tli value from the received tuple */
}
}

5) How about below as GetSlotInformation function comment?
/*
 * Run READ_REPLICATION_SLOT through a given connection and give back to caller
 * following information of the slot if requested:
 * - type
 * - restart lsn
 * - restart lsn timeline
 * - confirmed lsn
 * - confirmed lsn timeline
 */

6) Do you need +#include "pqexpbuffer.h" in pg_receivewal.c?

7) Missing "," after information and it is not required to use "the"
in the messages.
Change below
+ pg_log_warning("could not fetch the replication_slot \"%s\" information "
+"resuming from the current server position instead", replication_slot);
to:
+ pg_log_warning("could not fetch replication_slot \"%s\" information, "
+"resuming from current server position instead", replication_slot);

8) A typo "suport". Ignore this comment, if you incorporate review comment #10.
Change below
pg_log_warning("server does not suport fetching the slot's position, "
   "resuming from the current server position instead");
to:
pg_log_warning("server does not support getting start LSN from
replication slot, "
   "resuming from current server position instead");

9) I think you should do free the memory allocated to slot_type by
GetSlotInformation:
+ if (strcmp(slot_type, "physical") != 0)
+ {
+ pg_log_error("slot \"%s\" is not a physical replication slot",
+ replication_slot);
+ exit(1);
+ }
+
+ pg_free(slot_type);

10) Isn't it PQclear(res);?
+ PQclear(0);

11) I don't think you need to check for the null value of
replication_slot. In the StreamLog it can't be null, so you can safely
remove the below if condition.
+ if (replication_slot)
+ {

12) How about
/* Try to get start position from server's replication slot information */
insted of
+ /* Try to get it from the slot if any, and the server supports it */

13) When you say that the server supports the new
READ_REPLICATION_SLOT command only if version >= 15000, then isn't it
the function GetSlotInformation doing the following:
bool
GetSlotInformation(PGconn *conn,,bool *is_supported)
{

if (PQserverVersion(conn) < 15000)
{
*is_supported = false;
return false;
}
*is_supported = true;
}
So, the callers will just do:
/* Try to get start position from server's replication slot information */
char*slot_type = NULL;
bool is_supported;

if (!GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline, &slot_type, &is_supported))
{
if (!is_supported)
pg_log_warning("server does not support getting start LSN from
replication slot, "
   "resuming from current server position instead");
else
pg_log_warning("could not fetch replication_slot \"%s\" information, "
   "resuming from current server position instead",
   replication_slot);
}

if (slot_type && strcmp(slot_type, "physical") != 0)
{
pg_log_error("slot \"%s\" is not a physical replication slot",
replication_slot);
exit(1);
}

pg_free(slot_type);
}

14) Instead of just
+ if (strcmp(slot_type, "physical") != 0)
do
+ if (slot_type && strcmp(slot_type, "physical") != 0)

0003:
1) 

Re: pg_receivewal starting position

2021-08-31 Thread Bharath Rupireddy
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  wrote:
> Thank you for this review ! Please see the other side of the thread where I
> answered Michael Paquier with a new patchset, which includes some of your
> proposed modifications.

Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.

1) Missing full stops "." at the end.
+   logical
+   when following the current timeline history
+   position, when following the current timeline history

2) Can we have the "type" column as "slot_type" just to keep in sync
with the pg_replication_slots view?

3) Can we mention the link to pg_replication_slots view in the columns
- "type", "restart_lsn", "confirmed_flush_lsn"?
Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
same as  pg_replication_slots
view.

4) Can we use "read_replication_slot" instead of
"identify_replication_slot", just to be in sync with the actual
command?

5) Are you actually copying the slot contents into the slot_contents
variable here? Isn't just taking the pointer to the shared memory?
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

You could just do:
+ Oid dbid;
+ XLogRecPtr restart_lsn;
+ XLogRecPtr confirmed_flush;

+ /* Copy the required slot contents */
+ SpinLockAcquire(&slot->mutex);
+ dbid = slot.data.database;
+ restart_lsn = slot.data.restart_lsn;
+ confirmed_flush = slot.data.confirmed_flush;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

6) It looks like you are not sending anything as a response to the
READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
You are just calling end_tup_output which just calls rShutdown (points
to donothingCleanup of printsimpleDR)
if (has_value)
do_tup_output(tstate, values, nulls);
end_tup_output(tstate);

Can you test the use case where the pg_receivewal asks
READ_REPLICATION_SLOT with a non-existent replication slot and see
with your v3 patch how it behaves?

Why don't you remove has_value flag and do this in ReadReplicationSlot:
Datum values[5];
bool nulls[5];
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));

+ dest = CreateDestReceiver(DestRemoteSimple);
+ tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+ do_tup_output(tstate, values, nulls);
+ end_tup_output(tstate);

7) Why don't we use 2 separate variables "restart_tli",
"confirmed_flush_tli" instead of "slots_position_timeline", just to be
more informative?

8) I still have the question like, how can a client (pg_receivewal for
instance) know that it is the current owner/user of the slot it is
requesting the info? As I said upthread, why don't we send "active"
and "active_pid" fields of the pg_replication_slots view?
Also, it would be good to send the "wal_status" field so that the
client can check if the "wal_status" is not "lost"?

9) There are 2 new lines at the end of ReadReplicationSlot. We give
only one new line after each function definition.
end_tup_output(tstate);
}
<<1stnewline>>
<<2ndnewline>>
/*
 * Handle TIMELINE_HISTORY command.
 */

10) Why do we need to have two test cases for "non-existent" slots?
Isn't the test case after "DROP REPLICATION" enough?
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok( $stdout eq '',
+"READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

You can just rename the test case name from:
+"READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
to
+"READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-08-30 Thread Ronan Dunklau
Le samedi 28 août 2021, 14:10:25 CEST Bharath Rupireddy a écrit :
> On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau  
wrote:
> > order to fail early if the replication slot doesn't exist, so please find
> > attached v2 for that.
> 
> Thanks for the patches. Here are some comments:
> 

Thank you for this review ! Please see the other side of the thread where I 
answered Michael Paquier with a new patchset, which includes some of your 
proposed modifications.

> 1) While the intent of these patches looks good, I have following
> concern with new replication command READ_REPLICATION_SLOT: what if
> the pg_receivewal exits (because user issued a SIGINT or for some
> reason) after flushing  the received WAL to disk, before it sends
> sendFeedback to postgres server's walsender so that it doesn't get a
> chance to update the restart_lsn in the replication slot via
> PhysicalConfirmReceivedLocation. If the pg_receivewal is started
> again, isn't it going to get the previous restart_lsn and receive the
> last chunk of flushed WAL again?

I've kept the existing directory as the source of truth if we have any WAL 
there already. If we don't, we fallback to the restart_lsn on the replication 
slot.
So in the event that we start it again from somewhere else where we don't have 
access to those WALs anymore, we could be receiving it again, which in my 
opinion is better than losing everything in between in that case. 

> 
> 2) What is the significance of READ_REPLICATION_SLOT for logical
> replication slots? I read above that somebody suggested to restrict
> the walsender to handle READ_REPLICATION_SLOT for physical replication
> slots so that the callers will see a command failure. But I tend to
> think that it is clean to have this command common for both physical
> and logical replication slots and the callers can have an Assert(type
> == 'physical').

I've updated the patch to make it easy for the caller to check the slot's type 
and added a verification for those cases.

In general, I tried to implement the meaning of the different fields exactly as 
it's done in the pg_replication_slots view.

> 
> 3) Isn't it useful to send active, active_pid info of the replication
> slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
> true && active_pid == getpid()) as an assertion to ensure that it is
> the sole owner of the replication slot? Also, is it good send
> wal_status info

Typically we read the slot before attaching to it, since what we want to do is 
check if it exists. It may be worthwile to check that it's not already used by 
another backend though.

> 
> 4) I think below messages should start with lower case letter and also
> there are some typos:
> + pg_log_warning("Could not fetch the replication_slot \"%s\" information "
> + pg_log_warning("Server does not suport fetching the slot's position, "
> something like:
> + pg_log_warning("could not fetch replication slot \"%s\" information, "
> +"resuming from current server position instead", replication_slot);
> + pg_log_warning("server does not support fetching replication slot
> information, "
> +"resuming from current server position instead");
> 
I've rephrased it a bit in v3, let me know if that's what you had in mind.


> 5) How about emitting the above messages in case of "verbose"?

I think it is useful to warn the user even if not in the verbose case, but if 
the consensus is to move it to verbose only output I can change it.

> 
> 6) How about an assertion like below?
> + if (stream.startpos == InvalidXLogRecPtr)
> + {
> + stream.startpos = serverpos;
> + stream.timeline = servertli;
> + }
> +
> +Assert(stream.startpos != InvalidXLogRecPtr)>>

Good idea.

> 
> 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when 
using a replication slot. Maybe an option to turn it off could be useful ? 

> 
> 8) Just an idea, how about we store pg_receivewal's lastFlushPosition
> in a file before pg_receivewal exits and compare it with the
> restart_lsn that it received from the replication slot, if
> lastFlushPosition == received_restart_lsn well and good, if not, then
> something would have happened and we always start at the
> lastFlushPosition ?

The patch idea originally came from the fact that some utility use 
pg_receivewal to fetch WALs, and move them elsewhere. In that sense I don't 
really see what value this brings compared to the existing (and unmodified) way 
of computing the restart position from the already stored files ?

Best regards,

-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-08-30 Thread Ronan Dunklau
Le vendredi 27 août 2021, 05:44:32 CEST Michael Paquier a écrit :
> On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:
> > Following the discussion at [1], I refactored the implementation into
> > streamutil and added a third patch making use of it in pg_basebackup
> > itself in order to fail early if the replication slot doesn't exist, so
> > please find attached v2 for that.
> 
> Thanks for the split.  That helps a lot.
> 

Thank you very much for the review, please find attached an updated patchset.
I've also taken into account some remarks made by Bharath Rupireddy.

> +
> +
>  /*
>   * Run IDENTIFY_SYSTEM through a given connection and give back to caller
> 
> The patch series has some noise diffs here and there, you may want to
> clean up that for clarity.

Ok, sorry about that.

> 
> +   if (slot == NULL || !slot->in_use)
> +   {
> +   LWLockRelease(ReplicationSlotControlLock);
> +
> +   ereport(ERROR,
> +   (errcode(ERRCODE_UNDEFINED_OBJECT),
> LWLocks are released on ERROR, so no need for LWLockRelease() here.
> 

Following your suggestion of not erroring out on an unexisting slot this point 
is no longer be relevant, but thanks for pointing this out anyway.

> +
> +  
> +  Read information about the named replication slot.  This is
> useful to determine which WAL location we should be asking the server
> to start streaming at.
> 
> A nit.  You may want to be more careful with the indentation of the
> documentation.  Things are usually limited in width for readability.
> More  markups would be nice for the field names used in the
> descriptions.

Ok.

> 
> +   if (slot == NULL || !slot->in_use)  
>
> [...] +
>   ereport(ERROR,
> +   (errcode(ERRCODE_UNDEFINED_OBJECT),
> +errmsg("replication slot \"%s\" does not exist",
> +   cmd->slotname)));
> [...]
> +   if (PQntuples(res) == 0)
> +   {
> +   pg_log_error("replication slot %s does not exist",
> slot_name); +   PQclear(0);
> +   return false;
> So, the backend and ReadReplicationSlot() report an ERROR if a slot
> does not exist but pg_basebackup's GetSlotInformation() does the same
> if there are no tuples returned.  That's inconsistent.  Wouldn't it be
> more instinctive to return a NULL tuple instead if the slot does not
> exist to be able to check after real ERRORs in frontends using this
> interface?  

The attached patch returns no tuple at all when the replication slot doesn't 
exist. I'm not sure if that's what you meant by returning a NULL tuple ? 

> A slot in use exists, so the error is a bit confusing here
> anyway, no?

From my understanding, a slot  *not* in use doesn't exist anymore, as such I 
don't really understand this point. Could you clarify ?


> 
> +* XXX: should we allow the caller to specify which target timeline it
> wants +* ?
> +*/
> What are you thinking about here?

I was thinking that maybe instead of walking back the timeline history from 
where we currently are on the server, we could allow an additional argument 
for the client to specify which timeline it wants. But I guess a replication 
slot can not be present for a past, divergent timeline ? I have removed that 
suggestion. 

> 
> -# restarts of pg_receivewal will see this segment as full..
> +# restarts of pg_receivewal will see this segment as full../
> Typo.

Ok.

> 
> +   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4,
> "restart_lsn_timeline", + INT4OID, -1, 0);
> +   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5,
> "confirmed_flush_lsn_timeline", + INT4OID, -1,
> 0);
> I would call these restart_tli and confirmed_flush_tli., without the
> "lsn" part.

Ok.
> 
> The patch for READ_REPLICATION_SLOT could have some tests using a
> connection that has replication=1 in some TAP tests.  We do that in
> 001_stream_rep.pl with SHOW, as one example.

Ok. I added the physical part to 001_stream_rep.pl, using the protocol 
interface directly for creating / dropping the slot, and some tests for 
logical replication slots to 006_logical_decoding.pl.

> 
> -   'slot0'
> +   'slot0', '-p',
> +   "$port"
> Something we are missing here?

The thing we're missing here is a wrapper for command_fails_like. I've added 
this to PostgresNode.pm.

Best regards,


-- 
Ronan Dunklau>From 9fa01789f663975b963c26875b70857055cadb9b Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v3 1/3] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT  command.
This command is used to read information about a replication slot when
using a physical replication connection.

Re: pg_receivewal starting position

2021-08-28 Thread Bharath Rupireddy
On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau  wrote:
> order to fail early if the replication slot doesn't exist, so please find
> attached v2 for that.

Thanks for the patches. Here are some comments:

1) While the intent of these patches looks good, I have following
concern with new replication command READ_REPLICATION_SLOT: what if
the pg_receivewal exits (because user issued a SIGINT or for some
reason) after flushing  the received WAL to disk, before it sends
sendFeedback to postgres server's walsender so that it doesn't get a
chance to update the restart_lsn in the replication slot via
PhysicalConfirmReceivedLocation. If the pg_receivewal is started
again, isn't it going to get the previous restart_lsn and receive the
last chunk of flushed WAL again?

2) What is the significance of READ_REPLICATION_SLOT for logical
replication slots? I read above that somebody suggested to restrict
the walsender to handle READ_REPLICATION_SLOT for physical replication
slots so that the callers will see a command failure. But I tend to
think that it is clean to have this command common for both physical
and logical replication slots and the callers can have an Assert(type
== 'physical').

3) Isn't it useful to send active, active_pid info of the replication
slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
true && active_pid == getpid()) as an assertion to ensure that it is
the sole owner of the replication slot? Also, is it good send
wal_status info

4) I think below messages should start with lower case letter and also
there are some typos:
+ pg_log_warning("Could not fetch the replication_slot \"%s\" information "
+ pg_log_warning("Server does not suport fetching the slot's position, "
something like:
+ pg_log_warning("could not fetch replication slot \"%s\" information, "
+"resuming from current server position instead", replication_slot);
+ pg_log_warning("server does not support fetching replication slot
information, "
+"resuming from current server position instead");

5) How about emitting the above messages in case of "verbose"?

6) How about an assertion like below?
+ if (stream.startpos == InvalidXLogRecPtr)
+ {
+ stream.startpos = serverpos;
+ stream.timeline = servertli;
+ }
+
+Assert(stream.startpos != InvalidXLogRecPtr)>>

7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

8) Just an idea, how about we store pg_receivewal's lastFlushPosition
in a file before pg_receivewal exits and compare it with the
restart_lsn that it received from the replication slot, if
lastFlushPosition == received_restart_lsn well and good, if not, then
something would have happened and we always start at the
lastFlushPosition ?

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-08-26 Thread Michael Paquier
On Thu, Aug 26, 2021 at 02:14:27PM +0200, Ronan Dunklau wrote:
> Following the discussion at [1], I refactored the implementation into 
> streamutil and added a third patch making use of it in pg_basebackup itself 
> in 
> order to fail early if the replication slot doesn't exist, so please find 
> attached v2 for that.

Thanks for the split.  That helps a lot.

+
+
 /*
  * Run IDENTIFY_SYSTEM through a given connection and give back to caller

The patch series has some noise diffs here and there, you may want to
clean up that for clarity.

+   if (slot == NULL || !slot->in_use)
+   {
+   LWLockRelease(ReplicationSlotControlLock);
+
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
LWLocks are released on ERROR, so no need for LWLockRelease() here.

+
+  
+  Read information about the named replication slot.  This is
useful to determine which WAL location we should be asking the server
to start streaming at.

A nit.  You may want to be more careful with the indentation of the
documentation.  Things are usually limited in width for readability.
More  markups would be nice for the field names used in the
descriptions.

+   if (slot == NULL || !slot->in_use)  

 [...]
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("replication slot \"%s\" does not exist",
+   cmd->slotname)));
[...]
+   if (PQntuples(res) == 0)
+   {
+   pg_log_error("replication slot %s does not exist", slot_name);
+   PQclear(0);
+   return false;
So, the backend and ReadReplicationSlot() report an ERROR if a slot
does not exist but pg_basebackup's GetSlotInformation() does the same
if there are no tuples returned.  That's inconsistent.  Wouldn't it be
more instinctive to return a NULL tuple instead if the slot does not
exist to be able to check after real ERRORs in frontends using this
interface?  A slot in use exists, so the error is a bit confusing here
anyway, no?

+* XXX: should we allow the caller to specify which target timeline it wants
+* ?
+*/
What are you thinking about here?

-# restarts of pg_receivewal will see this segment as full..
+# restarts of pg_receivewal will see this segment as full../
Typo.

+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "restart_lsn_timeline",
+ INT4OID, -1, 0);
+   TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 5, 
"confirmed_flush_lsn_timeline",
+ INT4OID, -1, 0);
I would call these restart_tli and confirmed_flush_tli., without the
"lsn" part.

The patch for READ_REPLICATION_SLOT could have some tests using a
connection that has replication=1 in some TAP tests.  We do that in
001_stream_rep.pl with SHOW, as one example.

-   'slot0'
+   'slot0', '-p',
+   "$port"
Something we are missing here?
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-08-26 Thread Ronan Dunklau
Le jeudi 29 juillet 2021, 11:09:40 CEST Ronan Dunklau a écrit :
> Patch 0001 adds the new READ_REPLICATION_SLOT command.
> It returns for a given slot the type, restart_lsn, flush_lsn,
> restart_lsn_timeline and flush_lsn_timeline.
> The timelines are determined by reading the current timeline history, and
> finding the timeline where we may find the record. I didn't find explicit
> test for eg IDENTIFY_SYSTEM so didn't write one either for this new
> command, but it is tested indirectly in patch 0002.
> 
> Patch 0002 makes pg_receivewal use that command if we use a replication slot
> and the command is available, and use the restart_lsn and
> restart_lsn_timeline as a starting point. It also adds a small test to
> check that we start back from the previous restart_lsn instead of the
> current flush position when our destination directory does not contain any
> WAL file.
> 
> I also noticed we don't test following a timeline switch. It would probably
> be good to add that, both for the case where we determine the previous
> timeline from the archived segments and when it comes from the new command.
> What do you think ?

Following the discussion at [1], I refactored the implementation into 
streamutil and added a third patch making use of it in pg_basebackup itself in 
order to fail early if the replication slot doesn't exist, so please find 
attached v2 for that.

Best regards,

[1]: https://www.postgresql.org/message-id/flat/
CAD21AoDYmv0yJMQnWtCx_kZGwVZnkQSTQ1re2JNSgM0k37afYQ%40mail.gmail.com

-- 
Ronan Dunklau>From fff8786049326864d3ef8fe4539e1829f933f32f Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v2 1/3] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT  command.
This command is used to read information about a replication slot when
using a physical replication connection.

In this first version it returns the slot type, restart_lsn, flush_lsn and
the timeline of the restart_lsn and flush_lsn, which are obtained by following the
current timeline history.
---
 doc/src/sgml/protocol.sgml |  62 +++
 src/backend/replication/repl_gram.y|  18 -
 src/backend/replication/repl_scanner.l |   1 +
 src/backend/replication/walsender.c| 106 +
 src/include/nodes/nodes.h  |   1 +
 src/include/nodes/replnodes.h  |  10 +++
 6 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a232546b1d..6207171426 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2052,6 +2052,68 @@ The commands accepted in replication mode are:
 
   
 
+  
+READ_REPLICATION_SLOT slot_name
+  READ_REPLICATION_SLOT
+
+
+  
+  Read information about the named replication slot.  This is useful to determine which WAL location we should be asking the server to start streaming at.
+  
+  
+  In response to this command, the server will return a one-row result set, containing the following fields:
+
+  
+type (text)
+
+  
+   The replication slot's type, either physical or logical
+  
+
+  
+
+  
+restart_lsn (text)
+
+  
+   The replication slot's restart_lsn.
+  
+
+  
+
+  
+confirmed_flush_lsn (text)
+
+  
+   The replication slot's confirmed_flush LSN.
+  
+
+  
+
+  
+restart_lsn_timeline (int4)
+
+  
+   The timeline ID for the restart_lsn position, when following the current timeline
+   history
+  
+
+  
+
+  
+confirmed_flush_lsn_timeline (int4)
+
+  
+   The timeline ID for the confirmed_flush_lsn position, when following the current timeline
+   history
+  
+
+  
+
+  
+
+  
+
   
 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE tli ]
  START_REPLICATION
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index e1e8ec29cc..7298f44008 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type 	command
 %type 	base_backup start_replication start_logical_replication
 create_replication_slot drop_replication_slot identify_system
-timeline_history show sql_cmd
+identify_repl

Re: pg_receivewal starting position

2021-07-29 Thread Ronan Dunklau
Le jeudi 29 juillet 2021, 07:32:37 CEST Kyotaro Horiguchi a écrit :
> I didn't thought in details.  But I forgot that ordinary SQL commands
> have been prohibited in physical replication connection.  So we need a
> new replication command but it's not that a big deal.

Thank you for your feedback !


> 
> > I don't see any reason not to make it work for logical replication
> > connections / slots, but it wouldn't be that useful since we can query
> > the database in that case.
> 
> Ordinary SQL queries are usable on a logical replication slot so
> I'm not sure how logical replication connection uses the command.
> However, like you, I wouldn't bother restricting the command to
> physical replication, but perhaps the new command should return the
> slot type.
> 

Ok done in the attached patch.

> 
> I'm not sure it's worth adding complexity for such strictness.
> START_REPLICATION safely fails if someone steals the slot meanwhile.
> In the first place there's no means to protect a slot from others
> while idle.  One possible problem is the case where START_REPLICATION
> successfully acquire the slot after the new command failed.  But that
> case doesn't seem worse than the case someone advances the slot while
> absence.  So I think READ_REPLICATION_SLOT is sufficient.
> 

Ok, I implemented it like this. I tried to follow the pg_get_replication_slots 
approach with regards to how to prevent concurrent modification while reading 
the slot.

> > From pg_receivewal point of view, this would amount to:
> >  - check if we currently have wal in the target directory.
> >  
> >- if we do, proceed as currently done, by computing the start lsn and
> > 
> > timeline from the last archived wal
> > 
> >   - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
> > 
> > restart_lsn as the start lsn if there is one, and don't provide a timeline
> > 
> >   - if we still don't have a start_lsn, fallback to using the current
> >   server
> > 
> > wal position as is done.
> 
> That's pretty much it.

Great.

> 
> > What do you think ? Which information should we provide about the slot ?
> 
> We need the timeline id to start with when using restart_lsn.  The
> current timeline can be used in most cases but there's a case where
> the LSN is historical.

Ok, see below.

> 
> pg_receivewal doesn't send a replication status report when a segment
> is finished. So after pg_receivewal stops just after a segment is
> finished, the slot stays at the beginning of the last segment.  Thus
> next time it will start from there, creating a duplicate segment.

I'm not sure I see where the problem is here. If we don't keep the segments in 
pg_walreceiver target directory, then it would be the responsibility of 
whoever moved them to make sure we don't have duplicates, or to handle them 
gracefully. 

Even if we were forcing a feedback after a segment is finished, there could 
still be a problem if the feedback never made it to the server but the segment 
was here. It might be interesting to send a feedback anyway.

Please find attached two patches implementing what we've been discussing.

Patch 0001 adds the new READ_REPLICATION_SLOT command. 
It returns for a given slot the type, restart_lsn, flush_lsn, 
restart_lsn_timeline and flush_lsn_timeline.
The timelines are determined by reading the current timeline history, and 
finding the timeline where we may find the record. I didn't find explicit test 
for eg IDENTIFY_SYSTEM so didn't write one either for this new command, but it 
is tested indirectly in patch 0002.

Patch 0002 makes pg_receivewal use that command if we use a replication slot 
and the command is available, and use the restart_lsn and restart_lsn_timeline 
as a starting point. It also adds a small test to check that we start back 
from the previous restart_lsn instead of the current flush position when our 
destination directory does not contain any WAL file. 

I also noticed we don't test following a timeline switch. It would probably be 
good to add that, both for the case where we determine the previous timeline 
from the archived segments and when it comes from the new command. What do you 
think ?


Regards,

-- 
Ronan Dunklau>From 37d9545c05b9e36aafac751f9dc549e75798413c Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v1 1/2] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT  command.
This command is used to read information about a replication slot when
using a physical replication connection.

In this first version it returns the slot type, restart_lsn, flush_lsn and
the timeline of the restart_lsn and flush_lsn, which are obtained by following the
current timeline history.
---
 doc/src/sgml/protocol.sgml |  62 +++
 src/backend/replication/repl_gram.y|  18 -
 src/backend/replication/repl_scanner.l |   1 +
 src/backend/replication/walsender.c| 106 +
 src/

Re: pg_receivewal starting position

2021-07-28 Thread Kyotaro Horiguchi
At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau  
wrote in 
> Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
> > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau 
> > wrote in
> > > I don't know if it should be the default, toggled by a command line flag,
> > > or if we even should let the user provide a LSN.
> > 
> > *I* think it is completely reasonable (or at least convenient or less
> > astonishing) that pg_receivewal starts from the restart_lsn of the
> > replication slot to use.  The tool already decides the clean-start LSN
> > a bit unusual way. And it seems to me that proposed behavior can be
> > the default when -S is specified.
> > 
> 
> As of now we can't get the replication_slot restart_lsn with a replication 
> connection AFAIK.
> 
> This implies that the patch could require the user to specify a 
> maintenance-db 
> parameter, and we would use that if provided to fetch the replication slot 
> info, or fallback to the previous behaviour. I don't really like this 
> approach 
> as the behaviour changing wether we supply a maintenance-db parameter or not 
> is error-prone for the user.
>
> Another option would be to add a new replication command (for example 
> ACQUIRE_REPLICATION_SLOT ) to set the replication slot as the 
> current one, and return some info about it (restart_lsn at least for a 
> physical slot).

I didn't thought in details.  But I forgot that ordinary SQL commands
have been prohibited in physical replication connection.  So we need a
new replication command but it's not that a big deal.

> I don't see any reason not to make it work for logical replication 
> connections 
> / slots, but it wouldn't be that useful since we can query the database in 
> that case.

Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.

> Acquiring the replication slot instead of just reading it would make sure 
> that 
> no other process could start the replication between the time we read the 
> restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then 
> check if we already have a replication slot, and ensure it is the same one as 
> the one we're trying to use. 

I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle.  One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed.  But that
case doesn't seem worse than the case someone advances the slot while
absence.  So I think READ_REPLICATION_SLOT is sufficient.

> From pg_receivewal point of view, this would amount to:
> 
>  - check if we currently have wal in the target directory.
>- if we do, proceed as currently done, by computing the start lsn and 
> timeline from the last archived wal
>   - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the 
> restart_lsn as the start lsn if there is one, and don't provide a timeline
>   - if we still don't have a start_lsn, fallback to using the current server 
> wal position as is done. 

That's pretty much it.

> What do you think ? Which information should we provide about the slot ?

We need the timeline id to start with when using restart_lsn.  The
current timeline can be used in most cases but there's a case where
the LSN is historical.

pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment.  Thus
next time it will start from there, creating a duplicate segment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_receivewal starting position

2021-07-28 Thread Ronan Dunklau
Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
> At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau 
> wrote in
> > Hello,
> > 
> > I've notived that pg_receivewal logic for deciding which LSN to start
> > 
> > streaming at consists of:
> >   - looking up the latest WAL file in our destination folder, and resume
> >   from
> > 
> > here
> > 
> >   - if there isn't, use the current flush location instead.
> > 
> > This behaviour surprised me when using it with a replication slot: I was
> > expecting it to start streaming at the last flushed location from the
> > replication slot instead. If you consider a backup tool which will take
> > pg_receivewal's output and transfer it somewhere else, using the
> > replication slot position would be the easiest way to ensure we don't
> > miss WAL files.
> > 
> > Does that make sense ?
> > 
> > I don't know if it should be the default, toggled by a command line flag,
> > or if we even should let the user provide a LSN.
> 
> *I* think it is completely reasonable (or at least convenient or less
> astonishing) that pg_receivewal starts from the restart_lsn of the
> replication slot to use.  The tool already decides the clean-start LSN
> a bit unusual way. And it seems to me that proposed behavior can be
> the default when -S is specified.
> 

As of now we can't get the replication_slot restart_lsn with a replication 
connection AFAIK.

This implies that the patch could require the user to specify a maintenance-db 
parameter, and we would use that if provided to fetch the replication slot 
info, or fallback to the previous behaviour. I don't really like this approach 
as the behaviour changing wether we supply a maintenance-db parameter or not 
is error-prone for the user.

Another option would be to add a new replication command (for example 
ACQUIRE_REPLICATION_SLOT ) to set the replication slot as the 
current one, and return some info about it (restart_lsn at least for a 
physical slot).

I don't see any reason not to make it work for logical replication connections 
/ slots, but it wouldn't be that useful since we can query the database in 
that case.

Acquiring the replication slot instead of just reading it would make sure that 
no other process could start the replication between the time we read the 
restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then 
check if we already have a replication slot, and ensure it is the same one as 
the one we're trying to use. 

From pg_receivewal point of view, this would amount to:

 - check if we currently have wal in the target directory.
   - if we do, proceed as currently done, by computing the start lsn and 
timeline from the last archived wal
  - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the 
restart_lsn as the start lsn if there is one, and don't provide a timeline
  - if we still don't have a start_lsn, fallback to using the current server 
wal position as is done. 

What do you think ? Which information should we provide about the slot ?


-- 
Ronan Dunklau






Re: pg_receivewal starting position

2021-07-27 Thread Kyotaro Horiguchi
At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau  
wrote in 
> Hello,
> 
> I've notived that pg_receivewal logic for deciding which LSN to start 
> streaming at consists of:
>   - looking up the latest WAL file in our destination folder, and resume from 
> here
>   - if there isn't, use the current flush location instead.
> 
> This behaviour surprised me when using it with a replication slot: I was 
> expecting it to start streaming at the last flushed location from the 
> replication slot instead. If you consider a backup tool which will take 
> pg_receivewal's output and transfer it somewhere else, using the replication 
> slot position would be the easiest way to ensure we don't miss WAL files.
> 
> Does that make sense ? 
> 
> I don't know if it should be the default, toggled by a command line flag, or 
> if 
> we even should let the user provide a LSN.

*I* think it is completely reasonable (or at least convenient or less
astonishing) that pg_receivewal starts from the restart_lsn of the
replication slot to use.  The tool already decides the clean-start LSN
a bit unusual way. And it seems to me that proposed behavior can be
the default when -S is specified.

> I'd be happy to implement any of that if we agree.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pg_receivewal starting position

2021-07-26 Thread Ronan Dunklau
Hello,

I've notived that pg_receivewal logic for deciding which LSN to start 
streaming at consists of:
  - looking up the latest WAL file in our destination folder, and resume from 
here
  - if there isn't, use the current flush location instead.

This behaviour surprised me when using it with a replication slot: I was 
expecting it to start streaming at the last flushed location from the 
replication slot instead. If you consider a backup tool which will take 
pg_receivewal's output and transfer it somewhere else, using the replication 
slot position would be the easiest way to ensure we don't miss WAL files.

Does that make sense ? 

I don't know if it should be the default, toggled by a command line flag, or if 
we even should let the user provide a LSN.

I'd be happy to implement any of that if we agree.

-- 
Ronan Dunklau