Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-02-13 Thread Drouvot, Bertrand

Hi,

On 2/13/23 11:58 AM, Alvaro Herrera wrote:

On 2023-Jan-26, Drouvot, Bertrand wrote:


Hi,

On 1/26/23 10:42 AM, Alvaro Herrera wrote:

On 2023-Jan-26, Drouvot, Bertrand wrote:


On 1/24/23 7:27 PM, Alvaro Herrera wrote:



1. I don't think wait_for_write_catchup is necessary, because
calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
would already do the same thing.


Having a closer look, it does not seem to be the case. The default mode
in wait_for_catchup() is 'replay' and the default mode for the lsn is 'write'.

But in wait_for_write_catchup() we are making use of 'write' for both.


But that turns
  $node->wait_for_catchup('foobar', 'write')
into
  $node->wait_for_write_catchup('foobar');
so I don't see much value in it.


Agree.


 Also, the patch series from which this
patch spawned in the first place doesn't wait for write AFAICS.



Right, it does wait for replay only.


After adding some more POD docs for it, I pushed the one for replay.



Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-02-13 Thread Alvaro Herrera
On 2023-Jan-26, Drouvot, Bertrand wrote:

> Hi,
> 
> On 1/26/23 10:42 AM, Alvaro Herrera wrote:
> > On 2023-Jan-26, Drouvot, Bertrand wrote:
> > 
> > > On 1/24/23 7:27 PM, Alvaro Herrera wrote:
> > 
> > > > 1. I don't think wait_for_write_catchup is necessary, because
> > > > calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
> > > > would already do the same thing.
> 
> Having a closer look, it does not seem to be the case. The default mode
> in wait_for_catchup() is 'replay' and the default mode for the lsn is 'write'.
> 
> But in wait_for_write_catchup() we are making use of 'write' for both.

But that turns
 $node->wait_for_catchup('foobar', 'write')
into
 $node->wait_for_write_catchup('foobar');
so I don't see much value in it.  Also, the patch series from which this
patch spawned in the first place doesn't wait for write AFAICS.

After adding some more POD docs for it, I pushed the one for replay.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
¡Ay, ay, ay!  Con lo mucho que yo lo quería (bis)
se fue de mi vera ... se fue para siempre, pa toíta ... pa toíta la vida
¡Ay Camarón! ¡Ay Camarón!(Paco de Lucía)




Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-26 Thread Drouvot, Bertrand

Hi,

On 1/26/23 10:42 AM, Alvaro Herrera wrote:

On 2023-Jan-26, Drouvot, Bertrand wrote:


On 1/24/23 7:27 PM, Alvaro Herrera wrote:



1. I don't think wait_for_write_catchup is necessary, because
calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
would already do the same thing. 


Having a closer look, it does not seem to be the case. The default mode
in wait_for_catchup() is 'replay' and the default mode for the lsn is 'write'.

But in wait_for_write_catchup() we are making use of 'write' for both.




2. Because wait_for_replay_catchup is an instance method, passing the
second node as argument is needlessly noisy, because that's already
known as $self.  So we can just say

$primary_node->wait_for_replay_catchup($standby_node);


Yeah, but same here, there is places where the node passed as the second argument is not 
the "$self":

src/bin/pg_rewind/t/007_standby_source.pl:$node_b->wait_for_replay_catchup('node_c',
 $node_a);
src/test/recovery/t/001_stream_rep.pl:$node_standby_1->wait_for_replay_catchup($node_standby_2,
 $node_primary);
src/test/recovery/t/001_stream_rep.pl:$node_standby_1->wait_for_replay_catchup($node_standby_2,
 $node_primary);
src/test/recovery/t/001_stream_rep.pl:  
$node_standby_1->wait_for_replay_catchup($node_standby_2, $node_primary);

So it looks like there is still a need for wait_for_replay_catchup() with 2 
parameters.


Ah, cascading replication.  In that case, let's make the second
parameter optional.  If it's not given, $self is used.



Good point, done in V3 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/bin/pg_rewind/t/007_standby_source.pl 
b/src/bin/pg_rewind/t/007_standby_source.pl
index 52644c2c0d..7236a3b177 100644
--- a/src/bin/pg_rewind/t/007_standby_source.pl
+++ b/src/bin/pg_rewind/t/007_standby_source.pl
@@ -74,9 +74,8 @@ $node_a->safe_psql('postgres',
"INSERT INTO tbl1 values ('in A, before promotion')");
 $node_a->safe_psql('postgres', 'CHECKPOINT');
 
-my $lsn = $node_a->lsn('write');
-$node_a->wait_for_catchup('node_b', 'write', $lsn);
-$node_b->wait_for_catchup('node_c', 'write', $lsn);
+$node_a->wait_for_write_catchup('node_b', $node_a);
+$node_b->wait_for_write_catchup('node_c', $node_a);
 
 # Promote C
 #
@@ -160,7 +159,7 @@ in A, after C was promoted
 $node_a->safe_psql('postgres',
"INSERT INTO tbl1 values ('in A, after rewind')");
 
-$node_b->wait_for_catchup('node_c', 'replay', $node_a->lsn('write'));
+$node_b->wait_for_replay_catchup('node_c', $node_a);
 
 check_query(
'SELECT * FROM tbl1',
diff --git a/src/test/modules/brin/t/02_wal_consistency.pl 
b/src/test/modules/brin/t/02_wal_consistency.pl
index 5983ef208e..8b2b244feb 100644
--- a/src/test/modules/brin/t/02_wal_consistency.pl
+++ b/src/test/modules/brin/t/02_wal_consistency.pl
@@ -70,6 +70,6 @@ my ($ret, $out, $err) = $whiskey->psql(
});
 cmp_ok($out, '>=', 1);
 
-$whiskey->wait_for_catchup($charlie, 'replay', $whiskey->lsn('insert'));
+$whiskey->wait_for_replay_catchup($charlie);
 
 done_testing();
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 04921ca3a3..3ba1545688 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2709,6 +2709,45 @@ sub wait_for_catchup
return;
 }
 
+# Now defining helper functions wait_for_replay_catchup() and
+# wait_for_write_catchup().
+# Please note that wait_for_flush_catchup() and wait_for_sent_catchup() are not
+# defined as: 1) they are not used yet and 2) it lets their author (if any)
+# decide the node->lsn() mode to be used.
+
+=pod
+
+=item $node->wait_for_replay_catchup(standby_name, node)
+
+Helper function for wait_for_catchup when waiting for the replay_lsn column
+to catchup.
+
+=cut
+
+sub wait_for_replay_catchup
+{
+   my ($self, $standby_name, $node) = @_;
+   $node = defined($node) ? $node : $self;
+
+   $self->wait_for_catchup($standby_name, 'replay', $node->lsn('flush'));
+}
+
+=pod
+
+=item $node->wait_for_write_catchup(standby_name, node)
+
+Helper function for wait_for_catchup when waiting for the write_lsn column
+to catchup.
+
+=cut
+
+sub wait_for_write_catchup
+{
+   my ($self, $standby_name, $node) = @_;
+
+   $self->wait_for_catchup($standby_name, 'write', $node->lsn('write'));
+}
+
 =pod
 
 =item $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
diff --git a/src/test/recovery/t/001_stream_rep.pl 
b/src/test/recovery/t/001_stream_rep.pl
index 23a90dd85b..76846905a7 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -47,9 +47,8 @@ $node_primary->safe_psql('postgres',
"CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
 
 # Wait for standbys to catch up
-my $primary_lsn = $node_primary->lsn('write');
-$node_primary->wait_for_catchup($node_standby_1, 'replay', 

Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-26 Thread Alvaro Herrera
On 2023-Jan-26, Drouvot, Bertrand wrote:

> On 1/24/23 7:27 PM, Alvaro Herrera wrote:

> > 1. I don't think wait_for_write_catchup is necessary, because
> > calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
> > would already do the same thing.  So what we should do is patch places
> > that currently give those two arguments, so that they don't.
> 
> Agree but there is one place where the node passed as the second argument is 
> not the "$self":
> 
> src/bin/pg_rewind/t/007_standby_source.pl:$node_b->wait_for_write_catchup('node_c',
>  $node_a);
> 
> So it looks like there is still a need for wait_for_write_catchup().

Hmm, I think that one can use the more general wait_for_catchup.


> > 2. Because wait_for_replay_catchup is an instance method, passing the
> > second node as argument is needlessly noisy, because that's already
> > known as $self.  So we can just say
> > 
> >$primary_node->wait_for_replay_catchup($standby_node);
> 
> Yeah, but same here, there is places where the node passed as the second 
> argument is not the "$self":
> 
> src/bin/pg_rewind/t/007_standby_source.pl:$node_b->wait_for_replay_catchup('node_c',
>  $node_a);
> src/test/recovery/t/001_stream_rep.pl:$node_standby_1->wait_for_replay_catchup($node_standby_2,
>  $node_primary);
> src/test/recovery/t/001_stream_rep.pl:$node_standby_1->wait_for_replay_catchup($node_standby_2,
>  $node_primary);
> src/test/recovery/t/001_stream_rep.pl:  
> $node_standby_1->wait_for_replay_catchup($node_standby_2, $node_primary);
> 
> So it looks like there is still a need for wait_for_replay_catchup() with 2 
> parameters.

Ah, cascading replication.  In that case, let's make the second
parameter optional.  If it's not given, $self is used.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)




Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-26 Thread Drouvot, Bertrand

Hi,

On 1/24/23 7:27 PM, Alvaro Herrera wrote:

Looking again, I have two thoughts for making things easier:

1. I don't think wait_for_write_catchup is necessary, because
calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
would already do the same thing.  So what we should do is patch places
that currently give those two arguments, so that they don't.



Agree but there is one place where the node passed as the second argument is not the 
"$self":

src/bin/pg_rewind/t/007_standby_source.pl:$node_b->wait_for_write_catchup('node_c',
 $node_a);

So it looks like there is still a need for wait_for_write_catchup().


2. Because wait_for_replay_catchup is an instance method, passing the
second node as argument is needlessly noisy, because that's already
known as $self.  So we can just say

   $primary_node->wait_for_replay_catchup($standby_node);



Yeah, but same here, there is places where the node passed as the second argument is not 
the "$self":

src/bin/pg_rewind/t/007_standby_source.pl:$node_b->wait_for_replay_catchup('node_c',
 $node_a);
src/test/recovery/t/001_stream_rep.pl:$node_standby_1->wait_for_replay_catchup($node_standby_2,
 $node_primary);
src/test/recovery/t/001_stream_rep.pl:$node_standby_1->wait_for_replay_catchup($node_standby_2,
 $node_primary);
src/test/recovery/t/001_stream_rep.pl:  
$node_standby_1->wait_for_replay_catchup($node_standby_2, $node_primary);

So it looks like there is still a need for wait_for_replay_catchup() with 2 
parameters.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-24 Thread Alvaro Herrera
Looking again, I have two thoughts for making things easier:

1. I don't think wait_for_write_catchup is necessary, because
calling wait_for_catchup() and omitting the 'mode' and 'lsn' arguments
would already do the same thing.  So what we should do is patch places
that currently give those two arguments, so that they don't.

2. Because wait_for_replay_catchup is an instance method, passing the
second node as argument is needlessly noisy, because that's already
known as $self.  So we can just say

  $primary_node->wait_for_replay_catchup($standby_node);

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-19 Thread Drouvot, Bertrand

Hi,

On 1/18/23 10:59 AM, Alvaro Herrera wrote:

On 2023-Jan-18, Drouvot, Bertrand wrote:


The current calls are done that way:

wait_for_replay_catchup called:
- 8 times with write LSN as an argument
- 1 time with insert LSN as an argument
- 16 times with flush LSN as an argument



So it looks like that providing a node as a second argument
would not help for the wait_for_replay_catchup() case.


... unless we changed the calls that wait for reply that use write or
insert so that they use flush instead.  


That's a good idea, thanks! Please find attached V2 doing so.


Surely everything should still
work, right?  


Right.


Flushing would still occur, either right after the write
(as the transaction commits) or ~200ms afterwards when WAL writer
catches up to that point.

I suppose this may fail to be true if there is some test that is
specifically testing whether writing WAL without flushing works, which
should rare enough, but if it does exist, 


I don't see this kind of test.

Please note that V2 does not contain wait_for_flush_catchup() and
wait_for_sent_catchup() anymore as: 1) they are not used yet
and 2) it lets to their author (if any) decide the node->lsn() mode to be used.

This is also mentioned as a comment in V2.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/bin/pg_rewind/t/007_standby_source.pl 
b/src/bin/pg_rewind/t/007_standby_source.pl
index 52644c2c0d..7236a3b177 100644
--- a/src/bin/pg_rewind/t/007_standby_source.pl
+++ b/src/bin/pg_rewind/t/007_standby_source.pl
@@ -74,9 +74,8 @@ $node_a->safe_psql('postgres',
"INSERT INTO tbl1 values ('in A, before promotion')");
 $node_a->safe_psql('postgres', 'CHECKPOINT');
 
-my $lsn = $node_a->lsn('write');
-$node_a->wait_for_catchup('node_b', 'write', $lsn);
-$node_b->wait_for_catchup('node_c', 'write', $lsn);
+$node_a->wait_for_write_catchup('node_b', $node_a);
+$node_b->wait_for_write_catchup('node_c', $node_a);
 
 # Promote C
 #
@@ -160,7 +159,7 @@ in A, after C was promoted
 $node_a->safe_psql('postgres',
"INSERT INTO tbl1 values ('in A, after rewind')");
 
-$node_b->wait_for_catchup('node_c', 'replay', $node_a->lsn('write'));
+$node_b->wait_for_replay_catchup('node_c', $node_a);
 
 check_query(
'SELECT * FROM tbl1',
diff --git a/src/test/modules/brin/t/02_wal_consistency.pl 
b/src/test/modules/brin/t/02_wal_consistency.pl
index 5983ef208e..7211ba8d8d 100644
--- a/src/test/modules/brin/t/02_wal_consistency.pl
+++ b/src/test/modules/brin/t/02_wal_consistency.pl
@@ -70,6 +70,6 @@ my ($ret, $out, $err) = $whiskey->psql(
});
 cmp_ok($out, '>=', 1);
 
-$whiskey->wait_for_catchup($charlie, 'replay', $whiskey->lsn('insert'));
+$whiskey->wait_for_replay_catchup($charlie, $whiskey);
 
 done_testing();
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 04921ca3a3..cea9796c0c 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2709,6 +2709,44 @@ sub wait_for_catchup
return;
 }
 
+# Now defining helper functions wait_for_replay_catchup() and
+# wait_for_write_catchup().
+# Please note that wait_for_flush_catchup() and wait_for_sent_catchup() are not
+# defined as: 1) they are not used yet and 2) it lets their author (if any)
+# decide the node->lsn() mode to be used.
+
+=pod
+
+=item $node->wait_for_replay_catchup(standby_name, node)
+
+Helper function for wait_for_catchup when waiting for the replay_lsn column
+to catchup.
+
+=cut
+
+sub wait_for_replay_catchup
+{
+   my ($self, $standby_name, $node) = @_;
+
+   $self->wait_for_catchup($standby_name, 'replay', $node->lsn('flush'));
+}
+
+=pod
+
+=item $node->wait_for_write_catchup(standby_name, node)
+
+Helper function for wait_for_catchup when waiting for the write_lsn column
+to catchup.
+
+=cut
+
+sub wait_for_write_catchup
+{
+   my ($self, $standby_name, $node) = @_;
+
+   $self->wait_for_catchup($standby_name, 'write', $node->lsn('write'));
+}
+
 =pod
 
 =item $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
diff --git a/src/test/recovery/t/001_stream_rep.pl 
b/src/test/recovery/t/001_stream_rep.pl
index 23a90dd85b..5b15a20d54 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -47,9 +47,8 @@ $node_primary->safe_psql('postgres',
"CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
 
 # Wait for standbys to catch up
-my $primary_lsn = $node_primary->lsn('write');
-$node_primary->wait_for_catchup($node_standby_1, 'replay', $primary_lsn);
-$node_standby_1->wait_for_catchup($node_standby_2, 'replay', $primary_lsn);
+$node_primary->wait_for_replay_catchup($node_standby_1, $node_primary);
+$node_standby_1->wait_for_replay_catchup($node_standby_2, $node_primary);
 
 my $result =
   $node_standby_1->safe_psql('postgres', "SELECT count(*) FROM tab_int");
@@ -66,9 +65,8 

Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-18 Thread Alvaro Herrera
On 2023-Jan-18, Drouvot, Bertrand wrote:

> The current calls are done that way:
> 
> wait_for_replay_catchup called:
> - 8 times with write LSN as an argument
> - 1 time with insert LSN as an argument
> - 16 times with flush LSN as an argument

> So it looks like that providing a node as a second argument
> would not help for the wait_for_replay_catchup() case.

... unless we changed the calls that wait for reply that use write or
insert so that they use flush instead.  Surely everything should still
work, right?  Flushing would still occur, either right after the write
(as the transaction commits) or ~200ms afterwards when WAL writer
catches up to that point.

I suppose this may fail to be true if there is some test that is
specifically testing whether writing WAL without flushing works, which
should rare enough, but if it does exist, in that one place we can use
the underlying wait_for_catchup().

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-17 Thread Drouvot, Bertrand

Hi,

On 1/17/23 12:23 PM, Alvaro Herrera wrote:

On 2023-Jan-17, Drouvot, Bertrand wrote:


The idea has been raised in [1], where we are adding more calls to
wait_for_catchup() in 'replay' mode.


This seems mostly useless as presented.  Maybe if you're able to reduce
the noise on the second argument it would be worth something -- namely,
if the wrapper function receives a node instead of an LSN: perhaps
wait_for_replay_catchup() would use the flush LSN from the given node,
wait_for_write_catchup() would use the write LSN, and
wait_for_sent_catchup() would use the insert LSN.  (I didn't check in
your patch if there are callsites that do something else).  This would
in several cases let you also remove the line with the assignment of
appropriate LSN to a separate variable.  If you did it that way, maybe
the code would become a tiny bit smaller overall.



Thanks for looking at it!

The current calls are done that way:

wait_for_replay_catchup called:
- 8 times with write LSN as an argument
- 1 time with insert LSN as an argument
- 16 times with flush LSN as an argument

wait_for_write_catchup called:
- 5 times with write LSN as an argument

So it looks like that providing a node as a second argument
would not help for the wait_for_replay_catchup() case.

Worth to use the node as an argument for wait_for_write_catchup()? (though it 
would be
weird to have different types of arguments between wait_for_replay_catchup() 
and wait_for_write_catchup()).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Helper functions for wait_for_catchup() in Cluster.pm

2023-01-17 Thread Alvaro Herrera
On 2023-Jan-17, Drouvot, Bertrand wrote:

> The idea has been raised in [1], where we are adding more calls to
> wait_for_catchup() in 'replay' mode.

This seems mostly useless as presented.  Maybe if you're able to reduce
the noise on the second argument it would be worth something -- namely,
if the wrapper function receives a node instead of an LSN: perhaps
wait_for_replay_catchup() would use the flush LSN from the given node,
wait_for_write_catchup() would use the write LSN, and
wait_for_sent_catchup() would use the insert LSN.  (I didn't check in
your patch if there are callsites that do something else).  This would
in several cases let you also remove the line with the assignment of
appropriate LSN to a separate variable.  If you did it that way, maybe
the code would become a tiny bit smaller overall.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/