RE: speed up a logical replica setup

2024-06-07 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

Thanks for making the follow-up patch! I was looking forward to your updates.
I think this patch set is the solution for the found buildfarm error. However,
there are remained claims raised by others. You should reply what you think for
them. At least:

1) There are some misleading messages [1]. I think v3-0005 patch set can solve
   the issue.
2) pg_createsubscriber may fail If the primary has subscriptions [2]. IIUC
   possible approaches are A)"keep subscriptions disabled at the end",
   B)"by default drop the pre-existing subscriptions",
   C) "do nothing, just document the risk".


> Before sending this email I realized that I did nothing about physical
> replication slots on the standby. I think we should also remove them too
> unconditionally.

I also considered around here, but it might be difficult to predict the
expectation by users. Can we surely say that it's not intentional one? Regarding
the failover slot, it is OK because that's meaningful only on the standby,
but not sure other slots. I personally think we can keep current spec, but
how do other think? 


Below parts are comments for each patches.

0001
Basically LGTM. I was bit confused because the default timeout is not set, but
it seemed to follow the suggestion by Tomas [3].

0002
If you want to improve the commit message, please add that 
sync_replication_slots
is disabled during the conversion.

0003
Confirmed it followed the discussion [4].

0004
Basically LGTM.

Other minor comments are included by the attached diff file. It contains changes
to follow conventions and pgindent/pgperltidy.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1J2fAvsJ2HihbWJ_GxETd6sdqSMrZdCVJEutRZRpm1MEQ%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/CANhcyEWvimA1-f6hSrA%3D9qkfR5SonFb56b36M%2B%2BvT%3DLiFj%3D76g%40mail.gmail.com
[3]: 
https://www.postgresql.org/message-id/5d5dd4cd-6359-4109-88e8-c8e13035ae16%40enterprisedb.com
[4]: 
https://www.postgresql.org/message-id/CAA4eK1LZxYxcbeiOn3Q5hjXVtZKhJWj-fQtndAeTCvZrPev8BA%40mail.gmail.com


Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 



minor_fix_by_kuroda.diff
Description: minor_fix_by_kuroda.diff


Re: speed up a logical replica setup

2024-06-06 Thread Euler Taveira
On Wed, May 22, 2024, at 12:16 PM, Euler Taveira wrote:
> I'll summarize all issues as soon as I finish the review of sync slot 
> support. I
> think we should avoid new development if we judge that the item can be
> documented as a limitation for this version. Nevertheless, I will share 
> patches
> so you can give your opinion on whether it is an open item or new development.

Here it is a patch series to fix the issues reported in recent discussions. The
patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 removes
synchronized failover slots on subscriber since it has no use. I also included
an optional patch 0004 that improves the usability by checking both servers if
it already failed in any subscriber check.

As I said in this previous email I decided to remove the logic that reacts for
an issue on primary. We can reintroduce another code later if/when we have a
better way to check the recovery progress. It will rely on the recovery_timeout
and it adds recovery_timeout equals to PG_TEST_TIMEOUT_DEFAULT to let the
animals control how long it should wait for the recovery.

Since some animals reported some issues in the check_publisher routine that
checks if the primary_slot_name is in use on primary, this logic was removed
too (patch 0003). We could introduce a way to keep trying this test but the
conclusion is that it is not worth it and if the primary_slot_name does not
exist (due to a setup error), pg_createsubscriber will log an error message and
continue.

The 0002 removes any failover slots that remains on subscriber. Talking about
terminology, I noticed that slotsync.c uses "logical failover slots" and
"failover logical slots", I think the latter sounds better but I'm not a native
speaker. I also don't know if we use a short terminology like "failover slots"
"failover replication slots" or "failover logical replication slots". IMO we
can omit "logical" because "failover" infers it is a logical replication slot.
I'm also not sure about omitting "replication". It is not descriptive enough. I
prefer "failover replication slots".

Before sending this email I realized that I did nothing about physical
replication slots on the standby. I think we should also remove them too
unconditionally.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 5d8b4781e6e9bcb00564f45c25e575a8abab6ae8 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 21 May 2024 23:04:57 -0300
Subject: [PATCH 1/4] Only the recovery_timeout controls the end of recovery
 process

It used to check if the target server is connected to the primary server
(send required WAL) to rapidly react when the process won't succeed.
This code is not enough to guarantee that the recovery process will
complete. There is a window between the walreceiver shutdown and the
pg_is_in_recovery() returns false that can reach NUM_CONN_ATTEMPTS
attempts and fails.
Instead, rely only on recovery_timeout option to give up the process
after the specified number of seconds.
---
 doc/src/sgml/ref/pg_createsubscriber.sgml |  7 -
 src/bin/pg_basebackup/pg_createsubscriber.c   | 29 ++-
 .../t/040_pg_createsubscriber.pl  |  2 ++
 3 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index 142b02..a700697f88 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -325,13 +325,6 @@ PostgreSQL documentation
 connections to the target server should fail.

 
-   
-During the recovery process, if the target server disconnects from the
-source server, pg_createsubscriber will check a
-few times if the connection has been reestablished to stream the required
-WAL.  After a few attempts, it terminates with an error.
-   
-

 Since DDL commands are not replicated by logical replication, avoid
 executing DDL commands that change the database schema while running
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index 90cc580811..f62f34b1a7 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -1360,6 +1360,9 @@ stop_standby_server(const char *datadir)
  *
  * If recovery_timeout option is set, terminate abnormally without finishing
  * the recovery process. By default, it waits forever.
+ *
+ * XXX Is the recovery process still in progress? When recovery process has a
+ * better progress reporting mechanism, it should be added here.
  */
 static void
 wait_for_end_recovery(const char *conninfo, const struct CreateSubscriberOptions *opt)
@@ -1367,9 +1370,6 @@ wait_for_end_recovery(const char *conninfo, const struct CreateSubscriberOptions
 	PGconn	   *conn;
 	int			status = POSTMASTER_STILL_STARTING;
 	int			timer = 0;
-	int			count = 0;		/* number of consecutive connection attempts */
-
-#define NUM_CONN_ATTEMPTS	10
 
 	

Re: speed up a logical replica setup

2024-05-24 Thread Shlok Kyal
On Wed, 22 May 2024 at 16:50, Amit Kapila  wrote:
>
> On Mon, May 20, 2024 at 4:30 PM Shlok Kyal  wrote:
> > >
> > > I was trying to test this utility when 'sync_replication_slots' is on
> > > and it gets in an ERROR loop [1] and never finishes. Please find the
> > > postgresql.auto used on the standby attached. I think if the standby
> > > has enabled sync_slots, you need to pass dbname in
> > > GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> > > there are already synced slots on the standby (either due to
> > > 'sync_replication_slots' or users have used
> > > pg_sync_replication_slots() before invoking pg_createsubscriber),
> > > those would be retained as it is on new subscriber and lead to
> > > unnecessary WAL retention and dead rows.
> > >
> > > [1]
> > > 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> > > 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> > > requires dbname to be specified in primary_conninfo
> >
> > Hi,
> >
> > I tested the scenario posted by Amit in [1], in which retaining synced
> > slots lead to unnecessary WAL retention and ERROR. This is raised as
> > the second open point in [2].
> > The steps to reproduce the issue:
> > (1) Setup physical replication with sync slot feature turned on by
> > setting sync_replication_slots = 'true' or using
> > pg_sync_replication_slots() on the standby node.
> > For physical replication setup, run pg_basebackup with -R  and -d option.
> > (2) Create a logical replication slot on primary node with failover
> > option as true. A corresponding slot is created on standby as part of
> > sync slot feature.
> > (3) Run pg_createsubscriber on standby node.
> > (4) On Checking for the replication slot on standby node, I noticed
> > that the logical slots created in step 2 are retained.
> >  I have attached the script to reproduce the issue.
> >
> > I and Kuroda-san worked to resolve open points. Here are patches to
> > solve the second and third point in [2].
> > Patches proposed by Euler are also attached just in case, but they
> > were not modified.
> >
> > v2-0001: not changed
> >
>
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.
>

I have made changes as per your suggestion. I have used
pg_last_wal_receive_lsn to get lsn last wal received on standby and
check if consistent_lsn is already present on the standby.
I have only made changes in v3-0001. v3-0002, v3-0003, v3-0004 and
v3-0005 are not modified.

Thanks and Regards,
Shlok Kyal


v3-0005-Avoid-outputing-some-messages-in-dry_run-mode.patch
Description: Binary data


v3-0003-Disable-slot-sync-during-the-convertion.patch
Description: Binary data


v3-0001-Improve-the-code-that-checks-if-the-recovery-is-f.patch
Description: Binary data


v3-0004-Drop-replication-slots-which-had-been-synchronize.patch
Description: Binary data


v3-0002-Improve-the-code-that-checks-if-the-primary-slot-.patch
Description: Binary data


Re: speed up a logical replica setup

2024-05-23 Thread Amit Kapila
On Thu, May 23, 2024 at 8:43 PM Euler Taveira  wrote:
>
> On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote:
>
>
> Why in the first place do we need to ensure that primary_slot_name is
> active on the primary? You mentioned something related to WAL
> retention but I don't know how that is related to this tool's
> functionality. If at all, we are bothered about WAL retention on the
> primary that should be the WAL corresponding to consistent_lsn
> computed by setup_publisher() but this check doesn't seem to ensure
> that.
>
> Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and
> make sure the replication slot is used by the physical replication. I mean if
> there is primary_slot_name = 'foo' on standby, there is no guarantee that the
> replication slot 'foo' exists on primary. The idea is to get the exact
> replication slot name used by physical replication to drop it. Once I posted a
> patch it should be clear. (Another idea is to relax this check and rely only 
> on
> primary_slot_name to drop this replication slot on primary. The replication 
> slot
> might not exist and it shouldn't return an error in this case.)
>

I think your other idea is better than what we are doing currently.
Let's ignore the ERROR even if the primary_slot_name doesn't exist on
the primary.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-05-23 Thread Euler Taveira
On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote:
> On Wed, May 22, 2024 at 8:46 PM Euler Taveira  wrote:
> >
> > Following the same line that simplifies the code, we can: (a) add a loop in
> > check_subscriber() that waits until walreceiver is available on subscriber 
> > or
> > (b) use a timeout. The main advantage of (a) is that the primary slot is 
> > already
> > available but I'm afraid we need a escape mechanism for the loop (timeout?).
> >
> 
> Sorry, it is not clear to me why we need any additional loop in
> check_subscriber(), aren't we speaking about the problem in
> check_publisher() function?

The idea is to use check_subscriber() to check pg_stat_walreceiver. Once this
view returns a row and primary_slot_name is set on standby, the referred
replication slot name should be active on primary. Hence, the query on
check_publisher() make sure that the referred replication slot is in use on
primary. 

> Why in the first place do we need to ensure that primary_slot_name is
> active on the primary? You mentioned something related to WAL
> retention but I don't know how that is related to this tool's
> functionality. If at all, we are bothered about WAL retention on the
> primary that should be the WAL corresponding to consistent_lsn
> computed by setup_publisher() but this check doesn't seem to ensure
> that.

Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and
make sure the replication slot is used by the physical replication. I mean if
there is primary_slot_name = 'foo' on standby, there is no guarantee that the
replication slot 'foo' exists on primary. The idea is to get the exact
replication slot name used by physical replication to drop it. Once I posted a
patch it should be clear. (Another idea is to relax this check and rely only on
primary_slot_name to drop this replication slot on primary. The replication slot
might not exist and it shouldn't return an error in this case.)


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-05-23 Thread Amit Kapila
On Wed, May 22, 2024 at 8:46 PM Euler Taveira  wrote:
>
> On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
>
> > v2-0002: not changed
> >
>
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.
>
>
> Following the same line that simplifies the code, we can: (a) add a loop in
> check_subscriber() that waits until walreceiver is available on subscriber or
> (b) use a timeout. The main advantage of (a) is that the primary slot is 
> already
> available but I'm afraid we need a escape mechanism for the loop (timeout?).
>

Sorry, it is not clear to me why we need any additional loop in
check_subscriber(), aren't we speaking about the problem in
check_publisher() function?

Why in the first place do we need to ensure that primary_slot_name is
active on the primary? You mentioned something related to WAL
retention but I don't know how that is related to this tool's
functionality. If at all, we are bothered about WAL retention on the
primary that should be the WAL corresponding to consistent_lsn
computed by setup_publisher() but this check doesn't seem to ensure
that.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-05-22 Thread Euler Taveira
On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
> >
> > v2-0001: not changed
> >
> 
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.

That's exactly what I want to propose as Tomas convinced me offlist that less is
better when we don't have a useful recovery progress reporting mechanism to make
sure it is still working on the recovery and we should wait.

> > v2-0002: not changed
> >
> 
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.

Following the same line that simplifies the code, we can: (a) add a loop in
check_subscriber() that waits until walreceiver is available on subscriber or
(b) use a timeout. The main advantage of (a) is that the primary slot is already
available but I'm afraid we need a escape mechanism for the loop (timeout?).

I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-05-22 Thread Amit Kapila
On Mon, May 20, 2024 at 4:30 PM Shlok Kyal  wrote:
> >
> > I was trying to test this utility when 'sync_replication_slots' is on
> > and it gets in an ERROR loop [1] and never finishes. Please find the
> > postgresql.auto used on the standby attached. I think if the standby
> > has enabled sync_slots, you need to pass dbname in
> > GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> > there are already synced slots on the standby (either due to
> > 'sync_replication_slots' or users have used
> > pg_sync_replication_slots() before invoking pg_createsubscriber),
> > those would be retained as it is on new subscriber and lead to
> > unnecessary WAL retention and dead rows.
> >
> > [1]
> > 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> > 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> > requires dbname to be specified in primary_conninfo
>
> Hi,
>
> I tested the scenario posted by Amit in [1], in which retaining synced
> slots lead to unnecessary WAL retention and ERROR. This is raised as
> the second open point in [2].
> The steps to reproduce the issue:
> (1) Setup physical replication with sync slot feature turned on by
> setting sync_replication_slots = 'true' or using
> pg_sync_replication_slots() on the standby node.
> For physical replication setup, run pg_basebackup with -R  and -d option.
> (2) Create a logical replication slot on primary node with failover
> option as true. A corresponding slot is created on standby as part of
> sync slot feature.
> (3) Run pg_createsubscriber on standby node.
> (4) On Checking for the replication slot on standby node, I noticed
> that the logical slots created in step 2 are retained.
>  I have attached the script to reproduce the issue.
>
> I and Kuroda-san worked to resolve open points. Here are patches to
> solve the second and third point in [2].
> Patches proposed by Euler are also attached just in case, but they
> were not modified.
>
> v2-0001: not changed
>

Shouldn't we modify it as per the suggestion given in the email [1]? I
am wondering if we can entirely get rid of checking the primary
business and simply rely on recovery_timeout and keep checking
server_is_in_recovery(). If so, we can modify the test to use
non-default recovery_timeout (say 180s or something similar if we have
used it at any other place). As an additional check we can ensure that
constent_lsn is present on standby.

> v2-0002: not changed
>

We have added more tries to see if the primary_slot_name becomes
active but I think it is still fragile because it is possible on slow
machines that the required slot didn't become active even after more
retries. I have raised the same comment previously [2] and asked an
additional question but didn't get any response.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JJq_ER6Kq_H%3DjKHR75QPRd8y9_D%3DRtYw%3DaPYKMfqLi9A%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1LT3Z13Dg6p4Z%2B4adO_EY-ow5CmWfikEmBfL%3DeVrm8CPw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-05-20 Thread Shlok Kyal
Hi,
>
> I was trying to test this utility when 'sync_replication_slots' is on
> and it gets in an ERROR loop [1] and never finishes. Please find the
> postgresql.auto used on the standby attached. I think if the standby
> has enabled sync_slots, you need to pass dbname in
> GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> there are already synced slots on the standby (either due to
> 'sync_replication_slots' or users have used
> pg_sync_replication_slots() before invoking pg_createsubscriber),
> those would be retained as it is on new subscriber and lead to
> unnecessary WAL retention and dead rows.
>
> [1]
> 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> requires dbname to be specified in primary_conninfo

Hi,

I tested the scenario posted by Amit in [1], in which retaining synced
slots lead to unnecessary WAL retention and ERROR. This is raised as
the second open point in [2].
The steps to reproduce the issue:
(1) Setup physical replication with sync slot feature turned on by
setting sync_replication_slots = 'true' or using
pg_sync_replication_slots() on the standby node.
For physical replication setup, run pg_basebackup with -R  and -d option.
(2) Create a logical replication slot on primary node with failover
option as true. A corresponding slot is created on standby as part of
sync slot feature.
(3) Run pg_createsubscriber on standby node.
(4) On Checking for the replication slot on standby node, I noticed
that the logical slots created in step 2 are retained.
 I have attached the script to reproduce the issue.

I and Kuroda-san worked to resolve open points. Here are patches to
solve the second and third point in [2].
Patches proposed by Euler are also attached just in case, but they
were not modified.

v2-0001: not changed
v2-0002: not changed
v2-0003: ensures the slot sync is disabled during the conversion. This
resolves the second point.
v2-0004: drops sync slots which may be retained after running. This
resolves the second point.
v2-0005: removes misleading output messages in dry-run. This resolves
the third point.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1KdCb%2B5sjYu6qCMXXdCX1y_ihr8kFzMozq0%3DP%3DauYxgog%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/CAA4eK1J22UEfrqx222h5j9DQ7nxGrTbAa_BC%2B%3DmQXdXs-RCsew%40mail.gmail.com

Thanks and Regards,
Shlok Kyal
# cleanup
rm -rf ../primary ../standby primary.log standby.log

# setup primary node
./initdb -D ../primary
 
cat << EOF >> ../primary/postgresql.conf
wal_level = logical
EOF

./pg_ctl -D ../primary -l primary.log start
./psql -d postgres -c "CREATE table t1 (c1 int);"
./psql -d postgres -c "Insert into t1 values(1);"
./psql -d postgres -c "Insert into t1 values(2);"
./psql -d postgres -c "INSERT into t1 values(3);"
./psql -d postgres -c "INSERT into t1 values(4);"

# setup standby node with sync slot feature
./pg_basebackup -h localhost -X stream -v -W -R -S 'sb1' -C -D ../standby/ -d 'dbname=postgres' 

cat << EOF >> ../standby/postgresql.conf
sync_replication_slots = 'true'
hot_standby_feedback = 'on'
port = 9000
EOF

./pg_ctl -D ../standby -l standby.log start

# create a replication slot on primary with failover option
./psql -d postgres -c "SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot_2', 'test_decoding', false, false, true);"
sleep 2

# check if slots are synced on standby
./psql -d postgres -p 9000 -c "select * from pg_replication_slots;" 
./pg_ctl -D ../standby -l standby.log stop

# run pg_createsubscriber on standby
./pg_createsubscriber -D ../standby/ -p 9000 -P "host=localhost port=5432 dbname=postgres" -d postgres -v

# check if replication slots are retained after running pg_createsubscriber
./pg_ctl -D ../standby -l standby.log start
./psql -d postgres -p 9000 -c "select * from pg_replication_slots;" 

./pg_ctl -D ../standby -l standby.log stop
./pg_ctl -D ../primary -l primary.log stop


v2-0001-Improve-the-code-that-checks-if-the-recovery-is-f.patch
Description: Binary data


v2-0002-Improve-the-code-that-checks-if-the-primary-slot-.patch
Description: Binary data


v2-0003-Disable-slot-sync-during-the-convertion.patch
Description: Binary data


v2-0004-Drop-replication-slots-which-had-been-synchronize.patch
Description: Binary data


v2-0005-Avoid-outputing-some-messages-in-dry_run-mode.patch
Description: Binary data


Re: speed up a logical replica setup

2024-05-20 Thread Amit Kapila
On Sun, May 19, 2024 at 4:25 AM Thomas Munro  wrote:
>
> 040_pg_createsubscriber.pl seems to be failing occasionally on
> culicidae near "--dry-run on node S".  I couldn't immediately see why.
> That animal is using EXEC_BACKEND and I also saw a one-off failure a
> bit like that on my own local Linux + EXEC_BACKEND test run
> (sorry I didn't keep the details around).  Coincidence?
>

AFAICS, this is the same as one of the two BF failures being discussed
in this thread.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-05-18 Thread Thomas Munro
040_pg_createsubscriber.pl seems to be failing occasionally on
culicidae near "--dry-run on node S".  I couldn't immediately see why.
That animal is using EXEC_BACKEND and I also saw a one-off failure a
bit like that on my own local Linux + EXEC_BACKEND test run
(sorry I didn't keep the details around).  Coincidence?




Re: speed up a logical replica setup

2024-04-30 Thread Amit Kapila
On Tue, Apr 30, 2024 at 12:04 PM Amit Kapila  wrote:
>
> On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira  wrote:
> >
>
> I was trying to test this utility when 'sync_replication_slots' is on
> and it gets in an ERROR loop [1] and never finishes. Please find the
> postgresql.auto used on the standby attached. I think if the standby
> has enabled sync_slots, you need to pass dbname in
> GenerateRecoveryConfig().

The other possibility is that when we start standby from
pg_createsubscriber, we specifically set 'sync_replication_slots' as
false.

>
> I couldn't test it further but I wonder if
> there are already synced slots on the standby (either due to
> 'sync_replication_slots' or users have used
> pg_sync_replication_slots() before invoking pg_createsubscriber),
> those would be retained as it is on new subscriber and lead to
> unnecessary WAL retention and dead rows.
>

This still needs some handling.

BTW, I don't see the use of following messages in --dry-run mode or
rather they could be misleading:
pg_createsubscriber: hint: If pg_createsubscriber fails after this
point, you must recreate the physical replica before continuing.
...
...
pg_createsubscriber: setting the replication progress (node name
"pg_0" ; LSN 0/0) on database "postgres"

Similarly, we should think if below messages are useful in --dry-run mode:
pg_createsubscriber: dropping publication
"pg_createsubscriber_5_887f7991" on database "postgres"
pg_createsubscriber: creating subscription
"pg_createsubscriber_5_887f7991" on database "postgres"
...
pg_createsubscriber: enabling subscription
"pg_createsubscriber_5_887f7991" on database "postgres"

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-04-30 Thread Amit Kapila
On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila  wrote:
>
> On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira  wrote:
>

I was trying to test this utility when 'sync_replication_slots' is on
and it gets in an ERROR loop [1] and never finishes. Please find the
postgresql.auto used on the standby attached. I think if the standby
has enabled sync_slots, you need to pass dbname in
GenerateRecoveryConfig(). I couldn't test it further but I wonder if
there are already synced slots on the standby (either due to
'sync_replication_slots' or users have used
pg_sync_replication_slots() before invoking pg_createsubscriber),
those would be retained as it is on new subscriber and lead to
unnecessary WAL retention and dead rows.

[1]
2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
requires dbname to be specified in primary_conninfo

-- 
With Regards,
Amit Kapila.


postgresql.auto.standby.conf
Description: Binary data


Re: speed up a logical replica setup

2024-04-29 Thread Amit Kapila
On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira  wrote:
>
> On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:
>
> On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira  wrote:
> >
> > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
> >
> > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> > Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> > waited long enough?
> >
> >
> > It was an attempt to decoupled a connection failure (that keeps streaming 
> > the
> > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the 
> > primary
> > is gone during the standby recovery process, there is a way to bail out.
> >
>
> I think we don't need to check primary if the WAL corresponding to
> consistent_lsn is already present on the standby. Shouldn't we first
> check that? Once we ensure that the required WAL is copied, just
> checking server_is_in_recovery() should be sufficient. I feel that
> will be a direct way of ensuring what is required rather than
> indirectly verifying the same (by checking pg_stat_wal_receiver) as we
> are doing currently.
>
>
> How would you check it? WAL file? During recovery, you are not allowed to use
> pg_current_wal_lsn.
>

How about pg_last_wal_receive_lsn()?

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-04-29 Thread Euler Taveira
On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:
> On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira  wrote:
> >
> > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
> >
> > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> > Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> > waited long enough?
> >
> >
> > It was an attempt to decoupled a connection failure (that keeps streaming 
> > the
> > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the 
> > primary
> > is gone during the standby recovery process, there is a way to bail out.
> >
> 
> I think we don't need to check primary if the WAL corresponding to
> consistent_lsn is already present on the standby. Shouldn't we first
> check that? Once we ensure that the required WAL is copied, just
> checking server_is_in_recovery() should be sufficient. I feel that
> will be a direct way of ensuring what is required rather than
> indirectly verifying the same (by checking pg_stat_wal_receiver) as we
> are doing currently.

How would you check it? WAL file? During recovery, you are not allowed to use
pg_current_wal_lsn.

Tomas suggested to me off-list that we should adopt a simple solution in
wait_for_end_recovery: wait for recovery_timeout without additional checks
(which means remove the pg_stat_wal_receiver logic).  When we have additional
information that we can reliably use in this function, we can add it. Hence, it
is also easy to adjust the PG_TEST_TIMEOUT_DEFAULT to have stable tests.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-04-29 Thread Amit Kapila
On Tue, Mar 26, 2024 at 8:24 AM Euler Taveira  wrote:
>
> On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
>
> The first patch implements a combination of (1) and (2).
>
> ## Analysis for failure 2
>
> According to [2], the physical replication slot which is specified as 
> primary_slot_name
> was not used by the walsender process. At that time walsender has not existed.
>
> ```
> ...
> pg_createsubscriber: publisher: current wal senders: 0
> pg_createsubscriber: command is: SELECT 1 FROM 
> pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot'
> pg_createsubscriber: error: could not obtain replication slot information: 
> got 0 rows, expected 1 row
> ...
> ```
>
> Currently standby must be stopped before the command and current code does not
> block the flow to ensure the replication is started. So there is a possibility
> that the checking is run before walsender is launched.
>
> One possible approach is to wait until the replication starts. Alternative 
> one is
> to ease the condition.
>
>
> That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
> NUM_ATTEMPTS in the first patch). See second patch.
>

How can we guarantee that the slot can become active after these many
attempts? It still could be possible that on some slow machines it
didn't get activated even after NUM_ATTEMPTS. BTW, in the first place,
why do we need to ensure that the 'primary_slot_name' is active on the
primary?

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-04-29 Thread Amit Kapila
On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira  wrote:
>
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?
>
>
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the 
> primary
> is gone during the standby recovery process, there is a way to bail out.
>

I think we don't need to check primary if the WAL corresponding to
consistent_lsn is already present on the standby. Shouldn't we first
check that? Once we ensure that the required WAL is copied, just
checking server_is_in_recovery() should be sufficient. I feel that
will be a direct way of ensuring what is required rather than
indirectly verifying the same (by checking pg_stat_wal_receiver) as we
are doing currently.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-26 Thread Tomas Vondra
On 3/26/24 21:17, Euler Taveira wrote:
> On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
>> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
>> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
>> waited long enough?
> 
> It was an attempt to decoupled a connection failure (that keeps streaming the
> WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the 
> primary
> is gone during the standby recovery process, there is a way to bail out. The
> recovery-timeout is 0 (infinite) by default so you have an infinite wait 
> without
> this check. The idea behind this implementation is to avoid exiting in this
> critical code path. If it times out here you might have to rebuild the standby
> and start again.

- This seems like something that should definitely be documented in the
comment before wait_for_end_recovery(). At the moment it only talks
about timeout, and nothing about NUM_CONN_ATTEMPTS.

- The NUM_CONN_ATTEMPTS name seems rather misleading, considering it
does not really count connection attempts, but number of times we have
not seen 1 in pg_catalog.pg_stat_wal_receiver.

- Not sure I follow the logic - it tries to avoid exiting by setting
infinite timeout, but it still exists based on NUM_CONN_ATTEMPTS. Isn't
that somewhat contradictory?

- Isn't the NUM_CONN_ATTEMPTS actually making it more fragile, i.e. more
likely to exit? For example, what if there's a short networking hiccup,
so that the standby can't connect to the primary.

- It seems a bit strange that even with the recovery timeout set, having
the limit of 10 "connection attempts" effectively establishes a separate
hard-coded limit of 10 seconds. Seems a bit surprising if I set recovery
limit to 1 minute, and it just dies after 10 seconds.

> Amit suggested [1] that we use a value as recovery-timeout but
> how high is a good value? I've already saw some long recovery process using
> pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too
> worried about a small percentage of cases and we should use 1h as default, for
> example. It would reduce the complexity since the recovery process lacks some
> progress indicators (LSN is not sufficient in this case and there isn't a
> function to provide the current state -- stop applying WAL, reach target, new
> timeline, etc).
> 
> If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery
> by default otherwise we will have some reports saying the tool is hanging when
> in reality the primary has gone and WAL should be streamed.
> 

I don't think there's a default timeout value that would work for
everyone. Either it's going to be too short for some cases, or it'll
take too long for some other cases.

I think there are two obvious default values for the timeout - infinity,
and 60 seconds, which is the default we use for other CLI tools (like
pg_ctl and so on). Considering the negative impact of exiting, I'd say
it's better to default to infinity. It's always possible to Ctrl-C or
terminate the process in some other way, if needed.

As for people complaining about infinite recovery - perhaps it'd be
sufficient to mention this in the messages printed by the tool, to make
it clearer. Or maybe even print something in the loop, because right now
it's entirely silent so it's easy to believe it's stuck. Perhaps not on
every loop, but at least in verbose mode it should print something.

>> IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
>> pg_createsubscriber, and that should do the trick.
> 
> That's a good idea. Tests are not exercising the recovery-timeout option.
> 
>> Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
>> things like ubsan/valgrind already use to deal with exactly this kind of
>> timeout problem.
>>
>> Or is there a deeper problem with deciding if the system is in recovery?
> 
> As I said with some recovery progress indicators it would be easier to make 
> some
> decisions like wait a few seconds because the WAL has already been applied and
> it is creating a new timeline. The recovery timeout decision is a shot in the
> dark because we might be aborting pg_createsubscriber when the target server 
> is
> about to set RECOVERY_STATE_DONE.
> 

Isn't it enough to check data in pg_stat_replication on the primary?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: speed up a logical replica setup

2024-03-26 Thread Euler Taveira
On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?

It was an attempt to decoupled a connection failure (that keeps streaming the
WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
is gone during the standby recovery process, there is a way to bail out. The
recovery-timeout is 0 (infinite) by default so you have an infinite wait without
this check. The idea behind this implementation is to avoid exiting in this
critical code path. If it times out here you might have to rebuild the standby
and start again. Amit suggested [1] that we use a value as recovery-timeout but
how high is a good value? I've already saw some long recovery process using
pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too
worried about a small percentage of cases and we should use 1h as default, for
example. It would reduce the complexity since the recovery process lacks some
progress indicators (LSN is not sufficient in this case and there isn't a
function to provide the current state -- stop applying WAL, reach target, new
timeline, etc).

If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery
by default otherwise we will have some reports saying the tool is hanging when
in reality the primary has gone and WAL should be streamed.

> IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
> pg_createsubscriber, and that should do the trick.

That's a good idea. Tests are not exercising the recovery-timeout option.

> Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
> things like ubsan/valgrind already use to deal with exactly this kind of
> timeout problem.
> 
> Or is there a deeper problem with deciding if the system is in recovery?

As I said with some recovery progress indicators it would be easier to make some
decisions like wait a few seconds because the WAL has already been applied and
it is creating a new timeline. The recovery timeout decision is a shot in the
dark because we might be aborting pg_createsubscriber when the target server is
about to set RECOVERY_STATE_DONE.


[1] 
https://www.postgresql.org/message-id/CAA4eK1JRgnhv_ySzuFjN7UaX9qxz5Hqcwew7Fk%3D%2BAbJbu0Kd9w%40mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-26 Thread Tomas Vondra
On 3/26/24 03:53, Euler Taveira wrote:
> On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
>> ## Analysis for failure 1
>>
>> The failure caused by a time lag between walreceiver finishes and 
>> pg_is_in_recovery()
>> returns true.
>>
>> According to the output [1], it seems that the tool failed at 
>> wait_for_end_recovery()
>> with the message "standby server disconnected from the primary". Also, lines
>> "redo done at..." and "terminating walreceiver process due to administrator 
>> command"
>> meant that walreceiver was requested to shut down by XLogShutdownWalRcv().
>>
>> According to the source, we confirm that walreceiver is shut down in
>> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, 
>> SharedRecoveryState
>> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return 
>> true)
>> at the latter part of StartupXLOG().
>>
>> So, if there is a delay between FinishWalRecovery() and change the state, 
>> the check
>> in wait_for_end_recovery() would be failed during the time. Since we allow 
>> to miss
>> the walreceiver 10 times and it is checked once per second, the failure 
>> occurs if
>> the time lag is longer than 10 seconds.
>>
>> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS 
>> larger,
>> but it's not a fundamental solution.
> 
> I was expecting that slow hosts might have issues in wait_for_end_recovery().
> As you said it took a lot of steps between FinishWalRecovery() (where
> walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set 
> to
> RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS *
> WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision
> because it already finished the promotion and it is just doing the final
> preparation for the host to become a primary.
> 
> /*   
>  * If it is still in recovery, make sure the target server is
>  * connected to the primary so it can receive the required WAL to
>  * finish the recovery process. If it is disconnected try
>  * NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
>  */
> res = PQexec(conn,
>  "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");
> if (PQntuples(res) == 0)
> {
> if (++count > NUM_CONN_ATTEMPTS)
> {
> stop_standby_server(subscriber_dir);
> pg_log_error("standby server disconnected from the primary");
> break;
> }
> }
> else
> count = 0;  /* reset counter if it connects again */
> 
> This code was add to defend against the death/crash of the target server. 
> There
> are at least 3 options:
> 
> (1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this 
> constant
> and I decided to use 10 seconds because even in some slow hosts, this time
> wasn't reached during my tests. It seems I forgot to test the combination of 
> slow
> host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60
> seconds as default wait. Maybe that's a reasonable value. I checked the
> 004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s
> (culicidae), 18.31s (calliphoridae) and 27.52s (olingo).
> > (2) check if the primary is not running when walreceiver is not
available on the
> target server. Increase the connection attempts iif the primary is not 
> running.
> Hence, the described case doesn't cause an increment on the count variable.
> 
> (3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check
> protection against the death/crash target server. I explained in a previous
> message that timeout may occur in cases that WAL replay to reach consistent
> state takes more than recovery-timeout seconds.
> 
> Option (1) is the easiest fix, however, we can have the same issue again if a
> slow host decides to be even slower, hence, we have to adjust this value 
> again.
> Option (2) interprets the walreceiver absence as a recovery end and if the
> primary server is running it can indicate that the target server is in the
> imminence of the recovery end. Option (3) is not as resilient as the other
> options.
> 
> The first patch implements a combination of (1) and (2).
> 
>> ## Analysis for failure 2
>>
>> According to [2], the physical replication slot which is specified as 
>> primary_slot_name
>> was not used by the walsender process. At that time walsender has not 
>> existed.
>>
>> ```
>> ...
>> pg_createsubscriber: publisher: current wal senders: 0
>> pg_createsubscriber: command is: SELECT 1 FROM 
>> pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot'
>> pg_createsubscriber: error: could not obtain replication slot information: 
>> got 0 rows, expected 1 row
>> ...
>> ```
>>
>> Currently standby must be stopped before the command and current code does 
>> not
>> block the flow to ensure the 

RE: speed up a logical replica setup

2024-03-25 Thread Hayato Kuroda (Fujitsu)
Dear Amit, Euler,

> 
> This only drops the publications created by this tool, not the
> pre-existing ones that we discussed in the link provided.

Another concern around here is the case which primary subscribes changes from 
others.
After the conversion, new subscriber also tries to connect to another publisher 
as
well - this may lead conflicts. This causes because both launcher/workers start
after recovery finishes. So, based on the Ashutosh's point, should we remove
such replication objects?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: speed up a logical replica setup

2024-03-25 Thread Amit Kapila
On Tue, Mar 26, 2024 at 8:27 AM Euler Taveira  wrote:
>
> On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote:
>
> On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut  wrote:
> >
> > I have committed your version v33.  I did another pass over the
> > identifier and literal quoting.  I added quoting for replication slot
> > names, for example, even though they can only contain a restricted set
> > of characters, but it felt better to be defensive there.
> >
> > I'm happy to entertain follow-up patches on some of the details like
> > option naming that were still being discussed.  I just wanted to get the
> > main functionality in in good time.  We can fine-tune the rest over the
> > next few weeks.
> >
>
> I was looking at prior discussions on this topic to see if there are
> any other open design points apart from this and noticed that the
> points raised/discussed in the email [1] are also not addressed. IIRC,
> the key point we discussed was that after promotion, the existing
> replication objects should be removed (either optionally or always),
> otherwise, it can lead to a new subscriber not being able to restart
> or getting some unwarranted data.
>
>
> See setup_subscriber.
>
> /*
>  * Since the publication was created before the consistent LSN, it is
>  * available on the subscriber when the physical replica is promoted.
>  * Remove publications from the subscriber because it has no use.
>  */
> drop_publication(conn, [I]);
>

This only drops the publications created by this tool, not the
pre-existing ones that we discussed in the link provided.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-25 Thread Euler Taveira
On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote:
> On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut  wrote:
> >
> > I have committed your version v33.  I did another pass over the
> > identifier and literal quoting.  I added quoting for replication slot
> > names, for example, even though they can only contain a restricted set
> > of characters, but it felt better to be defensive there.
> >
> > I'm happy to entertain follow-up patches on some of the details like
> > option naming that were still being discussed.  I just wanted to get the
> > main functionality in in good time.  We can fine-tune the rest over the
> > next few weeks.
> >
> 
> I was looking at prior discussions on this topic to see if there are
> any other open design points apart from this and noticed that the
> points raised/discussed in the email [1] are also not addressed. IIRC,
> the key point we discussed was that after promotion, the existing
> replication objects should be removed (either optionally or always),
> otherwise, it can lead to a new subscriber not being able to restart
> or getting some unwarranted data.

See setup_subscriber.

/*
 * Since the publication was created before the consistent LSN, it is
 * available on the subscriber when the physical replica is promoted.
 * Remove publications from the subscriber because it has no use.
 */
drop_publication(conn, [i]);


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-25 Thread Euler Taveira
On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
> ## Analysis for failure 1
> 
> The failure caused by a time lag between walreceiver finishes and 
> pg_is_in_recovery()
> returns true.
> 
> According to the output [1], it seems that the tool failed at 
> wait_for_end_recovery()
> with the message "standby server disconnected from the primary". Also, lines
> "redo done at..." and "terminating walreceiver process due to administrator 
> command"
> meant that walreceiver was requested to shut down by XLogShutdownWalRcv().
> 
> According to the source, we confirm that walreceiver is shut down in
> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, 
> SharedRecoveryState
> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return 
> true)
> at the latter part of StartupXLOG().
> 
> So, if there is a delay between FinishWalRecovery() and change the state, the 
> check
> in wait_for_end_recovery() would be failed during the time. Since we allow to 
> miss
> the walreceiver 10 times and it is checked once per second, the failure 
> occurs if
> the time lag is longer than 10 seconds.
> 
> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS 
> larger,
> but it's not a fundamental solution.

I was expecting that slow hosts might have issues in wait_for_end_recovery().
As you said it took a lot of steps between FinishWalRecovery() (where
walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set to
RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS *
WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision
because it already finished the promotion and it is just doing the final
preparation for the host to become a primary.

/*   
 * If it is still in recovery, make sure the target server is
 * connected to the primary so it can receive the required WAL to
 * finish the recovery process. If it is disconnected try
 * NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
 */
res = PQexec(conn,
 "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");
if (PQntuples(res) == 0)
{
if (++count > NUM_CONN_ATTEMPTS)
{
stop_standby_server(subscriber_dir);
pg_log_error("standby server disconnected from the primary");
break;
}
}
else
count = 0;  /* reset counter if it connects again */

This code was add to defend against the death/crash of the target server. There
are at least 3 options:

(1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this 
constant
and I decided to use 10 seconds because even in some slow hosts, this time
wasn't reached during my tests. It seems I forgot to test the combination of 
slow
host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60
seconds as default wait. Maybe that's a reasonable value. I checked the
004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s
(culicidae), 18.31s (calliphoridae) and 27.52s (olingo).

(2) check if the primary is not running when walreceiver is not available on the
target server. Increase the connection attempts iif the primary is not running.
Hence, the described case doesn't cause an increment on the count variable.

(3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check
protection against the death/crash target server. I explained in a previous
message that timeout may occur in cases that WAL replay to reach consistent
state takes more than recovery-timeout seconds.

Option (1) is the easiest fix, however, we can have the same issue again if a
slow host decides to be even slower, hence, we have to adjust this value again.
Option (2) interprets the walreceiver absence as a recovery end and if the
primary server is running it can indicate that the target server is in the
imminence of the recovery end. Option (3) is not as resilient as the other
options.

The first patch implements a combination of (1) and (2).

> ## Analysis for failure 2
> 
> According to [2], the physical replication slot which is specified as 
> primary_slot_name
> was not used by the walsender process. At that time walsender has not existed.
> 
> ```
> ...
> pg_createsubscriber: publisher: current wal senders: 0
> pg_createsubscriber: command is: SELECT 1 FROM 
> pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot'
> pg_createsubscriber: error: could not obtain replication slot information: 
> got 0 rows, expected 1 row
> ...
> ```
> 
> Currently standby must be stopped before the command and current code does not
> block the flow to ensure the replication is started. So there is a possibility
> that the checking is run before walsender is launched.
> 
> One possible approach is to wait until the replication starts. Alternative 
> one is
> to ease the condition.


Re: speed up a logical replica setup

2024-03-25 Thread Amit Kapila
On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut  wrote:
>
> I have committed your version v33.  I did another pass over the
> identifier and literal quoting.  I added quoting for replication slot
> names, for example, even though they can only contain a restricted set
> of characters, but it felt better to be defensive there.
>
> I'm happy to entertain follow-up patches on some of the details like
> option naming that were still being discussed.  I just wanted to get the
> main functionality in in good time.  We can fine-tune the rest over the
> next few weeks.
>

I was looking at prior discussions on this topic to see if there are
any other open design points apart from this and noticed that the
points raised/discussed in the email [1] are also not addressed. IIRC,
the key point we discussed was that after promotion, the existing
replication objects should be removed (either optionally or always),
otherwise, it can lead to a new subscriber not being able to restart
or getting some unwarranted data.

[1] - 
https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-25 Thread vignesh C
On Mon, 25 Mar 2024 at 21:36, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Bharath, Peter,
>
> > Looks like BF animals aren't happy, please check -
> > > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
> >
> > Looks like sanitizer failures.  There were a few messages about that
> > recently, but those were all just about freeing memory after use, which
> > we don't necessarily require for client programs.  So maybe something else.
>
> It seems that there are several time of failures, [1] and [2].
>
> ## Analysis for failure 1
>
> The failure caused by a time lag between walreceiver finishes and 
> pg_is_in_recovery()
> returns true.
>
> According to the output [1], it seems that the tool failed at 
> wait_for_end_recovery()
> with the message "standby server disconnected from the primary". Also, lines
> "redo done at..." and "terminating walreceiver process due to administrator 
> command"
> meant that walreceiver was requested to shut down by XLogShutdownWalRcv().
>
> According to the source, we confirm that walreceiver is shut down in
> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, 
> SharedRecoveryState
> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return 
> true)
> at the latter part of StartupXLOG().
>
> So, if there is a delay between FinishWalRecovery() and change the state, the 
> check
> in wait_for_end_recovery() would be failed during the time. Since we allow to 
> miss
> the walreceiver 10 times and it is checked once per second, the failure 
> occurs if
> the time lag is longer than 10 seconds.
>
> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS 
> larger,
> but it's not a fundamental solution.

I agree with your analysis, another way to fix could be to remove the
following check as increasing the count might still have the race
condition issue:
/*
* If it is still in recovery, make sure the target server is
* connected to the primary so it can receive the required WAL to
* finish the recovery process. If it is disconnected try
* NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
*/
res = PQexec(conn,
"SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");

I'm not sure whether we should worry about the condition where
recovery is not done and pg_stat_wal_receiver is exited as we have the
following sanity check in check_subscriber before we wait for recovery
to be finished:
/* The target server must be a standby */
if (!server_is_in_recovery(conn))
{
pg_log_error("target server must be a standby");
disconnect_database(conn, true);
}

Regards,
Vignesh




RE: speed up a logical replica setup

2024-03-25 Thread Hayato Kuroda (Fujitsu)
Dear Bharath, Peter,

> Looks like BF animals aren't happy, please check -
> > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
> 
> Looks like sanitizer failures.  There were a few messages about that
> recently, but those were all just about freeing memory after use, which
> we don't necessarily require for client programs.  So maybe something else.

It seems that there are several time of failures, [1] and [2].

## Analysis for failure 1

The failure caused by a time lag between walreceiver finishes and 
pg_is_in_recovery()
returns true.

According to the output [1], it seems that the tool failed at 
wait_for_end_recovery()
with the message "standby server disconnected from the primary". Also, lines
"redo done at..." and "terminating walreceiver process due to administrator 
command"
meant that walreceiver was requested to shut down by XLogShutdownWalRcv().

According to the source, we confirm that walreceiver is shut down in
StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, 
SharedRecoveryState
is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return 
true)
at the latter part of StartupXLOG().

So, if there is a delay between FinishWalRecovery() and change the state, the 
check
in wait_for_end_recovery() would be failed during the time. Since we allow to 
miss
the walreceiver 10 times and it is checked once per second, the failure occurs 
if
the time lag is longer than 10 seconds.

I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS 
larger,
but it's not a fundamental solution.

## Analysis for failure 2

According to [2], the physical replication slot which is specified as 
primary_slot_name
was not used by the walsender process. At that time walsender has not existed.

```
...
pg_createsubscriber: publisher: current wal senders: 0
pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots 
WHERE active AND slot_name = 'physical_slot'
pg_createsubscriber: error: could not obtain replication slot information: got 
0 rows, expected 1 row
...
```

Currently standby must be stopped before the command and current code does not
block the flow to ensure the replication is started. So there is a possibility
that the checking is run before walsender is launched.

One possible approach is to wait until the replication starts. Alternative one 
is
to ease the condition.

How do you think?

[1]: 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2024-03-25%2013%3A03%3A07
[2]: 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2024-03-25%2013%3A53%3A58

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: speed up a logical replica setup

2024-03-25 Thread Euler Taveira
On Mon, Mar 25, 2024, at 8:55 AM, Peter Eisentraut wrote:
> On 22.03.24 04:31, Euler Taveira wrote:
> > On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
> >> There is a compilation error while building postgres with the patch
> >> due to a recent commit. I have attached a top-up patch v32-0003 to
> >> resolve this compilation error.
> >> I have not updated the version of the patch as I have not made any
> >> change in v32-0001 and v32-0002 patch.
> > 
> > I'm attaching a new version (v33) to incorporate this fix (v32-0003) 
> > into the
> > main patch (v32-0001). This version also includes 2 new tests:
> > 
> > - refuse to run if the standby server is running
> > - refuse to run if the standby was promoted e.g. it is not in recovery
> > 
> > The first one exercises a recent change (standby should be stopped) and the
> > second one covers an important requirement.
> 
> I have committed your version v33.  I did another pass over the 
> identifier and literal quoting.  I added quoting for replication slot 
> names, for example, even though they can only contain a restricted set 
> of characters, but it felt better to be defensive there.
Thanks.

> I'm happy to entertain follow-up patches on some of the details like 
> option naming that were still being discussed.  I just wanted to get the 
> main functionality in in good time.  We can fine-tune the rest over the 
> next few weeks.
Agree. Let's continue the discussion about the details.

> > Based on the discussion [1] about the check functions, Vignesh suggested 
> > that it
> > should check both server before exiting. v33-0003 implements it. I don't 
> > have a
> > strong preference; feel free to apply it.
> 
> I haven't done anything about this.
... including this one.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-25 Thread Peter Eisentraut

On 25.03.24 13:36, Bharath Rupireddy wrote:

On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut  wrote:


I have committed your version v33.


Looks like BF animals aren't happy, please check -
https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.


Looks like sanitizer failures.  There were a few messages about that 
recently, but those were all just about freeing memory after use, which 
we don't necessarily require for client programs.  So maybe something else.






Re: speed up a logical replica setup

2024-03-25 Thread Bharath Rupireddy
On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut  wrote:
>
> I have committed your version v33.

Looks like BF animals aren't happy, please check -
https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: speed up a logical replica setup

2024-03-25 Thread Peter Eisentraut

On 22.03.24 04:31, Euler Taveira wrote:

On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:

There is a compilation error while building postgres with the patch
due to a recent commit. I have attached a top-up patch v32-0003 to
resolve this compilation error.
I have not updated the version of the patch as I have not made any
change in v32-0001 and v32-0002 patch.


I'm attaching a new version (v33) to incorporate this fix (v32-0003) 
into the

main patch (v32-0001). This version also includes 2 new tests:

- refuse to run if the standby server is running
- refuse to run if the standby was promoted e.g. it is not in recovery

The first one exercises a recent change (standby should be stopped) and the
second one covers an important requirement.


I have committed your version v33.  I did another pass over the 
identifier and literal quoting.  I added quoting for replication slot 
names, for example, even though they can only contain a restricted set 
of characters, but it felt better to be defensive there.


I'm happy to entertain follow-up patches on some of the details like 
option naming that were still being discussed.  I just wanted to get the 
main functionality in in good time.  We can fine-tune the rest over the 
next few weeks.


Based on the discussion [1] about the check functions, Vignesh suggested 
that it
should check both server before exiting. v33-0003 implements it. I don't 
have a

strong preference; feel free to apply it.


I haven't done anything about this.





RE: speed up a logical replica setup

2024-03-25 Thread Hayato Kuroda (Fujitsu)
> IIUC, added options were inspired by Tomas. And he said the limitation
> (pub/sub/slot
> name cannot be specified) was acceptable for the first version. I agree with 
> him.
> (To be honest, I feel that options should be fewer for the first release)

Just to confirm - I don't think it is not completely needed. If we can agree a 
specification
in sometime, it's OK for me to add them. If you ask me, I still prefer Tomas's 
approach.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



RE: speed up a logical replica setup

2024-03-25 Thread Hayato Kuroda (Fujitsu)
Dear Amit, Euler,

I also want to share my opinion, just in case.

> On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira  wrote:
> >
> > On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
> >
> > If we commit this we might not be able to change the way the option
> > behaves once the customers starts using it. How about removing these
> > options in the first version and adding it in the next version after
> > more discussion.
> >
> >
> > We don't need to redesign this one if we want to add a format string in a 
> > next
> > version. A long time ago, pg_dump started to accept pattern for tables 
> > without
> > breaking or deprecating the -t option. If you have 100 databases and you 
> > don't
> > want to specify the options or use a script to generate it for you, you also
> > have the option to let pg_createsubscriber generate the object names for 
> > you.
> > Per my experience, it will be a rare case.
> >
> 
> But, why go with some UI in the first place which we don't think is a
> good one, or at least don't have a broader agreement that it is a good
> one? The problem with self-generated names for users could be that
> they won't be able to make much out of it. If one has to always use
> those internally then probably that would be acceptable. I would
> prefer what Tomas proposed a few emails ago as compared to this one as
> that has fewer options to be provided by users but still, they can
> later identify objects. But surely, we should discuss if you or others
> have better alternatives.

IIUC, added options were inspired by Tomas. And he said the limitation 
(pub/sub/slot
name cannot be specified) was acceptable for the first version. I agree with 
him.
(To be honest, I feel that options should be fewer for the first release)

> The user choosing to convert a physical standby to a subscriber would
> in some cases be interested in converting it for all the databases
> (say for the case of upgrade [1]) and giving options for each database
> would be cumbersome for her.

+1 for the primary use case.

> > Currently dry-run will do the check and might fail on identifying a
> > few failures like after checking subscriber configurations. Then the
> > user will have to correct the configuration and re-run then fix the
> > next set of failures. Whereas the suggest-config will display all the
> > optimal configuration for both the primary and the standby in a single
> > shot. This is not a must in the first version, it can be done as a
> > subsequent enhancement.
> >
> >
> > Do you meant publisher, right? Per order, check_subscriber is done before
> > check_publisher and it checks all settings on the subscriber before 
> > exiting. In
> > v30, I changed the way it provides the required settings. In a previous 
> > version,
> > it fails when it found a wrong setting; the current version, check all 
> > settings
> > from that server before providing a suitable error.
> >
> > pg_createsubscriber: checking settings on publisher
> > pg_createsubscriber: primary has replication slot "physical_slot"
> > pg_createsubscriber: error: publisher requires wal_level >= logical
> > pg_createsubscriber: error: publisher requires 2 replication slots, but 
> > only 0
> remain
> > pg_createsubscriber: hint: Consider increasing max_replication_slots to at 
> > least
> 3.
> > pg_createsubscriber: error: publisher requires 2 wal sender processes, but 
> > only
> 0 remain
> > pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 
> > 3.
> >
> > If you have such an error, you will fix them all and rerun using dry run 
> > mode
> > again to verify everything is ok. I don't have a strong preference about 
> > it. It
> > can be changed easily (unifying the check functions or providing a return 
> > for
> > each of the check functions).
> >
> 
> We can unify the checks but not sure if it is worth it. I am fine
> either way. It would have been better if we provided a way for a user
> to know the tool's requirement rather than letting him know via errors
> but I think it should be okay to extend it later as well.

Both ways are OK, but I prefer to unify checks a bit. The number of working 
modes
in the same executables should be reduced as much as possible.

Also, I feel the current specification is acceptable. pg_upgrade checks one by
one and exits soon in bad cases. If both old and new clusters have issues, the
first run only reports the old one's issues. After DBA fixes and runs again,
issues on the new cluster are output.

[1]: 
https://www.postgresql.org/message-id/8d52c226-7e34-44f7-a919-759bf8d81541%40enterprisedb.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: speed up a logical replica setup

2024-03-22 Thread Shubham Khanna
On Fri, Mar 22, 2024 at 9:02 AM Euler Taveira  wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.
>
> Based on the discussion [1] about the check functions, Vignesh suggested that 
> it
> should check both server before exiting. v33-0003 implements it. I don't have 
> a
> strong preference; feel free to apply it.
>
>
> [1] 
> https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com

I had run valgrind with pg_createsubscriber to see if there were any
issues. Valgrind reported the following issues:
==651272== LEAK SUMMARY:
==651272==definitely lost: 1,319 bytes in 18 blocks
==651272==indirectly lost: 1,280 bytes in 2 blocks
==651272==  possibly lost: 44 bytes in 3 blocks
==651272==still reachable: 3,066 bytes in 22 blocks
==651272== suppressed: 0 bytes in 0 blocks
==651272==
==651272== For lists of detected and suppressed errors, rerun with: -s
==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0)
The attached report has the details of the same.

Thanks and Regards:
Shubham Khanna.


valgrind.log
Description: Binary data


Re: speed up a logical replica setup

2024-03-22 Thread vignesh C
On Fri, 22 Mar 2024 at 09:01, Euler Taveira  wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.

Few comments:
1) In error case PQclear and PQfinish should be called:
+   /* Secure search_path */
+   res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not clear search_path: %s",
+PQresultErrorMessage(res));
+   if (exit_on_error)
+   exit(1);
+
+   return NULL;
+   }
+   PQclear(res);

2) Call fflush here before calling system command to get proper
ordered console output:
a) Call fflush:
+   int rc = system(cmd_str);
+
+   if (rc == 0)
+   pg_log_info("subscriber successfully changed
the system identifier");
+   else
+   pg_fatal("subscriber failed to change system
identifier: exit code: %d", rc);
+   }

b) similarly here:
+   pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd->data);
+   rc = system(pg_ctl_cmd->data);
+   pg_ctl_status(pg_ctl_cmd->data, rc);
+   standby_running = true;

c) similarly here:
+   pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path,
+ datadir);
+   pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd);
+   rc = system(pg_ctl_cmd);
+   pg_ctl_status(pg_ctl_cmd, rc);
+   standby_running = false;

3) Call PQClear in error cases:
a) Call PQClear
+   res = PQexec(conn, "SELECT system_identifier FROM
pg_catalog.pg_control_system()");
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not get system identifier: %s",
+PQresultErrorMessage(res));
+   disconnect_database(conn, true);
+   }

b) similarly here
+   if (PQntuples(res) != 1)
+   {
+   pg_log_error("could not get system identifier: got %d
rows, expected %d row",
+PQntuples(res), 1);
+   disconnect_database(conn, true);
+   }
+

c) similarly here
+   res = PQexec(conn,
+"SELECT oid FROM pg_catalog.pg_database "
+"WHERE datname =
pg_catalog.current_database()");
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not obtain database OID: %s",
+PQresultErrorMessage(res));
+   disconnect_database(conn, true);
+   }
+

d) There are few more places like this.

4) Since we are using a global variable, we might be able to remove
initializing many of them.
+   /* Default settings */
+   subscriber_dir = NULL;
+   opt.config_file = NULL;
+   opt.pub_conninfo_str = NULL;
+   opt.socket_dir = NULL;
+   opt.sub_port = DEFAULT_SUB_PORT;
+   opt.sub_username = NULL;
+   opt.database_names = (SimpleStringList){0};
+   opt.recovery_timeout = 0;

Regards,
Vignesh




Re: speed up a logical replica setup

2024-03-21 Thread Amit Kapila
On Fri, Mar 22, 2024 at 9:44 AM Fabrízio de Royes Mello
 wrote:
>
> On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila  wrote:
> >
> >
> > The user choosing to convert a physical standby to a subscriber would
> > in some cases be interested in converting it for all the databases
> > (say for the case of upgrade [1]) and giving options for each database
> > would be cumbersome for her.
> >
> > ...
> >
> > [1] - This tool can be used in an upgrade where the user first
> > converts physical standby to subscriber to get incremental changes and
> > then performs an online upgrade.
> >
>
> The first use case me and Euler discussed some time ago to upstream this tool 
> was exactly what Amit described so IMHO we should make it easier for users to 
> subscribe to all existing user databases.
>

I feel that will be a good use case for this tool especially now
(with this release) when we allow upgrade of subscriptions. In this
regard, I feel if the user doesn't specify any database it should have
subscriptions for all databases and the user should have a way to
provide publication/slot/subscription names.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-21 Thread Fabrízio de Royes Mello
On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila 
wrote:
>
>
> The user choosing to convert a physical standby to a subscriber would
> in some cases be interested in converting it for all the databases
> (say for the case of upgrade [1]) and giving options for each database
> would be cumbersome for her.
>
> ...
>
> [1] - This tool can be used in an upgrade where the user first
> converts physical standby to subscriber to get incremental changes and
> then performs an online upgrade.
>

The first use case me and Euler discussed some time ago to upstream this
tool was exactly what Amit described so IMHO we should make it easier for
users to subscribe to all existing user databases.

Regards,

--
Fabrízio de Royes Mello


Re: speed up a logical replica setup

2024-03-21 Thread Amit Kapila
On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira  wrote:
>
> On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
>
> If we commit this we might not be able to change the way the option
> behaves once the customers starts using it. How about removing these
> options in the first version and adding it in the next version after
> more discussion.
>
>
> We don't need to redesign this one if we want to add a format string in a next
> version. A long time ago, pg_dump started to accept pattern for tables without
> breaking or deprecating the -t option. If you have 100 databases and you don't
> want to specify the options or use a script to generate it for you, you also
> have the option to let pg_createsubscriber generate the object names for you.
> Per my experience, it will be a rare case.
>

But, why go with some UI in the first place which we don't think is a
good one, or at least don't have a broader agreement that it is a good
one? The problem with self-generated names for users could be that
they won't be able to make much out of it. If one has to always use
those internally then probably that would be acceptable. I would
prefer what Tomas proposed a few emails ago as compared to this one as
that has fewer options to be provided by users but still, they can
later identify objects. But surely, we should discuss if you or others
have better alternatives.

The user choosing to convert a physical standby to a subscriber would
in some cases be interested in converting it for all the databases
(say for the case of upgrade [1]) and giving options for each database
would be cumbersome for her.

> Currently dry-run will do the check and might fail on identifying a
> few failures like after checking subscriber configurations. Then the
> user will have to correct the configuration and re-run then fix the
> next set of failures. Whereas the suggest-config will display all the
> optimal configuration for both the primary and the standby in a single
> shot. This is not a must in the first version, it can be done as a
> subsequent enhancement.
>
>
> Do you meant publisher, right? Per order, check_subscriber is done before
> check_publisher and it checks all settings on the subscriber before exiting. 
> In
> v30, I changed the way it provides the required settings. In a previous 
> version,
> it fails when it found a wrong setting; the current version, check all 
> settings
> from that server before providing a suitable error.
>
> pg_createsubscriber: checking settings on publisher
> pg_createsubscriber: primary has replication slot "physical_slot"
> pg_createsubscriber: error: publisher requires wal_level >= logical
> pg_createsubscriber: error: publisher requires 2 replication slots, but only 
> 0 remain
> pg_createsubscriber: hint: Consider increasing max_replication_slots to at 
> least 3.
> pg_createsubscriber: error: publisher requires 2 wal sender processes, but 
> only 0 remain
> pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.
>
> If you have such an error, you will fix them all and rerun using dry run mode
> again to verify everything is ok. I don't have a strong preference about it. 
> It
> can be changed easily (unifying the check functions or providing a return for
> each of the check functions).
>

We can unify the checks but not sure if it is worth it. I am fine
either way. It would have been better if we provided a way for a user
to know the tool's requirement rather than letting him know via errors
but I think it should be okay to extend it later as well.

[1] - This tool can be used in an upgrade where the user first
converts physical standby to subscriber to get incremental changes and
then performs an online upgrade.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-21 Thread Euler Taveira
On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.

I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
main patch (v32-0001). This version also includes 2 new tests:

- refuse to run if the standby server is running
- refuse to run if the standby was promoted e.g. it is not in recovery

The first one exercises a recent change (standby should be stopped) and the
second one covers an important requirement.

Based on the discussion [1] about the check functions, Vignesh suggested that it
should check both server before exiting. v33-0003 implements it. I don't have a
strong preference; feel free to apply it.


[1] 
https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v33-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


v33-0002-Remove-How-it-Works-section.patch.gz
Description: application/gzip


v33-0003-Check-both-servers-before-exiting.patch.gz
Description: application/gzip


Re: speed up a logical replica setup

2024-03-21 Thread Euler Taveira
On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
> If we commit this we might not be able to change the way the option
> behaves once the customers starts using it. How about removing these
> options in the first version and adding it in the next version after
> more discussion.

We don't need to redesign this one if we want to add a format string in a next
version. A long time ago, pg_dump started to accept pattern for tables without
breaking or deprecating the -t option. If you have 100 databases and you don't
want to specify the options or use a script to generate it for you, you also
have the option to let pg_createsubscriber generate the object names for you.
Per my experience, it will be a rare case.

> Currently dry-run will do the check and might fail on identifying a
> few failures like after checking subscriber configurations. Then the
> user will have to correct the configuration and re-run then fix the
> next set of failures. Whereas the suggest-config will display all the
> optimal configuration for both the primary and the standby in a single
> shot. This is not a must in the first version, it can be done as a
> subsequent enhancement.

Do you meant publisher, right? Per order, check_subscriber is done before
check_publisher and it checks all settings on the subscriber before exiting. In
v30, I changed the way it provides the required settings. In a previous version,
it fails when it found a wrong setting; the current version, check all settings
from that server before providing a suitable error.

pg_createsubscriber: checking settings on publisher
pg_createsubscriber: primary has replication slot "physical_slot"
pg_createsubscriber: error: publisher requires wal_level >= logical
pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 
remain
pg_createsubscriber: hint: Consider increasing max_replication_slots to at 
least 3.
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 
0 remain
pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.

If you have such an error, you will fix them all and rerun using dry run mode
again to verify everything is ok. I don't have a strong preference about it. It
can be changed easily (unifying the check functions or providing a return for
each of the check functions).


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-21 Thread vignesh C
On Thu, 21 Mar 2024 at 18:02, Peter Eisentraut  wrote:
>
> On 21.03.24 12:35, vignesh C wrote:
> > Here are a few suggestions:
> > 1) I felt displaying the server log to the console is not a good idea,
> > I prefer this to be logged. There were similar comments from
> > Kuroda-san at [1], Peter at [2]. The number of lines will increase
> > based on the log level set. If you don't want to use pg_upgrade style,
> > how about exposing the log file option and logging it to the specified
> > log file.
>
> Let's leave that for the next version.  We need to wrap things up for
> this release.

Ok, This can be done as an improvement.

> > 2) Currently for publication, replication-slot and subscription, we
> > will have to specify these options based on the number of databases.
> > Here if we have 100 databases we will have to specify these options
> > 100 times, it might not be user friendly. How about something like
> > what Tomas had proposed at [3] and  Amit proposed at [4]. It will be
> > better if the user has to just specify publication, replication slot
> > and subscription options only one time.
>
> Same.  Designing, implementing, discussing, and testing this cannot be
> done in the time remaining.

If we commit this we might not be able to change the way the option
behaves once the customers starts using it. How about removing these
options in the first version and adding it in the next version after
more discussion.

> > 3) Can we have an option similar to dry-run which will display the
> > configurations required in the primary and standby node something
> > like:
> > pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres"  -p 
> > -s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2
> > --suggest-config
> > Suggested optimal configurations in the primary:
> > --
> > wallevel = logical
> > max_replication_slots = 3
> > max_wal_senders = 3
> > ...
> > Suggested optimal configurations in the standby:
> > --
> > max_replication_slots = 3
> > max_wal_senders = 3
> > ...
>
> How would this be different from what --dry-run does now?

Currently dry-run will do the check and might fail on identifying a
few failures like after checking subscriber configurations. Then the
user will have to correct the configuration and re-run then fix the
next set of failures. Whereas the suggest-config will display all the
optimal configuration for both the primary and the standby in a single
shot. This is not a must in the first version, it can be done as a
subsequent enhancement.

Regards,
Vignesh




Re: speed up a logical replica setup

2024-03-21 Thread Peter Eisentraut

On 21.03.24 12:35, vignesh C wrote:

Here are a few suggestions:
1) I felt displaying the server log to the console is not a good idea,
I prefer this to be logged. There were similar comments from
Kuroda-san at [1], Peter at [2]. The number of lines will increase
based on the log level set. If you don't want to use pg_upgrade style,
how about exposing the log file option and logging it to the specified
log file.


Let's leave that for the next version.  We need to wrap things up for 
this release.



2) Currently for publication, replication-slot and subscription, we
will have to specify these options based on the number of databases.
Here if we have 100 databases we will have to specify these options
100 times, it might not be user friendly. How about something like
what Tomas had proposed at [3] and  Amit proposed at [4]. It will be
better if the user has to just specify publication, replication slot
and subscription options only one time.


Same.  Designing, implementing, discussing, and testing this cannot be 
done in the time remaining.



+   /* Number of object names must match number of databases */
+   if (num_pubs > 0 && num_pubs != num_dbs)
+   {
+   pg_log_error("wrong number of publication names");
+   pg_log_error_hint("Number of publication names (%d)
must match number of database names (%d).",
+ num_pubs, num_dbs);
+   exit(1);
+   }

3) Can we have an option similar to dry-run which will display the
configurations required in the primary and standby node something
like:
pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres"  -p 
-s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2
--suggest-config
Suggested optimal configurations in the primary:
--
wallevel = logical
max_replication_slots = 3
max_wal_senders = 3
...
Suggested optimal configurations in the standby:
--
max_replication_slots = 3
max_wal_senders = 3
...


How would this be different from what --dry-run does now?





Re: speed up a logical replica setup

2024-03-21 Thread vignesh C
On Thu, 21 Mar 2024 at 09:50, Euler Taveira  wrote:
>
> On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote:
>
> On 16.03.24 16:42, Euler Taveira wrote:
> > I'm attaching a new version (v30) that adds:
>
> I have some review comments and attached a patch with some smaller
> fixups (mainly message wording and avoid fixed-size string buffers).
>
>
> Thanks for your review. I'm attaching a new version (v32) that includes your
> fixups, merges the v30-0002 into the main patch [1], addresses Hayato's 
> review[2],
> your reviews [3][4], and fixes the query for set_replication_progress() [5].
>
> * doc/src/sgml/ref/pg_createsubscriber.sgml
>
> I would remove the "How It Works" section.  This is not relevant to
> users, and it is very detailed and will require updating whenever the
> implementation changes.  It could be a source code comment instead.
>
>
> It uses the same structure as pg_rewind that also describes how it works
> internally. I included a separate patch that completely removes the section.
>
> * src/bin/pg_basebackup/pg_createsubscriber.c
>
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.
>
>
> get_base_conninfo() uses PQconninfoParse to parse the connection string. I
> expect PQconnectdb to provide a suitable error message in this case. Even if 
> it
> builds keywords and values arrays, it is also susceptible to the same issue, 
> no?
>
> Most SQL commands need to be amended for proper identifier or string
> literal quoting and/or escaping.
>
>
> I completely forgot about this detail when I added the new options in v30. It 
> is
> fixed now. I also changed the tests to exercise it.
>
> In check_subscriber(): All these permissions checks seem problematic
> to me.  We shouldn't reimplement our own copy of the server's
> permission checks.  The server can check the permissions.  And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of.  Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever.  But I would rather just remove all this, it seems too
> problematic.
>
>
> The main goal of the check_* functions are to minimize error during execution.
> I removed the permission checks. The GUC checks were kept.
>
> In main(): The first check if the standby is running is problematic.
> I think it would be better to require that the standby is initially
> shut down.  Consider, the standby might be running under systemd.
> This tool will try to stop it, systemd will try to restart it.  Let's
> avoid these kinds of battles.  It's also safer if we don't try to
> touch running servers.
>
>
> That's a good point. I hadn't found an excuse to simplify this but you 
> provided
> one. :) There was a worry about ignoring some command-line options that 
> changes
> GUCs if the server was started. There was also an ugly case for dry run mode
> that has to start the server (if it was running) at the end. Both cases are no
> longer issues. The current code provides a suitable error if the target server
> is running.
>
> The -p option (--subscriber-port) doesn't seem to do anything.  In my
> testing, it always uses the compiled-in default port.
>
>
> It works for me. See this snippet from the regression tests. The port (50945) 
> is
> used by pg_ctl.
>
> # Running: pg_createsubscriber --verbose --verbose --pgdata 
> /c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata
>  --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' 
> --socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 
> --database pg2
> pg_createsubscriber: validating connection string on publisher
> .
> .
> pg_createsubscriber: pg_ctl command is: 
> "/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start 
> -D 
> "/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata"
>  -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 
> -c unix_socket_directories='/tmp/qpngb0bPKo'"
> 2024-03-20 18:15:24.517 -03 [105195] LOG:  starting PostgreSQL 17devel on 
> x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
> 2024-03-20 18:15:24.517 -03 [105195] LOG:  listening on Unix socket 
> "/tmp/qpngb0bPKo/.s.PGSQL.50945"
>
> Printing all the server log lines to the terminal doesn't seem very
> user-friendly.  Not sure what to do about that, short of keeping a
> pg_upgrade-style directory of log files.  But it's ugly.
>
>
> I removed the previous implementation that creates a new directory and stores
> the log file there. I don't like the pg_upgrade-style directory because (a) it
> stores part of the server log files in another place and (b) it is another
> 

Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
On Wed, Mar 20, 2024, at 7:16 AM, Shubham Khanna wrote:
> On Tue, Mar 19, 2024 at 8:54 PM vignesh C  wrote:
> >
> > If you are not planning to have the checks for name length, this could
> > alternatively be fixed by including database id also while querying
> > pg_subscription like below in set_replication_progress function:
> > appendPQExpBuffer(str,
> > "SELECT oid FROM pg_catalog.pg_subscription \n"
> > "WHERE subname = '%s' AND subdbid = (SELECT oid FROM
> > pg_catalog.pg_database WHERE datname = '%s')",
> > dbinfo->subname,
> > dbinfo->dbname);
> 
> The attached patch has the changes to handle the same.

I included a different query that does the same. See v32.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
On Tue, Mar 19, 2024, at 8:57 AM, Peter Eisentraut wrote:
> On 19.03.24 12:26, Shlok Kyal wrote:
> > 
> > I have added a top-up patch v30-0003. The issue in [1] still exists in
> > the v30 patch. I was not able to come up with an approach to handle it
> > in the code, so I have added it to the documentation in the warning
> > section. Thoughts?
> 
> Seems acceptable to me.  pg_createsubscriber will probably always have 
> some restrictions and unsupported edge cases like that.  We can't 
> support everything, so documenting is ok.

Shlok, I'm not sure we should add a sentence about a pilot error. I added a
comment in check_subscriber that describes this situation. I think the comment
is sufficient to understand the limitation and, if it is possible in the future,
a check might be added for it. I didn't include v31-0004.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
On Mon, Mar 18, 2024, at 2:43 AM, Hayato Kuroda (Fujitsu) wrote:
> Thanks for updating the patch. Here are my comments.
> I used Grammarly to proofread sentences.
> (The tool strongly recommends to use active voice, but I can ignore for now)

Thanks for another review. I posted a new patch (v32) that hopefully addresses
these points.

> 01.
> 
> "After a successful run, the state of the target server is analagous to a 
> fresh
> logical replication setup."
> a/analagous/analogous

Fixed.

> 02.
> 
> "The main difference between the logical replication setup and 
> pg_createsubscriber
> is the initial data copy."
> 
> Grammarly suggests:
> "The initial data copy is the main difference between the logical replication
> setup and pg_createsubscriber."

Not fixed.

> 03.
> 
> "Only the synchronization phase is done, which ensures each table is brought 
> up
> to a synchronized state."
> 
> This sentence is not very clear to me. How about:
> "pg_createsubscriber does only the synchronization phase, ensuring each 
> table's
> replication state is ready."

I avoided pg_createsubscriber at the beginning because it is already
used in the previous sentence. I kept the last part of the sentence
because it is similar to one in the logical replication [1].

> 04.
> 
> "The pg_createsubscriber targets large database systems because most of the
> execution time is spent making the initial data copy."
> 
> Hmm, but initial data sync by logical replication also spends most of time to
> make the initial data copy. IIUC bottlenecks are a) this application must stop
> and start server several times, and b) only the serial copy works. Can you
> clarify them?

Reading the sentence again, it is not clear. When I said "most of
execution time" I was referring to the actual logical replication setup.

> 05.
> 
> It is better to say the internal difference between pg_createsubscriber and 
> the
> initial sync by logical replication. For example:
> pg_createsubscriber uses a physical replication mechanism to ensure the 
> standby
> catches up until a certain point. Then, it converts to the standby to the
> subscriber by promoting and creating subscriptions.

Isn't it better to leave these details to "How It Works"?

> 06.
> 
> "If these are not met an error will be reported."
> 
> Grammarly suggests:
> "If these are not met, an error will be reported."

Fixed.

> 07.
> 
> "The given target data directory must have the same system identifier than the
> source data directory."
> 
> Grammarly suggests:
> "The given target data directory must have the same system identifier as the
> source data directory."

Fixed.

> 08.
> 
> "If a standby server is running on the target data directory or it is a base
> backup from the source data directory, system identifiers are the same."
> 
> This line is not needed if bullet-style is not used. The line is just a 
> supplement,
> not prerequisite.

Fixed.

> 09.
> 
> "The source server must accept connections from the target server. The source 
> server must not be in recovery."
> 
> Grammarly suggests:
> "The source server must accept connections from the target server and not be 
> in recovery."

Not fixed.

> 10.
> 
> "Publications cannot be created in a read-only cluster."
> 
> Same as 08, this line is not needed if bullet-style is not used.

Fixed.

> 11.
> 
> "pg_createsubscriber usually starts the target server with different 
> connection
> settings during the transformation steps. Hence, connections to target server
> might fail."
>  
> Grammarly suggests:
> "pg_createsubscriber usually starts the target server with different 
> connection
> settings during transformation. Hence, connections to the target server might 
> fail."

Fixed.

> 12.
> 
> "During the recovery process,"
> 
> Grammarly suggests:
> "During recovery,"

Not fixed. Our documentation uses "recovery process".

> 13.
> 
> "replicated so an error would occur."
> 
> Grammarly suggests:
> "replicated, so an error would occur."

I didn't find this one. Maybe you checked a previous version.

> 14.
> 
> "It would avoid situations in which WAL files from the source server might be
> used by the target server."
> 
> Grammarly suggests:
> "It would avoid situations in which the target server might use WAL files from
> the source server."

Fixed.

> 15.
> 
> "a LSN"
> 
> s/a/an

Fixed.

> 16.
> 
> "of write-ahead"
> 
> s/of/of the/

Fixed.

> 17.
> 
> "specifies promote"
> 
> We can do double-quote for the word promote.

Why? It is referring to recovery_target_action. If you check this GUC,
you will notice that it also uses literal tag.

> 18.
> 
> "are also added so it avoids"
> 
> Grammarly suggests:
> "are added to avoid"

Fixed.

> 19.
> 
> "is accepting read-write transactions"
> 
> Grammarly suggests:
> "accepts read-write transactions"

Not fixed.

> 20.
> 
> New options must be also documented as well. This helps not only users but 
> also
> reviewers.
> (Sometimes we cannot identify that the implementation is intentinal or 

Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote:
> On 16.03.24 16:42, Euler Taveira wrote:
> > I'm attaching a new version (v30) that adds:
> 
> I have some review comments and attached a patch with some smaller 
> fixups (mainly message wording and avoid fixed-size string buffers).

Thanks for your review. I'm attaching a new version (v32) that includes your
fixups, merges the v30-0002 into the main patch [1], addresses Hayato's 
review[2],
your reviews [3][4], and fixes the query for set_replication_progress() [5].

> * doc/src/sgml/ref/pg_createsubscriber.sgml
> 
> I would remove the "How It Works" section.  This is not relevant to
> users, and it is very detailed and will require updating whenever the
> implementation changes.  It could be a source code comment instead.

It uses the same structure as pg_rewind that also describes how it works
internally. I included a separate patch that completely removes the section.

> * src/bin/pg_basebackup/pg_createsubscriber.c
> 
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.

get_base_conninfo() uses PQconninfoParse to parse the connection string. I
expect PQconnectdb to provide a suitable error message in this case. Even if it
builds keywords and values arrays, it is also susceptible to the same issue, no?

> Most SQL commands need to be amended for proper identifier or string
> literal quoting and/or escaping.

I completely forgot about this detail when I added the new options in v30. It is
fixed now. I also changed the tests to exercise it.

> In check_subscriber(): All these permissions checks seem problematic
> to me.  We shouldn't reimplement our own copy of the server's
> permission checks.  The server can check the permissions.  And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of.  Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever.  But I would rather just remove all this, it seems too
> problematic.

The main goal of the check_* functions are to minimize error during execution.
I removed the permission checks. The GUC checks were kept.

> In main(): The first check if the standby is running is problematic.
> I think it would be better to require that the standby is initially
> shut down.  Consider, the standby might be running under systemd.
> This tool will try to stop it, systemd will try to restart it.  Let's
> avoid these kinds of battles.  It's also safer if we don't try to
> touch running servers.

That's a good point. I hadn't found an excuse to simplify this but you provided
one. :) There was a worry about ignoring some command-line options that changes
GUCs if the server was started. There was also an ugly case for dry run mode
that has to start the server (if it was running) at the end. Both cases are no
longer issues. The current code provides a suitable error if the target server
is running.

> The -p option (--subscriber-port) doesn't seem to do anything.  In my
> testing, it always uses the compiled-in default port.

It works for me. See this snippet from the regression tests. The port (50945) is
used by pg_ctl.

# Running: pg_createsubscriber --verbose --verbose --pgdata 
/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata
 --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' 
--socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 
--database pg2
pg_createsubscriber: validating connection string on publisher
.
.
pg_createsubscriber: pg_ctl command is: 
"/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start -D 
"/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata"
 -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 -c 
unix_socket_directories='/tmp/qpngb0bPKo'"
2024-03-20 18:15:24.517 -03 [105195] LOG:  starting PostgreSQL 17devel on 
x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-03-20 18:15:24.517 -03 [105195] LOG:  listening on Unix socket 
"/tmp/qpngb0bPKo/.s.PGSQL.50945"

> Printing all the server log lines to the terminal doesn't seem very
> user-friendly.  Not sure what to do about that, short of keeping a 
> pg_upgrade-style directory of log files.  But it's ugly.

I removed the previous implementation that creates a new directory and stores
the log file there. I don't like the pg_upgrade-style directory because (a) it
stores part of the server log files in another place and (b) it is another
directory to ignore if your tool handles the data directory (like a backup
tool). My last test said it prints 35 server log lines. I expect that the user
redirects the output to a file so 

Re: speed up a logical replica setup

2024-03-20 Thread Shlok Kyal
> Hi,
>
> > I'm attaching a new version (v30) that adds:
> >
> > * 3 new options (--publication, --subscription, --replication-slot) to 
> > assign
> >   names to the objects. The --database option used to ignore duplicate 
> > names,
> >   however, since these new options rely on the number of database options to
> >   match the number of object name options, it is forbidden from now on. The
> >   duplication is also forbidden for the object names to avoid errors 
> > earlier.
> > * rewrite the paragraph related to unusuable target server after
> >   pg_createsubscriber fails.
> > * Vignesh reported an issue [1] related to reaching the recovery stop point
> >   before the consistent state is reached. I proposed a simple patch that 
> > fixes
> >   the issue.
> >
> > [1] 
> > https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com
> >
>
> I have added a top-up patch v30-0003. The issue in [1] still exists in
> the v30 patch. I was not able to come up with an approach to handle it
> in the code, so I have added it to the documentation in the warning
> section. Thoughts?
> I am not changing the version as I have not made any changes in
> v30-0001 and v30-0002.
>
> [1]: 
> https://www.postgresql.org/message-id/cahv8rj+5mzk9jt+7ecogjzfm5czvdccd5jo1_rcx0bteypb...@mail.gmail.com

There was some whitespace error in the v30-0003 patch. Updated the patch.



Thanks and regards,
Shlok Kyal
From d6717bcaf94302c0d41eabd448802b1fae6167d9 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v31 1/3] pg_createsubscriber: creates a new logical replica
 from a standby server

It must be run on the target server and should be able to connect to the
source server (publisher) and the target server (subscriber). All tables
in the specified database(s) are included in the logical replication
setup. A pair of publication and subscription objects are created for
each database.

The main advantage of pg_createsubscriber over the common logical
replication setup is the initial data copy. It also reduces the catchup
phase.

Some prerequisites must be met to successfully run it. It is basically
the logical replication requirements. It starts creating a publication
using FOR ALL TABLES and a replication slot for each specified database.
Write recovery parameters into the target data directory and start the
target server. It specifies the LSN of the last replication slot
(replication start point) up to which the recovery will proceed. Wait
until the target server is promoted. Create one subscription per
specified database (using publication and replication slot created in a
previous step) on the target server. Set the replication progress to the
replication start point for each subscription. Enable the subscription
for each specified database on the target server. And finally, change
the system identifier on the target server.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_createsubscriber.sgml |  525 
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|   11 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 2131 +
 .../t/040_pg_createsubscriber.pl  |  326 +++
 8 files changed, 3012 insertions(+), 3 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_createsubscriber.sgml
 create mode 100644 src/bin/pg_basebackup/pg_createsubscriber.c
 create mode 100644 src/bin/pg_basebackup/t/040_pg_createsubscriber.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..f5be638867 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -205,6 +205,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
new file mode 100644
index 00..642213b5a4
--- /dev/null
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -0,0 +1,525 @@
+
+
+
+ 
+  pg_createsubscriber
+ 
+
+ 
+  pg_createsubscriber
+  1
+  Application
+ 
+
+ 
+  pg_createsubscriber
+  convert a physical replica into a new logical replica
+ 
+
+ 
+  
+   pg_createsubscriber
+   option
+   
+
+ -d
+ --database
+
+dbname
+
+ -D 
+ --pgdata
+
+datadir
+
+ -P
+ --publisher-server
+
+connstr
+   
+  
+ 
+
+ 
+  Description
+  
+pg_createsubscriber creates a new logical
+replica from a physical standby server. All tables in the specified
+database are included in the logical replication setup. A pair of
+publication and subscription objects are created for each database. It
+must be run at the target server.
+  
+
+  
+After a successful run, the state of the target server is analagous to a
+  

Re: speed up a logical replica setup

2024-03-19 Thread Peter Eisentraut

On 19.03.24 16:24, vignesh C wrote:

The problem with this failure is that standby has been promoted
already and we will have to re-create the physica replica again.

If you are not planning to have the checks for name length, this could
alternatively be fixed by including database id also while querying
pg_subscription like below in set_replication_progress function:
appendPQExpBuffer(str,
"SELECT oid FROM pg_catalog.pg_subscription \n"
"WHERE subname = '%s' AND subdbid = (SELECT oid FROM
pg_catalog.pg_database WHERE datname = '%s')",
dbinfo->subname,
dbinfo->dbname);


Yes, this is more correct anyway, because subscription names are 
per-database, not global.  So you should be able to make 
pg_createsubscriber use the same subscription name for each database.







Re: speed up a logical replica setup

2024-03-19 Thread vignesh C
On Mon, 18 Mar 2024 at 16:36, Peter Eisentraut  wrote:
>
> On 18.03.24 08:18, vignesh C wrote:
> > 1) Maximum size of the object name is 64, we can have a check so that
> > we don't specify more than the maximum allowed length:
> > + case 3:
> > + if (!simple_string_list_member(_names, optarg))
> > + {
> > + simple_string_list_append(_names, optarg);
> > + num_replslots++;
> > + }
> > + else
> > + {
> > + pg_log_error("duplicate replication slot \"%s\"", optarg);
> > + exit(1);
> > + }
> > + break;
> >
> > If we allow something like this:
> >   ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > db2 -d db3 
> > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
> > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
> > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
> > In this case creation of replication slot will fail:
> > pg_createsubscriber: error: could not create replication slot
> > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
> > database "db2": ERROR:  replication slot
> > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
> > already exists
>
> I think this is fine.  The server can check whether the names it is
> given are of the right size.  We don't need to check it again in the client.
>
> > 2) Similarly here too:
> > + case 4:
> > + if (!simple_string_list_member(_names, optarg))
> > + {
> > + simple_string_list_append(_names, optarg);
> > + num_subs++;
> > + }
> > + else
> > + {
> > + pg_log_error("duplicate subscription \"%s\"", optarg);
> > + exit(1);
> > + }
> > + break;
> >
> > If we allow something like this:
> > ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
> > user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
> > db2 -d db3 
> > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
> > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
> > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3
> >
> > Subscriptions will be created with the same name and later there will
> > be a problem when setting replication progress as there will be
> > multiple subscriptions with the same name.
>
> Could you clarify this problem?

In this case the subscriptions name specified is more than the allowed
name, the subscription name will be truncated and both the
subscription for db1 and db2 will have same name like below:
db2=# select subname, subdbid from pg_subscription;
 subname | subdbid
-+-
 testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16384
 testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes |   16385

Now we try to set the replication origin of the subscriptions to a
consistent lsn from the following:
+set_replication_progress(PGconn *conn, struct LogicalRepInfo *dbinfo,
const char *lsn)
+{
+   PQExpBuffer str = createPQExpBuffer();
+   PGresult   *res;
+   Oid suboid;
+   charoriginname[NAMEDATALEN];
+   charlsnstr[17 + 1]; /* MAXPG_LSNLEN = 17 */
+
+   Assert(conn != NULL);
+
+   appendPQExpBuffer(str,
+ "SELECT oid FROM
pg_catalog.pg_subscription "
+ "WHERE subname = '%s'",
+ dbinfo->subname);
+
+   res = PQexec(conn, str->data);
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not obtain subscription OID: %s",
+PQresultErrorMessage(res));
+   disconnect_database(conn, true);
+   }
+
+   if (PQntuples(res) != 1 && !dry_run)
+   {
+   pg_log_error("could not obtain subscription OID: got
%d rows, expected %d rows",
+PQntuples(res), 1);
+   disconnect_database(conn, true);
+   }

Since the subscription name is truncated, we will have multiple
records returned for the above query which results in failure with:
pg_createsubscriber: error: could not obtain subscription OID: got 2
rows, expected 1 rows
pg_createsubscriber: warning: pg_createsubscriber failed after the end
of recovery
pg_createsubscriber: hint: The target server cannot be used as a
physical replica anymore.
pg_createsubscriber: hint: You must recreate the physical replica
before continuing.

The problem with this failure is that standby has been promoted
already and we will have to re-create the physica replica again.

If you are not planning to have the checks for name length, this could
alternatively be fixed by including database id also while querying

Re: speed up a logical replica setup

2024-03-19 Thread Peter Eisentraut

On 19.03.24 12:26, Shlok Kyal wrote:

I'm attaching a new version (v30) that adds:

* 3 new options (--publication, --subscription, --replication-slot) to assign
   names to the objects. The --database option used to ignore duplicate names,
   however, since these new options rely on the number of database options to
   match the number of object name options, it is forbidden from now on. The
   duplication is also forbidden for the object names to avoid errors earlier.
* rewrite the paragraph related to unusuable target server after
   pg_createsubscriber fails.
* Vignesh reported an issue [1] related to reaching the recovery stop point
   before the consistent state is reached. I proposed a simple patch that fixes
   the issue.

[1] 
https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com



I have added a top-up patch v30-0003. The issue in [1] still exists in
the v30 patch. I was not able to come up with an approach to handle it
in the code, so I have added it to the documentation in the warning
section. Thoughts?


Seems acceptable to me.  pg_createsubscriber will probably always have 
some restrictions and unsupported edge cases like that.  We can't 
support everything, so documenting is ok.






Re: speed up a logical replica setup

2024-03-19 Thread Peter Eisentraut

On 19.03.24 08:05, Amit Kapila wrote:

On Mon, Mar 18, 2024 at 7:22 PM Peter Eisentraut  wrote:


In check_subscriber(): All these permissions checks seem problematic
to me.  We shouldn't reimplement our own copy of the server's
permission checks.  The server can check the permissions.  And if the
permission checking in the server ever changes, then we have
inconsistencies to take care of.  Also, the error messages "permission
denied" are inappropriate, because we are not doing the actual thing.
Maybe we want to do a dry-run for the benefit of the user, but then we
should do the actual thing, like try to create a replication slot, or
whatever.  But I would rather just remove all this, it seems too
problematic.



If we remove all the checks then there is a possibility that we can
fail later while creating the actual subscription. For example, if
there are not sufficient max_replication_slots, then it is bound to
fail in the later steps which would be a costlier affair because by
that time the standby would have been promoted and the user won't have
any way to move forward but to re-create standby and then use this
tool again. I think here the patch tries to mimic pg_upgrade style
checks where we do some pre-checks.


I think checking for required parameter settings is fine.  My concern is 
with the code before that, that does 
pg_has_role/has_database_privilege/has_function_privilege.






Re: speed up a logical replica setup

2024-03-19 Thread Shlok Kyal
Hi,

> I'm attaching a new version (v30) that adds:
>
> * 3 new options (--publication, --subscription, --replication-slot) to assign
>   names to the objects. The --database option used to ignore duplicate names,
>   however, since these new options rely on the number of database options to
>   match the number of object name options, it is forbidden from now on. The
>   duplication is also forbidden for the object names to avoid errors earlier.
> * rewrite the paragraph related to unusuable target server after
>   pg_createsubscriber fails.
> * Vignesh reported an issue [1] related to reaching the recovery stop point
>   before the consistent state is reached. I proposed a simple patch that fixes
>   the issue.
>
> [1] 
> https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com
>

I have added a top-up patch v30-0003. The issue in [1] still exists in
the v30 patch. I was not able to come up with an approach to handle it
in the code, so I have added it to the documentation in the warning
section. Thoughts?
I am not changing the version as I have not made any changes in
v30-0001 and v30-0002.

[1]: 
https://www.postgresql.org/message-id/cahv8rj+5mzk9jt+7ecogjzfm5czvdccd5jo1_rcx0bteypb...@mail.gmail.com



Thanks and regards,
Shlok Kyal


v30-0001-pg_createsubscriber-creates-a-new-logical-replic.patch
Description: Binary data


v30-0002-Stop-the-target-server-earlier.patch
Description: Binary data


v30-0003-Document-a-limitation-of-pg_createsubscriber.patch
Description: Binary data


Re: speed up a logical replica setup

2024-03-19 Thread Amit Kapila
On Mon, Mar 18, 2024 at 7:22 PM Peter Eisentraut  wrote:
>
> In check_subscriber(): All these permissions checks seem problematic
> to me.  We shouldn't reimplement our own copy of the server's
> permission checks.  The server can check the permissions.  And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of.  Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever.  But I would rather just remove all this, it seems too
> problematic.
>

If we remove all the checks then there is a possibility that we can
fail later while creating the actual subscription. For example, if
there are not sufficient max_replication_slots, then it is bound to
fail in the later steps which would be a costlier affair because by
that time the standby would have been promoted and the user won't have
any way to move forward but to re-create standby and then use this
tool again. I think here the patch tries to mimic pg_upgrade style
checks where we do some pre-checks.

This raises a question in my mind how are we expecting users to know
all these required values and configure it properly before using this
tool?  IIUC, we are expecting that the user should figure out the
appropriate values for max_replication_slots,
max_logical_replication_workers, etc. by querying the number of
databases in the primary. Then either stop the standby to change these
parameters or use ALTER SYSTEM depending on the required parameters.
Similarly, there are some config requirements (like max_wal_senders,
max_replication_slots) for the primary which would be difficult for
users to know as these are tools are internal requirements.

The two possibilities that come to my mind are (a) pg_createsubscriber
should have some special mode/option using which the user can find out
all the required config settings, or (b) document how a user can find
the required settings. There are good chances of mistakes with option
(b).

-- 
With Regards,
Amit Kapila.




RE: speed up a logical replica setup

2024-03-18 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for giving comments. I want to reply some of them.

> > 17.
> >
> > "specifies promote"
> >
> > We can do double-quote for the word promote.
> 
> The v30 patch has promote, which I think is adequate.

Opps. Actually I did look v29 patch while firstly reviewing. Sorry for noise.

> 
> > 20.
> >
> > New options must be also documented as well. This helps not only users but
> also
> > reviewers.
> > (Sometimes we cannot identify that the implementation is intentinal or not.)
> 
> Which ones are missing?

In v29, newly added options (publication/subscription/replication-slot) was not 
added.
Since they have been added, please ignore.

> > 21.
> >
> > Also, not sure the specification is good. I preferred to specify them by 
> > format
> > string. Because it can reduce the number of arguments and I cannot find use
> cases
> > which user want to control the name of objects.
> >
> > However, your approach has a benefit which users can easily identify the
> generated
> > objects by pg_createsubscriber. How do other think?
> 
> I think listing them explicitly is better for the first version.  It's
> simpler to implement and more flexible.

OK.

> > 22.
> >
> > ```
> > #define BASE_OUTPUT_DIR
>   "pg_createsubscriber_output.d"
> > ```
> >
> > No one refers the define.
> 
> This is gone in v30.

I wrote due to the above reason. Please ignore...

> 
> > 31.
> >
> > ```
> > /* Create replication slot on publisher */
> > if (lsn)
> > pg_free(lsn);
> > ```
> >
> > I think allocating/freeing memory is not so efficient.
> > Can we add a flag to create_logical_replication_slot() for controlling the
> > returning value (NULL or duplicated string)? We can use the condition (i ==
> num_dbs-1)
> > as flag.
> 
> Nothing is even using the return value of
> create_logical_replication_slot().  I think this can be removed altogether.

> > 37.
> >
> > ```
> > /* Register a function to clean up objects in case of failure */
> > atexit(cleanup_objects_atexit);
> > ```
> >
> > Sorry if we have already discussed. I think the registration can be moved 
> > just
> > before the boot of the standby. Before that, the callback will be no-op.
> 
> But it can also stay where it is.  What is the advantage of moving it later?

I thought we could reduce the risk of bugs. Previously some bugs were reported
because the registration is too early. However, this is not a strong opinion.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: speed up a logical replica setup

2024-03-18 Thread Alvaro Herrera
On 2024-Mar-18, Peter Eisentraut wrote:

> * src/bin/pg_basebackup/pg_createsubscriber.c
> 
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.

Maybe it would be easier to deal with this by passing around a struct
with keyword/value pairs that can be given to PQconnectdbParams (and
keeping dbname as a param that's passed separately, so that it can be
added when one is needed), instead of messing with the string conninfos;
then you don't have to worry about quoting.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."   (New York Times, about Microsoft PowerPoint)




Re: speed up a logical replica setup

2024-03-18 Thread Peter Eisentraut

On 16.03.24 16:42, Euler Taveira wrote:

I'm attaching a new version (v30) that adds:


I have some review comments and attached a patch with some smaller 
fixups (mainly message wording and avoid fixed-size string buffers).


* doc/src/sgml/ref/pg_createsubscriber.sgml

I would remove the "How It Works" section.  This is not relevant to
users, and it is very detailed and will require updating whenever the
implementation changes.  It could be a source code comment instead.

* src/bin/pg_basebackup/pg_createsubscriber.c

I think the connection string handling is not robust against funny
characters, like spaces, in database names etc.

Most SQL commands need to be amended for proper identifier or string
literal quoting and/or escaping.

In check_subscriber(): All these permissions checks seem problematic
to me.  We shouldn't reimplement our own copy of the server's
permission checks.  The server can check the permissions.  And if the
permission checking in the server ever changes, then we have
inconsistencies to take care of.  Also, the error messages "permission
denied" are inappropriate, because we are not doing the actual thing.
Maybe we want to do a dry-run for the benefit of the user, but then we
should do the actual thing, like try to create a replication slot, or
whatever.  But I would rather just remove all this, it seems too
problematic.

In main(): The first check if the standby is running is problematic.
I think it would be better to require that the standby is initially
shut down.  Consider, the standby might be running under systemd.
This tool will try to stop it, systemd will try to restart it.  Let's
avoid these kinds of battles.  It's also safer if we don't try to
touch running servers.

The -p option (--subscriber-port) doesn't seem to do anything.  In my
testing, it always uses the compiled-in default port.

Printing all the server log lines to the terminal doesn't seem very
user-friendly.  Not sure what to do about that, short of keeping a 
pg_upgrade-style directory of log files.  But it's ugly.


From ec8e6ed6c3325a6f9fde2d1632346e212ade9c9f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 18 Mar 2024 14:48:55 +0100
Subject: [PATCH] Various improvements

---
 src/bin/pg_basebackup/Makefile  |  2 +-
 src/bin/pg_basebackup/nls.mk|  1 +
 src/bin/pg_basebackup/pg_createsubscriber.c | 49 +
 3 files changed, 23 insertions(+), 29 deletions(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index e9a920dbcda..26c53e473f5 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -49,7 +49,7 @@ all: pg_basebackup pg_createsubscriber pg_receivewal 
pg_recvlogical
 pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport 
submake-libpgfeutils
$(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o 
$@$(X)
 
-pg_createsubscriber: $(WIN32RES) pg_createsubscriber.o | submake-libpq 
submake-libpgport submake-libpgfeutils
+pg_createsubscriber: pg_createsubscriber.o $(WIN32RES) | submake-libpq 
submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport 
submake-libpgfeutils
diff --git a/src/bin/pg_basebackup/nls.mk b/src/bin/pg_basebackup/nls.mk
index fc475003e8e..7870cea71ce 100644
--- a/src/bin/pg_basebackup/nls.mk
+++ b/src/bin/pg_basebackup/nls.mk
@@ -8,6 +8,7 @@ GETTEXT_FILES= $(FRONTEND_COMMON_GETTEXT_FILES) \
bbstreamer_tar.c \
bbstreamer_zstd.c \
pg_basebackup.c \
+   pg_createsubscriber.c \
pg_receivewal.c \
pg_recvlogical.c \
receivelog.c \
diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c 
b/src/bin/pg_basebackup/pg_createsubscriber.c
index e24ed7ef506..91c3a2f0036 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -43,7 +43,7 @@ struct CreateSubscriberOptions
SimpleStringList sub_names; /* list of subscription names */
SimpleStringList replslot_names;/* list of replication slot 
names */
int recovery_timeout;   /* stop recovery after 
this time */
-}  CreateSubscriberOptions;
+};
 
 struct LogicalRepInfo
 {
@@ -57,7 +57,7 @@ struct LogicalRepInfo
 
boolmade_replslot;  /* replication slot was created */
boolmade_publication;   /* publication was created */
-}  LogicalRepInfo;
+};
 
 static void cleanup_objects_atexit(void);
 static void usage();
@@ -155,9 +155,9 @@ cleanup_objects_atexit(void)
 */
if (recovery_ended)
{
-   pg_log_warning("pg_createsubscriber failed after the end of 
recovery");
-   pg_log_warning_hint("The target server 

Re: speed up a logical replica setup

2024-03-18 Thread Peter Eisentraut

On 18.03.24 06:43, Hayato Kuroda (Fujitsu) wrote:

02.

"The main difference between the logical replication setup and 
pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."


I think that change is worse.


09.

"The source server must accept connections from the target server. The source server 
must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in 
recovery."


I think the previous version is better.


17.

"specifies promote"

We can do double-quote for the word promote.


The v30 patch has promote, which I think is adequate.


19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"


I like the first one better.


20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)


Which ones are missing?


21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use 
cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the 
generated
objects by pg_createsubscriber. How do other think?


I think listing them explicitly is better for the first version.  It's 
simpler to implement and more flexible.



22.

```
#define BASE_OUTPUT_DIR "pg_createsubscriber_output.d"
```

No one refers the define.


This is gone in v30.


23.
  
```

}   CreateSubscriberOptions;
...
}   LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.


Yeah, this is actually wrong, because as it is written now, it defines 
global variables.



22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?


Yes, more could be done here.  I have attached a patch for this.  (This 
also requires the just-committed 48018f1d8c.)



24.

```
/* standby / subscriber data directory */
static char *subscriber_dir = NULL;
```

It is bit strange that only subscriber_dir is a global variable. Caller requires
the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
main. So, how about makeing `CreateSubscriberOptions opt` to global one?


Fewer global variables seem preferable.  Only make global as needed.


30.

```
if (num_replslots == 0)
dbinfo[i].replslotname = pg_strdup(genname);
```

I think the straightforward way is to use the name of subscription if no name
is specified. This follows the rule for CREATE SUBSCRIPTION.


right


31.

```
/* Create replication slot on publisher */
if (lsn)
pg_free(lsn);
```

I think allocating/freeing memory is not so efficient.
Can we add a flag to create_logical_replication_slot() for controlling the
returning value (NULL or duplicated string)? We can use the condition (i == 
num_dbs-1)
as flag.


Nothing is even using the return value of 
create_logical_replication_slot().  I think this can be removed altogether.



34.

```
{"config-file", required_argument, NULL, 1},
{"publication", required_argument, NULL, 2},
{"replication-slot", required_argument, NULL, 3},
{"subscription", required_argument, NULL, 4},
```

The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
options which do not have short notation are listed behind.

35.

```
opt.sub_port = palloc(16);
```

Per other lines, pg_alloc() should be used.


Even better psprintf().


37.

```
/* Register a function to clean up objects in case of failure */
atexit(cleanup_objects_atexit);
```

Sorry if we have already discussed. I think the registration can be moved just
before the boot of the standby. Before that, the callback will be no-op.


But it can also stay where it is.  What is the advantage of moving it later?


40.

```
 * XXX this code was extracted from BootStrapXLOG().
```

So, can we extract the common part to somewhere? Since system identifier is 
related
with the controldata file, I think it can be located in controldata_util.c.


Let's leave it as is for this PG release.
From d951a9f186b2162aa241f7554908b236c718154f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 18 Mar 2024 11:54:03 +0100
Subject: [PATCH] fixup! pg_createsubscriber: creates a new logical replica
 from a standby server

Add const decorations.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 54 ++---
 1 file changed, 27 insertions(+), 27 

Re: speed up a logical replica setup

2024-03-18 Thread Peter Eisentraut

On 18.03.24 08:18, vignesh C wrote:

1) Maximum size of the object name is 64, we can have a check so that
we don't specify more than the maximum allowed length:
+ case 3:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_replslots++;
+ }
+ else
+ {
+ pg_log_error("duplicate replication slot \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
  ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
In this case creation of replication slot will fail:
pg_createsubscriber: error: could not create replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
database "db2": ERROR:  replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
already exists


I think this is fine.  The server can check whether the names it is 
given are of the right size.  We don't need to check it again in the client.



2) Similarly here too:
+ case 4:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_subs++;
+ }
+ else
+ {
+ pg_log_error("duplicate subscription \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3

Subscriptions will be created with the same name and later there will
be a problem when setting replication progress as there will be
multiple subscriptions with the same name.


Could you clarify this problem?





Re: speed up a logical replica setup

2024-03-18 Thread vignesh C
On Sat, 16 Mar 2024 at 21:13, Euler Taveira  wrote:
>
> On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
>
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.
>
>
> Yes. I was wondering to implement after v1 is pushed. I started to write a 
> code
> for it but I wasn't sure about the UI. The best approach I came up with was
> multiple options in the same order. (I don't provide short options to avoid
> possible portability issues with the order.) It means if I have 3 databases 
> and
> the following command-line:
>
> pg_createsubscriber ... --database pg1 --database pg2 --database3 
> --publication
> pubx --publication puby --publication pubz
>
> pubx, puby and pubz are created in the database pg1, pg2, and pg3 
> respectively.
>
> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?
>
>
> If it happens after creating a publication and before promotion, the cleanup
> routine will drop the publications on primary and it will eventually be 
> applied
> to the standby via replication later.
>
> I see the below note in the patch:
> +If pg_createsubscriber fails while processing,
> +then the data directory is likely not in a state that can be recovered. 
> It
> +is true if the target server was promoted. In such a case, creating a new
> +standby server is recommended.
>
> By reading this it is not completely clear whether the standby is not
> recoverable in case of any error or only an error after the target
> server is promoted. If others agree with this behavior then we should
> write the detailed reason for this somewhere in the comments as well
> unless it is already explained.
>
>
> I rewrote the sentence to make it clear that only if the server is promoted,
> the target server will be in a state that cannot be reused. It provides a
> message saying it too.
>
> pg_createsubscriber: target server reached the consistent state
> pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you
> must recreate the physical replica before continuing.
>
> I'm attaching a new version (v30) that adds:
>
> * 3 new options (--publication, --subscription, --replication-slot) to assign
>   names to the objects. The --database option used to ignore duplicate names,
>   however, since these new options rely on the number of database options to
>   match the number of object name options, it is forbidden from now on. The
>   duplication is also forbidden for the object names to avoid errors earlier.
> * rewrite the paragraph related to unusuable target server after
>   pg_createsubscriber fails.

1) Maximum size of the object name is 64, we can have a check so that
we don't specify more than the maximum allowed length:
+ case 3:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_replslots++;
+ }
+ else
+ {
+ pg_log_error("duplicate replication slot \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
 ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
In this case creation of replication slot will fail:
pg_createsubscriber: error: could not create replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
database "db2": ERROR:  replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
already exists

2) Similarly here too:
+ case 4:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_subs++;
+ }
+ else
+ {
+ pg_log_error("duplicate subscription \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2

RE: speed up a logical replica setup

2024-03-17 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

Thanks for updating the patch. Here are my comments.
I used Grammarly to proofread sentences.
(The tool strongly recommends to use active voice, but I can ignore for now)

01.

"After a successful run, the state of the target server is analagous to a fresh
logical replication setup."
a/analagous/analogous

02.

"The main difference between the logical replication setup and 
pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."

03.

"Only the synchronization phase is done, which ensures each table is brought up
to a synchronized state."

This sentence is not very clear to me. How about:
"pg_createsubscriber does only the synchronization phase, ensuring each table's
replication state is ready."

04.

"The pg_createsubscriber targets large database systems because most of the
execution time is spent making the initial data copy."

Hmm, but initial data sync by logical replication also spends most of time to
make the initial data copy. IIUC bottlenecks are a) this application must stop
and start server several times, and b) only the serial copy works. Can you
clarify them?

05.

It is better to say the internal difference between pg_createsubscriber and the
initial sync by logical replication. For example:
pg_createsubscriber uses a physical replication mechanism to ensure the standby
catches up until a certain point. Then, it converts to the standby to the
subscriber by promoting and creating subscriptions.

06.

"If these are not met an error will be reported."

Grammarly suggests:
"If these are not met, an error will be reported."

07.

"The given target data directory must have the same system identifier than the
source data directory."

Grammarly suggests:
"The given target data directory must have the same system identifier as the
source data directory."

08.

"If a standby server is running on the target data directory or it is a base
backup from the source data directory, system identifiers are the same."

This line is not needed if bullet-style is not used. The line is just a 
supplement,
not prerequisite.

09.

"The source server must accept connections from the target server. The source 
server must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in 
recovery."

10.

"Publications cannot be created in a read-only cluster."

Same as 08, this line is not needed if bullet-style is not used.

11.

"pg_createsubscriber usually starts the target server with different connection
settings during the transformation steps. Hence, connections to target server
might fail."
 
Grammarly suggests:
"pg_createsubscriber usually starts the target server with different connection
settings during transformation. Hence, connections to the target server might 
fail."

12.

"During the recovery process,"

Grammarly suggests:
"During recovery,"

13.

"replicated so an error would occur."

Grammarly suggests:
"replicated, so an error would occur."

14.

"It would avoid situations in which WAL files from the source server might be
used by the target server."

Grammarly suggests:
"It would avoid situations in which the target server might use WAL files from
the source server."

15.

"a LSN"

s/a/an

16.

"of write-ahead"

s/of/of the/

17.

"specifies promote"

We can do double-quote for the word promote.

18.

"are also added so it avoids"

Grammarly suggests:
"are added to avoid"

19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"

20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)

21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use 
cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the 
generated
objects by pg_createsubscriber. How do other think?

22.

```
#define BASE_OUTPUT_DIR "pg_createsubscriber_output.d"
```

No one refers the define.

23.
 
```
}   CreateSubscriberOptions;
...
}   LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.

22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?

23.

```
static int  num_dbs = 0;
static int  num_pubs = 0;
static int  num_subs = 0;
static int  num_replslots = 0;
```

I think the name is bit confusing. The number of generating 
publications/subscriptions/replication slots
are always same as the number of databases. They just indicate the 

Re: speed up a logical replica setup

2024-03-17 Thread Amit Kapila
On Mon, Mar 18, 2024 at 9:37 AM Amit Kapila  wrote:
>
> On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira  wrote:
> >
> > On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
> >
> > Did you consider adding options for publication/subscription/slot
> > names as mentioned in my previous email? As discussed in a few emails
> > above, it would be quite confusing for users to identify the logical
> > replication objects once the standby is converted to subscriber.
> >
> >
> > Yes. I was wondering to implement after v1 is pushed. I started to write a 
> > code
> > for it but I wasn't sure about the UI. The best approach I came up with was
> > multiple options in the same order. (I don't provide short options to avoid
> > possible portability issues with the order.) It means if I have 3 databases 
> > and
> > the following command-line:
> >
> > pg_createsubscriber ... --database pg1 --database pg2 --database3 
> > --publication
> > pubx --publication puby --publication pubz
> >
> > pubx, puby and pubz are created in the database pg1, pg2, and pg3 
> > respectively.
> >
>
> With this syntax, you want to give the user the option to specify
> publication/subscription/slot name for each database? If so, won't it
> be too much verbose?
>
> > It seems we care only for publications created on the primary. Isn't
> > it possible that some of the publications have been replicated to
> > standby by that time, for example, in case failure happens after
> > creating a few publications? IIUC, we don't care for standby cleanup
> > after failure because it can't be used for streaming replication
> > anymore. So, the only choice the user has is to recreate the standby
> > and start the pg_createsubscriber again. This sounds questionable to
> > me as to whether users would like this behavior. Does anyone else have
> > an opinion on this point?
> >
> >
> > If it happens after creating a publication and before promotion, the cleanup
> > routine will drop the publications on primary and it will eventually be 
> > applied
> > to the standby via replication later.
> >
>
> Do you mean to say that the next time if user uses
> pg_createsubscriber, it will be ensured that all the prior WAL will be
> replicated? I think we need to ensure that after the restart of this
> tool and before it attempts to create the publications again, the WAL
> of the previous drop has to be replayed.
>

On further thinking, the WAL for such dropped publications should get
replayed eventually before the WAL for new publications (the
publications which will be created after restart) unless the required
WAL is removed on primary due to some reason.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-17 Thread Amit Kapila
On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira  wrote:
>
> On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
>
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.
>
>
> Yes. I was wondering to implement after v1 is pushed. I started to write a 
> code
> for it but I wasn't sure about the UI. The best approach I came up with was
> multiple options in the same order. (I don't provide short options to avoid
> possible portability issues with the order.) It means if I have 3 databases 
> and
> the following command-line:
>
> pg_createsubscriber ... --database pg1 --database pg2 --database3 
> --publication
> pubx --publication puby --publication pubz
>
> pubx, puby and pubz are created in the database pg1, pg2, and pg3 
> respectively.
>

With this syntax, you want to give the user the option to specify
publication/subscription/slot name for each database? If so, won't it
be too much verbose?

> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?
>
>
> If it happens after creating a publication and before promotion, the cleanup
> routine will drop the publications on primary and it will eventually be 
> applied
> to the standby via replication later.
>

Do you mean to say that the next time if user uses
pg_createsubscriber, it will be ensured that all the prior WAL will be
replicated? I think we need to ensure that after the restart of this
tool and before it attempts to create the publications again, the WAL
of the previous drop has to be replayed.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-16 Thread vignesh C
On Sat, 16 Mar 2024 at 21:16, Euler Taveira  wrote:
>
> On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote:
>
> I was able to reproduce this random failure and found the following reason:
> The Minimum recovery ending location 0/500 was more than the
> recovery_target_lsn specified is "0/4001198". In few random cases the
> standby applies a few more WAL records after the replication slot is
> created; this leads to minimum recovery ending location being greater
> than the recovery_target_lsn because of which the server will fail
> with:
> FATAL:  requested recovery stop point is before consistent recovery point
>
>
> Thanks for checking. I proposed an alternative patch for it [1]. Can you check
> it?

This approach looks good to me.

Regards,
Vignesh




Re: speed up a logical replica setup

2024-03-16 Thread Euler Taveira
On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote:
> I was able to reproduce this random failure and found the following reason:
> The Minimum recovery ending location 0/500 was more than the
> recovery_target_lsn specified is "0/4001198". In few random cases the
> standby applies a few more WAL records after the replication slot is
> created; this leads to minimum recovery ending location being greater
> than the recovery_target_lsn because of which the server will fail
> with:
> FATAL:  requested recovery stop point is before consistent recovery point

Thanks for checking. I proposed an alternative patch for it [1]. Can you check
it?

[1] 
https://www.postgresql.org/message-id/34637e7f-0330-420d-8f45-1d022962d2fe%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-16 Thread Euler Taveira
On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.

Yes. I was wondering to implement after v1 is pushed. I started to write a code
for it but I wasn't sure about the UI. The best approach I came up with was
multiple options in the same order. (I don't provide short options to avoid
possible portability issues with the order.) It means if I have 3 databases and
the following command-line:

pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
pubx --publication puby --publication pubz

pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.

> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?

If it happens after creating a publication and before promotion, the cleanup
routine will drop the publications on primary and it will eventually be applied
to the standby via replication later.

> I see the below note in the patch:
> +If pg_createsubscriber fails while processing,
> +then the data directory is likely not in a state that can be recovered. 
> It
> +is true if the target server was promoted. In such a case, creating a new
> +standby server is recommended.
> 
> By reading this it is not completely clear whether the standby is not
> recoverable in case of any error or only an error after the target
> server is promoted. If others agree with this behavior then we should
> write the detailed reason for this somewhere in the comments as well
> unless it is already explained.

I rewrote the sentence to make it clear that only if the server is promoted,
the target server will be in a state that cannot be reused. It provides a
message saying it too.

pg_createsubscriber: target server reached the consistent state
pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you
must recreate the physical replica before continuing.

I'm attaching a new version (v30) that adds:

* 3 new options (--publication, --subscription, --replication-slot) to assign
  names to the objects. The --database option used to ignore duplicate names,
  however, since these new options rely on the number of database options to
  match the number of object name options, it is forbidden from now on. The
  duplication is also forbidden for the object names to avoid errors earlier.
* rewrite the paragraph related to unusuable target server after
  pg_createsubscriber fails.
* Vignesh reported an issue [1] related to reaching the recovery stop point
  before the consistent state is reached. I proposed a simple patch that fixes
  the issue.

[1] 
https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v30-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


v30-0002-Stop-the-target-server-earlier.patch.gz
Description: application/gzip


Re: speed up a logical replica setup

2024-03-16 Thread vignesh C
On Mon, 11 Mar 2024 at 10:33, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> Thanks for updating the patch, but cfbot still got angry [1].
> Note that two containers (autoconf and meson) failed at different place,
> so I think it is intentional one. It seems that there may be a bug related 
> with 32-bit build.
>
> We should see and fix as soon as possible.

I was able to reproduce this random failure and found the following reason:
The Minimum recovery ending location 0/500 was more than the
recovery_target_lsn specified is "0/4001198". In few random cases the
standby applies a few more WAL records after the replication slot is
created; this leads to minimum recovery ending location being greater
than the recovery_target_lsn because of which the server will fail
with:
FATAL:  requested recovery stop point is before consistent recovery point

I have fixed it by pausing the replay in the standby server before the
replication slots get created.
The attached v29-0002-Keep-standby-server-s-minimum-recovery-point-les.patch
patch has the changes for the same.
Thoughts?

Regards,
Vignesh
From e98e1ba0661e3d658de8d46ff9082f6cd4040b41 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sat, 16 Mar 2024 18:47:35 +0530
Subject: [PATCH v29 2/2] Keep standby server's minimum recovery point less
 than the consistent_lsn.

Standby server should not get ahead of the replication slot's lsn. If
this happens we will not be able to promote the standby server as it
will not be able to reach the consistent point because the standby
server's Minimum recovery ending location will be greater than the
consistent_lsn. Fixed this by pausing the replay before the replication
slots are created.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 93 -
 1 file changed, 92 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index c565f8524a..18e8757403 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -95,6 +95,8 @@ static void drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo);
 static void create_subscription(PGconn *conn, struct LogicalRepInfo *dbinfo);
 static void set_replication_progress(PGconn *conn, struct LogicalRepInfo *dbinfo,
 	 const char *lsn);
+static void pause_replay_stantby_server(struct LogicalRepInfo *dbinfo,
+		struct CreateSubscriberOptions *opt);
 static void enable_subscription(PGconn *conn, struct LogicalRepInfo *dbinfo);
 
 #define	USEC_PER_SEC	100
@@ -917,7 +919,9 @@ check_subscriber(struct LogicalRepInfo *dbinfo)
 	appendPQExpBuffer(str,
 	  "SELECT pg_catalog.pg_has_role(current_user, %u, 'MEMBER'), "
 	  "pg_catalog.has_database_privilege(current_user, '%s', 'CREATE'), "
-	  "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_replication_origin_advance(text, pg_lsn)', 'EXECUTE')",
+	  "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_replication_origin_advance(text, pg_lsn)', 'EXECUTE'), "
+	  "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_wal_replay_pause()', 'EXECUTE'), "
+	  "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_get_wal_replay_pause_state()', 'EXECUTE')",
 	  ROLE_PG_CREATE_SUBSCRIPTION, dbinfo[0].dbname);
 
 	pg_log_debug("command is: %s", str->data);
@@ -949,6 +953,18 @@ check_subscriber(struct LogicalRepInfo *dbinfo)
 	 "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
 		failed = true;
 	}
+	if (strcmp(PQgetvalue(res, 0, 3), "t") != 0)
+	{
+		pg_log_error("permission denied for function \"%s\"",
+	 "pg_catalog.pg_wal_replay_pause()");
+		failed = true;
+	}
+	if (strcmp(PQgetvalue(res, 0, 4), "t") != 0)
+	{
+		pg_log_error("permission denied for function \"%s\"",
+	 "pg_catalog.pg_get_wal_replay_pause_state()");
+		failed = true;
+	}
 
 	destroyPQExpBuffer(str);
 	PQclear(res);
@@ -1026,6 +1042,72 @@ check_subscriber(struct LogicalRepInfo *dbinfo)
 		exit(1);
 }
 
+/*
+ * Pause replaying at the standby server.
+ */
+static void
+pause_replay_stantby_server(struct LogicalRepInfo *dbinfo,
+			struct CreateSubscriberOptions *opt)
+{
+	PGconn	   *conn;
+	PGresult   *res;
+
+	/* Connect to subscriber. */
+	conn = connect_database(dbinfo[0].subconninfo, true);
+
+	pg_log_info("Pausing the replay in standby server");
+	pg_log_debug("command is: SELECT pg_catalog.pg_wal_replay_pause()");
+
+	if (!dry_run)
+	{
+		int			timer = 0;
+
+		res = PQexec(conn, "SELECT pg_catalog.pg_wal_replay_pause()");
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		{
+			pg_log_error("could not pause replay in standby server: %s",
+		 PQresultErrorMessage(res));
+			disconnect_database(conn, true);
+		}
+		PQclear(res);
+
+		/* Wait till replay has paused */
+		for(;;)
+		{
+			pg_log_debug("command is: SELECT pg_catalog.pg_get_wal_replay_pause_state()");
+			res = PQexec(conn, "SELECT 

Re: speed up a logical replica setup

2024-03-15 Thread Amit Kapila
On Fri, Mar 15, 2024 at 9:23 AM Euler Taveira  wrote:
>

Did you consider adding options for publication/subscription/slot
names as mentioned in my previous email? As discussed in a few emails
above, it would be quite confusing for users to identify the logical
replication objects once the standby is converted to subscriber.

*
+static void
+cleanup_objects_atexit(void)
{
...
conn = connect_database(dbinfo[i].pubconninfo, false);
+ if (conn != NULL)
+ {
+ if (dbinfo[i].made_publication)
+ drop_publication(conn, [i]);
+ if (dbinfo[i].made_replslot)
+ drop_replication_slot(conn, [i], dbinfo[i].subname);
+ disconnect_database(conn, false);
+ }
+ else
+ {
+ /*
+ * If a connection could not be established, inform the user
+ * that some objects were left on primary and should be
+ * removed before trying again.
+ */
+ if (dbinfo[i].made_publication)
+ {
+ pg_log_warning("There might be a publication \"%s\" in database
\"%s\" on primary",
+dbinfo[i].pubname, dbinfo[i].dbname);
+ pg_log_warning_hint("Consider dropping this publication before
trying again.");
+ }

It seems we care only for publications created on the primary. Isn't
it possible that some of the publications have been replicated to
standby by that time, for example, in case failure happens after
creating a few publications? IIUC, we don't care for standby cleanup
after failure because it can't be used for streaming replication
anymore. So, the only choice the user has is to recreate the standby
and start the pg_createsubscriber again. This sounds questionable to
me as to whether users would like this behavior. Does anyone else have
an opinion on this point?

I see the below note in the patch:
+If pg_createsubscriber fails while processing,
+then the data directory is likely not in a state that can be recovered. It
+is true if the target server was promoted. In such a case, creating a new
+standby server is recommended.

By reading this it is not completely clear whether the standby is not
recoverable in case of any error or only an error after the target
server is promoted. If others agree with this behavior then we should
write the detailed reason for this somewhere in the comments as well
unless it is already explained.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-14 Thread Euler Taveira
On Wed, Mar 13, 2024, at 10:09 AM, Shlok Kyal wrote:
> Added a top-up patch v28-0005 to fix this issue.
> I am not changing the version as v28-0001 to v28-0004 is the same as above.

Thanks for your review!

I'm posting a new patch (v29) that merges the previous patches (v28-0002 and
v28-0003). I applied the fix provided by Hayato [1]. It was an oversight during
a rebase. I also included the patch proposed by Shlok [2] that stops the target
server on error if it is running.

Tomas suggested in [3] that maybe the PID should be replaced with something
else that has more entropy. Instead of PID, it uses a random number for
replication slot and subscription. There is also a concern about converting
multiple standbys that will have the same publication name. It added the same
random number to the publication name so it doesn't fail because the
publication already exists. Documentation was changed based on Tomas feedback.

The user name was always included in the subscriber connection string. Let's
have the libpq to choose it. While on it, a new routine (get_sub_conninfo)
contains the code to build the subscriber connection string.

As I said in [4], there wasn't a way to inform a different configuration file.
If your cluster has a postgresql.conf outside PGDATA, when pg_createsubscriber
starts the server it will fail. The new --config-file option let you inform the
postgresql.conf location and the server is started just fine.

I also did some changes in the start_standby_server routine. I replaced the
strcat and snprintf with appendPQExpBuffer that has been used to store the
pg_ctl command.


[1] 
https://www.postgresql.org/message-id/TYCPR01MB12077FD21BB186C5A685C0BF3F52A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/CANhcyEW6-dH28gLbFc5XpDTJ6JPizU%2Bt5g-aKUWJBf5W_Zriqw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com
[4] 
https://www.postgresql.org/message-id/d898faad-f6d7-4b0d-b816-b9dcdf490685%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v29-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


Re: speed up a logical replica setup

2024-03-13 Thread Euler Taveira
On Fri, Mar 8, 2024, at 6:44 AM, Hayato Kuroda (Fujitsu) wrote:
> Hmm, right. I considered below improvements. Tomas and Euler, how do you 
> think?

I'm posting a new patchset v28.

I changed the way that the check function works. From the usability
perspective, it is better to test all conditions and reports all errors (if
any) at once. It avoids multiple executions in dry run mode just to figure out
all of the issues in the initial phase. I also included tests for it using
Shlok's idea [1] although I didn't use v27-0004.

Shlok [1] reported that it was failing on Windows since the socket-directory
option was added. I added a fix for it.

Tomas pointed out the documentation [2] does not provide a good high level
explanation about pg_createsubscriber. I expanded the Description section and
moved the prerequisites to Nodes section. The prerequisites were grouped into
target and source conditions on their own paragraph instead of using a list. It
seems more in line with the style of some applications.

As I said in a previous email [3], I removed the retain option.


[1] 
https://www.postgresql.org/message-id/canhcyeu4q3dwh9ax9bpojcm4ebbhyfenegoaz8xfgyjmcpz...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com
[3] 
https://www.postgresql.org/message-id/e390e35e-508e-4eb8-92e4-e6b066407a41%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v28-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


v28-0002-Use-last-replication-slot-position-as-replicatio.patch.gz
Description: application/gzip


v28-0003-port-replace-int-with-string.patch.gz
Description: application/gzip


Re: speed up a logical replica setup

2024-03-12 Thread Amit Kapila
On Mon, Mar 11, 2024 at 9:42 AM vignesh C  wrote:
>
> On Sat, 9 Mar 2024 at 00:56, Tomas Vondra  
> wrote:
> >
> > >> 5) slot / publication / subscription name
> > >>
> > >> I find it somewhat annoying it's not possible to specify names for
> > >> objects created by the tool - replication slots, publication and
> > >> subscriptions. If this is meant to be a replica running for a while,
> > >> after a while I'll have no idea what pg_createsubscriber_569853 or
> > >> pg_createsubscriber_459548_2348239 was meant for.
> > >>
> > >> This is particularly annoying because renaming these objects later is
> > >> either not supported at all (e.g. for replication slots), or may be
> > >> quite difficult (e.g. publications).
> > >>
> > >> I do realize there are challenges with custom names (say, if there are
> > >> multiple databases to replicate), but can't we support some simple
> > >> formatting with basic placeholders? So we could specify
> > >>
> > >> --slot-name "myslot_%d_%p"
> > >>
> > >> or something like that?
> > >
> > > Not sure we can do in the first version, but looks nice. One concern is 
> > > that I
> > > cannot find applications which accepts escape strings like 
> > > log_line_prefix.
> > > (It may just because we do not have use-case.) Do you know examples?
> > >
> >
> > I can't think of a tool already doing that, but I think that's simply
> > because it was not needed. Why should we be concerned about this?
> >
>
> +1 to handle this.
> Currently, a) Publication name = pg_createsubscriber_%u, where %u is
> database oid, b) Replication slot name =  pg_createsubscriber_%u_%d,
> Where %u is database oid and %d is the pid and c) Subscription name =
> pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid
> How about we have a non mandatory option like
> --prefix_object_name=mysetup1, which will create a) Publication name =
> mysetup1_pg_createsubscriber_%u, and b) Replication slot name =
> mysetup1_pg_createsubscriber_%u (here pid is also removed) c)
> Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also
> removed).
>
> In the default case where the user does not specify
> --prefix_object_name the object names will be created without any
> prefix names.
>

Tomas's idea is better in terms of useability. So, we should instead
have three switches --slot-name, --publication-name, and
--subscriber-name with some provision of appending dbid's and some
unique identifier for standby. The unique identifier can help in
creating multiple subscribers from different standbys.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-12 Thread Amit Kapila
On Thu, Mar 7, 2024 at 10:44 PM Tomas Vondra
 wrote:
>
> 4) Is FOR ALL TABLES a good idea?
>
> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> sure it won't work for a number of use cases. I know large databases
> it's common to create "work tables" (not necessarily temporary) as part
> of a batch job, but there's no need to replicate those tables.
>
> AFAIK that'd break this FOR ALL TABLES publication, because the tables
> will qualify for replication, but won't be present on the subscriber. Or
> did I miss something?
>

As the subscriber is created from standby, all the tables should be
present at least initially during and after creating the subscriber.
Users are later free to modify the publications/subscriptions.

> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> it may be an acceptable limitation, but maybe it'd be good to also
> support restricting which tables should be replicated (e.g. blacklist or
> whitelist based on table/schema name?).
>

This would be useful, but OTOH could also be enhanced in a later
version unless we think it is a must for the first version.

-- 
With Regards,
Amit Kapila.




RE: speed up a logical replica setup

2024-03-10 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thanks for updating the patch, but cfbot still got angry [1].
Note that two containers (autoconf and meson) failed at different place,
so I think it is intentional one. It seems that there may be a bug related with 
32-bit build.

We should see and fix as soon as possible.

[1]: http://cfbot.cputube.org/highlights/all.html#4637

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: speed up a logical replica setup

2024-03-10 Thread vignesh C
On Sat, 9 Mar 2024 at 00:56, Tomas Vondra  wrote:
>
>
>
> On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote:
> > Dear Tomas, Euler,
> >
> > Thanks for starting to read the thread! Since I'm not an original author,
> > I want to reply partially.
> >
> >> I decided to take a quick look on this patch today, to see how it works
> >> and do some simple tests. I've only started to get familiar with it, so
> >> I have only some comments / questions regarding usage, not on the code.
> >> It's quite possible I didn't understand some finer points, or maybe it
> >> was already discussed earlier in this very long thread, so please feel
> >> free to push back or point me to the past discussion.
> >>
> >> Also, some of this is rather opinionated, but considering I didn't see
> >> this patch before, my opinions may easily be wrong ...
> >
> > I felt your comments were quit valuable.
> >
> >> 1) SGML docs
> >>
> >> It seems the SGML docs are more about explaining how this works on the
> >> inside, rather than how to use the tool. Maybe that's intentional, but
> >> as someone who didn't work with pg_createsubscriber before I found it
> >> confusing and not very helpful.
> >>
> >> For example, the first half of the page is prerequisities+warning, and
> >> sure those are useful details, but prerequisities are checked by the
> >> tool (so I can't really miss this) and warnings go into a lot of details
> >> about different places where things may go wrong. Sure, worth knowing
> >> and including in the docs, but maybe not right at the beginning, before
> >> I learn how to even run the tool?
> >
> > Hmm, right. I considered below improvements. Tomas and Euler, how do you 
> > think?
> >
> > * Adds more descriptions in "Description" section.
> > * Moves prerequisities+warning to "Notes" section.
> > * Adds "Usage" section which describes from a single node.
> >
> >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> >> sure it won't work for a number of use cases. I know large databases
> >> it's common to create "work tables" (not necessarily temporary) as part
> >> of a batch job, but there's no need to replicate those tables.
> >
> > Indeed, the documentation does not describe that all tables in the database
> > would be included in the publication.
> >
> >> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> >> it may be an acceptable limitation, but maybe it'd be good to also
> >> support restricting which tables should be replicated (e.g. blacklist or
> >> whitelist based on table/schema name?).
> >
> > May not directly related, but we considered that accepting options was a 
> > next-step [1].
> >
> >> Note: I now realize this might fall under the warning about DDL, which
> >> says this:
> >>
> >> Executing DDL commands on the source server while running
> >> pg_createsubscriber is not recommended. If the target server has
> >> already been converted to logical replica, the DDL commands must
> >> not be replicated so an error would occur.
> >
> > Yeah, they would not be replicated, but not lead ERROR.
> > So should we say like "Creating tables on the source server..."?
> >
>
> Perhaps. Clarifying the docs would help, but it depends on the wording.
> For example, I doubt this should talk about "creating tables" because
> there are other DDL that (probably) could cause issues (like adding a
> column to the table, or something like that).
>
> >> 5) slot / publication / subscription name
> >>
> >> I find it somewhat annoying it's not possible to specify names for
> >> objects created by the tool - replication slots, publication and
> >> subscriptions. If this is meant to be a replica running for a while,
> >> after a while I'll have no idea what pg_createsubscriber_569853 or
> >> pg_createsubscriber_459548_2348239 was meant for.
> >>
> >> This is particularly annoying because renaming these objects later is
> >> either not supported at all (e.g. for replication slots), or may be
> >> quite difficult (e.g. publications).
> >>
> >> I do realize there are challenges with custom names (say, if there are
> >> multiple databases to replicate), but can't we support some simple
> >> formatting with basic placeholders? So we could specify
> >>
> >> --slot-name "myslot_%d_%p"
> >>
> >> or something like that?
> >
> > Not sure we can do in the first version, but looks nice. One concern is 
> > that I
> > cannot find applications which accepts escape strings like log_line_prefix.
> > (It may just because we do not have use-case.) Do you know examples?
> >
>
> I can't think of a tool already doing that, but I think that's simply
> because it was not needed. Why should we be concerned about this?
>
> >> BTW what will happen if we convert multiple standbys? Can't they all get
> >> the same slot name (they all have the same database OID, and I'm not
> >> sure how much entropy the PID has)?
> >
> > I tested and the second try did not work. The primal reason was the name of 
> > 

Re: speed up a logical replica setup

2024-03-08 Thread Tomas Vondra



On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote:
> Dear Tomas, Euler,
> 
> Thanks for starting to read the thread! Since I'm not an original author,
> I want to reply partially.
> 
>> I decided to take a quick look on this patch today, to see how it works
>> and do some simple tests. I've only started to get familiar with it, so
>> I have only some comments / questions regarding usage, not on the code.
>> It's quite possible I didn't understand some finer points, or maybe it
>> was already discussed earlier in this very long thread, so please feel
>> free to push back or point me to the past discussion.
>>
>> Also, some of this is rather opinionated, but considering I didn't see
>> this patch before, my opinions may easily be wrong ...
> 
> I felt your comments were quit valuable.
> 
>> 1) SGML docs
>>
>> It seems the SGML docs are more about explaining how this works on the
>> inside, rather than how to use the tool. Maybe that's intentional, but
>> as someone who didn't work with pg_createsubscriber before I found it
>> confusing and not very helpful.
>>
>> For example, the first half of the page is prerequisities+warning, and
>> sure those are useful details, but prerequisities are checked by the
>> tool (so I can't really miss this) and warnings go into a lot of details
>> about different places where things may go wrong. Sure, worth knowing
>> and including in the docs, but maybe not right at the beginning, before
>> I learn how to even run the tool?
> 
> Hmm, right. I considered below improvements. Tomas and Euler, how do you 
> think?
> 
> * Adds more descriptions in "Description" section.
> * Moves prerequisities+warning to "Notes" section.
> * Adds "Usage" section which describes from a single node.
> 
>> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
>> sure it won't work for a number of use cases. I know large databases
>> it's common to create "work tables" (not necessarily temporary) as part
>> of a batch job, but there's no need to replicate those tables.
> 
> Indeed, the documentation does not describe that all tables in the database
> would be included in the publication.
> 
>> I do understand that FOR ALL TABLES is the simplest approach, and for v1
>> it may be an acceptable limitation, but maybe it'd be good to also
>> support restricting which tables should be replicated (e.g. blacklist or
>> whitelist based on table/schema name?).
> 
> May not directly related, but we considered that accepting options was a 
> next-step [1].
> 
>> Note: I now realize this might fall under the warning about DDL, which
>> says this:
>>
>> Executing DDL commands on the source server while running
>> pg_createsubscriber is not recommended. If the target server has
>> already been converted to logical replica, the DDL commands must
>> not be replicated so an error would occur.
> 
> Yeah, they would not be replicated, but not lead ERROR.
> So should we say like "Creating tables on the source server..."?
> 

Perhaps. Clarifying the docs would help, but it depends on the wording.
For example, I doubt this should talk about "creating tables" because
there are other DDL that (probably) could cause issues (like adding a
column to the table, or something like that).

>> 5) slot / publication / subscription name
>>
>> I find it somewhat annoying it's not possible to specify names for
>> objects created by the tool - replication slots, publication and
>> subscriptions. If this is meant to be a replica running for a while,
>> after a while I'll have no idea what pg_createsubscriber_569853 or
>> pg_createsubscriber_459548_2348239 was meant for.
>>
>> This is particularly annoying because renaming these objects later is
>> either not supported at all (e.g. for replication slots), or may be
>> quite difficult (e.g. publications).
>>
>> I do realize there are challenges with custom names (say, if there are
>> multiple databases to replicate), but can't we support some simple
>> formatting with basic placeholders? So we could specify
>>
>> --slot-name "myslot_%d_%p"
>>
>> or something like that?
> 
> Not sure we can do in the first version, but looks nice. One concern is that I
> cannot find applications which accepts escape strings like log_line_prefix.
> (It may just because we do not have use-case.) Do you know examples?
> 

I can't think of a tool already doing that, but I think that's simply
because it was not needed. Why should we be concerned about this?

>> BTW what will happen if we convert multiple standbys? Can't they all get
>> the same slot name (they all have the same database OID, and I'm not
>> sure how much entropy the PID has)?
> 
> I tested and the second try did not work. The primal reason was the name of 
> publication
> - pg_createsubscriber_%u (oid).
> FYI - previously we can reuse same publications, but based on my comment [2] 
> the
> feature was removed. It might be too optimistic.
> 

OK. I could be convinced the other limitations are reasonable for v1 and

Re: speed up a logical replica setup

2024-03-08 Thread Andrey M. Borodin



> On 8 Mar 2024, at 12:03, Shlok Kyal  wrote:
> 
> 

I haven't digged into the thread, but recent version fails some CFbot's tests.

http://commitfest.cputube.org/euler-taveira.html
https://cirrus-ci.com/task/4833499115421696
==29928==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a01458 
at pc 0x7f3b29fdedce bp 0x7ffe68fcf1c0 sp 0x7ffe68fcf1b8

Thanks!


Best regards, Andrey Borodin.



RE: speed up a logical replica setup

2024-03-08 Thread Hayato Kuroda (Fujitsu)
Dear Tomas, Euler,

Thanks for starting to read the thread! Since I'm not an original author,
I want to reply partially.

> I decided to take a quick look on this patch today, to see how it works
> and do some simple tests. I've only started to get familiar with it, so
> I have only some comments / questions regarding usage, not on the code.
> It's quite possible I didn't understand some finer points, or maybe it
> was already discussed earlier in this very long thread, so please feel
> free to push back or point me to the past discussion.
> 
> Also, some of this is rather opinionated, but considering I didn't see
> this patch before, my opinions may easily be wrong ...

I felt your comments were quit valuable.

> 1) SGML docs
> 
> It seems the SGML docs are more about explaining how this works on the
> inside, rather than how to use the tool. Maybe that's intentional, but
> as someone who didn't work with pg_createsubscriber before I found it
> confusing and not very helpful.
>
> For example, the first half of the page is prerequisities+warning, and
> sure those are useful details, but prerequisities are checked by the
> tool (so I can't really miss this) and warnings go into a lot of details
> about different places where things may go wrong. Sure, worth knowing
> and including in the docs, but maybe not right at the beginning, before
> I learn how to even run the tool?

Hmm, right. I considered below improvements. Tomas and Euler, how do you think?

* Adds more descriptions in "Description" section.
* Moves prerequisities+warning to "Notes" section.
* Adds "Usage" section which describes from a single node.

> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> sure it won't work for a number of use cases. I know large databases
> it's common to create "work tables" (not necessarily temporary) as part
> of a batch job, but there's no need to replicate those tables.

Indeed, the documentation does not describe that all tables in the database
would be included in the publication.

> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> it may be an acceptable limitation, but maybe it'd be good to also
> support restricting which tables should be replicated (e.g. blacklist or
> whitelist based on table/schema name?).

May not directly related, but we considered that accepting options was a 
next-step [1].

> Note: I now realize this might fall under the warning about DDL, which
> says this:
> 
> Executing DDL commands on the source server while running
> pg_createsubscriber is not recommended. If the target server has
> already been converted to logical replica, the DDL commands must
> not be replicated so an error would occur.

Yeah, they would not be replicated, but not lead ERROR.
So should we say like "Creating tables on the source server..."?

> 5) slot / publication / subscription name
> 
> I find it somewhat annoying it's not possible to specify names for
> objects created by the tool - replication slots, publication and
> subscriptions. If this is meant to be a replica running for a while,
> after a while I'll have no idea what pg_createsubscriber_569853 or
> pg_createsubscriber_459548_2348239 was meant for.
> 
> This is particularly annoying because renaming these objects later is
> either not supported at all (e.g. for replication slots), or may be
> quite difficult (e.g. publications).
> 
> I do realize there are challenges with custom names (say, if there are
> multiple databases to replicate), but can't we support some simple
> formatting with basic placeholders? So we could specify
> 
> --slot-name "myslot_%d_%p"
> 
> or something like that?

Not sure we can do in the first version, but looks nice. One concern is that I
cannot find applications which accepts escape strings like log_line_prefix.
(It may just because we do not have use-case.) Do you know examples?

> BTW what will happen if we convert multiple standbys? Can't they all get
> the same slot name (they all have the same database OID, and I'm not
> sure how much entropy the PID has)?

I tested and the second try did not work. The primal reason was the name of 
publication
- pg_createsubscriber_%u (oid).
FYI - previously we can reuse same publications, but based on my comment [2] the
feature was removed. It might be too optimistic.

[1]: 
https://www.postgresql.org/message-id/TY3PR01MB9889CCBD4D9DAF8BD2F18541F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]: 
https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: speed up a logical replica setup

2024-03-08 Thread vignesh C
On Thu, 7 Mar 2024 at 18:31, Shlok Kyal  wrote:
>
> Hi,
>
> > Thanks for the feedback. I'm attaching v26 that addresses most of your 
> > comments
> > and some issues pointed by Vignesh [1].
>
> I have created a top-up patch v27-0004. It contains additional test
> cases for pg_createsubscriber.
>
> Currently, two testcases (in v27-0004 patch) are failing. These
> failures are related to running pg_createsubscriber on nodes in
> cascade physical replication and are already reported in [1] and [2].
> I think these cases should be fixed. Thoughts?

We can fix these issues, if we are not planning to fix any of them, we
can add documentation for the same.

> The idea of this patch is to keep track of testcases, so that any
> future patch does not break any scenario which has already been worked
> on. These testcases can be used to test in our development process,
> but which test should actually be committed, can be discussed later.
> Thought?

Few comments for v27-0004-Add-additional-testcases.patch:
1) We could use command_fails_like to verify the reason of the error:
+# set max_replication_slots
+$node_p->append_conf('postgresql.conf', 'max_replication_slots = 3');
+$node_p->restart;
+command_fails(
+   [
+   'pg_createsubscriber', '--verbose',
+   '--dry-run', '--pgdata',
+   $node_s->data_dir, '--publisher-server',
+   $node_p->connstr('pg1'), '--socket-directory',
+   $node_s->host, '--subscriber-port',
+   $node_s->port, '--database',
+   'pg1', '--database',
+   'pg2',
+   ],
+   'max_replication_slots are less in number in publisher');

2) Add a comment saying what is being verified
+# set max_replication_slots
+$node_p->append_conf('postgresql.conf', 'max_replication_slots = 3');
+$node_p->restart;
+command_fails(
+   [
+   'pg_createsubscriber', '--verbose',
+   '--dry-run', '--pgdata',
+   $node_s->data_dir, '--publisher-server',
+   $node_p->connstr('pg1'), '--socket-directory',
+   $node_s->host, '--subscriber-port',
+   $node_s->port, '--database',
+   'pg1', '--database',
+   'pg2',
+   ],
+   'max_replication_slots are less in number in publisher');

3) We could rename this file something like
pg_create_subscriber_failure_cases or something better:
 src/bin/pg_basebackup/t/041_tests.pl | 285 +++
 1 file changed, 285 insertions(+)
 create mode 100644 src/bin/pg_basebackup/t/041_tests.pl

diff --git a/src/bin/pg_basebackup/t/041_tests.pl
b/src/bin/pg_basebackup/t/041_tests.pl
new file mode 100644
index 00..2889d60d54
--- /dev/null
+++ b/src/bin/pg_basebackup/t/041_tests.pl
@@ -0,0 +1,285 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group


4) This success case is not required as this would have already been
covered in 040_pg_createsubscriber.pl:
+$node_p->append_conf('postgresql.conf', 'max_replication_slots = 4');
+$node_p->restart;
+command_ok(
+   [
+   'pg_createsubscriber', '--verbose',
+   '--dry-run', '--pgdata',
+   $node_s->data_dir, '--publisher-server',
+   $node_p->connstr('pg1'), '--socket-directory',
+   $node_s->host, '--subscriber-port',
+   $node_s->port, '--database',
+   'pg1', '--database',
+   'pg2',
+   ],
+   'max_replication_slots are accurate on publisher');

5)  We could use command_fails_like to verify the reason of the error:
$node_s->append_conf('postgresql.conf', 'max_replication_slots = 1');
$node_s->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_replication_slots are less in number in subscriber');

6) Add a comment saying what is being verified
$node_s->append_conf('postgresql.conf', 'max_replication_slots = 1');
$node_s->restart;
command_fails(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_replication_slots are less in number in subscriber');

7) This success case is not required as this would have already been
covered in 040_pg_createsubscriber.pl:
$node_s->append_conf('postgresql.conf', 'max_replication_slots = 2');
$node_s->restart;
command_ok(
[
'pg_createsubscriber', '--verbose',
'--dry-run', '--pgdata',
$node_s->data_dir, '--publisher-server',
$node_p->connstr('pg1'), '--socket-directory',
$node_s->host, '--subscriber-port',
$node_s->port, '--database',
'pg1', '--database',
'pg2',
],
'max_replication_slots are less in number in subscriber');

8) We could use 

Re: speed up a logical replica setup

2024-03-07 Thread Tomas Vondra
Hi,

I decided to take a quick look on this patch today, to see how it works
and do some simple tests. I've only started to get familiar with it, so
I have only some comments / questions regarding usage, not on the code.
It's quite possible I didn't understand some finer points, or maybe it
was already discussed earlier in this very long thread, so please feel
free to push back or point me to the past discussion.

Also, some of this is rather opinionated, but considering I didn't see
this patch before, my opinions may easily be wrong ...


1) SGML docs

It seems the SGML docs are more about explaining how this works on the
inside, rather than how to use the tool. Maybe that's intentional, but
as someone who didn't work with pg_createsubscriber before I found it
confusing and not very helpful.

For example, the first half of the page is prerequisities+warning, and
sure those are useful details, but prerequisities are checked by the
tool (so I can't really miss this) and warnings go into a lot of details
about different places where things may go wrong. Sure, worth knowing
and including in the docs, but maybe not right at the beginning, before
I learn how to even run the tool?

Maybe that's just me, though. Also, I'm sure it's not the only part of
our docs like this. Perhaps it'd be good to reorganize the content a bit
to make the "how to use" stuff more prominent?


2) this is a bit vague

... pg_createsubscriber will check a few times if the connection has
been reestablished to stream the required WAL. After a few attempts, it
terminates with an error.

What does "a few times" mean, and how many is "a few attempts"? Seems
worth knowing when using this tool in environments where disconnections
can happen. Maybe this should be configurable?


3) publication info

For a while I was quite confused about which tables get replicated,
until I realized the publication is FOR ALL TABLES. But I learned that
from this thread, the docs say nothing about this. Surely that's an
important detail that should be mentioned?


4) Is FOR ALL TABLES a good idea?

I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
sure it won't work for a number of use cases. I know large databases
it's common to create "work tables" (not necessarily temporary) as part
of a batch job, but there's no need to replicate those tables.

AFAIK that'd break this FOR ALL TABLES publication, because the tables
will qualify for replication, but won't be present on the subscriber. Or
did I miss something?

I do understand that FOR ALL TABLES is the simplest approach, and for v1
it may be an acceptable limitation, but maybe it'd be good to also
support restricting which tables should be replicated (e.g. blacklist or
whitelist based on table/schema name?).

BTW if I'm right and creating a table breaks the subscriber creation,
maybe it'd be good to explicitly mention that in the docs.

Note: I now realize this might fall under the warning about DDL, which
says this:

Executing DDL commands on the source server while running
pg_createsubscriber is not recommended. If the target server has
already been converted to logical replica, the DDL commands must
not be replicated so an error would occur.

But I find this confusing. Surely there are many DDL commands that have
absolutely no impact on logical replication (like creating an index or
view, various ALTER TABLE flavors, and so on). And running such DDL
certainly does not trigger error, right?


5) slot / publication / subscription name

I find it somewhat annoying it's not possible to specify names for
objects created by the tool - replication slots, publication and
subscriptions. If this is meant to be a replica running for a while,
after a while I'll have no idea what pg_createsubscriber_569853 or
pg_createsubscriber_459548_2348239 was meant for.

This is particularly annoying because renaming these objects later is
either not supported at all (e.g. for replication slots), or may be
quite difficult (e.g. publications).

I do realize there are challenges with custom names (say, if there are
multiple databases to replicate), but can't we support some simple
formatting with basic placeholders? So we could specify

--slot-name "myslot_%d_%p"

or something like that?

BTW what will happen if we convert multiple standbys? Can't they all get
the same slot name (they all have the same database OID, and I'm not
sure how much entropy the PID has)?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: speed up a logical replica setup

2024-03-07 Thread vignesh C
On Thu, 7 Mar 2024 at 10:05, Euler Taveira  wrote:
>
> On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote:
>
> Thanks for updating the patch!
>
>
> Thanks for the feedback. I'm attaching v26 that addresses most of your 
> comments
> and some issues pointed by Vignesh [1].

Few comments:
1) We are disconnecting database again in error case too, it will lead
to a double free in this scenario,
+   PQclear(res);
+
+   disconnect_database(conn, false);
+
+   if (max_repslots < num_dbs)
+   {
+   pg_log_error("subscriber requires %d replication
slots, but only %d remain",
+num_dbs, max_repslots);
+   pg_log_error_hint("Consider increasing
max_replication_slots to at least %d.",
+ num_dbs);
+   disconnect_database(conn, true);
+   }
+
+   if (max_lrworkers < num_dbs)
+   {
+   pg_log_error("subscriber requires %d logical
replication workers, but only %d remain",
+num_dbs, max_lrworkers);
+   pg_log_error_hint("Consider increasing
max_logical_replication_workers to at least %d.",
+ num_dbs);
+   disconnect_database(conn, true);
+   }

pg_createsubscriber: error: subscriber requires 5 logical replication
workers, but only 4 remain
pg_createsubscriber: hint: Consider increasing
max_logical_replication_workers to at least 5.
free(): double free detected in tcache 2
Aborted

2) We can also check that the primary is not using
synchronous_standby_names, else all the transactions in the primary
will wait infinitely once the standby server is stopped, this could be
added in the documentation:
+/*
+ * Is the primary server ready for logical replication?
+ */
+static void
+check_publisher(struct LogicalRepInfo *dbinfo)
+{
+   PGconn *conn;
+   PGresult   *res;
+
+   char   *wal_level;
+   int max_repslots;
+   int cur_repslots;
+   int max_walsenders;
+   int cur_walsenders;
+
+   pg_log_info("checking settings on publisher");
+
+   conn = connect_database(dbinfo[0].pubconninfo, true);
+
+   /*
+* If the primary server is in recovery (i.e. cascading replication),
+* objects (publication) cannot be created because it is read only.
+*/
+   if (server_is_in_recovery(conn))
+   {
+   pg_log_error("primary server cannot be in recovery");
+   disconnect_database(conn, true);
+   }

3) This check is present only for publication, we do not have this in
case of creating a subscription. We can keep both of them similar,
i.e. have the check in both or don't have the check for both
publication and subscription:
+   /* Check if the publication already exists */
+   appendPQExpBuffer(str,
+ "SELECT 1 FROM
pg_catalog.pg_publication "
+ "WHERE pubname = '%s'",
+ dbinfo->pubname);
+   res = PQexec(conn, str->data);
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not obtain publication information: %s",
+PQresultErrorMessage(res));
+   disconnect_database(conn, true);
+   }
+

4) Few options are missing:
+ 
+  
+   pg_createsubscriber
+   option
+   
+
+ -d
+ --database
+
+dbname
+
+ -D 
+ --pgdata
+
+datadir
+
+ -P
+ --publisher-server
+
+connstr
+   
+  
+ 

4a) -n, --dry-run
4b) -p, --subscriber-port
4c) -r, --retain
4d) -s, --socket-directory
4e) -t, --recovery-timeout
4f) -U, --subscriber-username

5) typo connnections should be connections
+   
+The port number on which the target server is listening for
connections.
+Defaults to running the target server on port 50432 to avoid unintended
+client connnections.

6) repliation should be replication
+ * Create the subscriptions, adjust the initial location for logical
+ * replication and enable the subscriptions. That's the last step for logical
+ * repliation setup.

7) I did not notice these changes in the latest patch:
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d808aad8b0..08de2bf4e6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -517,6 +517,7 @@ CreateSeqStmt
 CreateStatsStmt
 CreateStmt
 CreateStmtContext
+CreateSubscriberOptions
 CreateSubscriptionStmt
 CreateTableAsStmt
 CreateTableSpaceStmt
@@ -1505,6 +1506,7 @@ LogicalRepBeginData
 LogicalRepCommitData
 LogicalRepCommitPreparedTxnData
 LogicalRepCtxStruct
+LogicalRepInfo
 LogicalRepMsgType
 LogicalRepPartMapEntry
 LogicalRepPreparedTxnData

Regards,

Re: speed up a logical replica setup

2024-03-06 Thread Euler Taveira
On Wed, Mar 6, 2024, at 8:24 AM, vignesh C wrote:
> Few comments:

Thanks for your review. Some changes are included in v26.

> 1)   Can we use strdup here instead of atoi, as we do similarly in
> case of pg_dump too, else we will do double conversion, convert using
> atoi and again to string while forming the connection string:
> +   case 'p':
> +   if ((opt.sub_port = atoi(optarg)) <= 0)
> +   pg_fatal("invalid subscriber
> port number");
> +   break;

I don't have a strong preference but decided to provide a patch for it. See
v26-0003.

> 2) We can have some valid range for this, else we will end up in some
> unexpected values when a higher number is specified:
> +   case 't':
> +   opt.recovery_timeout = atoi(optarg);
> +   break;

I wouldn't like to add an arbitrary value. Suggestions?

> 3) Now that we have addressed most of the items, can we handle this TODO:
> +   /*
> +* TODO use primary_conninfo (if available) from subscriber 
> and
> +* extract publisher connection string. Assume that there are
> +* identical entries for physical and logical
> replication. If there is
> +* not, we would fail anyway.
> +*/
> +   pg_log_error("no publisher connection string specified");
> +   pg_log_error_hint("Try \"%s --help\" for more
> information.", progname);
> +   exit(1);

It is not in my top priority at the moment.

> 4)  By default the log level as info here, I was not sure how to set
> it to debug level to get these error messages:
> +   pg_log_debug("publisher(%d): connection string: %s",
> i, dbinfo[i].pubconninfo);
> +   pg_log_debug("subscriber(%d): connection string: %s",
> i, dbinfo[i].subconninfo);

  -v
  --verbose
  
   
Enables verbose mode. This will cause
pg_createsubscriber to output progress 
messages
and detailed information about each step to standard error.
Repeating the option causes additional debug-level messages to appear on
standard error.
   

> 5) Currently in non verbose mode there are no messages printed on
> console, we could have a few of them printed irrespective of verbose
> or not like the following:
> a) creating publication
> b) creating replication slot
> c) waiting for the target server to reach the consistent state
> d) If pg_createsubscriber fails after this point, you must recreate
> the physical replica before continuing.
> e) creating subscription

That's the idea. Quiet mode by default.

> 6) The message should be "waiting for the target server to reach the
> consistent state":
> +#define NUM_CONN_ATTEMPTS  5
> +
> +   pg_log_info("waiting the target server to reach the consistent 
> state");
> +
> +   conn = connect_database(conninfo, true);

Fixed.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-06 Thread Euler Taveira
On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote:
> Thanks for updating the patch!

Thanks for the feedback. I'm attaching v26 that addresses most of your comments
and some issues pointed by Vignesh [1].

> >* I also removed the check for standby is running. If the standby was 
> >stopped a
> >  long time ago, it will take some time to reach the start point.
> >* Dry run mode has to start / stop the service to work correctly. Is it an
> >  issue?
> 
> One concern (see below comment) is that -l option would not be passed even if
> the standby has been logging before running pg_createsubscriber. Also, some 
> settings
> passed by pg_ctl start -o  would not be restored.

That's a good point. We should state in the documentation that GUCs specified in
the command-line options are ignored during the execution.

> >However, I decided to include --retain option, I'm thinking about to remove 
> >it.
> >If the logging is enabled, the information during the pg_createsubscriber 
> >will
> >be available. The client log can be redirected to a file for future 
> >inspection.
> 
> Just to confirm - you meant to say like below, right? 
> * the client output would be redirected, and
> * -r option would be removed.

Yes. The logging_collector is usually enabled or the syslog is collecting the
log entries. Under reflection, another log directory to store entries for a
short period of time doesn't seem a good idea. It divides the information and
it also costs development time. The questions that make me think about it were:
Should I remove the pg_createsubscriber_output.d directory if it runs
successfully? What if there is an old file there? Is it another directory to
exclude while taking a backup? I also don't like the long directory name.

> Here are my initial comments for v25-0001. I read new doc and looks very good.
> I may do reviewing more about v25-0001, but feel free to revise.
> 
> 01. cleanup_objects_atexit
> ```
> PGconn*conn;
> int i;
> ```
> The declaration *conn can be in the for-loop. Also, the declaration of the 
> indicator can be in the bracket.

Changed.

> 02. cleanup_objects_atexit
> ```
> 
> /*
> * If a connection could not be established, inform the user
> * that some objects were left on primary and should be
> * removed before trying again.
> */
> if (dbinfo[i].made_publication)
> {
> pg_log_warning("There might be a publication \"%s\" in database \"%s\" on 
> primary",
>dbinfo[i].pubname, dbinfo[i].dbname);
> pg_log_warning_hint("Consider dropping this publication before trying 
> again.");
> }
> if (dbinfo[i].made_replslot)
> {
> pg_log_warning("There might be a replication slot \"%s\" in database \"%s\" 
> on primary",
>dbinfo[i].subname, dbinfo[i].dbname);
> pg_log_warning_hint("Drop this replication slot soon to avoid retention of 
> WAL files.");
> }
> ```
> 
> Not sure which is better, but we may able to the list to the concrete file 
> like pg_upgrade.
> (I thought it had been already discussed, but could not find from the 
> archive. Sorry if it was a duplicated comment)

Do you mean the replication slot file? I think the replication slot and the
server (primary) is sufficient for checking and fixing if required.

> 03. main
> ```
> while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
> long_options, _index)) != -1)
> ```
> 
> Missing update for __shortopts.

Fixed.

> 04. main
> ```
> case 'D':
> opt.subscriber_dir = pg_strdup(optarg);
> canonicalize_path(opt.subscriber_dir);
> break;
> ...
> case 'P':
> opt.pub_conninfo_str = pg_strdup(optarg);
> break;
> ...
> case 's':
> opt.socket_dir = pg_strdup(optarg);
> break;
> ...
> case 'U':
> opt.sub_username = pg_strdup(optarg);
> break;
> ```
> 
> Should we consider the case these options would be specified twice?
> I.e., should we call pg_free() before the substitution? 

It isn't a concern for the other client tools. I think the reason is that it
doesn't continue to leak memory during the execution. I wouldn't bother with it.

> 05. main
> 
> Missing canonicalize_path() to the socket_dir.

Fixed.

> 06. main
> ```
> /*
> * If socket directory is not provided, use the current directory.
> */
> ```
> 
> One-line comment can be used. Period can be also removed at that time.

Fixed.

> 07. main
> ```
> /*
> *
> * If subscriber username is not provided, check if the environment
> * variable sets it. If not, obtain the operating system name of the user
> * running it.
> */
> ```
> Unnecessary blank.

Fixed.

> 08. main
> ```
> char*errstr = NULL;
> ```
>  
> This declaration can be at else-part.

Fixed.

> 09. main.
> 
> Also, as the first place, do we have to get username if not specified?
> I felt libpq can handle the case if we skip passing the info.

Are you suggesting that the username should be optional?

> 10. main
> ```
> appendPQExpBuffer(sub_conninfo_str, "host=%s port=%u user=%s 
> fallback_application_name=%s",
>   opt.socket_dir, opt.sub_port, opt.sub_username, progname);
> sub_base_conninfo = 

Re: speed up a logical replica setup

2024-03-06 Thread vignesh C
On Sat, 2 Mar 2024 at 02:19, Euler Taveira  wrote:
>
> On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
>
> Based on idea from Euler, I roughly implemented. Thought?
>
> 0001-0013 were not changed from the previous version.
>
> V24-0014: addressed your comment in the replied e-mail.
> V24-0015: Add disconnect_database() again, per [3]
> V24-0016: addressed your comment in [4].
> V24-0017: addressed your comment in [5].
> V24-0018: addressed your comment in [6].
>
>
> Thanks for your review. I'm attaching v25 that hopefully addresses all pending
> points.
>
> Regarding your comments [1] on v21, I included changes for almost all items
> except 2, 20, 23, 24, and 25. It still think item 2 is not required because
> pg_ctl will provide a suitable message. I decided not to rearrange the block 
> of
> SQL commands (item 20) mainly because it would avoid these objects on node_f.
> Do we really need command_checks_all? Depending on the output it uses
> additional cycles than command_ok.
>
> In summary:
>
> v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, 
> I
> checked what you wrote and fix some gaps in what I've been written.
> v24-0003: as I said I don't think we need to add it, however, I won't fight
> against it if people want to add this check.
> v24-0004: I spent some time on it. This patch is not completed. I cleaned it 
> up
> and include the start_standby_server code. It starts the server using the
> specified socket directory, port and username, hence, preventing external 
> client
> connections during the execution.
> v24-0005: partially applied
> v24-0006: applied with cosmetic change
> v24-0007: applied with cosmetic change
> v24-0008: applied
> v24-0009: applied with cosmetic change
> v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
> not used when there is a database connection open. Instead, pg_log_error()
> followed by disconnect_database(). In cases that it should exit immediately,
> disconnect_database() has a new parameter (exit_on_error) that controls if it
> needs to call exit(1). I go ahead and did the same for connect_database().
> v24-0011: partially applied. I included some of the suggestions (18, 19, and 
> 21).
> v24-0012: not applied. Under reflection, after working on v24-0004, the target
> server will start with new parameters (that only accepts local connections),
> hence, during the execution is not possible anymore to detect if the target
> server is a primary for another server. I added a sentence for it in the
> documentation (Warning section).
> v24-0013: good catch. Applied.
> v24-0014: partially applied. After some experiments I decided to use a small
> number of attempts. The current code didn't reset the counter if the 
> connection
> is reestablished. I included the documentation suggestion. I didn't include 
> the
> IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
> I refactored the drop_publication() to not try again if the DROP PUBLICATION
> failed.
> v24-0015: not applied. I refactored the exit code to do the right thing: print
> error message, disconnect database (if applicable) and exit.
> v24-0016: not applied. But checked if the information was presented in the
> documentation; it is.
> v24-0017: good catch. Applied.
> v24-0018: not applied.
>
> I removed almost all boolean return and include the error logic inside the
> function. It reads better. I also changed the connect|disconnect_database
> functions to include the error logic inside it. There is a new parameter
> on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
> think someone suggested it -- and the error message was moved to outside the
> function. I improved the cleanup routine. It provides useful information if it
> cannot remove the object (publication or replication slot) from the primary.
>
> Since I applied v24-0004, I realized that extra start / stop service are
> required. It mean pg_createsubscriber doesn't start the transformation with 
> the
> current standby settings. Instead, it stops the standby if it is running and
> start it with the provided command-line options (socket, port,
> listen_addresses). It has a few drawbacks:
> * See v34-0012. It cannot detect if the target server is a primary for another
>   server. It is documented.
> * I also removed the check for standby is running. If the standby was stopped 
> a
>   long time ago, it will take some time to reach the start point.
> * Dry run mode has to start / stop the service to work correctly. Is it an
>   issue?
>
> However, I decided to include --retain option, I'm thinking about to remove 
> it.
> If the logging is enabled, the information during the pg_createsubscriber will
> be available. The client log can be redirected to a file for future 
> inspection.
>
> Comments?

Few comments:
1)   Can we use strdup here instead of atoi, as we do similarly in
case of pg_dump too, else we will do double 

RE: speed up a logical replica setup

2024-03-06 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

Thanks for updating the patch!

>v24-0003: as I said I don't think we need to add it, however, I won't fight
>against it if people want to add this check.

OK, let's wait comments from senior members.

>Since I applied v24-0004, I realized that extra start / stop service are
>required. It mean pg_createsubscriber doesn't start the transformation with the
>current standby settings. Instead, it stops the standby if it is running and
>start it with the provided command-line options (socket, port,
>listen_addresses). It has a few drawbacks:
>* See v34-0012. It cannot detect if the target server is a primary for another
>  server. It is documented.

Yeah, It is a collateral damage.

>* I also removed the check for standby is running. If the standby was stopped a
>  long time ago, it will take some time to reach the start point.
>* Dry run mode has to start / stop the service to work correctly. Is it an
>  issue?

One concern (see below comment) is that -l option would not be passed even if
the standby has been logging before running pg_createsubscriber. Also, some 
settings
passed by pg_ctl start -o  would not be restored.

>However, I decided to include --retain option, I'm thinking about to remove it.
>If the logging is enabled, the information during the pg_createsubscriber will
>be available. The client log can be redirected to a file for future inspection.

Just to confirm - you meant to say like below, right? 
* the client output would be redirected, and
* -r option would be removed.

Here are my initial comments for v25-0001. I read new doc and looks very good.
I may do reviewing more about v25-0001, but feel free to revise.

01. cleanup_objects_atexit
```
PGconn *conn;
int i;
```
The declaration *conn can be in the for-loop. Also, the declaration of the 
indicator can be in the bracket.

02. cleanup_objects_atexit
```

/*
 * If a connection could not be established, 
inform the user
 * that some objects were left on primary and 
should be
 * removed before trying again.
 */
if (dbinfo[i].made_publication)
{
pg_log_warning("There might be a 
publication \"%s\" in database \"%s\" on primary",
   
dbinfo[i].pubname, dbinfo[i].dbname);
pg_log_warning_hint("Consider dropping 
this publication before trying again.");
}
if (dbinfo[i].made_replslot)
{
pg_log_warning("There might be a 
replication slot \"%s\" in database \"%s\" on primary",
   
dbinfo[i].subname, dbinfo[i].dbname);
pg_log_warning_hint("Drop this 
replication slot soon to avoid retention of WAL files.");
}
```

Not sure which is better, but we may able to the list to the concrete file like 
pg_upgrade.
(I thought it had been already discussed, but could not find from the archive. 
Sorry if it was a duplicated comment)

03. main
```
while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
long_options, 
_index)) != -1)
```

Missing update for __shortopts.

04. main
```
case 'D':
opt.subscriber_dir = pg_strdup(optarg);
canonicalize_path(opt.subscriber_dir);
break;
...
case 'P':
opt.pub_conninfo_str = pg_strdup(optarg);
break;
...
case 's':
opt.socket_dir = pg_strdup(optarg);
break;
...
case 'U':
opt.sub_username = pg_strdup(optarg);
break;
```

Should we consider the case these options would be specified twice?
I.e., should we call pg_free() before the substitution?
 
05. main

Missing canonicalize_path() to the socket_dir.

06. main
```
/*
 * If socket directory is not provided, use the current directory.
 */
```

One-line comment can be used. Period can be also removed at that time.

07. main
```
/*
 *
 * If subscriber username is not provided, check if the environment
 * variable sets it. If not, obtain the operating system name of the 
user
 * running it.
 */
```
Unnecessary blank.

08. main
```
char   *errstr = NULL;
```
 
This 

Re: speed up a logical replica setup

2024-03-05 Thread Euler Taveira
On Tue, Mar 5, 2024, at 12:48 AM, Shubham Khanna wrote:
> I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
> I was unable to execute it and experienced the following error:
> 
> $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
> dbname=postgres"  -d postgres -d db1 -p 9000 -r
> ./pg_createsubscriber: invalid option -- 'p'
> pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
> information.

Oops. Good catch! I will post an updated patch soon.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-04 Thread Shlok Kyal
Hi,
> I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
> I was unable to execute it and experienced the following error:
>
> $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
> dbname=postgres"  -d postgres -d db1 -p 9000 -r
> ./pg_createsubscriber: invalid option -- 'p'
> pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
> information.
>
> Here, the --p is not accepting the desired port number. Thoughts?

I investigated it and found that:

+ while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v",
+ long_options, _index)) != -1)
+ {

Here 'p', 's' and 'U' options are missing so we are getting the error.
Also, I think the 'S' option should be removed from here.

I also see that specifying long options like --subscriber-port,
--subscriber-username, --socket-directory works fine.
Thoughts?

Thanks and regards,
Shlok Kyal




Re: speed up a logical replica setup

2024-03-04 Thread Shubham Khanna
On Sat, Mar 2, 2024 at 2:19 AM Euler Taveira  wrote:
>
> On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
>
> Based on idea from Euler, I roughly implemented. Thought?
>
> 0001-0013 were not changed from the previous version.
>
> V24-0014: addressed your comment in the replied e-mail.
> V24-0015: Add disconnect_database() again, per [3]
> V24-0016: addressed your comment in [4].
> V24-0017: addressed your comment in [5].
> V24-0018: addressed your comment in [6].
>
>
> Thanks for your review. I'm attaching v25 that hopefully addresses all pending
> points.
>
> Regarding your comments [1] on v21, I included changes for almost all items
> except 2, 20, 23, 24, and 25. It still think item 2 is not required because
> pg_ctl will provide a suitable message. I decided not to rearrange the block 
> of
> SQL commands (item 20) mainly because it would avoid these objects on node_f.
> Do we really need command_checks_all? Depending on the output it uses
> additional cycles than command_ok.
>
> In summary:
>
> v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, 
> I
> checked what you wrote and fix some gaps in what I've been written.
> v24-0003: as I said I don't think we need to add it, however, I won't fight
> against it if people want to add this check.
> v24-0004: I spent some time on it. This patch is not completed. I cleaned it 
> up
> and include the start_standby_server code. It starts the server using the
> specified socket directory, port and username, hence, preventing external 
> client
> connections during the execution.
> v24-0005: partially applied
> v24-0006: applied with cosmetic change
> v24-0007: applied with cosmetic change
> v24-0008: applied
> v24-0009: applied with cosmetic change
> v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
> not used when there is a database connection open. Instead, pg_log_error()
> followed by disconnect_database(). In cases that it should exit immediately,
> disconnect_database() has a new parameter (exit_on_error) that controls if it
> needs to call exit(1). I go ahead and did the same for connect_database().
> v24-0011: partially applied. I included some of the suggestions (18, 19, and 
> 21).
> v24-0012: not applied. Under reflection, after working on v24-0004, the target
> server will start with new parameters (that only accepts local connections),
> hence, during the execution is not possible anymore to detect if the target
> server is a primary for another server. I added a sentence for it in the
> documentation (Warning section).
> v24-0013: good catch. Applied.
> v24-0014: partially applied. After some experiments I decided to use a small
> number of attempts. The current code didn't reset the counter if the 
> connection
> is reestablished. I included the documentation suggestion. I didn't include 
> the
> IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
> I refactored the drop_publication() to not try again if the DROP PUBLICATION
> failed.
> v24-0015: not applied. I refactored the exit code to do the right thing: print
> error message, disconnect database (if applicable) and exit.
> v24-0016: not applied. But checked if the information was presented in the
> documentation; it is.
> v24-0017: good catch. Applied.
> v24-0018: not applied.
>
> I removed almost all boolean return and include the error logic inside the
> function. It reads better. I also changed the connect|disconnect_database
> functions to include the error logic inside it. There is a new parameter
> on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
> think someone suggested it -- and the error message was moved to outside the
> function. I improved the cleanup routine. It provides useful information if it
> cannot remove the object (publication or replication slot) from the primary.
>
> Since I applied v24-0004, I realized that extra start / stop service are
> required. It mean pg_createsubscriber doesn't start the transformation with 
> the
> current standby settings. Instead, it stops the standby if it is running and
> start it with the provided command-line options (socket, port,
> listen_addresses). It has a few drawbacks:
> * See v34-0012. It cannot detect if the target server is a primary for another
>   server. It is documented.
> * I also removed the check for standby is running. If the standby was stopped 
> a
>   long time ago, it will take some time to reach the start point.
> * Dry run mode has to start / stop the service to work correctly. Is it an
>   issue?
>
> However, I decided to include --retain option, I'm thinking about to remove 
> it.
> If the logging is enabled, the information during the pg_createsubscriber will
> be available. The client log can be redirected to a file for future 
> inspection.
>
> Comments?

I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
I was unable to execute it and experienced the following 

Re: speed up a logical replica setup

2024-03-01 Thread Euler Taveira
g_createsubscriber: creates a new logical replica from a
 standby server

It must be run on the target server and should be able to connect to the
source server (publisher) and the target server (subscriber).

Some prerequisites must be met to successfully run it. It is basically
the logical replication requirements. It starts creating a publication
for each specified database. After that, it stops the target server. One
temporary replication slot is created to get the replication start
point. It is used as (a) a stopping point for the recovery process and
(b) a starting point for the subscriptions. Write recovery parameters
into the target data directory and start the target server. Wait until
the target server is promoted. Create one subscription per specified
database (using replication slot and publication created in a previous
step) on the target server. Set the replication progress to the
replication start point for each subscription. Enable the subscription
for each specified database on the target server. And finally, change
the system identifier on the target server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_createsubscriber.sgml |  523 +
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|   11 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 2038 +
 .../t/040_pg_createsubscriber.pl  |  214 ++
 8 files changed, 2805 insertions(+), 3 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_createsubscriber.sgml
 create mode 100644 src/bin/pg_basebackup/pg_createsubscriber.c
 create mode 100644 src/bin/pg_basebackup/t/040_pg_createsubscriber.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..f5be638867 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -205,6 +205,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
new file mode 100644
index 00..44705b2c52
--- /dev/null
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -0,0 +1,523 @@
+
+
+
+ 
+  pg_createsubscriber
+ 
+
+ 
+  pg_createsubscriber
+  1
+  Application
+ 
+
+ 
+  pg_createsubscriber
+  convert a physical replica into a new logical replica
+ 
+
+ 
+  
+   pg_createsubscriber
+   option
+   
+
+ -d
+ --database
+
+dbname
+
+ -D 
+ --pgdata
+
+datadir
+
+ -P
+ --publisher-server
+
+connstr
+   
+  
+ 
+
+ 
+  Description
+  
+pg_createsubscriber creates a new logical
+replica from a physical standby server. It must be run at the target server.
+  
+
+  
+   The pg_createsubscriber targets large database
+   systems because it speeds up the logical replication setup. For smaller
+   databases, initial data synchronization
+   is recommended.
+  
+
+  
+   There are some prerequisites for pg_createsubscriber
+   to convert the target server into a logical replica. If these are not met an
+   error will be reported.
+  
+
+  
+   
+
+ The source and target servers must have the same major version as the
+ pg_createsubscriber.
+
+   
+
+   
+
+ The given target data directory must have the same system identifier than
+ the source data directory. If a standby server is running on the target
+ data directory or it is a base backup from the source data directory,
+ system identifiers are the same.
+
+   
+
+   
+
+ The target server must be used as a physical standby.
+
+   
+
+   
+
+ The given database user for the target data directory must have privileges
+ for creating subscriptions
+ and using pg_replication_origin_advance().
+
+   
+
+   
+
+ The target server must have
+ max_replication_slots
+ and
+ max_logical_replication_workers
+ configured to a value greater than or equal to the number of specified databases.
+
+   
+
+   
+
+ The target server must have
+ max_worker_processes
+ configured to a value greater than the number of specified databases.
+
+   
+
+   
+
+ The target server must accept local connections.
+
+   
+
+   
+
+ The source server must accept connections from the target server.
+
+   
+
+   
+
+ The source server must not be in recovery. Publications cannot be created
+

Re: speed up a logical replica setup

2024-02-28 Thread Andrew Dunstan



On 2024-02-27 Tu 05:02, Hayato Kuroda (Fujitsu) wrote:

Dear Euler,


Sorry, which pg_rewind option did you mention? I cannot find.
IIUC, -l is an only option which can accept the path, but it is not related 
with us.

Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so 
changed.
The reason why --config-file exists is that pg_rewind requires that target must 
be stopped,
which is different from the current pg_createsubscriber. So I still do not like 
to add options.

[1]: 
https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename
[2]:
```
The target server must be shut down cleanly before running pg_rewind
```



Even though that is a difference I'd still rather we did more or less 
the same thing more or less the same way across utilities, so I agree 
with Euler's suggestion.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





RE: speed up a logical replica setup

2024-02-27 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

> Sorry, which pg_rewind option did you mention? I cannot find.
> IIUC, -l is an only option which can accept the path, but it is not related 
> with us.

Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so 
changed.
The reason why --config-file exists is that pg_rewind requires that target must 
be stopped,
which is different from the current pg_createsubscriber. So I still do not like 
to add options.

[1]: 
https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename
[2]:
```
The target server must be shut down cleanly before running pg_rewind
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



RE: speed up a logical replica setup

2024-02-26 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

>> The possible solution would be
>> 1) allow to run pg_createsubscriber if standby is initially stopped .
>> I observed that pg_logical_createsubscriber also uses this approach.
>> 2) read GUCs via SHOW command and restore them when server restarts
>>
>3. add a config-file option. That's similar to what pg_rewind does.

Sorry, which pg_rewind option did you mention? I cannot find.
IIUC, -l is an only option which can accept the path, but it is not related 
with us.

Also, I'm not sure the benefit to add as new options. Basically it should be 
less.
Is there benefits than read via SHOW? Even if I assume the pg_resetwal has such
an option, the reason is that the target postmaster for pg_resetwal must be 
stopped.

>I expect
>that Debian-based installations will have this issue.

I'm not familiar with the Debian-env, so can you explain the reason?

>It was not a good idea if you want to keep the postgresql.conf outside PGDATA.
>I mean you need extra steps that can be error prone (different settings between
>files).

Yeah, if we use my approach, users who specify such GUCs may not be happy.
So...based on above discussion, we should choose either of below items. Thought?

a)
enforce the standby must be *stopped*, and options like config_file can be 
specified via option.
b)
enforce the standby must be *running*,  options like config_file would be read 
via SHOW command.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 



RE: speed up a logical replica setup

2024-02-25 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

I forgot to reply one of the suggestion.

> 2) There is a CheckDataVersion function which does exactly this, will
> we be able to use this:
> +   /* Check standby server version */
> +   if ((ver_fd = fopen(versionfile, "r")) == NULL)
> +   pg_fatal("could not open file \"%s\" for reading: %m",
> versionfile);
> +
> +   /* Version number has to be the first line read */
> +   if (!fgets(rawline, sizeof(rawline), ver_fd))
> +   {
> +   if (!ferror(ver_fd))
> +   pg_fatal("unexpected empty file \"%s\"", versionfile);
> +   else
> +   pg_fatal("could not read file \"%s\": %m", 
> versionfile);
> +   }
> +
> +   /* Strip trailing newline and carriage return */
> +   (void) pg_strip_crlf(rawline);
> +
> +   if (strcmp(rawline, PG_MAJORVERSION) != 0)
> +   {
> +   pg_log_error("standby server is of wrong version");
> +   pg_log_error_detail("File \"%s\" contains \"%s\",
> which is not compatible with this program's version \"%s\".",
> +   versionfile,
> rawline, PG_MAJORVERSION);
> +   exit(1);
> +   }
> +
> +   fclose(ver_fd);


This suggestion has been rejected because I was not sure the location of the
declaration and the implementation. Function which could be called from clients
must be declared in src/include/{common|fe_utils|utils} directory. I saw files
located at there but I could not appropriate location for CheckDataVersion.
Also, I did not think new file should be created only for this function.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: speed up a logical replica setup

2024-02-22 Thread Amit Kapila
On Fri, Feb 23, 2024 at 8:16 AM Euler Taveira  wrote:
>
> On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
>
> I found some issues and fixed those issues with top up patches
> v23-0012 and v23-0013
> 1.
> Suppose there is a cascade physical replication node1->node2->node3.
> Now if we run pg_createsubscriber with node1 as primary and node2 as
> standby, pg_createsubscriber will be successful but the connection
> between node2 and node3 will not be retained and log og node3 will
> give error:
> 2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
> identifier differs between the primary and standby
> 2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
> is 7337575856950914038, the standby's identifier is
> 7337575783125171076.
> 2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
> available at 0/3000F10
>
> To fix this I am avoiding pg_createsubscriber to run if the standby
> node is primary to any other server.
> Made the change in v23-0012 patch
>
>
> IIRC we already discussed the cascading replication scenario. Of course,
> breaking a node is not good that's why you proposed v23-0012. However,
> preventing pg_createsubscriber to run if there are standbys attached to it is
> also annoying. If you don't access to these hosts you need to (a) kill
> walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
> add a firewall rule to prevent that these hosts do not establish a connection
> to the target server. I wouldn't like to include the patch as-is. IMO we need
> at least one message explaining the situation to the user, I mean, add a hint
> message.
>

Yeah, it could be a bit tricky for users to ensure that no follow-on
standby is present but I think it is still better to give an error and
prohibit running pg_createsubscriber than breaking the existing
replication. The possible solution, in this case, is to allow running
pg_basebackup via this tool or otherwise and then let the user use it
to convert to a subscriber. It would be good to keep things simple for
the first version then we can add such options like --force in
subsequent versions.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-02-22 Thread Euler Taveira
On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
> I found some issues and fixed those issues with top up patches
> v23-0012 and v23-0013
> 1.
> Suppose there is a cascade physical replication node1->node2->node3.
> Now if we run pg_createsubscriber with node1 as primary and node2 as
> standby, pg_createsubscriber will be successful but the connection
> between node2 and node3 will not be retained and log og node3 will
> give error:
> 2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
> identifier differs between the primary and standby
> 2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
> is 7337575856950914038, the standby's identifier is
> 7337575783125171076.
> 2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
> available at 0/3000F10
> 
> To fix this I am avoiding pg_createsubscriber to run if the standby
> node is primary to any other server.
> Made the change in v23-0012 patch

IIRC we already discussed the cascading replication scenario. Of course,
breaking a node is not good that's why you proposed v23-0012. However,
preventing pg_createsubscriber to run if there are standbys attached to it is
also annoying. If you don't access to these hosts you need to (a) kill
walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
add a firewall rule to prevent that these hosts do not establish a connection
to the target server. I wouldn't like to include the patch as-is. IMO we need
at least one message explaining the situation to the user, I mean, add a hint
message.  I'm resistant to a new option but probably a --force option is an
answer. There is no test coverage for it. I adjusted this patch (didn't include
the --force option) and add a test case.

> 2.
> While checking 'max_replication_slots' in 'check_publisher' function,
> we are not considering the temporary slot in the check:
> +   if (max_repslots - cur_repslots < num_dbs)
> +   {
> +   pg_log_error("publisher requires %d replication slots, but
> only %d remain",
> +num_dbs, max_repslots - cur_repslots);
> +   pg_log_error_hint("Consider increasing max_replication_slots
> to at least %d.",
> + cur_repslots + num_dbs);
> +   return false;
> +   }
> Fixed this in v23-0013

Good catch!

Both are included in the next patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-22 Thread Euler Taveira
On Thu, Feb 22, 2024, at 9:43 AM, Hayato Kuroda (Fujitsu) wrote:
> > The possible solution would be
> > 1) allow to run pg_createsubscriber if standby is initially stopped .
> > I observed that pg_logical_createsubscriber also uses this approach.
> > 2) read GUCs via SHOW command and restore them when server restarts
> >

3. add a config-file option. That's similar to what pg_rewind does. I expect
that Debian-based installations will have this issue.

> I also prefer the first solution. 
> Another reason why the standby should be stopped is for backup purpose.
> Basically, the standby instance should be saved before running 
> pg_createsubscriber.
> An easiest way is hard-copy, and the postmaster should be stopped at that 
> time.
> I felt it is better that users can run the command immediately later the 
> copying.
> Thought?

It was not a good idea if you want to keep the postgresql.conf outside PGDATA.
I mean you need extra steps that can be error prone (different settings between
files).

Shlok, I didn't read your previous email carefully. :-/


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-22 Thread 'Alvaro Herrera'
Hello,

On 2024-Feb-22, Hayato Kuroda (Fujitsu) wrote:

> Dear Alvaro,

> > Hmm, but doesn't this mean that the server will log an ugly message
> > that "client closed connection unexpectedly"?  I think it's nicer to
> > close the connection before terminating the process (especially
> > since the code for that is already written).
> 
> OK. So we should disconnect properly even if the process exits. I
> added the function call again. Note that PQclear() was not added
> because it is only related with the application.

Sounds about right, but I didn't verify the patches in detail.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)




Re: speed up a logical replica setup

2024-02-22 Thread vignesh C
On Thu, 22 Feb 2024 at 21:17, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> > Few comments on the tests:
> > 1) If the dry run was successful because of some issue then the server
> > will be stopped so we can check for "pg_ctl status" if the server is
> > running otherwise the connection will fail in this case. Another way
> > would be to check if it does not have "postmaster was stopped"
> > messages in the stdout.
> > +
> > +# Check if node S is still a standby
> > +is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
> > +   't', 'standby is in recovery');
>
> Just to confirm - your suggestion is to add `pg_ctl status`, right? Added.

Yes, that is correct.

> > 2) Can we add verification of  "postmaster was stopped" messages in
> > the stdout for dry run without --databases testcase
> > +# pg_createsubscriber can run without --databases option
> > +command_ok(
> > +   [
> > +   'pg_createsubscriber', '--verbose',
> > +   '--dry-run', '--pgdata',
> > +   $node_s->data_dir, '--publisher-server',
> > +   $node_p->connstr('pg1'), '--subscriber-server',
> > +   $node_s->connstr('pg1')
> > +   ],
> > +   'run pg_createsubscriber without --databases');
> > +
>
> Hmm, in case of --dry-run, the server would be never shut down.
> See below part.
>
> ```
> if (!dry_run)
> stop_standby_server(pg_ctl_path, opt.subscriber_dir);
> ```

One way to differentiate whether the server is run in dry_run mode or
not is to check if the server was stopped or not. So I mean we can
check that the stdout does not have a "postmaster was stopped" message
from the stdout. Can we add validation based on this code:
+   if (action)
+   pg_log_info("postmaster was started");

Or another way is to check pg_ctl status to see that the server is not shutdown.

> > 3) This message "target server must be running" seems to be wrong,
> > should it be cannot specify cascading replicating standby as standby
> > node(this is for v22-0011 patch :
> > +   'pg_createsubscriber', '--verbose', '--pgdata',
> > $node_c->data_dir,
> > +   '--publisher-server', $node_s->connstr('postgres'),
> > +   '--port', $node_c->port, '--socketdir', $node_c->host,
> > +   '--database', 'postgres'
> > ],
> > -   'primary server is in recovery');
> > +   1,
> > +   [qr//],
> > +   [qr/primary server cannot be in recovery/],
> > +   'target server must be running');
>
> Fixed.
>
> > 4) Should this be "Wait for subscriber to catch up"
> > +# Wait subscriber to catch up
> > +$node_s->wait_for_subscription_sync($node_p, $subnames[0]);
> > +$node_s->wait_for_subscription_sync($node_p, $subnames[1]);
>
> Fixed.
>
> > 5) Should this be 'Subscriptions has been created on all the specified
> > databases'
> > +);
> > +is($result, qq(2),
> > +   'Subscriptions has been created to all the specified databases'
> > +);
>
> Fixed, but "has" should be "have".
>
> > 6) Add test to verify current_user is not a member of
> > ROLE_PG_CREATE_SUBSCRIPTION, has no create permissions, has no
> > permissions to execution replication origin advance functions
> >
> > 7) Add tests to verify insufficient max_logical_replication_workers,
> > max_replication_slots and max_worker_processes for the subscription
> > node
> >
> > 8) Add tests to verify invalid configuration of  wal_level,
> > max_replication_slots and max_wal_senders for the publisher node
>
> Hmm, but adding these checks may increase the test time. we should
> measure the time and then decide.

We can check and see if it does not take significantly more time, then
we can have these tests.

Regards,
Vignesh




RE: speed up a logical replica setup

2024-02-22 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

> Few comments on the tests:
> 1) If the dry run was successful because of some issue then the server
> will be stopped so we can check for "pg_ctl status" if the server is
> running otherwise the connection will fail in this case. Another way
> would be to check if it does not have "postmaster was stopped"
> messages in the stdout.
> +
> +# Check if node S is still a standby
> +is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'),
> +   't', 'standby is in recovery');

Just to confirm - your suggestion is to add `pg_ctl status`, right? Added.

> 2) Can we add verification of  "postmaster was stopped" messages in
> the stdout for dry run without --databases testcase
> +# pg_createsubscriber can run without --databases option
> +command_ok(
> +   [
> +   'pg_createsubscriber', '--verbose',
> +   '--dry-run', '--pgdata',
> +   $node_s->data_dir, '--publisher-server',
> +   $node_p->connstr('pg1'), '--subscriber-server',
> +   $node_s->connstr('pg1')
> +   ],
> +   'run pg_createsubscriber without --databases');
> +

Hmm, in case of --dry-run, the server would be never shut down.
See below part.

```
if (!dry_run)
stop_standby_server(pg_ctl_path, opt.subscriber_dir);
```

> 3) This message "target server must be running" seems to be wrong,
> should it be cannot specify cascading replicating standby as standby
> node(this is for v22-0011 patch :
> +   'pg_createsubscriber', '--verbose', '--pgdata',
> $node_c->data_dir,
> +   '--publisher-server', $node_s->connstr('postgres'),
> +   '--port', $node_c->port, '--socketdir', $node_c->host,
> +   '--database', 'postgres'
> ],
> -   'primary server is in recovery');
> +   1,
> +   [qr//],
> +   [qr/primary server cannot be in recovery/],
> +   'target server must be running');

Fixed.

> 4) Should this be "Wait for subscriber to catch up"
> +# Wait subscriber to catch up
> +$node_s->wait_for_subscription_sync($node_p, $subnames[0]);
> +$node_s->wait_for_subscription_sync($node_p, $subnames[1]);

Fixed.

> 5) Should this be 'Subscriptions has been created on all the specified
> databases'
> +);
> +is($result, qq(2),
> +   'Subscriptions has been created to all the specified databases'
> +);

Fixed, but "has" should be "have".

> 6) Add test to verify current_user is not a member of
> ROLE_PG_CREATE_SUBSCRIPTION, has no create permissions, has no
> permissions to execution replication origin advance functions
> 
> 7) Add tests to verify insufficient max_logical_replication_workers,
> max_replication_slots and max_worker_processes for the subscription
> node
> 
> 8) Add tests to verify invalid configuration of  wal_level,
> max_replication_slots and max_wal_senders for the publisher node

Hmm, but adding these checks may increase the test time. we should
measure the time and then decide.

> 9) We can use the same node name in comment and for the variable
> +# Set up node P as primary
> +$node_p = PostgreSQL::Test::Cluster->new('node_p');
> +$node_p->init(allows_streaming => 'logical');
> +$node_p->start;

Fixed.

> 10) Similarly we can use node_f instead of F in the comments.
> +# Set up node F as about-to-fail node
> +# Force it to initialize a new cluster instead of copying a
> +# previously initdb'd cluster.
> +{
> +   local $ENV{'INITDB_TEMPLATE'} = undef;
> +
> +   $node_f = PostgreSQL::Test::Cluster->new('node_f');
> +   $node_f->init(allows_streaming => 'logical');
> +   $node_f->start;
>

Fixed. Also, recent commit [1] allows to run the initdb forcibly. So followed.

New patch can be available in [2].

[1]: 
https://github.com/postgres/postgres/commit/ff9e1e764fcce9a34467d614611a34d4d2a91b50
[2]: 
https://www.postgresql.org/message-id/TYCPR01MB12077CD76B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



RE: speed up a logical replica setup

2024-02-22 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

> Few comments:
> 1) The below code can lead to assertion failure if the publisher is
> stopped while dropping the replication slot:
> +   if (primary_slot_name != NULL)
> +   {
> +   conn = connect_database(dbinfo[0].pubconninfo);
> +   if (conn != NULL)
> +   {
> +   drop_replication_slot(conn, [0],
> primary_slot_name);
> +   }
> +   else
> +   {
> +   pg_log_warning("could not drop replication
> slot \"%s\" on primary",
> +  primary_slot_name);
> +   pg_log_warning_hint("Drop this replication
> slot soon to avoid retention of WAL files.");
> +   }
> +   disconnect_database(conn);
> +   }
> 
> pg_createsubscriber: error: connection to database failed: connection
> to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or
> directory
> Is the server running locally and accepting connections on that socket?
> pg_createsubscriber: warning: could not drop replication slot
> "standby_1" on primary
> pg_createsubscriber: hint: Drop this replication slot soon to avoid
> retention of WAL files.
> pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database:
> Assertion `conn != ((void *)0)' failed.
> Aborted (core dumped)
> 
> This is happening because we are calling disconnect_database in case
> of connection failure case too which has the following assert:
> +static void
> +disconnect_database(PGconn *conn)
> +{
> +   Assert(conn != NULL);
> +
> +   PQfinish(conn);
> +}

Right. disconnect_database() was moved to if (conn != NULL) block.

> 2) There is a CheckDataVersion function which does exactly this, will
> we be able to use this:
> +   /* Check standby server version */
> +   if ((ver_fd = fopen(versionfile, "r")) == NULL)
> +   pg_fatal("could not open file \"%s\" for reading: %m",
> versionfile);
> +
> +   /* Version number has to be the first line read */
> +   if (!fgets(rawline, sizeof(rawline), ver_fd))
> +   {
> +   if (!ferror(ver_fd))
> +   pg_fatal("unexpected empty file \"%s\"", versionfile);
> +   else
> +   pg_fatal("could not read file \"%s\": %m", 
> versionfile);
> +   }
> +
> +   /* Strip trailing newline and carriage return */
> +   (void) pg_strip_crlf(rawline);
> +
> +   if (strcmp(rawline, PG_MAJORVERSION) != 0)
> +   {
> +   pg_log_error("standby server is of wrong version");
> +   pg_log_error_detail("File \"%s\" contains \"%s\",
> which is not compatible with this program's version \"%s\".",
> +   versionfile,
> rawline, PG_MAJORVERSION);
> +   exit(1);
> +   }
> +
> +   fclose(ver_fd);


> 3) Should this be added to typedefs.list:
> +enum WaitPMResult
> +{
> +   POSTMASTER_READY,
> +   POSTMASTER_STILL_STARTING
> +};

But the comment from Peter E. [1] was opposite. I did not handle this.

> 4) pgCreateSubscriber should be mentioned after pg_controldata to keep
> the ordering consistency:
> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
> index aa94f6adf6..c5edd244ef 100644
> --- a/doc/src/sgml/reference.sgml
> +++ b/doc/src/sgml/reference.sgml
> @@ -285,6 +285,7 @@
> 
> 
> 
> +   
> 

This has been already pointed out by Peter E. I did not handle this.

> 5) Here pg_replication_slots should be pg_catalog.pg_replication_slots:
> +   if (primary_slot_name)
> +   {
> +   appendPQExpBuffer(str,
> + "SELECT 1 FROM
> pg_replication_slots "
> + "WHERE active AND
> slot_name = '%s'",
> + primary_slot_name);

Fixed.

> 6) Here pg_settings should be pg_catalog.pg_settings:
> +* - max_worker_processes >= 1 + number of dbs to be converted
> +
> *
> +*/
> +   res = PQexec(conn,
> +"SELECT setting FROM pg_settings
> WHERE name IN ("
> +"'max_logical_replication_workers', "
> +"'max_replication_slots', "
> +"'max_worker_processes', "
> +"'primary_slot_name') "
> +"ORDER BY name");

Fixed.

New version can be available in [2]

[1]: 
https://www.postgresql.org/message-id/3ee79f2c-e8b3-4342-857c-a31b87e1afda%40eisentraut.org
[2]: 
https://www.postgresql.org/message-id/TYCPR01MB12077CD76B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



RE: speed up a logical replica setup

2024-02-22 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

> Few comments regarding the documentation:
> 1) max_replication_slots information seems to be present couple of times:
> 
> +
> + The target instance must have
> +  linkend="guc-max-replication-slots">max_replication_slots me>
> + and  linkend="guc-max-logical-replication-workers">max_logical_replica
> tion_workers
> + configured to a value greater than or equal to the number of target
> + databases.
> +
>
> +   
> +
> + The target instance must have
> +  linkend="guc-max-replication-slots">max_replication_slots me>
> + configured to a value greater than or equal to the number of target
> + databases and replication slots.
> +
> +   

Fixed.

> 2) Can we add an id to prerequisites and use it instead of referring
> to r1-app-pg_createsubscriber-1:
> - pg_createsubscriber checks if the
> given target data
> - directory has the same system identifier than the source data directory.
> - Since it uses the recovery process as one of the steps, it starts the
> - target server as a replica from the source server. If the system
> - identifier is not the same,
> pg_createsubscriber will
> - terminate with an error.
> + Checks the target can be converted.  In particular, things listed in
> + above section
> would be
> + checked.  If these are not met
> pg_createsubscriber
> + will terminate with an error.
>  

Changed.

> 3) The code also checks the following:
>  Verify if a PostgreSQL binary (progname) is available in the same
> directory as pg_createsubscriber.
> 
> But this is not present in the pre-requisites of documentation.

I think it is quite trivial so that I did not add.

> 4) Here we mention that the target server should be stopped, but the
> same is not mentioned in prerequisites:
> +   Here is an example of using
> pg_createsubscriber.
> +   Before running the command, please make sure target server is stopped.
> +
> +$ pg_ctl -D /usr/local/pgsql/data
> stop
> +
> +

Oh, it is opposite, it should NOT be stopped. Fixed.

> 5) If there is an error during any of the pg_createsubscriber
> operation like if create subscription fails, it might not be possible
> to rollback to the earlier state which had physical-standby
> replication. I felt we should document this and also add it to the
> console message like how we do in case of pg_upgrade.

Added.

New version can be available in [1]

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB12077CD76B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



  1   2   3   >