RE: speed up a logical replica setup
Dear Euler, Thanks for making the follow-up patch! I was looking forward to your updates. I think this patch set is the solution for the found buildfarm error. However, there are remained claims raised by others. You should reply what you think for them. At least: 1) There are some misleading messages [1]. I think v3-0005 patch set can solve the issue. 2) pg_createsubscriber may fail If the primary has subscriptions [2]. IIUC possible approaches are A)"keep subscriptions disabled at the end", B)"by default drop the pre-existing subscriptions", C) "do nothing, just document the risk". > Before sending this email I realized that I did nothing about physical > replication slots on the standby. I think we should also remove them too > unconditionally. I also considered around here, but it might be difficult to predict the expectation by users. Can we surely say that it's not intentional one? Regarding the failover slot, it is OK because that's meaningful only on the standby, but not sure other slots. I personally think we can keep current spec, but how do other think? Below parts are comments for each patches. 0001 Basically LGTM. I was bit confused because the default timeout is not set, but it seemed to follow the suggestion by Tomas [3]. 0002 If you want to improve the commit message, please add that sync_replication_slots is disabled during the conversion. 0003 Confirmed it followed the discussion [4]. 0004 Basically LGTM. Other minor comments are included by the attached diff file. It contains changes to follow conventions and pgindent/pgperltidy. [1]: https://www.postgresql.org/message-id/CAA4eK1J2fAvsJ2HihbWJ_GxETd6sdqSMrZdCVJEutRZRpm1MEQ%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CANhcyEWvimA1-f6hSrA%3D9qkfR5SonFb56b36M%2B%2BvT%3DLiFj%3D76g%40mail.gmail.com [3]: https://www.postgresql.org/message-id/5d5dd4cd-6359-4109-88e8-c8e13035ae16%40enterprisedb.com [4]: https://www.postgresql.org/message-id/CAA4eK1LZxYxcbeiOn3Q5hjXVtZKhJWj-fQtndAeTCvZrPev8BA%40mail.gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/global/ minor_fix_by_kuroda.diff Description: minor_fix_by_kuroda.diff
Re: speed up a logical replica setup
On Wed, May 22, 2024, at 12:16 PM, Euler Taveira wrote: > I'll summarize all issues as soon as I finish the review of sync slot > support. I > think we should avoid new development if we judge that the item can be > documented as a limitation for this version. Nevertheless, I will share > patches > so you can give your opinion on whether it is an open item or new development. Here it is a patch series to fix the issues reported in recent discussions. The patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 removes synchronized failover slots on subscriber since it has no use. I also included an optional patch 0004 that improves the usability by checking both servers if it already failed in any subscriber check. As I said in this previous email I decided to remove the logic that reacts for an issue on primary. We can reintroduce another code later if/when we have a better way to check the recovery progress. It will rely on the recovery_timeout and it adds recovery_timeout equals to PG_TEST_TIMEOUT_DEFAULT to let the animals control how long it should wait for the recovery. Since some animals reported some issues in the check_publisher routine that checks if the primary_slot_name is in use on primary, this logic was removed too (patch 0003). We could introduce a way to keep trying this test but the conclusion is that it is not worth it and if the primary_slot_name does not exist (due to a setup error), pg_createsubscriber will log an error message and continue. The 0002 removes any failover slots that remains on subscriber. Talking about terminology, I noticed that slotsync.c uses "logical failover slots" and "failover logical slots", I think the latter sounds better but I'm not a native speaker. I also don't know if we use a short terminology like "failover slots" "failover replication slots" or "failover logical replication slots". IMO we can omit "logical" because "failover" infers it is a logical replication slot. I'm also not sure about omitting "replication". It is not descriptive enough. I prefer "failover replication slots". Before sending this email I realized that I did nothing about physical replication slots on the standby. I think we should also remove them too unconditionally. -- Euler Taveira EDB https://www.enterprisedb.com/ From 5d8b4781e6e9bcb00564f45c25e575a8abab6ae8 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Tue, 21 May 2024 23:04:57 -0300 Subject: [PATCH 1/4] Only the recovery_timeout controls the end of recovery process It used to check if the target server is connected to the primary server (send required WAL) to rapidly react when the process won't succeed. This code is not enough to guarantee that the recovery process will complete. There is a window between the walreceiver shutdown and the pg_is_in_recovery() returns false that can reach NUM_CONN_ATTEMPTS attempts and fails. Instead, rely only on recovery_timeout option to give up the process after the specified number of seconds. --- doc/src/sgml/ref/pg_createsubscriber.sgml | 7 - src/bin/pg_basebackup/pg_createsubscriber.c | 29 ++- .../t/040_pg_createsubscriber.pl | 2 ++ 3 files changed, 5 insertions(+), 33 deletions(-) diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml index 142b02..a700697f88 100644 --- a/doc/src/sgml/ref/pg_createsubscriber.sgml +++ b/doc/src/sgml/ref/pg_createsubscriber.sgml @@ -325,13 +325,6 @@ PostgreSQL documentation connections to the target server should fail. - -During the recovery process, if the target server disconnects from the -source server, pg_createsubscriber will check a -few times if the connection has been reestablished to stream the required -WAL. After a few attempts, it terminates with an error. - - Since DDL commands are not replicated by logical replication, avoid executing DDL commands that change the database schema while running diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c index 90cc580811..f62f34b1a7 100644 --- a/src/bin/pg_basebackup/pg_createsubscriber.c +++ b/src/bin/pg_basebackup/pg_createsubscriber.c @@ -1360,6 +1360,9 @@ stop_standby_server(const char *datadir) * * If recovery_timeout option is set, terminate abnormally without finishing * the recovery process. By default, it waits forever. + * + * XXX Is the recovery process still in progress? When recovery process has a + * better progress reporting mechanism, it should be added here. */ static void wait_for_end_recovery(const char *conninfo, const struct CreateSubscriberOptions *opt) @@ -1367,9 +1370,6 @@ wait_for_end_recovery(const char *conninfo, const struct CreateSubscriberOptions PGconn *conn; int status = POSTMASTER_STILL_STARTING; int timer = 0; - int count = 0; /* number of consecutive connection attempts */ - -#define NUM_CONN_ATTEMPTS 10
Re: speed up a logical replica setup
On Wed, 22 May 2024 at 16:50, Amit Kapila wrote: > > On Mon, May 20, 2024 at 4:30 PM Shlok Kyal wrote: > > > > > > I was trying to test this utility when 'sync_replication_slots' is on > > > and it gets in an ERROR loop [1] and never finishes. Please find the > > > postgresql.auto used on the standby attached. I think if the standby > > > has enabled sync_slots, you need to pass dbname in > > > GenerateRecoveryConfig(). I couldn't test it further but I wonder if > > > there are already synced slots on the standby (either due to > > > 'sync_replication_slots' or users have used > > > pg_sync_replication_slots() before invoking pg_createsubscriber), > > > those would be retained as it is on new subscriber and lead to > > > unnecessary WAL retention and dead rows. > > > > > > [1] > > > 2024-04-30 11:50:43.239 IST [12536] LOG: slot sync worker started > > > 2024-04-30 11:50:43.247 IST [12536] ERROR: slot synchronization > > > requires dbname to be specified in primary_conninfo > > > > Hi, > > > > I tested the scenario posted by Amit in [1], in which retaining synced > > slots lead to unnecessary WAL retention and ERROR. This is raised as > > the second open point in [2]. > > The steps to reproduce the issue: > > (1) Setup physical replication with sync slot feature turned on by > > setting sync_replication_slots = 'true' or using > > pg_sync_replication_slots() on the standby node. > > For physical replication setup, run pg_basebackup with -R and -d option. > > (2) Create a logical replication slot on primary node with failover > > option as true. A corresponding slot is created on standby as part of > > sync slot feature. > > (3) Run pg_createsubscriber on standby node. > > (4) On Checking for the replication slot on standby node, I noticed > > that the logical slots created in step 2 are retained. > > I have attached the script to reproduce the issue. > > > > I and Kuroda-san worked to resolve open points. Here are patches to > > solve the second and third point in [2]. > > Patches proposed by Euler are also attached just in case, but they > > were not modified. > > > > v2-0001: not changed > > > > Shouldn't we modify it as per the suggestion given in the email [1]? I > am wondering if we can entirely get rid of checking the primary > business and simply rely on recovery_timeout and keep checking > server_is_in_recovery(). If so, we can modify the test to use > non-default recovery_timeout (say 180s or something similar if we have > used it at any other place). As an additional check we can ensure that > constent_lsn is present on standby. > I have made changes as per your suggestion. I have used pg_last_wal_receive_lsn to get lsn last wal received on standby and check if consistent_lsn is already present on the standby. I have only made changes in v3-0001. v3-0002, v3-0003, v3-0004 and v3-0005 are not modified. Thanks and Regards, Shlok Kyal v3-0005-Avoid-outputing-some-messages-in-dry_run-mode.patch Description: Binary data v3-0003-Disable-slot-sync-during-the-convertion.patch Description: Binary data v3-0001-Improve-the-code-that-checks-if-the-recovery-is-f.patch Description: Binary data v3-0004-Drop-replication-slots-which-had-been-synchronize.patch Description: Binary data v3-0002-Improve-the-code-that-checks-if-the-primary-slot-.patch Description: Binary data
Re: speed up a logical replica setup
On Thu, May 23, 2024 at 8:43 PM Euler Taveira wrote: > > On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote: > > > Why in the first place do we need to ensure that primary_slot_name is > active on the primary? You mentioned something related to WAL > retention but I don't know how that is related to this tool's > functionality. If at all, we are bothered about WAL retention on the > primary that should be the WAL corresponding to consistent_lsn > computed by setup_publisher() but this check doesn't seem to ensure > that. > > Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and > make sure the replication slot is used by the physical replication. I mean if > there is primary_slot_name = 'foo' on standby, there is no guarantee that the > replication slot 'foo' exists on primary. The idea is to get the exact > replication slot name used by physical replication to drop it. Once I posted a > patch it should be clear. (Another idea is to relax this check and rely only > on > primary_slot_name to drop this replication slot on primary. The replication > slot > might not exist and it shouldn't return an error in this case.) > I think your other idea is better than what we are doing currently. Let's ignore the ERROR even if the primary_slot_name doesn't exist on the primary. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote: > On Wed, May 22, 2024 at 8:46 PM Euler Taveira wrote: > > > > Following the same line that simplifies the code, we can: (a) add a loop in > > check_subscriber() that waits until walreceiver is available on subscriber > > or > > (b) use a timeout. The main advantage of (a) is that the primary slot is > > already > > available but I'm afraid we need a escape mechanism for the loop (timeout?). > > > > Sorry, it is not clear to me why we need any additional loop in > check_subscriber(), aren't we speaking about the problem in > check_publisher() function? The idea is to use check_subscriber() to check pg_stat_walreceiver. Once this view returns a row and primary_slot_name is set on standby, the referred replication slot name should be active on primary. Hence, the query on check_publisher() make sure that the referred replication slot is in use on primary. > Why in the first place do we need to ensure that primary_slot_name is > active on the primary? You mentioned something related to WAL > retention but I don't know how that is related to this tool's > functionality. If at all, we are bothered about WAL retention on the > primary that should be the WAL corresponding to consistent_lsn > computed by setup_publisher() but this check doesn't seem to ensure > that. Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and make sure the replication slot is used by the physical replication. I mean if there is primary_slot_name = 'foo' on standby, there is no guarantee that the replication slot 'foo' exists on primary. The idea is to get the exact replication slot name used by physical replication to drop it. Once I posted a patch it should be clear. (Another idea is to relax this check and rely only on primary_slot_name to drop this replication slot on primary. The replication slot might not exist and it shouldn't return an error in this case.) -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Wed, May 22, 2024 at 8:46 PM Euler Taveira wrote: > > On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote: > > > v2-0002: not changed > > > > We have added more tries to see if the primary_slot_name becomes > active but I think it is still fragile because it is possible on slow > machines that the required slot didn't become active even after more > retries. I have raised the same comment previously [2] and asked an > additional question but didn't get any response. > > > Following the same line that simplifies the code, we can: (a) add a loop in > check_subscriber() that waits until walreceiver is available on subscriber or > (b) use a timeout. The main advantage of (a) is that the primary slot is > already > available but I'm afraid we need a escape mechanism for the loop (timeout?). > Sorry, it is not clear to me why we need any additional loop in check_subscriber(), aren't we speaking about the problem in check_publisher() function? Why in the first place do we need to ensure that primary_slot_name is active on the primary? You mentioned something related to WAL retention but I don't know how that is related to this tool's functionality. If at all, we are bothered about WAL retention on the primary that should be the WAL corresponding to consistent_lsn computed by setup_publisher() but this check doesn't seem to ensure that. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote: > > > > v2-0001: not changed > > > > Shouldn't we modify it as per the suggestion given in the email [1]? I > am wondering if we can entirely get rid of checking the primary > business and simply rely on recovery_timeout and keep checking > server_is_in_recovery(). If so, we can modify the test to use > non-default recovery_timeout (say 180s or something similar if we have > used it at any other place). As an additional check we can ensure that > constent_lsn is present on standby. That's exactly what I want to propose as Tomas convinced me offlist that less is better when we don't have a useful recovery progress reporting mechanism to make sure it is still working on the recovery and we should wait. > > v2-0002: not changed > > > > We have added more tries to see if the primary_slot_name becomes > active but I think it is still fragile because it is possible on slow > machines that the required slot didn't become active even after more > retries. I have raised the same comment previously [2] and asked an > additional question but didn't get any response. Following the same line that simplifies the code, we can: (a) add a loop in check_subscriber() that waits until walreceiver is available on subscriber or (b) use a timeout. The main advantage of (a) is that the primary slot is already available but I'm afraid we need a escape mechanism for the loop (timeout?). I'll summarize all issues as soon as I finish the review of sync slot support. I think we should avoid new development if we judge that the item can be documented as a limitation for this version. Nevertheless, I will share patches so you can give your opinion on whether it is an open item or new development. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Mon, May 20, 2024 at 4:30 PM Shlok Kyal wrote: > > > > I was trying to test this utility when 'sync_replication_slots' is on > > and it gets in an ERROR loop [1] and never finishes. Please find the > > postgresql.auto used on the standby attached. I think if the standby > > has enabled sync_slots, you need to pass dbname in > > GenerateRecoveryConfig(). I couldn't test it further but I wonder if > > there are already synced slots on the standby (either due to > > 'sync_replication_slots' or users have used > > pg_sync_replication_slots() before invoking pg_createsubscriber), > > those would be retained as it is on new subscriber and lead to > > unnecessary WAL retention and dead rows. > > > > [1] > > 2024-04-30 11:50:43.239 IST [12536] LOG: slot sync worker started > > 2024-04-30 11:50:43.247 IST [12536] ERROR: slot synchronization > > requires dbname to be specified in primary_conninfo > > Hi, > > I tested the scenario posted by Amit in [1], in which retaining synced > slots lead to unnecessary WAL retention and ERROR. This is raised as > the second open point in [2]. > The steps to reproduce the issue: > (1) Setup physical replication with sync slot feature turned on by > setting sync_replication_slots = 'true' or using > pg_sync_replication_slots() on the standby node. > For physical replication setup, run pg_basebackup with -R and -d option. > (2) Create a logical replication slot on primary node with failover > option as true. A corresponding slot is created on standby as part of > sync slot feature. > (3) Run pg_createsubscriber on standby node. > (4) On Checking for the replication slot on standby node, I noticed > that the logical slots created in step 2 are retained. > I have attached the script to reproduce the issue. > > I and Kuroda-san worked to resolve open points. Here are patches to > solve the second and third point in [2]. > Patches proposed by Euler are also attached just in case, but they > were not modified. > > v2-0001: not changed > Shouldn't we modify it as per the suggestion given in the email [1]? I am wondering if we can entirely get rid of checking the primary business and simply rely on recovery_timeout and keep checking server_is_in_recovery(). If so, we can modify the test to use non-default recovery_timeout (say 180s or something similar if we have used it at any other place). As an additional check we can ensure that constent_lsn is present on standby. > v2-0002: not changed > We have added more tries to see if the primary_slot_name becomes active but I think it is still fragile because it is possible on slow machines that the required slot didn't become active even after more retries. I have raised the same comment previously [2] and asked an additional question but didn't get any response. [1] - https://www.postgresql.org/message-id/CAA4eK1JJq_ER6Kq_H%3DjKHR75QPRd8y9_D%3DRtYw%3DaPYKMfqLi9A%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1LT3Z13Dg6p4Z%2B4adO_EY-ow5CmWfikEmBfL%3DeVrm8CPw%40mail.gmail.com -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
Hi, > > I was trying to test this utility when 'sync_replication_slots' is on > and it gets in an ERROR loop [1] and never finishes. Please find the > postgresql.auto used on the standby attached. I think if the standby > has enabled sync_slots, you need to pass dbname in > GenerateRecoveryConfig(). I couldn't test it further but I wonder if > there are already synced slots on the standby (either due to > 'sync_replication_slots' or users have used > pg_sync_replication_slots() before invoking pg_createsubscriber), > those would be retained as it is on new subscriber and lead to > unnecessary WAL retention and dead rows. > > [1] > 2024-04-30 11:50:43.239 IST [12536] LOG: slot sync worker started > 2024-04-30 11:50:43.247 IST [12536] ERROR: slot synchronization > requires dbname to be specified in primary_conninfo Hi, I tested the scenario posted by Amit in [1], in which retaining synced slots lead to unnecessary WAL retention and ERROR. This is raised as the second open point in [2]. The steps to reproduce the issue: (1) Setup physical replication with sync slot feature turned on by setting sync_replication_slots = 'true' or using pg_sync_replication_slots() on the standby node. For physical replication setup, run pg_basebackup with -R and -d option. (2) Create a logical replication slot on primary node with failover option as true. A corresponding slot is created on standby as part of sync slot feature. (3) Run pg_createsubscriber on standby node. (4) On Checking for the replication slot on standby node, I noticed that the logical slots created in step 2 are retained. I have attached the script to reproduce the issue. I and Kuroda-san worked to resolve open points. Here are patches to solve the second and third point in [2]. Patches proposed by Euler are also attached just in case, but they were not modified. v2-0001: not changed v2-0002: not changed v2-0003: ensures the slot sync is disabled during the conversion. This resolves the second point. v2-0004: drops sync slots which may be retained after running. This resolves the second point. v2-0005: removes misleading output messages in dry-run. This resolves the third point. [1]: https://www.postgresql.org/message-id/CAA4eK1KdCb%2B5sjYu6qCMXXdCX1y_ihr8kFzMozq0%3DP%3DauYxgog%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CAA4eK1J22UEfrqx222h5j9DQ7nxGrTbAa_BC%2B%3DmQXdXs-RCsew%40mail.gmail.com Thanks and Regards, Shlok Kyal # cleanup rm -rf ../primary ../standby primary.log standby.log # setup primary node ./initdb -D ../primary cat << EOF >> ../primary/postgresql.conf wal_level = logical EOF ./pg_ctl -D ../primary -l primary.log start ./psql -d postgres -c "CREATE table t1 (c1 int);" ./psql -d postgres -c "Insert into t1 values(1);" ./psql -d postgres -c "Insert into t1 values(2);" ./psql -d postgres -c "INSERT into t1 values(3);" ./psql -d postgres -c "INSERT into t1 values(4);" # setup standby node with sync slot feature ./pg_basebackup -h localhost -X stream -v -W -R -S 'sb1' -C -D ../standby/ -d 'dbname=postgres' cat << EOF >> ../standby/postgresql.conf sync_replication_slots = 'true' hot_standby_feedback = 'on' port = 9000 EOF ./pg_ctl -D ../standby -l standby.log start # create a replication slot on primary with failover option ./psql -d postgres -c "SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot_2', 'test_decoding', false, false, true);" sleep 2 # check if slots are synced on standby ./psql -d postgres -p 9000 -c "select * from pg_replication_slots;" ./pg_ctl -D ../standby -l standby.log stop # run pg_createsubscriber on standby ./pg_createsubscriber -D ../standby/ -p 9000 -P "host=localhost port=5432 dbname=postgres" -d postgres -v # check if replication slots are retained after running pg_createsubscriber ./pg_ctl -D ../standby -l standby.log start ./psql -d postgres -p 9000 -c "select * from pg_replication_slots;" ./pg_ctl -D ../standby -l standby.log stop ./pg_ctl -D ../primary -l primary.log stop v2-0001-Improve-the-code-that-checks-if-the-recovery-is-f.patch Description: Binary data v2-0002-Improve-the-code-that-checks-if-the-primary-slot-.patch Description: Binary data v2-0003-Disable-slot-sync-during-the-convertion.patch Description: Binary data v2-0004-Drop-replication-slots-which-had-been-synchronize.patch Description: Binary data v2-0005-Avoid-outputing-some-messages-in-dry_run-mode.patch Description: Binary data
Re: speed up a logical replica setup
On Sun, May 19, 2024 at 4:25 AM Thomas Munro wrote: > > 040_pg_createsubscriber.pl seems to be failing occasionally on > culicidae near "--dry-run on node S". I couldn't immediately see why. > That animal is using EXEC_BACKEND and I also saw a one-off failure a > bit like that on my own local Linux + EXEC_BACKEND test run > (sorry I didn't keep the details around). Coincidence? > AFAICS, this is the same as one of the two BF failures being discussed in this thread. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
040_pg_createsubscriber.pl seems to be failing occasionally on culicidae near "--dry-run on node S". I couldn't immediately see why. That animal is using EXEC_BACKEND and I also saw a one-off failure a bit like that on my own local Linux + EXEC_BACKEND test run (sorry I didn't keep the details around). Coincidence?
Re: speed up a logical replica setup
On Tue, Apr 30, 2024 at 12:04 PM Amit Kapila wrote: > > On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila wrote: > > > > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira wrote: > > > > I was trying to test this utility when 'sync_replication_slots' is on > and it gets in an ERROR loop [1] and never finishes. Please find the > postgresql.auto used on the standby attached. I think if the standby > has enabled sync_slots, you need to pass dbname in > GenerateRecoveryConfig(). The other possibility is that when we start standby from pg_createsubscriber, we specifically set 'sync_replication_slots' as false. > > I couldn't test it further but I wonder if > there are already synced slots on the standby (either due to > 'sync_replication_slots' or users have used > pg_sync_replication_slots() before invoking pg_createsubscriber), > those would be retained as it is on new subscriber and lead to > unnecessary WAL retention and dead rows. > This still needs some handling. BTW, I don't see the use of following messages in --dry-run mode or rather they could be misleading: pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you must recreate the physical replica before continuing. ... ... pg_createsubscriber: setting the replication progress (node name "pg_0" ; LSN 0/0) on database "postgres" Similarly, we should think if below messages are useful in --dry-run mode: pg_createsubscriber: dropping publication "pg_createsubscriber_5_887f7991" on database "postgres" pg_createsubscriber: creating subscription "pg_createsubscriber_5_887f7991" on database "postgres" ... pg_createsubscriber: enabling subscription "pg_createsubscriber_5_887f7991" on database "postgres" -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Mon, Apr 29, 2024 at 5:28 PM Amit Kapila wrote: > > On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira wrote: > I was trying to test this utility when 'sync_replication_slots' is on and it gets in an ERROR loop [1] and never finishes. Please find the postgresql.auto used on the standby attached. I think if the standby has enabled sync_slots, you need to pass dbname in GenerateRecoveryConfig(). I couldn't test it further but I wonder if there are already synced slots on the standby (either due to 'sync_replication_slots' or users have used pg_sync_replication_slots() before invoking pg_createsubscriber), those would be retained as it is on new subscriber and lead to unnecessary WAL retention and dead rows. [1] 2024-04-30 11:50:43.239 IST [12536] LOG: slot sync worker started 2024-04-30 11:50:43.247 IST [12536] ERROR: slot synchronization requires dbname to be specified in primary_conninfo -- With Regards, Amit Kapila. postgresql.auto.standby.conf Description: Binary data
Re: speed up a logical replica setup
On Mon, Apr 29, 2024 at 5:23 PM Euler Taveira wrote: > > On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote: > > On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira wrote: > > > > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote: > > > > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed? > > Why isn't recovery_timeout enough to decide if wait_for_end_recovery() > > waited long enough? > > > > > > It was an attempt to decoupled a connection failure (that keeps streaming > > the > > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the > > primary > > is gone during the standby recovery process, there is a way to bail out. > > > > I think we don't need to check primary if the WAL corresponding to > consistent_lsn is already present on the standby. Shouldn't we first > check that? Once we ensure that the required WAL is copied, just > checking server_is_in_recovery() should be sufficient. I feel that > will be a direct way of ensuring what is required rather than > indirectly verifying the same (by checking pg_stat_wal_receiver) as we > are doing currently. > > > How would you check it? WAL file? During recovery, you are not allowed to use > pg_current_wal_lsn. > How about pg_last_wal_receive_lsn()? -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote: > On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira wrote: > > > > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote: > > > > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed? > > Why isn't recovery_timeout enough to decide if wait_for_end_recovery() > > waited long enough? > > > > > > It was an attempt to decoupled a connection failure (that keeps streaming > > the > > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the > > primary > > is gone during the standby recovery process, there is a way to bail out. > > > > I think we don't need to check primary if the WAL corresponding to > consistent_lsn is already present on the standby. Shouldn't we first > check that? Once we ensure that the required WAL is copied, just > checking server_is_in_recovery() should be sufficient. I feel that > will be a direct way of ensuring what is required rather than > indirectly verifying the same (by checking pg_stat_wal_receiver) as we > are doing currently. How would you check it? WAL file? During recovery, you are not allowed to use pg_current_wal_lsn. Tomas suggested to me off-list that we should adopt a simple solution in wait_for_end_recovery: wait for recovery_timeout without additional checks (which means remove the pg_stat_wal_receiver logic). When we have additional information that we can reliably use in this function, we can add it. Hence, it is also easy to adjust the PG_TEST_TIMEOUT_DEFAULT to have stable tests. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Tue, Mar 26, 2024 at 8:24 AM Euler Taveira wrote: > > On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote: > > The first patch implements a combination of (1) and (2). > > ## Analysis for failure 2 > > According to [2], the physical replication slot which is specified as > primary_slot_name > was not used by the walsender process. At that time walsender has not existed. > > ``` > ... > pg_createsubscriber: publisher: current wal senders: 0 > pg_createsubscriber: command is: SELECT 1 FROM > pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot' > pg_createsubscriber: error: could not obtain replication slot information: > got 0 rows, expected 1 row > ... > ``` > > Currently standby must be stopped before the command and current code does not > block the flow to ensure the replication is started. So there is a possibility > that the checking is run before walsender is launched. > > One possible approach is to wait until the replication starts. Alternative > one is > to ease the condition. > > > That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to > NUM_ATTEMPTS in the first patch). See second patch. > How can we guarantee that the slot can become active after these many attempts? It still could be possible that on some slow machines it didn't get activated even after NUM_ATTEMPTS. BTW, in the first place, why do we need to ensure that the 'primary_slot_name' is active on the primary? -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira wrote: > > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote: > > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed? > Why isn't recovery_timeout enough to decide if wait_for_end_recovery() > waited long enough? > > > It was an attempt to decoupled a connection failure (that keeps streaming the > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the > primary > is gone during the standby recovery process, there is a way to bail out. > I think we don't need to check primary if the WAL corresponding to consistent_lsn is already present on the standby. Shouldn't we first check that? Once we ensure that the required WAL is copied, just checking server_is_in_recovery() should be sufficient. I feel that will be a direct way of ensuring what is required rather than indirectly verifying the same (by checking pg_stat_wal_receiver) as we are doing currently. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On 3/26/24 21:17, Euler Taveira wrote: > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote: >> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed? >> Why isn't recovery_timeout enough to decide if wait_for_end_recovery() >> waited long enough? > > It was an attempt to decoupled a connection failure (that keeps streaming the > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the > primary > is gone during the standby recovery process, there is a way to bail out. The > recovery-timeout is 0 (infinite) by default so you have an infinite wait > without > this check. The idea behind this implementation is to avoid exiting in this > critical code path. If it times out here you might have to rebuild the standby > and start again. - This seems like something that should definitely be documented in the comment before wait_for_end_recovery(). At the moment it only talks about timeout, and nothing about NUM_CONN_ATTEMPTS. - The NUM_CONN_ATTEMPTS name seems rather misleading, considering it does not really count connection attempts, but number of times we have not seen 1 in pg_catalog.pg_stat_wal_receiver. - Not sure I follow the logic - it tries to avoid exiting by setting infinite timeout, but it still exists based on NUM_CONN_ATTEMPTS. Isn't that somewhat contradictory? - Isn't the NUM_CONN_ATTEMPTS actually making it more fragile, i.e. more likely to exit? For example, what if there's a short networking hiccup, so that the standby can't connect to the primary. - It seems a bit strange that even with the recovery timeout set, having the limit of 10 "connection attempts" effectively establishes a separate hard-coded limit of 10 seconds. Seems a bit surprising if I set recovery limit to 1 minute, and it just dies after 10 seconds. > Amit suggested [1] that we use a value as recovery-timeout but > how high is a good value? I've already saw some long recovery process using > pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too > worried about a small percentage of cases and we should use 1h as default, for > example. It would reduce the complexity since the recovery process lacks some > progress indicators (LSN is not sufficient in this case and there isn't a > function to provide the current state -- stop applying WAL, reach target, new > timeline, etc). > > If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery > by default otherwise we will have some reports saying the tool is hanging when > in reality the primary has gone and WAL should be streamed. > I don't think there's a default timeout value that would work for everyone. Either it's going to be too short for some cases, or it'll take too long for some other cases. I think there are two obvious default values for the timeout - infinity, and 60 seconds, which is the default we use for other CLI tools (like pg_ctl and so on). Considering the negative impact of exiting, I'd say it's better to default to infinity. It's always possible to Ctrl-C or terminate the process in some other way, if needed. As for people complaining about infinite recovery - perhaps it'd be sufficient to mention this in the messages printed by the tool, to make it clearer. Or maybe even print something in the loop, because right now it's entirely silent so it's easy to believe it's stuck. Perhaps not on every loop, but at least in verbose mode it should print something. >> IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling >> pg_createsubscriber, and that should do the trick. > > That's a good idea. Tests are not exercising the recovery-timeout option. > >> Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing >> things like ubsan/valgrind already use to deal with exactly this kind of >> timeout problem. >> >> Or is there a deeper problem with deciding if the system is in recovery? > > As I said with some recovery progress indicators it would be easier to make > some > decisions like wait a few seconds because the WAL has already been applied and > it is creating a new timeline. The recovery timeout decision is a shot in the > dark because we might be aborting pg_createsubscriber when the target server > is > about to set RECOVERY_STATE_DONE. > Isn't it enough to check data in pg_stat_replication on the primary? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: speed up a logical replica setup
On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote: > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed? > Why isn't recovery_timeout enough to decide if wait_for_end_recovery() > waited long enough? It was an attempt to decoupled a connection failure (that keeps streaming the WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary is gone during the standby recovery process, there is a way to bail out. The recovery-timeout is 0 (infinite) by default so you have an infinite wait without this check. The idea behind this implementation is to avoid exiting in this critical code path. If it times out here you might have to rebuild the standby and start again. Amit suggested [1] that we use a value as recovery-timeout but how high is a good value? I've already saw some long recovery process using pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too worried about a small percentage of cases and we should use 1h as default, for example. It would reduce the complexity since the recovery process lacks some progress indicators (LSN is not sufficient in this case and there isn't a function to provide the current state -- stop applying WAL, reach target, new timeline, etc). If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery by default otherwise we will have some reports saying the tool is hanging when in reality the primary has gone and WAL should be streamed. > IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling > pg_createsubscriber, and that should do the trick. That's a good idea. Tests are not exercising the recovery-timeout option. > Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing > things like ubsan/valgrind already use to deal with exactly this kind of > timeout problem. > > Or is there a deeper problem with deciding if the system is in recovery? As I said with some recovery progress indicators it would be easier to make some decisions like wait a few seconds because the WAL has already been applied and it is creating a new timeline. The recovery timeout decision is a shot in the dark because we might be aborting pg_createsubscriber when the target server is about to set RECOVERY_STATE_DONE. [1] https://www.postgresql.org/message-id/CAA4eK1JRgnhv_ySzuFjN7UaX9qxz5Hqcwew7Fk%3D%2BAbJbu0Kd9w%40mail.gmail.com -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On 3/26/24 03:53, Euler Taveira wrote: > On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote: >> ## Analysis for failure 1 >> >> The failure caused by a time lag between walreceiver finishes and >> pg_is_in_recovery() >> returns true. >> >> According to the output [1], it seems that the tool failed at >> wait_for_end_recovery() >> with the message "standby server disconnected from the primary". Also, lines >> "redo done at..." and "terminating walreceiver process due to administrator >> command" >> meant that walreceiver was requested to shut down by XLogShutdownWalRcv(). >> >> According to the source, we confirm that walreceiver is shut down in >> StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, >> SharedRecoveryState >> is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return >> true) >> at the latter part of StartupXLOG(). >> >> So, if there is a delay between FinishWalRecovery() and change the state, >> the check >> in wait_for_end_recovery() would be failed during the time. Since we allow >> to miss >> the walreceiver 10 times and it is checked once per second, the failure >> occurs if >> the time lag is longer than 10 seconds. >> >> I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS >> larger, >> but it's not a fundamental solution. > > I was expecting that slow hosts might have issues in wait_for_end_recovery(). > As you said it took a lot of steps between FinishWalRecovery() (where > walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set > to > RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS * > WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision > because it already finished the promotion and it is just doing the final > preparation for the host to become a primary. > > /* > * If it is still in recovery, make sure the target server is > * connected to the primary so it can receive the required WAL to > * finish the recovery process. If it is disconnected try > * NUM_CONN_ATTEMPTS in a row and bail out if not succeed. > */ > res = PQexec(conn, > "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver"); > if (PQntuples(res) == 0) > { > if (++count > NUM_CONN_ATTEMPTS) > { > stop_standby_server(subscriber_dir); > pg_log_error("standby server disconnected from the primary"); > break; > } > } > else > count = 0; /* reset counter if it connects again */ > > This code was add to defend against the death/crash of the target server. > There > are at least 3 options: > > (1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this > constant > and I decided to use 10 seconds because even in some slow hosts, this time > wasn't reached during my tests. It seems I forgot to test the combination of > slow > host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60 > seconds as default wait. Maybe that's a reasonable value. I checked the > 004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s > (culicidae), 18.31s (calliphoridae) and 27.52s (olingo). > > (2) check if the primary is not running when walreceiver is not available on the > target server. Increase the connection attempts iif the primary is not > running. > Hence, the described case doesn't cause an increment on the count variable. > > (3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check > protection against the death/crash target server. I explained in a previous > message that timeout may occur in cases that WAL replay to reach consistent > state takes more than recovery-timeout seconds. > > Option (1) is the easiest fix, however, we can have the same issue again if a > slow host decides to be even slower, hence, we have to adjust this value > again. > Option (2) interprets the walreceiver absence as a recovery end and if the > primary server is running it can indicate that the target server is in the > imminence of the recovery end. Option (3) is not as resilient as the other > options. > > The first patch implements a combination of (1) and (2). > >> ## Analysis for failure 2 >> >> According to [2], the physical replication slot which is specified as >> primary_slot_name >> was not used by the walsender process. At that time walsender has not >> existed. >> >> ``` >> ... >> pg_createsubscriber: publisher: current wal senders: 0 >> pg_createsubscriber: command is: SELECT 1 FROM >> pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot' >> pg_createsubscriber: error: could not obtain replication slot information: >> got 0 rows, expected 1 row >> ... >> ``` >> >> Currently standby must be stopped before the command and current code does >> not >> block the flow to ensure the
RE: speed up a logical replica setup
Dear Amit, Euler, > > This only drops the publications created by this tool, not the > pre-existing ones that we discussed in the link provided. Another concern around here is the case which primary subscribes changes from others. After the conversion, new subscriber also tries to connect to another publisher as well - this may lead conflicts. This causes because both launcher/workers start after recovery finishes. So, based on the Ashutosh's point, should we remove such replication objects? Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: speed up a logical replica setup
On Tue, Mar 26, 2024 at 8:27 AM Euler Taveira wrote: > > On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote: > > On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut wrote: > > > > I have committed your version v33. I did another pass over the > > identifier and literal quoting. I added quoting for replication slot > > names, for example, even though they can only contain a restricted set > > of characters, but it felt better to be defensive there. > > > > I'm happy to entertain follow-up patches on some of the details like > > option naming that were still being discussed. I just wanted to get the > > main functionality in in good time. We can fine-tune the rest over the > > next few weeks. > > > > I was looking at prior discussions on this topic to see if there are > any other open design points apart from this and noticed that the > points raised/discussed in the email [1] are also not addressed. IIRC, > the key point we discussed was that after promotion, the existing > replication objects should be removed (either optionally or always), > otherwise, it can lead to a new subscriber not being able to restart > or getting some unwarranted data. > > > See setup_subscriber. > > /* > * Since the publication was created before the consistent LSN, it is > * available on the subscriber when the physical replica is promoted. > * Remove publications from the subscriber because it has no use. > */ > drop_publication(conn, [I]); > This only drops the publications created by this tool, not the pre-existing ones that we discussed in the link provided. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote: > On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut wrote: > > > > I have committed your version v33. I did another pass over the > > identifier and literal quoting. I added quoting for replication slot > > names, for example, even though they can only contain a restricted set > > of characters, but it felt better to be defensive there. > > > > I'm happy to entertain follow-up patches on some of the details like > > option naming that were still being discussed. I just wanted to get the > > main functionality in in good time. We can fine-tune the rest over the > > next few weeks. > > > > I was looking at prior discussions on this topic to see if there are > any other open design points apart from this and noticed that the > points raised/discussed in the email [1] are also not addressed. IIRC, > the key point we discussed was that after promotion, the existing > replication objects should be removed (either optionally or always), > otherwise, it can lead to a new subscriber not being able to restart > or getting some unwarranted data. See setup_subscriber. /* * Since the publication was created before the consistent LSN, it is * available on the subscriber when the physical replica is promoted. * Remove publications from the subscriber because it has no use. */ drop_publication(conn, [i]); -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote: > ## Analysis for failure 1 > > The failure caused by a time lag between walreceiver finishes and > pg_is_in_recovery() > returns true. > > According to the output [1], it seems that the tool failed at > wait_for_end_recovery() > with the message "standby server disconnected from the primary". Also, lines > "redo done at..." and "terminating walreceiver process due to administrator > command" > meant that walreceiver was requested to shut down by XLogShutdownWalRcv(). > > According to the source, we confirm that walreceiver is shut down in > StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, > SharedRecoveryState > is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return > true) > at the latter part of StartupXLOG(). > > So, if there is a delay between FinishWalRecovery() and change the state, the > check > in wait_for_end_recovery() would be failed during the time. Since we allow to > miss > the walreceiver 10 times and it is checked once per second, the failure > occurs if > the time lag is longer than 10 seconds. > > I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS > larger, > but it's not a fundamental solution. I was expecting that slow hosts might have issues in wait_for_end_recovery(). As you said it took a lot of steps between FinishWalRecovery() (where walreceiver is shutdown -- XLogShutdownWalRcv) and SharedRecoveryState is set to RECOVERY_STATE_DONE. If this window takes longer than NUM_CONN_ATTEMPTS * WAIT_INTERVAL (10 seconds), it aborts the execution. That's a bad decision because it already finished the promotion and it is just doing the final preparation for the host to become a primary. /* * If it is still in recovery, make sure the target server is * connected to the primary so it can receive the required WAL to * finish the recovery process. If it is disconnected try * NUM_CONN_ATTEMPTS in a row and bail out if not succeed. */ res = PQexec(conn, "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver"); if (PQntuples(res) == 0) { if (++count > NUM_CONN_ATTEMPTS) { stop_standby_server(subscriber_dir); pg_log_error("standby server disconnected from the primary"); break; } } else count = 0; /* reset counter if it connects again */ This code was add to defend against the death/crash of the target server. There are at least 3 options: (1) increase NUM_CONN_ATTEMPTS * WAIT_INTERVAL seconds. We discussed this constant and I decided to use 10 seconds because even in some slow hosts, this time wasn't reached during my tests. It seems I forgot to test the combination of slow host, asserts enabled, and ubsan. I didn't notice that pg_promote() uses 60 seconds as default wait. Maybe that's a reasonable value. I checked the 004_timeline_switch test and the last run took: 39.2s (serinus), 33.1s (culicidae), 18.31s (calliphoridae) and 27.52s (olingo). (2) check if the primary is not running when walreceiver is not available on the target server. Increase the connection attempts iif the primary is not running. Hence, the described case doesn't cause an increment on the count variable. (3) set recovery_timeout default to != 0 and remove pg_stat_wal_receiver check protection against the death/crash target server. I explained in a previous message that timeout may occur in cases that WAL replay to reach consistent state takes more than recovery-timeout seconds. Option (1) is the easiest fix, however, we can have the same issue again if a slow host decides to be even slower, hence, we have to adjust this value again. Option (2) interprets the walreceiver absence as a recovery end and if the primary server is running it can indicate that the target server is in the imminence of the recovery end. Option (3) is not as resilient as the other options. The first patch implements a combination of (1) and (2). > ## Analysis for failure 2 > > According to [2], the physical replication slot which is specified as > primary_slot_name > was not used by the walsender process. At that time walsender has not existed. > > ``` > ... > pg_createsubscriber: publisher: current wal senders: 0 > pg_createsubscriber: command is: SELECT 1 FROM > pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot' > pg_createsubscriber: error: could not obtain replication slot information: > got 0 rows, expected 1 row > ... > ``` > > Currently standby must be stopped before the command and current code does not > block the flow to ensure the replication is started. So there is a possibility > that the checking is run before walsender is launched. > > One possible approach is to wait until the replication starts. Alternative > one is > to ease the condition.
Re: speed up a logical replica setup
On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut wrote: > > I have committed your version v33. I did another pass over the > identifier and literal quoting. I added quoting for replication slot > names, for example, even though they can only contain a restricted set > of characters, but it felt better to be defensive there. > > I'm happy to entertain follow-up patches on some of the details like > option naming that were still being discussed. I just wanted to get the > main functionality in in good time. We can fine-tune the rest over the > next few weeks. > I was looking at prior discussions on this topic to see if there are any other open design points apart from this and noticed that the points raised/discussed in the email [1] are also not addressed. IIRC, the key point we discussed was that after promotion, the existing replication objects should be removed (either optionally or always), otherwise, it can lead to a new subscriber not being able to restart or getting some unwarranted data. [1] - https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Mon, 25 Mar 2024 at 21:36, Hayato Kuroda (Fujitsu) wrote: > > Dear Bharath, Peter, > > > Looks like BF animals aren't happy, please check - > > > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl. > > > > Looks like sanitizer failures. There were a few messages about that > > recently, but those were all just about freeing memory after use, which > > we don't necessarily require for client programs. So maybe something else. > > It seems that there are several time of failures, [1] and [2]. > > ## Analysis for failure 1 > > The failure caused by a time lag between walreceiver finishes and > pg_is_in_recovery() > returns true. > > According to the output [1], it seems that the tool failed at > wait_for_end_recovery() > with the message "standby server disconnected from the primary". Also, lines > "redo done at..." and "terminating walreceiver process due to administrator > command" > meant that walreceiver was requested to shut down by XLogShutdownWalRcv(). > > According to the source, we confirm that walreceiver is shut down in > StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, > SharedRecoveryState > is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return > true) > at the latter part of StartupXLOG(). > > So, if there is a delay between FinishWalRecovery() and change the state, the > check > in wait_for_end_recovery() would be failed during the time. Since we allow to > miss > the walreceiver 10 times and it is checked once per second, the failure > occurs if > the time lag is longer than 10 seconds. > > I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS > larger, > but it's not a fundamental solution. I agree with your analysis, another way to fix could be to remove the following check as increasing the count might still have the race condition issue: /* * If it is still in recovery, make sure the target server is * connected to the primary so it can receive the required WAL to * finish the recovery process. If it is disconnected try * NUM_CONN_ATTEMPTS in a row and bail out if not succeed. */ res = PQexec(conn, "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver"); I'm not sure whether we should worry about the condition where recovery is not done and pg_stat_wal_receiver is exited as we have the following sanity check in check_subscriber before we wait for recovery to be finished: /* The target server must be a standby */ if (!server_is_in_recovery(conn)) { pg_log_error("target server must be a standby"); disconnect_database(conn, true); } Regards, Vignesh
RE: speed up a logical replica setup
Dear Bharath, Peter, > Looks like BF animals aren't happy, please check - > > https://buildfarm.postgresql.org/cgi-bin/show_failures.pl. > > Looks like sanitizer failures. There were a few messages about that > recently, but those were all just about freeing memory after use, which > we don't necessarily require for client programs. So maybe something else. It seems that there are several time of failures, [1] and [2]. ## Analysis for failure 1 The failure caused by a time lag between walreceiver finishes and pg_is_in_recovery() returns true. According to the output [1], it seems that the tool failed at wait_for_end_recovery() with the message "standby server disconnected from the primary". Also, lines "redo done at..." and "terminating walreceiver process due to administrator command" meant that walreceiver was requested to shut down by XLogShutdownWalRcv(). According to the source, we confirm that walreceiver is shut down in StartupXLOG()->FinishWalRecovery()->XLogShutdownWalRcv(). Also, SharedRecoveryState is changed to RECOVERY_STATE_DONE (this meant the pg_is_in_recovery() return true) at the latter part of StartupXLOG(). So, if there is a delay between FinishWalRecovery() and change the state, the check in wait_for_end_recovery() would be failed during the time. Since we allow to miss the walreceiver 10 times and it is checked once per second, the failure occurs if the time lag is longer than 10 seconds. I do not have a good way to fix it. One approach is make NUM_CONN_ATTEMPTS larger, but it's not a fundamental solution. ## Analysis for failure 2 According to [2], the physical replication slot which is specified as primary_slot_name was not used by the walsender process. At that time walsender has not existed. ``` ... pg_createsubscriber: publisher: current wal senders: 0 pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot' pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row ... ``` Currently standby must be stopped before the command and current code does not block the flow to ensure the replication is started. So there is a possibility that the checking is run before walsender is launched. One possible approach is to wait until the replication starts. Alternative one is to ease the condition. How do you think? [1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2024-03-25%2013%3A03%3A07 [2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2024-03-25%2013%3A53%3A58 Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: speed up a logical replica setup
On Mon, Mar 25, 2024, at 8:55 AM, Peter Eisentraut wrote: > On 22.03.24 04:31, Euler Taveira wrote: > > On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote: > >> There is a compilation error while building postgres with the patch > >> due to a recent commit. I have attached a top-up patch v32-0003 to > >> resolve this compilation error. > >> I have not updated the version of the patch as I have not made any > >> change in v32-0001 and v32-0002 patch. > > > > I'm attaching a new version (v33) to incorporate this fix (v32-0003) > > into the > > main patch (v32-0001). This version also includes 2 new tests: > > > > - refuse to run if the standby server is running > > - refuse to run if the standby was promoted e.g. it is not in recovery > > > > The first one exercises a recent change (standby should be stopped) and the > > second one covers an important requirement. > > I have committed your version v33. I did another pass over the > identifier and literal quoting. I added quoting for replication slot > names, for example, even though they can only contain a restricted set > of characters, but it felt better to be defensive there. Thanks. > I'm happy to entertain follow-up patches on some of the details like > option naming that were still being discussed. I just wanted to get the > main functionality in in good time. We can fine-tune the rest over the > next few weeks. Agree. Let's continue the discussion about the details. > > Based on the discussion [1] about the check functions, Vignesh suggested > > that it > > should check both server before exiting. v33-0003 implements it. I don't > > have a > > strong preference; feel free to apply it. > > I haven't done anything about this. ... including this one. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On 25.03.24 13:36, Bharath Rupireddy wrote: On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut wrote: I have committed your version v33. Looks like BF animals aren't happy, please check - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl. Looks like sanitizer failures. There were a few messages about that recently, but those were all just about freeing memory after use, which we don't necessarily require for client programs. So maybe something else.
Re: speed up a logical replica setup
On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut wrote: > > I have committed your version v33. Looks like BF animals aren't happy, please check - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: speed up a logical replica setup
On 22.03.24 04:31, Euler Taveira wrote: On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote: There is a compilation error while building postgres with the patch due to a recent commit. I have attached a top-up patch v32-0003 to resolve this compilation error. I have not updated the version of the patch as I have not made any change in v32-0001 and v32-0002 patch. I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the main patch (v32-0001). This version also includes 2 new tests: - refuse to run if the standby server is running - refuse to run if the standby was promoted e.g. it is not in recovery The first one exercises a recent change (standby should be stopped) and the second one covers an important requirement. I have committed your version v33. I did another pass over the identifier and literal quoting. I added quoting for replication slot names, for example, even though they can only contain a restricted set of characters, but it felt better to be defensive there. I'm happy to entertain follow-up patches on some of the details like option naming that were still being discussed. I just wanted to get the main functionality in in good time. We can fine-tune the rest over the next few weeks. Based on the discussion [1] about the check functions, Vignesh suggested that it should check both server before exiting. v33-0003 implements it. I don't have a strong preference; feel free to apply it. I haven't done anything about this.
RE: speed up a logical replica setup
> IIUC, added options were inspired by Tomas. And he said the limitation > (pub/sub/slot > name cannot be specified) was acceptable for the first version. I agree with > him. > (To be honest, I feel that options should be fewer for the first release) Just to confirm - I don't think it is not completely needed. If we can agree a specification in sometime, it's OK for me to add them. If you ask me, I still prefer Tomas's approach. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
RE: speed up a logical replica setup
Dear Amit, Euler, I also want to share my opinion, just in case. > On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira wrote: > > > > On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote: > > > > If we commit this we might not be able to change the way the option > > behaves once the customers starts using it. How about removing these > > options in the first version and adding it in the next version after > > more discussion. > > > > > > We don't need to redesign this one if we want to add a format string in a > > next > > version. A long time ago, pg_dump started to accept pattern for tables > > without > > breaking or deprecating the -t option. If you have 100 databases and you > > don't > > want to specify the options or use a script to generate it for you, you also > > have the option to let pg_createsubscriber generate the object names for > > you. > > Per my experience, it will be a rare case. > > > > But, why go with some UI in the first place which we don't think is a > good one, or at least don't have a broader agreement that it is a good > one? The problem with self-generated names for users could be that > they won't be able to make much out of it. If one has to always use > those internally then probably that would be acceptable. I would > prefer what Tomas proposed a few emails ago as compared to this one as > that has fewer options to be provided by users but still, they can > later identify objects. But surely, we should discuss if you or others > have better alternatives. IIUC, added options were inspired by Tomas. And he said the limitation (pub/sub/slot name cannot be specified) was acceptable for the first version. I agree with him. (To be honest, I feel that options should be fewer for the first release) > The user choosing to convert a physical standby to a subscriber would > in some cases be interested in converting it for all the databases > (say for the case of upgrade [1]) and giving options for each database > would be cumbersome for her. +1 for the primary use case. > > Currently dry-run will do the check and might fail on identifying a > > few failures like after checking subscriber configurations. Then the > > user will have to correct the configuration and re-run then fix the > > next set of failures. Whereas the suggest-config will display all the > > optimal configuration for both the primary and the standby in a single > > shot. This is not a must in the first version, it can be done as a > > subsequent enhancement. > > > > > > Do you meant publisher, right? Per order, check_subscriber is done before > > check_publisher and it checks all settings on the subscriber before > > exiting. In > > v30, I changed the way it provides the required settings. In a previous > > version, > > it fails when it found a wrong setting; the current version, check all > > settings > > from that server before providing a suitable error. > > > > pg_createsubscriber: checking settings on publisher > > pg_createsubscriber: primary has replication slot "physical_slot" > > pg_createsubscriber: error: publisher requires wal_level >= logical > > pg_createsubscriber: error: publisher requires 2 replication slots, but > > only 0 > remain > > pg_createsubscriber: hint: Consider increasing max_replication_slots to at > > least > 3. > > pg_createsubscriber: error: publisher requires 2 wal sender processes, but > > only > 0 remain > > pg_createsubscriber: hint: Consider increasing max_wal_senders to at least > > 3. > > > > If you have such an error, you will fix them all and rerun using dry run > > mode > > again to verify everything is ok. I don't have a strong preference about > > it. It > > can be changed easily (unifying the check functions or providing a return > > for > > each of the check functions). > > > > We can unify the checks but not sure if it is worth it. I am fine > either way. It would have been better if we provided a way for a user > to know the tool's requirement rather than letting him know via errors > but I think it should be okay to extend it later as well. Both ways are OK, but I prefer to unify checks a bit. The number of working modes in the same executables should be reduced as much as possible. Also, I feel the current specification is acceptable. pg_upgrade checks one by one and exits soon in bad cases. If both old and new clusters have issues, the first run only reports the old one's issues. After DBA fixes and runs again, issues on the new cluster are output. [1]: https://www.postgresql.org/message-id/8d52c226-7e34-44f7-a919-759bf8d81541%40enterprisedb.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: speed up a logical replica setup
On Fri, Mar 22, 2024 at 9:02 AM Euler Taveira wrote: > > On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote: > > There is a compilation error while building postgres with the patch > due to a recent commit. I have attached a top-up patch v32-0003 to > resolve this compilation error. > I have not updated the version of the patch as I have not made any > change in v32-0001 and v32-0002 patch. > > > I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the > main patch (v32-0001). This version also includes 2 new tests: > > - refuse to run if the standby server is running > - refuse to run if the standby was promoted e.g. it is not in recovery > > The first one exercises a recent change (standby should be stopped) and the > second one covers an important requirement. > > Based on the discussion [1] about the check functions, Vignesh suggested that > it > should check both server before exiting. v33-0003 implements it. I don't have > a > strong preference; feel free to apply it. > > > [1] > https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com I had run valgrind with pg_createsubscriber to see if there were any issues. Valgrind reported the following issues: ==651272== LEAK SUMMARY: ==651272==definitely lost: 1,319 bytes in 18 blocks ==651272==indirectly lost: 1,280 bytes in 2 blocks ==651272== possibly lost: 44 bytes in 3 blocks ==651272==still reachable: 3,066 bytes in 22 blocks ==651272== suppressed: 0 bytes in 0 blocks ==651272== ==651272== For lists of detected and suppressed errors, rerun with: -s ==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0) The attached report has the details of the same. Thanks and Regards: Shubham Khanna. valgrind.log Description: Binary data
Re: speed up a logical replica setup
On Fri, 22 Mar 2024 at 09:01, Euler Taveira wrote: > > On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote: > > There is a compilation error while building postgres with the patch > due to a recent commit. I have attached a top-up patch v32-0003 to > resolve this compilation error. > I have not updated the version of the patch as I have not made any > change in v32-0001 and v32-0002 patch. > > > I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the > main patch (v32-0001). This version also includes 2 new tests: > > - refuse to run if the standby server is running > - refuse to run if the standby was promoted e.g. it is not in recovery > > The first one exercises a recent change (standby should be stopped) and the > second one covers an important requirement. Few comments: 1) In error case PQclear and PQfinish should be called: + /* Secure search_path */ + res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not clear search_path: %s", +PQresultErrorMessage(res)); + if (exit_on_error) + exit(1); + + return NULL; + } + PQclear(res); 2) Call fflush here before calling system command to get proper ordered console output: a) Call fflush: + int rc = system(cmd_str); + + if (rc == 0) + pg_log_info("subscriber successfully changed the system identifier"); + else + pg_fatal("subscriber failed to change system identifier: exit code: %d", rc); + } b) similarly here: + pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd->data); + rc = system(pg_ctl_cmd->data); + pg_ctl_status(pg_ctl_cmd->data, rc); + standby_running = true; c) similarly here: + pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path, + datadir); + pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd); + rc = system(pg_ctl_cmd); + pg_ctl_status(pg_ctl_cmd, rc); + standby_running = false; 3) Call PQClear in error cases: a) Call PQClear + res = PQexec(conn, "SELECT system_identifier FROM pg_catalog.pg_control_system()"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not get system identifier: %s", +PQresultErrorMessage(res)); + disconnect_database(conn, true); + } b) similarly here + if (PQntuples(res) != 1) + { + pg_log_error("could not get system identifier: got %d rows, expected %d row", +PQntuples(res), 1); + disconnect_database(conn, true); + } + c) similarly here + res = PQexec(conn, +"SELECT oid FROM pg_catalog.pg_database " +"WHERE datname = pg_catalog.current_database()"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not obtain database OID: %s", +PQresultErrorMessage(res)); + disconnect_database(conn, true); + } + d) There are few more places like this. 4) Since we are using a global variable, we might be able to remove initializing many of them. + /* Default settings */ + subscriber_dir = NULL; + opt.config_file = NULL; + opt.pub_conninfo_str = NULL; + opt.socket_dir = NULL; + opt.sub_port = DEFAULT_SUB_PORT; + opt.sub_username = NULL; + opt.database_names = (SimpleStringList){0}; + opt.recovery_timeout = 0; Regards, Vignesh
Re: speed up a logical replica setup
On Fri, Mar 22, 2024 at 9:44 AM Fabrízio de Royes Mello wrote: > > On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila wrote: > > > > > > The user choosing to convert a physical standby to a subscriber would > > in some cases be interested in converting it for all the databases > > (say for the case of upgrade [1]) and giving options for each database > > would be cumbersome for her. > > > > ... > > > > [1] - This tool can be used in an upgrade where the user first > > converts physical standby to subscriber to get incremental changes and > > then performs an online upgrade. > > > > The first use case me and Euler discussed some time ago to upstream this tool > was exactly what Amit described so IMHO we should make it easier for users to > subscribe to all existing user databases. > I feel that will be a good use case for this tool especially now (with this release) when we allow upgrade of subscriptions. In this regard, I feel if the user doesn't specify any database it should have subscriptions for all databases and the user should have a way to provide publication/slot/subscription names. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Fri, Mar 22, 2024 at 12:54 AM Amit Kapila wrote: > > > The user choosing to convert a physical standby to a subscriber would > in some cases be interested in converting it for all the databases > (say for the case of upgrade [1]) and giving options for each database > would be cumbersome for her. > > ... > > [1] - This tool can be used in an upgrade where the user first > converts physical standby to subscriber to get incremental changes and > then performs an online upgrade. > The first use case me and Euler discussed some time ago to upstream this tool was exactly what Amit described so IMHO we should make it easier for users to subscribe to all existing user databases. Regards, -- Fabrízio de Royes Mello
Re: speed up a logical replica setup
On Thu, Mar 21, 2024 at 8:00 PM Euler Taveira wrote: > > On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote: > > If we commit this we might not be able to change the way the option > behaves once the customers starts using it. How about removing these > options in the first version and adding it in the next version after > more discussion. > > > We don't need to redesign this one if we want to add a format string in a next > version. A long time ago, pg_dump started to accept pattern for tables without > breaking or deprecating the -t option. If you have 100 databases and you don't > want to specify the options or use a script to generate it for you, you also > have the option to let pg_createsubscriber generate the object names for you. > Per my experience, it will be a rare case. > But, why go with some UI in the first place which we don't think is a good one, or at least don't have a broader agreement that it is a good one? The problem with self-generated names for users could be that they won't be able to make much out of it. If one has to always use those internally then probably that would be acceptable. I would prefer what Tomas proposed a few emails ago as compared to this one as that has fewer options to be provided by users but still, they can later identify objects. But surely, we should discuss if you or others have better alternatives. The user choosing to convert a physical standby to a subscriber would in some cases be interested in converting it for all the databases (say for the case of upgrade [1]) and giving options for each database would be cumbersome for her. > Currently dry-run will do the check and might fail on identifying a > few failures like after checking subscriber configurations. Then the > user will have to correct the configuration and re-run then fix the > next set of failures. Whereas the suggest-config will display all the > optimal configuration for both the primary and the standby in a single > shot. This is not a must in the first version, it can be done as a > subsequent enhancement. > > > Do you meant publisher, right? Per order, check_subscriber is done before > check_publisher and it checks all settings on the subscriber before exiting. > In > v30, I changed the way it provides the required settings. In a previous > version, > it fails when it found a wrong setting; the current version, check all > settings > from that server before providing a suitable error. > > pg_createsubscriber: checking settings on publisher > pg_createsubscriber: primary has replication slot "physical_slot" > pg_createsubscriber: error: publisher requires wal_level >= logical > pg_createsubscriber: error: publisher requires 2 replication slots, but only > 0 remain > pg_createsubscriber: hint: Consider increasing max_replication_slots to at > least 3. > pg_createsubscriber: error: publisher requires 2 wal sender processes, but > only 0 remain > pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3. > > If you have such an error, you will fix them all and rerun using dry run mode > again to verify everything is ok. I don't have a strong preference about it. > It > can be changed easily (unifying the check functions or providing a return for > each of the check functions). > We can unify the checks but not sure if it is worth it. I am fine either way. It would have been better if we provided a way for a user to know the tool's requirement rather than letting him know via errors but I think it should be okay to extend it later as well. [1] - This tool can be used in an upgrade where the user first converts physical standby to subscriber to get incremental changes and then performs an online upgrade. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote: > There is a compilation error while building postgres with the patch > due to a recent commit. I have attached a top-up patch v32-0003 to > resolve this compilation error. > I have not updated the version of the patch as I have not made any > change in v32-0001 and v32-0002 patch. I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the main patch (v32-0001). This version also includes 2 new tests: - refuse to run if the standby server is running - refuse to run if the standby was promoted e.g. it is not in recovery The first one exercises a recent change (standby should be stopped) and the second one covers an important requirement. Based on the discussion [1] about the check functions, Vignesh suggested that it should check both server before exiting. v33-0003 implements it. I don't have a strong preference; feel free to apply it. [1] https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com -- Euler Taveira EDB https://www.enterprisedb.com/ v33-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz Description: application/gzip v33-0002-Remove-How-it-Works-section.patch.gz Description: application/gzip v33-0003-Check-both-servers-before-exiting.patch.gz Description: application/gzip
Re: speed up a logical replica setup
On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote: > If we commit this we might not be able to change the way the option > behaves once the customers starts using it. How about removing these > options in the first version and adding it in the next version after > more discussion. We don't need to redesign this one if we want to add a format string in a next version. A long time ago, pg_dump started to accept pattern for tables without breaking or deprecating the -t option. If you have 100 databases and you don't want to specify the options or use a script to generate it for you, you also have the option to let pg_createsubscriber generate the object names for you. Per my experience, it will be a rare case. > Currently dry-run will do the check and might fail on identifying a > few failures like after checking subscriber configurations. Then the > user will have to correct the configuration and re-run then fix the > next set of failures. Whereas the suggest-config will display all the > optimal configuration for both the primary and the standby in a single > shot. This is not a must in the first version, it can be done as a > subsequent enhancement. Do you meant publisher, right? Per order, check_subscriber is done before check_publisher and it checks all settings on the subscriber before exiting. In v30, I changed the way it provides the required settings. In a previous version, it fails when it found a wrong setting; the current version, check all settings from that server before providing a suitable error. pg_createsubscriber: checking settings on publisher pg_createsubscriber: primary has replication slot "physical_slot" pg_createsubscriber: error: publisher requires wal_level >= logical pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 remain pg_createsubscriber: hint: Consider increasing max_replication_slots to at least 3. pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 0 remain pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3. If you have such an error, you will fix them all and rerun using dry run mode again to verify everything is ok. I don't have a strong preference about it. It can be changed easily (unifying the check functions or providing a return for each of the check functions). -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Thu, 21 Mar 2024 at 18:02, Peter Eisentraut wrote: > > On 21.03.24 12:35, vignesh C wrote: > > Here are a few suggestions: > > 1) I felt displaying the server log to the console is not a good idea, > > I prefer this to be logged. There were similar comments from > > Kuroda-san at [1], Peter at [2]. The number of lines will increase > > based on the log level set. If you don't want to use pg_upgrade style, > > how about exposing the log file option and logging it to the specified > > log file. > > Let's leave that for the next version. We need to wrap things up for > this release. Ok, This can be done as an improvement. > > 2) Currently for publication, replication-slot and subscription, we > > will have to specify these options based on the number of databases. > > Here if we have 100 databases we will have to specify these options > > 100 times, it might not be user friendly. How about something like > > what Tomas had proposed at [3] and Amit proposed at [4]. It will be > > better if the user has to just specify publication, replication slot > > and subscription options only one time. > > Same. Designing, implementing, discussing, and testing this cannot be > done in the time remaining. If we commit this we might not be able to change the way the option behaves once the customers starts using it. How about removing these options in the first version and adding it in the next version after more discussion. > > 3) Can we have an option similar to dry-run which will display the > > configurations required in the primary and standby node something > > like: > > pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres" -p > > -s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2 > > --suggest-config > > Suggested optimal configurations in the primary: > > -- > > wallevel = logical > > max_replication_slots = 3 > > max_wal_senders = 3 > > ... > > Suggested optimal configurations in the standby: > > -- > > max_replication_slots = 3 > > max_wal_senders = 3 > > ... > > How would this be different from what --dry-run does now? Currently dry-run will do the check and might fail on identifying a few failures like after checking subscriber configurations. Then the user will have to correct the configuration and re-run then fix the next set of failures. Whereas the suggest-config will display all the optimal configuration for both the primary and the standby in a single shot. This is not a must in the first version, it can be done as a subsequent enhancement. Regards, Vignesh
Re: speed up a logical replica setup
On 21.03.24 12:35, vignesh C wrote: Here are a few suggestions: 1) I felt displaying the server log to the console is not a good idea, I prefer this to be logged. There were similar comments from Kuroda-san at [1], Peter at [2]. The number of lines will increase based on the log level set. If you don't want to use pg_upgrade style, how about exposing the log file option and logging it to the specified log file. Let's leave that for the next version. We need to wrap things up for this release. 2) Currently for publication, replication-slot and subscription, we will have to specify these options based on the number of databases. Here if we have 100 databases we will have to specify these options 100 times, it might not be user friendly. How about something like what Tomas had proposed at [3] and Amit proposed at [4]. It will be better if the user has to just specify publication, replication slot and subscription options only one time. Same. Designing, implementing, discussing, and testing this cannot be done in the time remaining. + /* Number of object names must match number of databases */ + if (num_pubs > 0 && num_pubs != num_dbs) + { + pg_log_error("wrong number of publication names"); + pg_log_error_hint("Number of publication names (%d) must match number of database names (%d).", + num_pubs, num_dbs); + exit(1); + } 3) Can we have an option similar to dry-run which will display the configurations required in the primary and standby node something like: pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres" -p -s /home/vignesh/postgres/inst/bin/ -U postgres -d db1 -d db2 --suggest-config Suggested optimal configurations in the primary: -- wallevel = logical max_replication_slots = 3 max_wal_senders = 3 ... Suggested optimal configurations in the standby: -- max_replication_slots = 3 max_wal_senders = 3 ... How would this be different from what --dry-run does now?
Re: speed up a logical replica setup
On Thu, 21 Mar 2024 at 09:50, Euler Taveira wrote: > > On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote: > > On 16.03.24 16:42, Euler Taveira wrote: > > I'm attaching a new version (v30) that adds: > > I have some review comments and attached a patch with some smaller > fixups (mainly message wording and avoid fixed-size string buffers). > > > Thanks for your review. I'm attaching a new version (v32) that includes your > fixups, merges the v30-0002 into the main patch [1], addresses Hayato's > review[2], > your reviews [3][4], and fixes the query for set_replication_progress() [5]. > > * doc/src/sgml/ref/pg_createsubscriber.sgml > > I would remove the "How It Works" section. This is not relevant to > users, and it is very detailed and will require updating whenever the > implementation changes. It could be a source code comment instead. > > > It uses the same structure as pg_rewind that also describes how it works > internally. I included a separate patch that completely removes the section. > > * src/bin/pg_basebackup/pg_createsubscriber.c > > I think the connection string handling is not robust against funny > characters, like spaces, in database names etc. > > > get_base_conninfo() uses PQconninfoParse to parse the connection string. I > expect PQconnectdb to provide a suitable error message in this case. Even if > it > builds keywords and values arrays, it is also susceptible to the same issue, > no? > > Most SQL commands need to be amended for proper identifier or string > literal quoting and/or escaping. > > > I completely forgot about this detail when I added the new options in v30. It > is > fixed now. I also changed the tests to exercise it. > > In check_subscriber(): All these permissions checks seem problematic > to me. We shouldn't reimplement our own copy of the server's > permission checks. The server can check the permissions. And if the > permission checking in the server ever changes, then we have > inconsistencies to take care of. Also, the error messages "permission > denied" are inappropriate, because we are not doing the actual thing. > Maybe we want to do a dry-run for the benefit of the user, but then we > should do the actual thing, like try to create a replication slot, or > whatever. But I would rather just remove all this, it seems too > problematic. > > > The main goal of the check_* functions are to minimize error during execution. > I removed the permission checks. The GUC checks were kept. > > In main(): The first check if the standby is running is problematic. > I think it would be better to require that the standby is initially > shut down. Consider, the standby might be running under systemd. > This tool will try to stop it, systemd will try to restart it. Let's > avoid these kinds of battles. It's also safer if we don't try to > touch running servers. > > > That's a good point. I hadn't found an excuse to simplify this but you > provided > one. :) There was a worry about ignoring some command-line options that > changes > GUCs if the server was started. There was also an ugly case for dry run mode > that has to start the server (if it was running) at the end. Both cases are no > longer issues. The current code provides a suitable error if the target server > is running. > > The -p option (--subscriber-port) doesn't seem to do anything. In my > testing, it always uses the compiled-in default port. > > > It works for me. See this snippet from the regression tests. The port (50945) > is > used by pg_ctl. > > # Running: pg_createsubscriber --verbose --verbose --pgdata > /c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata > --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' > --socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 > --database pg2 > pg_createsubscriber: validating connection string on publisher > . > . > pg_createsubscriber: pg_ctl command is: > "/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start > -D > "/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata" > -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 > -c unix_socket_directories='/tmp/qpngb0bPKo'" > 2024-03-20 18:15:24.517 -03 [105195] LOG: starting PostgreSQL 17devel on > x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit > 2024-03-20 18:15:24.517 -03 [105195] LOG: listening on Unix socket > "/tmp/qpngb0bPKo/.s.PGSQL.50945" > > Printing all the server log lines to the terminal doesn't seem very > user-friendly. Not sure what to do about that, short of keeping a > pg_upgrade-style directory of log files. But it's ugly. > > > I removed the previous implementation that creates a new directory and stores > the log file there. I don't like the pg_upgrade-style directory because (a) it > stores part of the server log files in another place and (b) it is another >
Re: speed up a logical replica setup
On Wed, Mar 20, 2024, at 7:16 AM, Shubham Khanna wrote: > On Tue, Mar 19, 2024 at 8:54 PM vignesh C wrote: > > > > If you are not planning to have the checks for name length, this could > > alternatively be fixed by including database id also while querying > > pg_subscription like below in set_replication_progress function: > > appendPQExpBuffer(str, > > "SELECT oid FROM pg_catalog.pg_subscription \n" > > "WHERE subname = '%s' AND subdbid = (SELECT oid FROM > > pg_catalog.pg_database WHERE datname = '%s')", > > dbinfo->subname, > > dbinfo->dbname); > > The attached patch has the changes to handle the same. I included a different query that does the same. See v32. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Tue, Mar 19, 2024, at 8:57 AM, Peter Eisentraut wrote: > On 19.03.24 12:26, Shlok Kyal wrote: > > > > I have added a top-up patch v30-0003. The issue in [1] still exists in > > the v30 patch. I was not able to come up with an approach to handle it > > in the code, so I have added it to the documentation in the warning > > section. Thoughts? > > Seems acceptable to me. pg_createsubscriber will probably always have > some restrictions and unsupported edge cases like that. We can't > support everything, so documenting is ok. Shlok, I'm not sure we should add a sentence about a pilot error. I added a comment in check_subscriber that describes this situation. I think the comment is sufficient to understand the limitation and, if it is possible in the future, a check might be added for it. I didn't include v31-0004. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Mon, Mar 18, 2024, at 2:43 AM, Hayato Kuroda (Fujitsu) wrote: > Thanks for updating the patch. Here are my comments. > I used Grammarly to proofread sentences. > (The tool strongly recommends to use active voice, but I can ignore for now) Thanks for another review. I posted a new patch (v32) that hopefully addresses these points. > 01. > > "After a successful run, the state of the target server is analagous to a > fresh > logical replication setup." > a/analagous/analogous Fixed. > 02. > > "The main difference between the logical replication setup and > pg_createsubscriber > is the initial data copy." > > Grammarly suggests: > "The initial data copy is the main difference between the logical replication > setup and pg_createsubscriber." Not fixed. > 03. > > "Only the synchronization phase is done, which ensures each table is brought > up > to a synchronized state." > > This sentence is not very clear to me. How about: > "pg_createsubscriber does only the synchronization phase, ensuring each > table's > replication state is ready." I avoided pg_createsubscriber at the beginning because it is already used in the previous sentence. I kept the last part of the sentence because it is similar to one in the logical replication [1]. > 04. > > "The pg_createsubscriber targets large database systems because most of the > execution time is spent making the initial data copy." > > Hmm, but initial data sync by logical replication also spends most of time to > make the initial data copy. IIUC bottlenecks are a) this application must stop > and start server several times, and b) only the serial copy works. Can you > clarify them? Reading the sentence again, it is not clear. When I said "most of execution time" I was referring to the actual logical replication setup. > 05. > > It is better to say the internal difference between pg_createsubscriber and > the > initial sync by logical replication. For example: > pg_createsubscriber uses a physical replication mechanism to ensure the > standby > catches up until a certain point. Then, it converts to the standby to the > subscriber by promoting and creating subscriptions. Isn't it better to leave these details to "How It Works"? > 06. > > "If these are not met an error will be reported." > > Grammarly suggests: > "If these are not met, an error will be reported." Fixed. > 07. > > "The given target data directory must have the same system identifier than the > source data directory." > > Grammarly suggests: > "The given target data directory must have the same system identifier as the > source data directory." Fixed. > 08. > > "If a standby server is running on the target data directory or it is a base > backup from the source data directory, system identifiers are the same." > > This line is not needed if bullet-style is not used. The line is just a > supplement, > not prerequisite. Fixed. > 09. > > "The source server must accept connections from the target server. The source > server must not be in recovery." > > Grammarly suggests: > "The source server must accept connections from the target server and not be > in recovery." Not fixed. > 10. > > "Publications cannot be created in a read-only cluster." > > Same as 08, this line is not needed if bullet-style is not used. Fixed. > 11. > > "pg_createsubscriber usually starts the target server with different > connection > settings during the transformation steps. Hence, connections to target server > might fail." > > Grammarly suggests: > "pg_createsubscriber usually starts the target server with different > connection > settings during transformation. Hence, connections to the target server might > fail." Fixed. > 12. > > "During the recovery process," > > Grammarly suggests: > "During recovery," Not fixed. Our documentation uses "recovery process". > 13. > > "replicated so an error would occur." > > Grammarly suggests: > "replicated, so an error would occur." I didn't find this one. Maybe you checked a previous version. > 14. > > "It would avoid situations in which WAL files from the source server might be > used by the target server." > > Grammarly suggests: > "It would avoid situations in which the target server might use WAL files from > the source server." Fixed. > 15. > > "a LSN" > > s/a/an Fixed. > 16. > > "of write-ahead" > > s/of/of the/ Fixed. > 17. > > "specifies promote" > > We can do double-quote for the word promote. Why? It is referring to recovery_target_action. If you check this GUC, you will notice that it also uses literal tag. > 18. > > "are also added so it avoids" > > Grammarly suggests: > "are added to avoid" Fixed. > 19. > > "is accepting read-write transactions" > > Grammarly suggests: > "accepts read-write transactions" Not fixed. > 20. > > New options must be also documented as well. This helps not only users but > also > reviewers. > (Sometimes we cannot identify that the implementation is intentinal or
Re: speed up a logical replica setup
On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote: > On 16.03.24 16:42, Euler Taveira wrote: > > I'm attaching a new version (v30) that adds: > > I have some review comments and attached a patch with some smaller > fixups (mainly message wording and avoid fixed-size string buffers). Thanks for your review. I'm attaching a new version (v32) that includes your fixups, merges the v30-0002 into the main patch [1], addresses Hayato's review[2], your reviews [3][4], and fixes the query for set_replication_progress() [5]. > * doc/src/sgml/ref/pg_createsubscriber.sgml > > I would remove the "How It Works" section. This is not relevant to > users, and it is very detailed and will require updating whenever the > implementation changes. It could be a source code comment instead. It uses the same structure as pg_rewind that also describes how it works internally. I included a separate patch that completely removes the section. > * src/bin/pg_basebackup/pg_createsubscriber.c > > I think the connection string handling is not robust against funny > characters, like spaces, in database names etc. get_base_conninfo() uses PQconninfoParse to parse the connection string. I expect PQconnectdb to provide a suitable error message in this case. Even if it builds keywords and values arrays, it is also susceptible to the same issue, no? > Most SQL commands need to be amended for proper identifier or string > literal quoting and/or escaping. I completely forgot about this detail when I added the new options in v30. It is fixed now. I also changed the tests to exercise it. > In check_subscriber(): All these permissions checks seem problematic > to me. We shouldn't reimplement our own copy of the server's > permission checks. The server can check the permissions. And if the > permission checking in the server ever changes, then we have > inconsistencies to take care of. Also, the error messages "permission > denied" are inappropriate, because we are not doing the actual thing. > Maybe we want to do a dry-run for the benefit of the user, but then we > should do the actual thing, like try to create a replication slot, or > whatever. But I would rather just remove all this, it seems too > problematic. The main goal of the check_* functions are to minimize error during execution. I removed the permission checks. The GUC checks were kept. > In main(): The first check if the standby is running is problematic. > I think it would be better to require that the standby is initially > shut down. Consider, the standby might be running under systemd. > This tool will try to stop it, systemd will try to restart it. Let's > avoid these kinds of battles. It's also safer if we don't try to > touch running servers. That's a good point. I hadn't found an excuse to simplify this but you provided one. :) There was a worry about ignoring some command-line options that changes GUCs if the server was started. There was also an ugly case for dry run mode that has to start the server (if it was running) at the end. Both cases are no longer issues. The current code provides a suitable error if the target server is running. > The -p option (--subscriber-port) doesn't seem to do anything. In my > testing, it always uses the compiled-in default port. It works for me. See this snippet from the regression tests. The port (50945) is used by pg_ctl. # Running: pg_createsubscriber --verbose --verbose --pgdata /c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' --socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 --database pg2 pg_createsubscriber: validating connection string on publisher . . pg_createsubscriber: pg_ctl command is: "/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start -D "/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata" -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/tmp/qpngb0bPKo'" 2024-03-20 18:15:24.517 -03 [105195] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit 2024-03-20 18:15:24.517 -03 [105195] LOG: listening on Unix socket "/tmp/qpngb0bPKo/.s.PGSQL.50945" > Printing all the server log lines to the terminal doesn't seem very > user-friendly. Not sure what to do about that, short of keeping a > pg_upgrade-style directory of log files. But it's ugly. I removed the previous implementation that creates a new directory and stores the log file there. I don't like the pg_upgrade-style directory because (a) it stores part of the server log files in another place and (b) it is another directory to ignore if your tool handles the data directory (like a backup tool). My last test said it prints 35 server log lines. I expect that the user redirects the output to a file so
Re: speed up a logical replica setup
> Hi, > > > I'm attaching a new version (v30) that adds: > > > > * 3 new options (--publication, --subscription, --replication-slot) to > > assign > > names to the objects. The --database option used to ignore duplicate > > names, > > however, since these new options rely on the number of database options to > > match the number of object name options, it is forbidden from now on. The > > duplication is also forbidden for the object names to avoid errors > > earlier. > > * rewrite the paragraph related to unusuable target server after > > pg_createsubscriber fails. > > * Vignesh reported an issue [1] related to reaching the recovery stop point > > before the consistent state is reached. I proposed a simple patch that > > fixes > > the issue. > > > > [1] > > https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com > > > > I have added a top-up patch v30-0003. The issue in [1] still exists in > the v30 patch. I was not able to come up with an approach to handle it > in the code, so I have added it to the documentation in the warning > section. Thoughts? > I am not changing the version as I have not made any changes in > v30-0001 and v30-0002. > > [1]: > https://www.postgresql.org/message-id/cahv8rj+5mzk9jt+7ecogjzfm5czvdccd5jo1_rcx0bteypb...@mail.gmail.com There was some whitespace error in the v30-0003 patch. Updated the patch. Thanks and regards, Shlok Kyal From d6717bcaf94302c0d41eabd448802b1fae6167d9 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Mon, 5 Jun 2023 14:39:40 -0400 Subject: [PATCH v31 1/3] pg_createsubscriber: creates a new logical replica from a standby server It must be run on the target server and should be able to connect to the source server (publisher) and the target server (subscriber). All tables in the specified database(s) are included in the logical replication setup. A pair of publication and subscription objects are created for each database. The main advantage of pg_createsubscriber over the common logical replication setup is the initial data copy. It also reduces the catchup phase. Some prerequisites must be met to successfully run it. It is basically the logical replication requirements. It starts creating a publication using FOR ALL TABLES and a replication slot for each specified database. Write recovery parameters into the target data directory and start the target server. It specifies the LSN of the last replication slot (replication start point) up to which the recovery will proceed. Wait until the target server is promoted. Create one subscription per specified database (using publication and replication slot created in a previous step) on the target server. Set the replication progress to the replication start point for each subscription. Enable the subscription for each specified database on the target server. And finally, change the system identifier on the target server. --- doc/src/sgml/ref/allfiles.sgml|1 + doc/src/sgml/ref/pg_createsubscriber.sgml | 525 doc/src/sgml/reference.sgml |1 + src/bin/pg_basebackup/.gitignore |1 + src/bin/pg_basebackup/Makefile| 11 +- src/bin/pg_basebackup/meson.build | 19 + src/bin/pg_basebackup/pg_createsubscriber.c | 2131 + .../t/040_pg_createsubscriber.pl | 326 +++ 8 files changed, 3012 insertions(+), 3 deletions(-) create mode 100644 doc/src/sgml/ref/pg_createsubscriber.sgml create mode 100644 src/bin/pg_basebackup/pg_createsubscriber.c create mode 100644 src/bin/pg_basebackup/t/040_pg_createsubscriber.pl diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index 4a42999b18..f5be638867 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -205,6 +205,7 @@ Complete list of usable sgml source files in this directory. + diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml new file mode 100644 index 00..642213b5a4 --- /dev/null +++ b/doc/src/sgml/ref/pg_createsubscriber.sgml @@ -0,0 +1,525 @@ + + + + + pg_createsubscriber + + + + pg_createsubscriber + 1 + Application + + + + pg_createsubscriber + convert a physical replica into a new logical replica + + + + + pg_createsubscriber + option + + + -d + --database + +dbname + + -D + --pgdata + +datadir + + -P + --publisher-server + +connstr + + + + + + Description + +pg_createsubscriber creates a new logical +replica from a physical standby server. All tables in the specified +database are included in the logical replication setup. A pair of +publication and subscription objects are created for each database. It +must be run at the target server. + + + +After a successful run, the state of the target server is analagous to a +
Re: speed up a logical replica setup
On 19.03.24 16:24, vignesh C wrote: The problem with this failure is that standby has been promoted already and we will have to re-create the physica replica again. If you are not planning to have the checks for name length, this could alternatively be fixed by including database id also while querying pg_subscription like below in set_replication_progress function: appendPQExpBuffer(str, "SELECT oid FROM pg_catalog.pg_subscription \n" "WHERE subname = '%s' AND subdbid = (SELECT oid FROM pg_catalog.pg_database WHERE datname = '%s')", dbinfo->subname, dbinfo->dbname); Yes, this is more correct anyway, because subscription names are per-database, not global. So you should be able to make pg_createsubscriber use the same subscription name for each database.
Re: speed up a logical replica setup
On Mon, 18 Mar 2024 at 16:36, Peter Eisentraut wrote: > > On 18.03.24 08:18, vignesh C wrote: > > 1) Maximum size of the object name is 64, we can have a check so that > > we don't specify more than the maximum allowed length: > > + case 3: > > + if (!simple_string_list_member(_names, optarg)) > > + { > > + simple_string_list_append(_names, optarg); > > + num_replslots++; > > + } > > + else > > + { > > + pg_log_error("duplicate replication slot \"%s\"", optarg); > > + exit(1); > > + } > > + break; > > > > If we allow something like this: > > ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 > > user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d > > db2 -d db3 > > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1" > > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2" > > --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3" > > In this case creation of replication slot will fail: > > pg_createsubscriber: error: could not create replication slot > > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on > > database "db2": ERROR: replication slot > > "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" > > already exists > > I think this is fine. The server can check whether the names it is > given are of the right size. We don't need to check it again in the client. > > > 2) Similarly here too: > > + case 4: > > + if (!simple_string_list_member(_names, optarg)) > > + { > > + simple_string_list_append(_names, optarg); > > + num_subs++; > > + } > > + else > > + { > > + pg_log_error("duplicate subscription \"%s\"", optarg); > > + exit(1); > > + } > > + break; > > > > If we allow something like this: > > ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 > > user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d > > db2 -d db3 > > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1 > > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2 > > --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3 > > > > Subscriptions will be created with the same name and later there will > > be a problem when setting replication progress as there will be > > multiple subscriptions with the same name. > > Could you clarify this problem? In this case the subscriptions name specified is more than the allowed name, the subscription name will be truncated and both the subscription for db1 and db2 will have same name like below: db2=# select subname, subdbid from pg_subscription; subname | subdbid -+- testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes | 16384 testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes | 16385 Now we try to set the replication origin of the subscriptions to a consistent lsn from the following: +set_replication_progress(PGconn *conn, struct LogicalRepInfo *dbinfo, const char *lsn) +{ + PQExpBuffer str = createPQExpBuffer(); + PGresult *res; + Oid suboid; + charoriginname[NAMEDATALEN]; + charlsnstr[17 + 1]; /* MAXPG_LSNLEN = 17 */ + + Assert(conn != NULL); + + appendPQExpBuffer(str, + "SELECT oid FROM pg_catalog.pg_subscription " + "WHERE subname = '%s'", + dbinfo->subname); + + res = PQexec(conn, str->data); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not obtain subscription OID: %s", +PQresultErrorMessage(res)); + disconnect_database(conn, true); + } + + if (PQntuples(res) != 1 && !dry_run) + { + pg_log_error("could not obtain subscription OID: got %d rows, expected %d rows", +PQntuples(res), 1); + disconnect_database(conn, true); + } Since the subscription name is truncated, we will have multiple records returned for the above query which results in failure with: pg_createsubscriber: error: could not obtain subscription OID: got 2 rows, expected 1 rows pg_createsubscriber: warning: pg_createsubscriber failed after the end of recovery pg_createsubscriber: hint: The target server cannot be used as a physical replica anymore. pg_createsubscriber: hint: You must recreate the physical replica before continuing. The problem with this failure is that standby has been promoted already and we will have to re-create the physica replica again. If you are not planning to have the checks for name length, this could alternatively be fixed by including database id also while querying
Re: speed up a logical replica setup
On 19.03.24 12:26, Shlok Kyal wrote: I'm attaching a new version (v30) that adds: * 3 new options (--publication, --subscription, --replication-slot) to assign names to the objects. The --database option used to ignore duplicate names, however, since these new options rely on the number of database options to match the number of object name options, it is forbidden from now on. The duplication is also forbidden for the object names to avoid errors earlier. * rewrite the paragraph related to unusuable target server after pg_createsubscriber fails. * Vignesh reported an issue [1] related to reaching the recovery stop point before the consistent state is reached. I proposed a simple patch that fixes the issue. [1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com I have added a top-up patch v30-0003. The issue in [1] still exists in the v30 patch. I was not able to come up with an approach to handle it in the code, so I have added it to the documentation in the warning section. Thoughts? Seems acceptable to me. pg_createsubscriber will probably always have some restrictions and unsupported edge cases like that. We can't support everything, so documenting is ok.
Re: speed up a logical replica setup
On 19.03.24 08:05, Amit Kapila wrote: On Mon, Mar 18, 2024 at 7:22 PM Peter Eisentraut wrote: In check_subscriber(): All these permissions checks seem problematic to me. We shouldn't reimplement our own copy of the server's permission checks. The server can check the permissions. And if the permission checking in the server ever changes, then we have inconsistencies to take care of. Also, the error messages "permission denied" are inappropriate, because we are not doing the actual thing. Maybe we want to do a dry-run for the benefit of the user, but then we should do the actual thing, like try to create a replication slot, or whatever. But I would rather just remove all this, it seems too problematic. If we remove all the checks then there is a possibility that we can fail later while creating the actual subscription. For example, if there are not sufficient max_replication_slots, then it is bound to fail in the later steps which would be a costlier affair because by that time the standby would have been promoted and the user won't have any way to move forward but to re-create standby and then use this tool again. I think here the patch tries to mimic pg_upgrade style checks where we do some pre-checks. I think checking for required parameter settings is fine. My concern is with the code before that, that does pg_has_role/has_database_privilege/has_function_privilege.
Re: speed up a logical replica setup
Hi, > I'm attaching a new version (v30) that adds: > > * 3 new options (--publication, --subscription, --replication-slot) to assign > names to the objects. The --database option used to ignore duplicate names, > however, since these new options rely on the number of database options to > match the number of object name options, it is forbidden from now on. The > duplication is also forbidden for the object names to avoid errors earlier. > * rewrite the paragraph related to unusuable target server after > pg_createsubscriber fails. > * Vignesh reported an issue [1] related to reaching the recovery stop point > before the consistent state is reached. I proposed a simple patch that fixes > the issue. > > [1] > https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com > I have added a top-up patch v30-0003. The issue in [1] still exists in the v30 patch. I was not able to come up with an approach to handle it in the code, so I have added it to the documentation in the warning section. Thoughts? I am not changing the version as I have not made any changes in v30-0001 and v30-0002. [1]: https://www.postgresql.org/message-id/cahv8rj+5mzk9jt+7ecogjzfm5czvdccd5jo1_rcx0bteypb...@mail.gmail.com Thanks and regards, Shlok Kyal v30-0001-pg_createsubscriber-creates-a-new-logical-replic.patch Description: Binary data v30-0002-Stop-the-target-server-earlier.patch Description: Binary data v30-0003-Document-a-limitation-of-pg_createsubscriber.patch Description: Binary data
Re: speed up a logical replica setup
On Mon, Mar 18, 2024 at 7:22 PM Peter Eisentraut wrote: > > In check_subscriber(): All these permissions checks seem problematic > to me. We shouldn't reimplement our own copy of the server's > permission checks. The server can check the permissions. And if the > permission checking in the server ever changes, then we have > inconsistencies to take care of. Also, the error messages "permission > denied" are inappropriate, because we are not doing the actual thing. > Maybe we want to do a dry-run for the benefit of the user, but then we > should do the actual thing, like try to create a replication slot, or > whatever. But I would rather just remove all this, it seems too > problematic. > If we remove all the checks then there is a possibility that we can fail later while creating the actual subscription. For example, if there are not sufficient max_replication_slots, then it is bound to fail in the later steps which would be a costlier affair because by that time the standby would have been promoted and the user won't have any way to move forward but to re-create standby and then use this tool again. I think here the patch tries to mimic pg_upgrade style checks where we do some pre-checks. This raises a question in my mind how are we expecting users to know all these required values and configure it properly before using this tool? IIUC, we are expecting that the user should figure out the appropriate values for max_replication_slots, max_logical_replication_workers, etc. by querying the number of databases in the primary. Then either stop the standby to change these parameters or use ALTER SYSTEM depending on the required parameters. Similarly, there are some config requirements (like max_wal_senders, max_replication_slots) for the primary which would be difficult for users to know as these are tools are internal requirements. The two possibilities that come to my mind are (a) pg_createsubscriber should have some special mode/option using which the user can find out all the required config settings, or (b) document how a user can find the required settings. There are good chances of mistakes with option (b). -- With Regards, Amit Kapila.
RE: speed up a logical replica setup
Dear Peter, Thanks for giving comments. I want to reply some of them. > > 17. > > > > "specifies promote" > > > > We can do double-quote for the word promote. > > The v30 patch has promote, which I think is adequate. Opps. Actually I did look v29 patch while firstly reviewing. Sorry for noise. > > > 20. > > > > New options must be also documented as well. This helps not only users but > also > > reviewers. > > (Sometimes we cannot identify that the implementation is intentinal or not.) > > Which ones are missing? In v29, newly added options (publication/subscription/replication-slot) was not added. Since they have been added, please ignore. > > 21. > > > > Also, not sure the specification is good. I preferred to specify them by > > format > > string. Because it can reduce the number of arguments and I cannot find use > cases > > which user want to control the name of objects. > > > > However, your approach has a benefit which users can easily identify the > generated > > objects by pg_createsubscriber. How do other think? > > I think listing them explicitly is better for the first version. It's > simpler to implement and more flexible. OK. > > 22. > > > > ``` > > #define BASE_OUTPUT_DIR > "pg_createsubscriber_output.d" > > ``` > > > > No one refers the define. > > This is gone in v30. I wrote due to the above reason. Please ignore... > > > 31. > > > > ``` > > /* Create replication slot on publisher */ > > if (lsn) > > pg_free(lsn); > > ``` > > > > I think allocating/freeing memory is not so efficient. > > Can we add a flag to create_logical_replication_slot() for controlling the > > returning value (NULL or duplicated string)? We can use the condition (i == > num_dbs-1) > > as flag. > > Nothing is even using the return value of > create_logical_replication_slot(). I think this can be removed altogether. > > 37. > > > > ``` > > /* Register a function to clean up objects in case of failure */ > > atexit(cleanup_objects_atexit); > > ``` > > > > Sorry if we have already discussed. I think the registration can be moved > > just > > before the boot of the standby. Before that, the callback will be no-op. > > But it can also stay where it is. What is the advantage of moving it later? I thought we could reduce the risk of bugs. Previously some bugs were reported because the registration is too early. However, this is not a strong opinion. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: speed up a logical replica setup
On 2024-Mar-18, Peter Eisentraut wrote: > * src/bin/pg_basebackup/pg_createsubscriber.c > > I think the connection string handling is not robust against funny > characters, like spaces, in database names etc. Maybe it would be easier to deal with this by passing around a struct with keyword/value pairs that can be given to PQconnectdbParams (and keeping dbname as a param that's passed separately, so that it can be added when one is needed), instead of messing with the string conninfos; then you don't have to worry about quoting. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)
Re: speed up a logical replica setup
On 16.03.24 16:42, Euler Taveira wrote: I'm attaching a new version (v30) that adds: I have some review comments and attached a patch with some smaller fixups (mainly message wording and avoid fixed-size string buffers). * doc/src/sgml/ref/pg_createsubscriber.sgml I would remove the "How It Works" section. This is not relevant to users, and it is very detailed and will require updating whenever the implementation changes. It could be a source code comment instead. * src/bin/pg_basebackup/pg_createsubscriber.c I think the connection string handling is not robust against funny characters, like spaces, in database names etc. Most SQL commands need to be amended for proper identifier or string literal quoting and/or escaping. In check_subscriber(): All these permissions checks seem problematic to me. We shouldn't reimplement our own copy of the server's permission checks. The server can check the permissions. And if the permission checking in the server ever changes, then we have inconsistencies to take care of. Also, the error messages "permission denied" are inappropriate, because we are not doing the actual thing. Maybe we want to do a dry-run for the benefit of the user, but then we should do the actual thing, like try to create a replication slot, or whatever. But I would rather just remove all this, it seems too problematic. In main(): The first check if the standby is running is problematic. I think it would be better to require that the standby is initially shut down. Consider, the standby might be running under systemd. This tool will try to stop it, systemd will try to restart it. Let's avoid these kinds of battles. It's also safer if we don't try to touch running servers. The -p option (--subscriber-port) doesn't seem to do anything. In my testing, it always uses the compiled-in default port. Printing all the server log lines to the terminal doesn't seem very user-friendly. Not sure what to do about that, short of keeping a pg_upgrade-style directory of log files. But it's ugly. From ec8e6ed6c3325a6f9fde2d1632346e212ade9c9f Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 18 Mar 2024 14:48:55 +0100 Subject: [PATCH] Various improvements --- src/bin/pg_basebackup/Makefile | 2 +- src/bin/pg_basebackup/nls.mk| 1 + src/bin/pg_basebackup/pg_createsubscriber.c | 49 + 3 files changed, 23 insertions(+), 29 deletions(-) diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile index e9a920dbcda..26c53e473f5 100644 --- a/src/bin/pg_basebackup/Makefile +++ b/src/bin/pg_basebackup/Makefile @@ -49,7 +49,7 @@ all: pg_basebackup pg_createsubscriber pg_receivewal pg_recvlogical pg_basebackup: $(BBOBJS) $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils $(CC) $(CFLAGS) $(BBOBJS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) -pg_createsubscriber: $(WIN32RES) pg_createsubscriber.o | submake-libpq submake-libpgport submake-libpgfeutils +pg_createsubscriber: pg_createsubscriber.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) pg_receivewal: pg_receivewal.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils diff --git a/src/bin/pg_basebackup/nls.mk b/src/bin/pg_basebackup/nls.mk index fc475003e8e..7870cea71ce 100644 --- a/src/bin/pg_basebackup/nls.mk +++ b/src/bin/pg_basebackup/nls.mk @@ -8,6 +8,7 @@ GETTEXT_FILES= $(FRONTEND_COMMON_GETTEXT_FILES) \ bbstreamer_tar.c \ bbstreamer_zstd.c \ pg_basebackup.c \ + pg_createsubscriber.c \ pg_receivewal.c \ pg_recvlogical.c \ receivelog.c \ diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c index e24ed7ef506..91c3a2f0036 100644 --- a/src/bin/pg_basebackup/pg_createsubscriber.c +++ b/src/bin/pg_basebackup/pg_createsubscriber.c @@ -43,7 +43,7 @@ struct CreateSubscriberOptions SimpleStringList sub_names; /* list of subscription names */ SimpleStringList replslot_names;/* list of replication slot names */ int recovery_timeout; /* stop recovery after this time */ -} CreateSubscriberOptions; +}; struct LogicalRepInfo { @@ -57,7 +57,7 @@ struct LogicalRepInfo boolmade_replslot; /* replication slot was created */ boolmade_publication; /* publication was created */ -} LogicalRepInfo; +}; static void cleanup_objects_atexit(void); static void usage(); @@ -155,9 +155,9 @@ cleanup_objects_atexit(void) */ if (recovery_ended) { - pg_log_warning("pg_createsubscriber failed after the end of recovery"); - pg_log_warning_hint("The target server
Re: speed up a logical replica setup
On 18.03.24 06:43, Hayato Kuroda (Fujitsu) wrote: 02. "The main difference between the logical replication setup and pg_createsubscriber is the initial data copy." Grammarly suggests: "The initial data copy is the main difference between the logical replication setup and pg_createsubscriber." I think that change is worse. 09. "The source server must accept connections from the target server. The source server must not be in recovery." Grammarly suggests: "The source server must accept connections from the target server and not be in recovery." I think the previous version is better. 17. "specifies promote" We can do double-quote for the word promote. The v30 patch has promote, which I think is adequate. 19. "is accepting read-write transactions" Grammarly suggests: "accepts read-write transactions" I like the first one better. 20. New options must be also documented as well. This helps not only users but also reviewers. (Sometimes we cannot identify that the implementation is intentinal or not.) Which ones are missing? 21. Also, not sure the specification is good. I preferred to specify them by format string. Because it can reduce the number of arguments and I cannot find use cases which user want to control the name of objects. However, your approach has a benefit which users can easily identify the generated objects by pg_createsubscriber. How do other think? I think listing them explicitly is better for the first version. It's simpler to implement and more flexible. 22. ``` #define BASE_OUTPUT_DIR "pg_createsubscriber_output.d" ``` No one refers the define. This is gone in v30. 23. ``` } CreateSubscriberOptions; ... } LogicalRepInfo; ``` Declarations after the "{" are not needed, because we do not do typedef. Yeah, this is actually wrong, because as it is written now, it defines global variables. 22. While seeing definitions of functions, I found that some pointers are declared as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be changed but not the constant. Is it just missing or is there another rule? Yes, more could be done here. I have attached a patch for this. (This also requires the just-committed 48018f1d8c.) 24. ``` /* standby / subscriber data directory */ static char *subscriber_dir = NULL; ``` It is bit strange that only subscriber_dir is a global variable. Caller requires the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and main. So, how about makeing `CreateSubscriberOptions opt` to global one? Fewer global variables seem preferable. Only make global as needed. 30. ``` if (num_replslots == 0) dbinfo[i].replslotname = pg_strdup(genname); ``` I think the straightforward way is to use the name of subscription if no name is specified. This follows the rule for CREATE SUBSCRIPTION. right 31. ``` /* Create replication slot on publisher */ if (lsn) pg_free(lsn); ``` I think allocating/freeing memory is not so efficient. Can we add a flag to create_logical_replication_slot() for controlling the returning value (NULL or duplicated string)? We can use the condition (i == num_dbs-1) as flag. Nothing is even using the return value of create_logical_replication_slot(). I think this can be removed altogether. 34. ``` {"config-file", required_argument, NULL, 1}, {"publication", required_argument, NULL, 2}, {"replication-slot", required_argument, NULL, 3}, {"subscription", required_argument, NULL, 4}, ``` The ordering looks strange for me. According to pg_upgarade and pg_basebackup, options which do not have short notation are listed behind. 35. ``` opt.sub_port = palloc(16); ``` Per other lines, pg_alloc() should be used. Even better psprintf(). 37. ``` /* Register a function to clean up objects in case of failure */ atexit(cleanup_objects_atexit); ``` Sorry if we have already discussed. I think the registration can be moved just before the boot of the standby. Before that, the callback will be no-op. But it can also stay where it is. What is the advantage of moving it later? 40. ``` * XXX this code was extracted from BootStrapXLOG(). ``` So, can we extract the common part to somewhere? Since system identifier is related with the controldata file, I think it can be located in controldata_util.c. Let's leave it as is for this PG release. From d951a9f186b2162aa241f7554908b236c718154f Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 18 Mar 2024 11:54:03 +0100 Subject: [PATCH] fixup! pg_createsubscriber: creates a new logical replica from a standby server Add const decorations. --- src/bin/pg_basebackup/pg_createsubscriber.c | 54 ++--- 1 file changed, 27 insertions(+), 27
Re: speed up a logical replica setup
On 18.03.24 08:18, vignesh C wrote: 1) Maximum size of the object name is 64, we can have a check so that we don't specify more than the maximum allowed length: + case 3: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_replslots++; + } + else + { + pg_log_error("duplicate replication slot \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3" In this case creation of replication slot will fail: pg_createsubscriber: error: could not create replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on database "db2": ERROR: replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" already exists I think this is fine. The server can check whether the names it is given are of the right size. We don't need to check it again in the client. 2) Similarly here too: + case 4: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_subs++; + } + else + { + pg_log_error("duplicate subscription \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3 Subscriptions will be created with the same name and later there will be a problem when setting replication progress as there will be multiple subscriptions with the same name. Could you clarify this problem?
Re: speed up a logical replica setup
On Sat, 16 Mar 2024 at 21:13, Euler Taveira wrote: > > On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote: > > Did you consider adding options for publication/subscription/slot > names as mentioned in my previous email? As discussed in a few emails > above, it would be quite confusing for users to identify the logical > replication objects once the standby is converted to subscriber. > > > Yes. I was wondering to implement after v1 is pushed. I started to write a > code > for it but I wasn't sure about the UI. The best approach I came up with was > multiple options in the same order. (I don't provide short options to avoid > possible portability issues with the order.) It means if I have 3 databases > and > the following command-line: > > pg_createsubscriber ... --database pg1 --database pg2 --database3 > --publication > pubx --publication puby --publication pubz > > pubx, puby and pubz are created in the database pg1, pg2, and pg3 > respectively. > > It seems we care only for publications created on the primary. Isn't > it possible that some of the publications have been replicated to > standby by that time, for example, in case failure happens after > creating a few publications? IIUC, we don't care for standby cleanup > after failure because it can't be used for streaming replication > anymore. So, the only choice the user has is to recreate the standby > and start the pg_createsubscriber again. This sounds questionable to > me as to whether users would like this behavior. Does anyone else have > an opinion on this point? > > > If it happens after creating a publication and before promotion, the cleanup > routine will drop the publications on primary and it will eventually be > applied > to the standby via replication later. > > I see the below note in the patch: > +If pg_createsubscriber fails while processing, > +then the data directory is likely not in a state that can be recovered. > It > +is true if the target server was promoted. In such a case, creating a new > +standby server is recommended. > > By reading this it is not completely clear whether the standby is not > recoverable in case of any error or only an error after the target > server is promoted. If others agree with this behavior then we should > write the detailed reason for this somewhere in the comments as well > unless it is already explained. > > > I rewrote the sentence to make it clear that only if the server is promoted, > the target server will be in a state that cannot be reused. It provides a > message saying it too. > > pg_createsubscriber: target server reached the consistent state > pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you > must recreate the physical replica before continuing. > > I'm attaching a new version (v30) that adds: > > * 3 new options (--publication, --subscription, --replication-slot) to assign > names to the objects. The --database option used to ignore duplicate names, > however, since these new options rely on the number of database options to > match the number of object name options, it is forbidden from now on. The > duplication is also forbidden for the object names to avoid errors earlier. > * rewrite the paragraph related to unusuable target server after > pg_createsubscriber fails. 1) Maximum size of the object name is 64, we can have a check so that we don't specify more than the maximum allowed length: + case 3: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_replslots++; + } + else + { + pg_log_error("duplicate replication slot \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2" --replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3" In this case creation of replication slot will fail: pg_createsubscriber: error: could not create replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on database "db2": ERROR: replication slot "testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" already exists 2) Similarly here too: + case 4: + if (!simple_string_list_member(_names, optarg)) + { + simple_string_list_append(_names, optarg); + num_subs++; + } + else + { + pg_log_error("duplicate subscription \"%s\"", optarg); + exit(1); + } + break; If we allow something like this: ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431 user=postgres" -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d db2 -d db3 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1 --subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
RE: speed up a logical replica setup
Dear Euler, Thanks for updating the patch. Here are my comments. I used Grammarly to proofread sentences. (The tool strongly recommends to use active voice, but I can ignore for now) 01. "After a successful run, the state of the target server is analagous to a fresh logical replication setup." a/analagous/analogous 02. "The main difference between the logical replication setup and pg_createsubscriber is the initial data copy." Grammarly suggests: "The initial data copy is the main difference between the logical replication setup and pg_createsubscriber." 03. "Only the synchronization phase is done, which ensures each table is brought up to a synchronized state." This sentence is not very clear to me. How about: "pg_createsubscriber does only the synchronization phase, ensuring each table's replication state is ready." 04. "The pg_createsubscriber targets large database systems because most of the execution time is spent making the initial data copy." Hmm, but initial data sync by logical replication also spends most of time to make the initial data copy. IIUC bottlenecks are a) this application must stop and start server several times, and b) only the serial copy works. Can you clarify them? 05. It is better to say the internal difference between pg_createsubscriber and the initial sync by logical replication. For example: pg_createsubscriber uses a physical replication mechanism to ensure the standby catches up until a certain point. Then, it converts to the standby to the subscriber by promoting and creating subscriptions. 06. "If these are not met an error will be reported." Grammarly suggests: "If these are not met, an error will be reported." 07. "The given target data directory must have the same system identifier than the source data directory." Grammarly suggests: "The given target data directory must have the same system identifier as the source data directory." 08. "If a standby server is running on the target data directory or it is a base backup from the source data directory, system identifiers are the same." This line is not needed if bullet-style is not used. The line is just a supplement, not prerequisite. 09. "The source server must accept connections from the target server. The source server must not be in recovery." Grammarly suggests: "The source server must accept connections from the target server and not be in recovery." 10. "Publications cannot be created in a read-only cluster." Same as 08, this line is not needed if bullet-style is not used. 11. "pg_createsubscriber usually starts the target server with different connection settings during the transformation steps. Hence, connections to target server might fail." Grammarly suggests: "pg_createsubscriber usually starts the target server with different connection settings during transformation. Hence, connections to the target server might fail." 12. "During the recovery process," Grammarly suggests: "During recovery," 13. "replicated so an error would occur." Grammarly suggests: "replicated, so an error would occur." 14. "It would avoid situations in which WAL files from the source server might be used by the target server." Grammarly suggests: "It would avoid situations in which the target server might use WAL files from the source server." 15. "a LSN" s/a/an 16. "of write-ahead" s/of/of the/ 17. "specifies promote" We can do double-quote for the word promote. 18. "are also added so it avoids" Grammarly suggests: "are added to avoid" 19. "is accepting read-write transactions" Grammarly suggests: "accepts read-write transactions" 20. New options must be also documented as well. This helps not only users but also reviewers. (Sometimes we cannot identify that the implementation is intentinal or not.) 21. Also, not sure the specification is good. I preferred to specify them by format string. Because it can reduce the number of arguments and I cannot find use cases which user want to control the name of objects. However, your approach has a benefit which users can easily identify the generated objects by pg_createsubscriber. How do other think? 22. ``` #define BASE_OUTPUT_DIR "pg_createsubscriber_output.d" ``` No one refers the define. 23. ``` } CreateSubscriberOptions; ... } LogicalRepInfo; ``` Declarations after the "{" are not needed, because we do not do typedef. 22. While seeing definitions of functions, I found that some pointers are declared as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be changed but not the constant. Is it just missing or is there another rule? 23. ``` static int num_dbs = 0; static int num_pubs = 0; static int num_subs = 0; static int num_replslots = 0; ``` I think the name is bit confusing. The number of generating publications/subscriptions/replication slots are always same as the number of databases. They just indicate the
Re: speed up a logical replica setup
On Mon, Mar 18, 2024 at 9:37 AM Amit Kapila wrote: > > On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira wrote: > > > > On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote: > > > > Did you consider adding options for publication/subscription/slot > > names as mentioned in my previous email? As discussed in a few emails > > above, it would be quite confusing for users to identify the logical > > replication objects once the standby is converted to subscriber. > > > > > > Yes. I was wondering to implement after v1 is pushed. I started to write a > > code > > for it but I wasn't sure about the UI. The best approach I came up with was > > multiple options in the same order. (I don't provide short options to avoid > > possible portability issues with the order.) It means if I have 3 databases > > and > > the following command-line: > > > > pg_createsubscriber ... --database pg1 --database pg2 --database3 > > --publication > > pubx --publication puby --publication pubz > > > > pubx, puby and pubz are created in the database pg1, pg2, and pg3 > > respectively. > > > > With this syntax, you want to give the user the option to specify > publication/subscription/slot name for each database? If so, won't it > be too much verbose? > > > It seems we care only for publications created on the primary. Isn't > > it possible that some of the publications have been replicated to > > standby by that time, for example, in case failure happens after > > creating a few publications? IIUC, we don't care for standby cleanup > > after failure because it can't be used for streaming replication > > anymore. So, the only choice the user has is to recreate the standby > > and start the pg_createsubscriber again. This sounds questionable to > > me as to whether users would like this behavior. Does anyone else have > > an opinion on this point? > > > > > > If it happens after creating a publication and before promotion, the cleanup > > routine will drop the publications on primary and it will eventually be > > applied > > to the standby via replication later. > > > > Do you mean to say that the next time if user uses > pg_createsubscriber, it will be ensured that all the prior WAL will be > replicated? I think we need to ensure that after the restart of this > tool and before it attempts to create the publications again, the WAL > of the previous drop has to be replayed. > On further thinking, the WAL for such dropped publications should get replayed eventually before the WAL for new publications (the publications which will be created after restart) unless the required WAL is removed on primary due to some reason. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Sat, Mar 16, 2024 at 9:13 PM Euler Taveira wrote: > > On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote: > > Did you consider adding options for publication/subscription/slot > names as mentioned in my previous email? As discussed in a few emails > above, it would be quite confusing for users to identify the logical > replication objects once the standby is converted to subscriber. > > > Yes. I was wondering to implement after v1 is pushed. I started to write a > code > for it but I wasn't sure about the UI. The best approach I came up with was > multiple options in the same order. (I don't provide short options to avoid > possible portability issues with the order.) It means if I have 3 databases > and > the following command-line: > > pg_createsubscriber ... --database pg1 --database pg2 --database3 > --publication > pubx --publication puby --publication pubz > > pubx, puby and pubz are created in the database pg1, pg2, and pg3 > respectively. > With this syntax, you want to give the user the option to specify publication/subscription/slot name for each database? If so, won't it be too much verbose? > It seems we care only for publications created on the primary. Isn't > it possible that some of the publications have been replicated to > standby by that time, for example, in case failure happens after > creating a few publications? IIUC, we don't care for standby cleanup > after failure because it can't be used for streaming replication > anymore. So, the only choice the user has is to recreate the standby > and start the pg_createsubscriber again. This sounds questionable to > me as to whether users would like this behavior. Does anyone else have > an opinion on this point? > > > If it happens after creating a publication and before promotion, the cleanup > routine will drop the publications on primary and it will eventually be > applied > to the standby via replication later. > Do you mean to say that the next time if user uses pg_createsubscriber, it will be ensured that all the prior WAL will be replicated? I think we need to ensure that after the restart of this tool and before it attempts to create the publications again, the WAL of the previous drop has to be replayed. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Sat, 16 Mar 2024 at 21:16, Euler Taveira wrote: > > On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote: > > I was able to reproduce this random failure and found the following reason: > The Minimum recovery ending location 0/500 was more than the > recovery_target_lsn specified is "0/4001198". In few random cases the > standby applies a few more WAL records after the replication slot is > created; this leads to minimum recovery ending location being greater > than the recovery_target_lsn because of which the server will fail > with: > FATAL: requested recovery stop point is before consistent recovery point > > > Thanks for checking. I proposed an alternative patch for it [1]. Can you check > it? This approach looks good to me. Regards, Vignesh
Re: speed up a logical replica setup
On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote: > I was able to reproduce this random failure and found the following reason: > The Minimum recovery ending location 0/500 was more than the > recovery_target_lsn specified is "0/4001198". In few random cases the > standby applies a few more WAL records after the replication slot is > created; this leads to minimum recovery ending location being greater > than the recovery_target_lsn because of which the server will fail > with: > FATAL: requested recovery stop point is before consistent recovery point Thanks for checking. I proposed an alternative patch for it [1]. Can you check it? [1] https://www.postgresql.org/message-id/34637e7f-0330-420d-8f45-1d022962d2fe%40app.fastmail.com -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote: > Did you consider adding options for publication/subscription/slot > names as mentioned in my previous email? As discussed in a few emails > above, it would be quite confusing for users to identify the logical > replication objects once the standby is converted to subscriber. Yes. I was wondering to implement after v1 is pushed. I started to write a code for it but I wasn't sure about the UI. The best approach I came up with was multiple options in the same order. (I don't provide short options to avoid possible portability issues with the order.) It means if I have 3 databases and the following command-line: pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication pubx --publication puby --publication pubz pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively. > It seems we care only for publications created on the primary. Isn't > it possible that some of the publications have been replicated to > standby by that time, for example, in case failure happens after > creating a few publications? IIUC, we don't care for standby cleanup > after failure because it can't be used for streaming replication > anymore. So, the only choice the user has is to recreate the standby > and start the pg_createsubscriber again. This sounds questionable to > me as to whether users would like this behavior. Does anyone else have > an opinion on this point? If it happens after creating a publication and before promotion, the cleanup routine will drop the publications on primary and it will eventually be applied to the standby via replication later. > I see the below note in the patch: > +If pg_createsubscriber fails while processing, > +then the data directory is likely not in a state that can be recovered. > It > +is true if the target server was promoted. In such a case, creating a new > +standby server is recommended. > > By reading this it is not completely clear whether the standby is not > recoverable in case of any error or only an error after the target > server is promoted. If others agree with this behavior then we should > write the detailed reason for this somewhere in the comments as well > unless it is already explained. I rewrote the sentence to make it clear that only if the server is promoted, the target server will be in a state that cannot be reused. It provides a message saying it too. pg_createsubscriber: target server reached the consistent state pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you must recreate the physical replica before continuing. I'm attaching a new version (v30) that adds: * 3 new options (--publication, --subscription, --replication-slot) to assign names to the objects. The --database option used to ignore duplicate names, however, since these new options rely on the number of database options to match the number of object name options, it is forbidden from now on. The duplication is also forbidden for the object names to avoid errors earlier. * rewrite the paragraph related to unusuable target server after pg_createsubscriber fails. * Vignesh reported an issue [1] related to reaching the recovery stop point before the consistent state is reached. I proposed a simple patch that fixes the issue. [1] https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com -- Euler Taveira EDB https://www.enterprisedb.com/ v30-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz Description: application/gzip v30-0002-Stop-the-target-server-earlier.patch.gz Description: application/gzip
Re: speed up a logical replica setup
On Mon, 11 Mar 2024 at 10:33, Hayato Kuroda (Fujitsu) wrote: > > Dear Vignesh, > > Thanks for updating the patch, but cfbot still got angry [1]. > Note that two containers (autoconf and meson) failed at different place, > so I think it is intentional one. It seems that there may be a bug related > with 32-bit build. > > We should see and fix as soon as possible. I was able to reproduce this random failure and found the following reason: The Minimum recovery ending location 0/500 was more than the recovery_target_lsn specified is "0/4001198". In few random cases the standby applies a few more WAL records after the replication slot is created; this leads to minimum recovery ending location being greater than the recovery_target_lsn because of which the server will fail with: FATAL: requested recovery stop point is before consistent recovery point I have fixed it by pausing the replay in the standby server before the replication slots get created. The attached v29-0002-Keep-standby-server-s-minimum-recovery-point-les.patch patch has the changes for the same. Thoughts? Regards, Vignesh From e98e1ba0661e3d658de8d46ff9082f6cd4040b41 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Sat, 16 Mar 2024 18:47:35 +0530 Subject: [PATCH v29 2/2] Keep standby server's minimum recovery point less than the consistent_lsn. Standby server should not get ahead of the replication slot's lsn. If this happens we will not be able to promote the standby server as it will not be able to reach the consistent point because the standby server's Minimum recovery ending location will be greater than the consistent_lsn. Fixed this by pausing the replay before the replication slots are created. --- src/bin/pg_basebackup/pg_createsubscriber.c | 93 - 1 file changed, 92 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c index c565f8524a..18e8757403 100644 --- a/src/bin/pg_basebackup/pg_createsubscriber.c +++ b/src/bin/pg_basebackup/pg_createsubscriber.c @@ -95,6 +95,8 @@ static void drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo); static void create_subscription(PGconn *conn, struct LogicalRepInfo *dbinfo); static void set_replication_progress(PGconn *conn, struct LogicalRepInfo *dbinfo, const char *lsn); +static void pause_replay_stantby_server(struct LogicalRepInfo *dbinfo, + struct CreateSubscriberOptions *opt); static void enable_subscription(PGconn *conn, struct LogicalRepInfo *dbinfo); #define USEC_PER_SEC 100 @@ -917,7 +919,9 @@ check_subscriber(struct LogicalRepInfo *dbinfo) appendPQExpBuffer(str, "SELECT pg_catalog.pg_has_role(current_user, %u, 'MEMBER'), " "pg_catalog.has_database_privilege(current_user, '%s', 'CREATE'), " - "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_replication_origin_advance(text, pg_lsn)', 'EXECUTE')", + "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_replication_origin_advance(text, pg_lsn)', 'EXECUTE'), " + "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_wal_replay_pause()', 'EXECUTE'), " + "pg_catalog.has_function_privilege(current_user, 'pg_catalog.pg_get_wal_replay_pause_state()', 'EXECUTE')", ROLE_PG_CREATE_SUBSCRIPTION, dbinfo[0].dbname); pg_log_debug("command is: %s", str->data); @@ -949,6 +953,18 @@ check_subscriber(struct LogicalRepInfo *dbinfo) "pg_catalog.pg_replication_origin_advance(text, pg_lsn)"); failed = true; } + if (strcmp(PQgetvalue(res, 0, 3), "t") != 0) + { + pg_log_error("permission denied for function \"%s\"", + "pg_catalog.pg_wal_replay_pause()"); + failed = true; + } + if (strcmp(PQgetvalue(res, 0, 4), "t") != 0) + { + pg_log_error("permission denied for function \"%s\"", + "pg_catalog.pg_get_wal_replay_pause_state()"); + failed = true; + } destroyPQExpBuffer(str); PQclear(res); @@ -1026,6 +1042,72 @@ check_subscriber(struct LogicalRepInfo *dbinfo) exit(1); } +/* + * Pause replaying at the standby server. + */ +static void +pause_replay_stantby_server(struct LogicalRepInfo *dbinfo, + struct CreateSubscriberOptions *opt) +{ + PGconn *conn; + PGresult *res; + + /* Connect to subscriber. */ + conn = connect_database(dbinfo[0].subconninfo, true); + + pg_log_info("Pausing the replay in standby server"); + pg_log_debug("command is: SELECT pg_catalog.pg_wal_replay_pause()"); + + if (!dry_run) + { + int timer = 0; + + res = PQexec(conn, "SELECT pg_catalog.pg_wal_replay_pause()"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not pause replay in standby server: %s", + PQresultErrorMessage(res)); + disconnect_database(conn, true); + } + PQclear(res); + + /* Wait till replay has paused */ + for(;;) + { + pg_log_debug("command is: SELECT pg_catalog.pg_get_wal_replay_pause_state()"); + res = PQexec(conn, "SELECT
Re: speed up a logical replica setup
On Fri, Mar 15, 2024 at 9:23 AM Euler Taveira wrote: > Did you consider adding options for publication/subscription/slot names as mentioned in my previous email? As discussed in a few emails above, it would be quite confusing for users to identify the logical replication objects once the standby is converted to subscriber. * +static void +cleanup_objects_atexit(void) { ... conn = connect_database(dbinfo[i].pubconninfo, false); + if (conn != NULL) + { + if (dbinfo[i].made_publication) + drop_publication(conn, [i]); + if (dbinfo[i].made_replslot) + drop_replication_slot(conn, [i], dbinfo[i].subname); + disconnect_database(conn, false); + } + else + { + /* + * If a connection could not be established, inform the user + * that some objects were left on primary and should be + * removed before trying again. + */ + if (dbinfo[i].made_publication) + { + pg_log_warning("There might be a publication \"%s\" in database \"%s\" on primary", +dbinfo[i].pubname, dbinfo[i].dbname); + pg_log_warning_hint("Consider dropping this publication before trying again."); + } It seems we care only for publications created on the primary. Isn't it possible that some of the publications have been replicated to standby by that time, for example, in case failure happens after creating a few publications? IIUC, we don't care for standby cleanup after failure because it can't be used for streaming replication anymore. So, the only choice the user has is to recreate the standby and start the pg_createsubscriber again. This sounds questionable to me as to whether users would like this behavior. Does anyone else have an opinion on this point? I see the below note in the patch: +If pg_createsubscriber fails while processing, +then the data directory is likely not in a state that can be recovered. It +is true if the target server was promoted. In such a case, creating a new +standby server is recommended. By reading this it is not completely clear whether the standby is not recoverable in case of any error or only an error after the target server is promoted. If others agree with this behavior then we should write the detailed reason for this somewhere in the comments as well unless it is already explained. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Wed, Mar 13, 2024, at 10:09 AM, Shlok Kyal wrote: > Added a top-up patch v28-0005 to fix this issue. > I am not changing the version as v28-0001 to v28-0004 is the same as above. Thanks for your review! I'm posting a new patch (v29) that merges the previous patches (v28-0002 and v28-0003). I applied the fix provided by Hayato [1]. It was an oversight during a rebase. I also included the patch proposed by Shlok [2] that stops the target server on error if it is running. Tomas suggested in [3] that maybe the PID should be replaced with something else that has more entropy. Instead of PID, it uses a random number for replication slot and subscription. There is also a concern about converting multiple standbys that will have the same publication name. It added the same random number to the publication name so it doesn't fail because the publication already exists. Documentation was changed based on Tomas feedback. The user name was always included in the subscriber connection string. Let's have the libpq to choose it. While on it, a new routine (get_sub_conninfo) contains the code to build the subscriber connection string. As I said in [4], there wasn't a way to inform a different configuration file. If your cluster has a postgresql.conf outside PGDATA, when pg_createsubscriber starts the server it will fail. The new --config-file option let you inform the postgresql.conf location and the server is started just fine. I also did some changes in the start_standby_server routine. I replaced the strcat and snprintf with appendPQExpBuffer that has been used to store the pg_ctl command. [1] https://www.postgresql.org/message-id/TYCPR01MB12077FD21BB186C5A685C0BF3F52A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com [2] https://www.postgresql.org/message-id/CANhcyEW6-dH28gLbFc5XpDTJ6JPizU%2Bt5g-aKUWJBf5W_Zriqw%40mail.gmail.com [3] https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com [4] https://www.postgresql.org/message-id/d898faad-f6d7-4b0d-b816-b9dcdf490685%40app.fastmail.com -- Euler Taveira EDB https://www.enterprisedb.com/ v29-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz Description: application/gzip
Re: speed up a logical replica setup
On Fri, Mar 8, 2024, at 6:44 AM, Hayato Kuroda (Fujitsu) wrote: > Hmm, right. I considered below improvements. Tomas and Euler, how do you > think? I'm posting a new patchset v28. I changed the way that the check function works. From the usability perspective, it is better to test all conditions and reports all errors (if any) at once. It avoids multiple executions in dry run mode just to figure out all of the issues in the initial phase. I also included tests for it using Shlok's idea [1] although I didn't use v27-0004. Shlok [1] reported that it was failing on Windows since the socket-directory option was added. I added a fix for it. Tomas pointed out the documentation [2] does not provide a good high level explanation about pg_createsubscriber. I expanded the Description section and moved the prerequisites to Nodes section. The prerequisites were grouped into target and source conditions on their own paragraph instead of using a list. It seems more in line with the style of some applications. As I said in a previous email [3], I removed the retain option. [1] https://www.postgresql.org/message-id/canhcyeu4q3dwh9ax9bpojcm4ebbhyfenegoaz8xfgyjmcpz...@mail.gmail.com [2] https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com [3] https://www.postgresql.org/message-id/e390e35e-508e-4eb8-92e4-e6b066407a41%40app.fastmail.com -- Euler Taveira EDB https://www.enterprisedb.com/ v28-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz Description: application/gzip v28-0002-Use-last-replication-slot-position-as-replicatio.patch.gz Description: application/gzip v28-0003-port-replace-int-with-string.patch.gz Description: application/gzip
Re: speed up a logical replica setup
On Mon, Mar 11, 2024 at 9:42 AM vignesh C wrote: > > On Sat, 9 Mar 2024 at 00:56, Tomas Vondra > wrote: > > > > >> 5) slot / publication / subscription name > > >> > > >> I find it somewhat annoying it's not possible to specify names for > > >> objects created by the tool - replication slots, publication and > > >> subscriptions. If this is meant to be a replica running for a while, > > >> after a while I'll have no idea what pg_createsubscriber_569853 or > > >> pg_createsubscriber_459548_2348239 was meant for. > > >> > > >> This is particularly annoying because renaming these objects later is > > >> either not supported at all (e.g. for replication slots), or may be > > >> quite difficult (e.g. publications). > > >> > > >> I do realize there are challenges with custom names (say, if there are > > >> multiple databases to replicate), but can't we support some simple > > >> formatting with basic placeholders? So we could specify > > >> > > >> --slot-name "myslot_%d_%p" > > >> > > >> or something like that? > > > > > > Not sure we can do in the first version, but looks nice. One concern is > > > that I > > > cannot find applications which accepts escape strings like > > > log_line_prefix. > > > (It may just because we do not have use-case.) Do you know examples? > > > > > > > I can't think of a tool already doing that, but I think that's simply > > because it was not needed. Why should we be concerned about this? > > > > +1 to handle this. > Currently, a) Publication name = pg_createsubscriber_%u, where %u is > database oid, b) Replication slot name = pg_createsubscriber_%u_%d, > Where %u is database oid and %d is the pid and c) Subscription name = > pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid > How about we have a non mandatory option like > --prefix_object_name=mysetup1, which will create a) Publication name = > mysetup1_pg_createsubscriber_%u, and b) Replication slot name = > mysetup1_pg_createsubscriber_%u (here pid is also removed) c) > Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also > removed). > > In the default case where the user does not specify > --prefix_object_name the object names will be created without any > prefix names. > Tomas's idea is better in terms of useability. So, we should instead have three switches --slot-name, --publication-name, and --subscriber-name with some provision of appending dbid's and some unique identifier for standby. The unique identifier can help in creating multiple subscribers from different standbys. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Thu, Mar 7, 2024 at 10:44 PM Tomas Vondra wrote: > > 4) Is FOR ALL TABLES a good idea? > > I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm > sure it won't work for a number of use cases. I know large databases > it's common to create "work tables" (not necessarily temporary) as part > of a batch job, but there's no need to replicate those tables. > > AFAIK that'd break this FOR ALL TABLES publication, because the tables > will qualify for replication, but won't be present on the subscriber. Or > did I miss something? > As the subscriber is created from standby, all the tables should be present at least initially during and after creating the subscriber. Users are later free to modify the publications/subscriptions. > I do understand that FOR ALL TABLES is the simplest approach, and for v1 > it may be an acceptable limitation, but maybe it'd be good to also > support restricting which tables should be replicated (e.g. blacklist or > whitelist based on table/schema name?). > This would be useful, but OTOH could also be enhanced in a later version unless we think it is a must for the first version. -- With Regards, Amit Kapila.
RE: speed up a logical replica setup
Dear Vignesh, Thanks for updating the patch, but cfbot still got angry [1]. Note that two containers (autoconf and meson) failed at different place, so I think it is intentional one. It seems that there may be a bug related with 32-bit build. We should see and fix as soon as possible. [1]: http://cfbot.cputube.org/highlights/all.html#4637 Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: speed up a logical replica setup
On Sat, 9 Mar 2024 at 00:56, Tomas Vondra wrote: > > > > On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote: > > Dear Tomas, Euler, > > > > Thanks for starting to read the thread! Since I'm not an original author, > > I want to reply partially. > > > >> I decided to take a quick look on this patch today, to see how it works > >> and do some simple tests. I've only started to get familiar with it, so > >> I have only some comments / questions regarding usage, not on the code. > >> It's quite possible I didn't understand some finer points, or maybe it > >> was already discussed earlier in this very long thread, so please feel > >> free to push back or point me to the past discussion. > >> > >> Also, some of this is rather opinionated, but considering I didn't see > >> this patch before, my opinions may easily be wrong ... > > > > I felt your comments were quit valuable. > > > >> 1) SGML docs > >> > >> It seems the SGML docs are more about explaining how this works on the > >> inside, rather than how to use the tool. Maybe that's intentional, but > >> as someone who didn't work with pg_createsubscriber before I found it > >> confusing and not very helpful. > >> > >> For example, the first half of the page is prerequisities+warning, and > >> sure those are useful details, but prerequisities are checked by the > >> tool (so I can't really miss this) and warnings go into a lot of details > >> about different places where things may go wrong. Sure, worth knowing > >> and including in the docs, but maybe not right at the beginning, before > >> I learn how to even run the tool? > > > > Hmm, right. I considered below improvements. Tomas and Euler, how do you > > think? > > > > * Adds more descriptions in "Description" section. > > * Moves prerequisities+warning to "Notes" section. > > * Adds "Usage" section which describes from a single node. > > > >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm > >> sure it won't work for a number of use cases. I know large databases > >> it's common to create "work tables" (not necessarily temporary) as part > >> of a batch job, but there's no need to replicate those tables. > > > > Indeed, the documentation does not describe that all tables in the database > > would be included in the publication. > > > >> I do understand that FOR ALL TABLES is the simplest approach, and for v1 > >> it may be an acceptable limitation, but maybe it'd be good to also > >> support restricting which tables should be replicated (e.g. blacklist or > >> whitelist based on table/schema name?). > > > > May not directly related, but we considered that accepting options was a > > next-step [1]. > > > >> Note: I now realize this might fall under the warning about DDL, which > >> says this: > >> > >> Executing DDL commands on the source server while running > >> pg_createsubscriber is not recommended. If the target server has > >> already been converted to logical replica, the DDL commands must > >> not be replicated so an error would occur. > > > > Yeah, they would not be replicated, but not lead ERROR. > > So should we say like "Creating tables on the source server..."? > > > > Perhaps. Clarifying the docs would help, but it depends on the wording. > For example, I doubt this should talk about "creating tables" because > there are other DDL that (probably) could cause issues (like adding a > column to the table, or something like that). > > >> 5) slot / publication / subscription name > >> > >> I find it somewhat annoying it's not possible to specify names for > >> objects created by the tool - replication slots, publication and > >> subscriptions. If this is meant to be a replica running for a while, > >> after a while I'll have no idea what pg_createsubscriber_569853 or > >> pg_createsubscriber_459548_2348239 was meant for. > >> > >> This is particularly annoying because renaming these objects later is > >> either not supported at all (e.g. for replication slots), or may be > >> quite difficult (e.g. publications). > >> > >> I do realize there are challenges with custom names (say, if there are > >> multiple databases to replicate), but can't we support some simple > >> formatting with basic placeholders? So we could specify > >> > >> --slot-name "myslot_%d_%p" > >> > >> or something like that? > > > > Not sure we can do in the first version, but looks nice. One concern is > > that I > > cannot find applications which accepts escape strings like log_line_prefix. > > (It may just because we do not have use-case.) Do you know examples? > > > > I can't think of a tool already doing that, but I think that's simply > because it was not needed. Why should we be concerned about this? > > >> BTW what will happen if we convert multiple standbys? Can't they all get > >> the same slot name (they all have the same database OID, and I'm not > >> sure how much entropy the PID has)? > > > > I tested and the second try did not work. The primal reason was the name of > >
Re: speed up a logical replica setup
On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote: > Dear Tomas, Euler, > > Thanks for starting to read the thread! Since I'm not an original author, > I want to reply partially. > >> I decided to take a quick look on this patch today, to see how it works >> and do some simple tests. I've only started to get familiar with it, so >> I have only some comments / questions regarding usage, not on the code. >> It's quite possible I didn't understand some finer points, or maybe it >> was already discussed earlier in this very long thread, so please feel >> free to push back or point me to the past discussion. >> >> Also, some of this is rather opinionated, but considering I didn't see >> this patch before, my opinions may easily be wrong ... > > I felt your comments were quit valuable. > >> 1) SGML docs >> >> It seems the SGML docs are more about explaining how this works on the >> inside, rather than how to use the tool. Maybe that's intentional, but >> as someone who didn't work with pg_createsubscriber before I found it >> confusing and not very helpful. >> >> For example, the first half of the page is prerequisities+warning, and >> sure those are useful details, but prerequisities are checked by the >> tool (so I can't really miss this) and warnings go into a lot of details >> about different places where things may go wrong. Sure, worth knowing >> and including in the docs, but maybe not right at the beginning, before >> I learn how to even run the tool? > > Hmm, right. I considered below improvements. Tomas and Euler, how do you > think? > > * Adds more descriptions in "Description" section. > * Moves prerequisities+warning to "Notes" section. > * Adds "Usage" section which describes from a single node. > >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm >> sure it won't work for a number of use cases. I know large databases >> it's common to create "work tables" (not necessarily temporary) as part >> of a batch job, but there's no need to replicate those tables. > > Indeed, the documentation does not describe that all tables in the database > would be included in the publication. > >> I do understand that FOR ALL TABLES is the simplest approach, and for v1 >> it may be an acceptable limitation, but maybe it'd be good to also >> support restricting which tables should be replicated (e.g. blacklist or >> whitelist based on table/schema name?). > > May not directly related, but we considered that accepting options was a > next-step [1]. > >> Note: I now realize this might fall under the warning about DDL, which >> says this: >> >> Executing DDL commands on the source server while running >> pg_createsubscriber is not recommended. If the target server has >> already been converted to logical replica, the DDL commands must >> not be replicated so an error would occur. > > Yeah, they would not be replicated, but not lead ERROR. > So should we say like "Creating tables on the source server..."? > Perhaps. Clarifying the docs would help, but it depends on the wording. For example, I doubt this should talk about "creating tables" because there are other DDL that (probably) could cause issues (like adding a column to the table, or something like that). >> 5) slot / publication / subscription name >> >> I find it somewhat annoying it's not possible to specify names for >> objects created by the tool - replication slots, publication and >> subscriptions. If this is meant to be a replica running for a while, >> after a while I'll have no idea what pg_createsubscriber_569853 or >> pg_createsubscriber_459548_2348239 was meant for. >> >> This is particularly annoying because renaming these objects later is >> either not supported at all (e.g. for replication slots), or may be >> quite difficult (e.g. publications). >> >> I do realize there are challenges with custom names (say, if there are >> multiple databases to replicate), but can't we support some simple >> formatting with basic placeholders? So we could specify >> >> --slot-name "myslot_%d_%p" >> >> or something like that? > > Not sure we can do in the first version, but looks nice. One concern is that I > cannot find applications which accepts escape strings like log_line_prefix. > (It may just because we do not have use-case.) Do you know examples? > I can't think of a tool already doing that, but I think that's simply because it was not needed. Why should we be concerned about this? >> BTW what will happen if we convert multiple standbys? Can't they all get >> the same slot name (they all have the same database OID, and I'm not >> sure how much entropy the PID has)? > > I tested and the second try did not work. The primal reason was the name of > publication > - pg_createsubscriber_%u (oid). > FYI - previously we can reuse same publications, but based on my comment [2] > the > feature was removed. It might be too optimistic. > OK. I could be convinced the other limitations are reasonable for v1 and
Re: speed up a logical replica setup
> On 8 Mar 2024, at 12:03, Shlok Kyal wrote: > > I haven't digged into the thread, but recent version fails some CFbot's tests. http://commitfest.cputube.org/euler-taveira.html https://cirrus-ci.com/task/4833499115421696 ==29928==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a01458 at pc 0x7f3b29fdedce bp 0x7ffe68fcf1c0 sp 0x7ffe68fcf1b8 Thanks! Best regards, Andrey Borodin.
RE: speed up a logical replica setup
Dear Tomas, Euler, Thanks for starting to read the thread! Since I'm not an original author, I want to reply partially. > I decided to take a quick look on this patch today, to see how it works > and do some simple tests. I've only started to get familiar with it, so > I have only some comments / questions regarding usage, not on the code. > It's quite possible I didn't understand some finer points, or maybe it > was already discussed earlier in this very long thread, so please feel > free to push back or point me to the past discussion. > > Also, some of this is rather opinionated, but considering I didn't see > this patch before, my opinions may easily be wrong ... I felt your comments were quit valuable. > 1) SGML docs > > It seems the SGML docs are more about explaining how this works on the > inside, rather than how to use the tool. Maybe that's intentional, but > as someone who didn't work with pg_createsubscriber before I found it > confusing and not very helpful. > > For example, the first half of the page is prerequisities+warning, and > sure those are useful details, but prerequisities are checked by the > tool (so I can't really miss this) and warnings go into a lot of details > about different places where things may go wrong. Sure, worth knowing > and including in the docs, but maybe not right at the beginning, before > I learn how to even run the tool? Hmm, right. I considered below improvements. Tomas and Euler, how do you think? * Adds more descriptions in "Description" section. * Moves prerequisities+warning to "Notes" section. * Adds "Usage" section which describes from a single node. > I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm > sure it won't work for a number of use cases. I know large databases > it's common to create "work tables" (not necessarily temporary) as part > of a batch job, but there's no need to replicate those tables. Indeed, the documentation does not describe that all tables in the database would be included in the publication. > I do understand that FOR ALL TABLES is the simplest approach, and for v1 > it may be an acceptable limitation, but maybe it'd be good to also > support restricting which tables should be replicated (e.g. blacklist or > whitelist based on table/schema name?). May not directly related, but we considered that accepting options was a next-step [1]. > Note: I now realize this might fall under the warning about DDL, which > says this: > > Executing DDL commands on the source server while running > pg_createsubscriber is not recommended. If the target server has > already been converted to logical replica, the DDL commands must > not be replicated so an error would occur. Yeah, they would not be replicated, but not lead ERROR. So should we say like "Creating tables on the source server..."? > 5) slot / publication / subscription name > > I find it somewhat annoying it's not possible to specify names for > objects created by the tool - replication slots, publication and > subscriptions. If this is meant to be a replica running for a while, > after a while I'll have no idea what pg_createsubscriber_569853 or > pg_createsubscriber_459548_2348239 was meant for. > > This is particularly annoying because renaming these objects later is > either not supported at all (e.g. for replication slots), or may be > quite difficult (e.g. publications). > > I do realize there are challenges with custom names (say, if there are > multiple databases to replicate), but can't we support some simple > formatting with basic placeholders? So we could specify > > --slot-name "myslot_%d_%p" > > or something like that? Not sure we can do in the first version, but looks nice. One concern is that I cannot find applications which accepts escape strings like log_line_prefix. (It may just because we do not have use-case.) Do you know examples? > BTW what will happen if we convert multiple standbys? Can't they all get > the same slot name (they all have the same database OID, and I'm not > sure how much entropy the PID has)? I tested and the second try did not work. The primal reason was the name of publication - pg_createsubscriber_%u (oid). FYI - previously we can reuse same publications, but based on my comment [2] the feature was removed. It might be too optimistic. [1]: https://www.postgresql.org/message-id/TY3PR01MB9889CCBD4D9DAF8BD2F18541F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com [2]: https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: speed up a logical replica setup
On Thu, 7 Mar 2024 at 18:31, Shlok Kyal wrote: > > Hi, > > > Thanks for the feedback. I'm attaching v26 that addresses most of your > > comments > > and some issues pointed by Vignesh [1]. > > I have created a top-up patch v27-0004. It contains additional test > cases for pg_createsubscriber. > > Currently, two testcases (in v27-0004 patch) are failing. These > failures are related to running pg_createsubscriber on nodes in > cascade physical replication and are already reported in [1] and [2]. > I think these cases should be fixed. Thoughts? We can fix these issues, if we are not planning to fix any of them, we can add documentation for the same. > The idea of this patch is to keep track of testcases, so that any > future patch does not break any scenario which has already been worked > on. These testcases can be used to test in our development process, > but which test should actually be committed, can be discussed later. > Thought? Few comments for v27-0004-Add-additional-testcases.patch: 1) We could use command_fails_like to verify the reason of the error: +# set max_replication_slots +$node_p->append_conf('postgresql.conf', 'max_replication_slots = 3'); +$node_p->restart; +command_fails( + [ + 'pg_createsubscriber', '--verbose', + '--dry-run', '--pgdata', + $node_s->data_dir, '--publisher-server', + $node_p->connstr('pg1'), '--socket-directory', + $node_s->host, '--subscriber-port', + $node_s->port, '--database', + 'pg1', '--database', + 'pg2', + ], + 'max_replication_slots are less in number in publisher'); 2) Add a comment saying what is being verified +# set max_replication_slots +$node_p->append_conf('postgresql.conf', 'max_replication_slots = 3'); +$node_p->restart; +command_fails( + [ + 'pg_createsubscriber', '--verbose', + '--dry-run', '--pgdata', + $node_s->data_dir, '--publisher-server', + $node_p->connstr('pg1'), '--socket-directory', + $node_s->host, '--subscriber-port', + $node_s->port, '--database', + 'pg1', '--database', + 'pg2', + ], + 'max_replication_slots are less in number in publisher'); 3) We could rename this file something like pg_create_subscriber_failure_cases or something better: src/bin/pg_basebackup/t/041_tests.pl | 285 +++ 1 file changed, 285 insertions(+) create mode 100644 src/bin/pg_basebackup/t/041_tests.pl diff --git a/src/bin/pg_basebackup/t/041_tests.pl b/src/bin/pg_basebackup/t/041_tests.pl new file mode 100644 index 00..2889d60d54 --- /dev/null +++ b/src/bin/pg_basebackup/t/041_tests.pl @@ -0,0 +1,285 @@ +# Copyright (c) 2024, PostgreSQL Global Development Group 4) This success case is not required as this would have already been covered in 040_pg_createsubscriber.pl: +$node_p->append_conf('postgresql.conf', 'max_replication_slots = 4'); +$node_p->restart; +command_ok( + [ + 'pg_createsubscriber', '--verbose', + '--dry-run', '--pgdata', + $node_s->data_dir, '--publisher-server', + $node_p->connstr('pg1'), '--socket-directory', + $node_s->host, '--subscriber-port', + $node_s->port, '--database', + 'pg1', '--database', + 'pg2', + ], + 'max_replication_slots are accurate on publisher'); 5) We could use command_fails_like to verify the reason of the error: $node_s->append_conf('postgresql.conf', 'max_replication_slots = 1'); $node_s->restart; command_fails( [ 'pg_createsubscriber', '--verbose', '--dry-run', '--pgdata', $node_s->data_dir, '--publisher-server', $node_p->connstr('pg1'), '--socket-directory', $node_s->host, '--subscriber-port', $node_s->port, '--database', 'pg1', '--database', 'pg2', ], 'max_replication_slots are less in number in subscriber'); 6) Add a comment saying what is being verified $node_s->append_conf('postgresql.conf', 'max_replication_slots = 1'); $node_s->restart; command_fails( [ 'pg_createsubscriber', '--verbose', '--dry-run', '--pgdata', $node_s->data_dir, '--publisher-server', $node_p->connstr('pg1'), '--socket-directory', $node_s->host, '--subscriber-port', $node_s->port, '--database', 'pg1', '--database', 'pg2', ], 'max_replication_slots are less in number in subscriber'); 7) This success case is not required as this would have already been covered in 040_pg_createsubscriber.pl: $node_s->append_conf('postgresql.conf', 'max_replication_slots = 2'); $node_s->restart; command_ok( [ 'pg_createsubscriber', '--verbose', '--dry-run', '--pgdata', $node_s->data_dir, '--publisher-server', $node_p->connstr('pg1'), '--socket-directory', $node_s->host, '--subscriber-port', $node_s->port, '--database', 'pg1', '--database', 'pg2', ], 'max_replication_slots are less in number in subscriber'); 8) We could use
Re: speed up a logical replica setup
Hi, I decided to take a quick look on this patch today, to see how it works and do some simple tests. I've only started to get familiar with it, so I have only some comments / questions regarding usage, not on the code. It's quite possible I didn't understand some finer points, or maybe it was already discussed earlier in this very long thread, so please feel free to push back or point me to the past discussion. Also, some of this is rather opinionated, but considering I didn't see this patch before, my opinions may easily be wrong ... 1) SGML docs It seems the SGML docs are more about explaining how this works on the inside, rather than how to use the tool. Maybe that's intentional, but as someone who didn't work with pg_createsubscriber before I found it confusing and not very helpful. For example, the first half of the page is prerequisities+warning, and sure those are useful details, but prerequisities are checked by the tool (so I can't really miss this) and warnings go into a lot of details about different places where things may go wrong. Sure, worth knowing and including in the docs, but maybe not right at the beginning, before I learn how to even run the tool? Maybe that's just me, though. Also, I'm sure it's not the only part of our docs like this. Perhaps it'd be good to reorganize the content a bit to make the "how to use" stuff more prominent? 2) this is a bit vague ... pg_createsubscriber will check a few times if the connection has been reestablished to stream the required WAL. After a few attempts, it terminates with an error. What does "a few times" mean, and how many is "a few attempts"? Seems worth knowing when using this tool in environments where disconnections can happen. Maybe this should be configurable? 3) publication info For a while I was quite confused about which tables get replicated, until I realized the publication is FOR ALL TABLES. But I learned that from this thread, the docs say nothing about this. Surely that's an important detail that should be mentioned? 4) Is FOR ALL TABLES a good idea? I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm sure it won't work for a number of use cases. I know large databases it's common to create "work tables" (not necessarily temporary) as part of a batch job, but there's no need to replicate those tables. AFAIK that'd break this FOR ALL TABLES publication, because the tables will qualify for replication, but won't be present on the subscriber. Or did I miss something? I do understand that FOR ALL TABLES is the simplest approach, and for v1 it may be an acceptable limitation, but maybe it'd be good to also support restricting which tables should be replicated (e.g. blacklist or whitelist based on table/schema name?). BTW if I'm right and creating a table breaks the subscriber creation, maybe it'd be good to explicitly mention that in the docs. Note: I now realize this might fall under the warning about DDL, which says this: Executing DDL commands on the source server while running pg_createsubscriber is not recommended. If the target server has already been converted to logical replica, the DDL commands must not be replicated so an error would occur. But I find this confusing. Surely there are many DDL commands that have absolutely no impact on logical replication (like creating an index or view, various ALTER TABLE flavors, and so on). And running such DDL certainly does not trigger error, right? 5) slot / publication / subscription name I find it somewhat annoying it's not possible to specify names for objects created by the tool - replication slots, publication and subscriptions. If this is meant to be a replica running for a while, after a while I'll have no idea what pg_createsubscriber_569853 or pg_createsubscriber_459548_2348239 was meant for. This is particularly annoying because renaming these objects later is either not supported at all (e.g. for replication slots), or may be quite difficult (e.g. publications). I do realize there are challenges with custom names (say, if there are multiple databases to replicate), but can't we support some simple formatting with basic placeholders? So we could specify --slot-name "myslot_%d_%p" or something like that? BTW what will happen if we convert multiple standbys? Can't they all get the same slot name (they all have the same database OID, and I'm not sure how much entropy the PID has)? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: speed up a logical replica setup
On Thu, 7 Mar 2024 at 10:05, Euler Taveira wrote: > > On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote: > > Thanks for updating the patch! > > > Thanks for the feedback. I'm attaching v26 that addresses most of your > comments > and some issues pointed by Vignesh [1]. Few comments: 1) We are disconnecting database again in error case too, it will lead to a double free in this scenario, + PQclear(res); + + disconnect_database(conn, false); + + if (max_repslots < num_dbs) + { + pg_log_error("subscriber requires %d replication slots, but only %d remain", +num_dbs, max_repslots); + pg_log_error_hint("Consider increasing max_replication_slots to at least %d.", + num_dbs); + disconnect_database(conn, true); + } + + if (max_lrworkers < num_dbs) + { + pg_log_error("subscriber requires %d logical replication workers, but only %d remain", +num_dbs, max_lrworkers); + pg_log_error_hint("Consider increasing max_logical_replication_workers to at least %d.", + num_dbs); + disconnect_database(conn, true); + } pg_createsubscriber: error: subscriber requires 5 logical replication workers, but only 4 remain pg_createsubscriber: hint: Consider increasing max_logical_replication_workers to at least 5. free(): double free detected in tcache 2 Aborted 2) We can also check that the primary is not using synchronous_standby_names, else all the transactions in the primary will wait infinitely once the standby server is stopped, this could be added in the documentation: +/* + * Is the primary server ready for logical replication? + */ +static void +check_publisher(struct LogicalRepInfo *dbinfo) +{ + PGconn *conn; + PGresult *res; + + char *wal_level; + int max_repslots; + int cur_repslots; + int max_walsenders; + int cur_walsenders; + + pg_log_info("checking settings on publisher"); + + conn = connect_database(dbinfo[0].pubconninfo, true); + + /* +* If the primary server is in recovery (i.e. cascading replication), +* objects (publication) cannot be created because it is read only. +*/ + if (server_is_in_recovery(conn)) + { + pg_log_error("primary server cannot be in recovery"); + disconnect_database(conn, true); + } 3) This check is present only for publication, we do not have this in case of creating a subscription. We can keep both of them similar, i.e. have the check in both or don't have the check for both publication and subscription: + /* Check if the publication already exists */ + appendPQExpBuffer(str, + "SELECT 1 FROM pg_catalog.pg_publication " + "WHERE pubname = '%s'", + dbinfo->pubname); + res = PQexec(conn, str->data); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not obtain publication information: %s", +PQresultErrorMessage(res)); + disconnect_database(conn, true); + } + 4) Few options are missing: + + + pg_createsubscriber + option + + + -d + --database + +dbname + + -D + --pgdata + +datadir + + -P + --publisher-server + +connstr + + + 4a) -n, --dry-run 4b) -p, --subscriber-port 4c) -r, --retain 4d) -s, --socket-directory 4e) -t, --recovery-timeout 4f) -U, --subscriber-username 5) typo connnections should be connections + +The port number on which the target server is listening for connections. +Defaults to running the target server on port 50432 to avoid unintended +client connnections. 6) repliation should be replication + * Create the subscriptions, adjust the initial location for logical + * replication and enable the subscriptions. That's the last step for logical + * repliation setup. 7) I did not notice these changes in the latest patch: diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index d808aad8b0..08de2bf4e6 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -517,6 +517,7 @@ CreateSeqStmt CreateStatsStmt CreateStmt CreateStmtContext +CreateSubscriberOptions CreateSubscriptionStmt CreateTableAsStmt CreateTableSpaceStmt @@ -1505,6 +1506,7 @@ LogicalRepBeginData LogicalRepCommitData LogicalRepCommitPreparedTxnData LogicalRepCtxStruct +LogicalRepInfo LogicalRepMsgType LogicalRepPartMapEntry LogicalRepPreparedTxnData Regards,
Re: speed up a logical replica setup
On Wed, Mar 6, 2024, at 8:24 AM, vignesh C wrote: > Few comments: Thanks for your review. Some changes are included in v26. > 1) Can we use strdup here instead of atoi, as we do similarly in > case of pg_dump too, else we will do double conversion, convert using > atoi and again to string while forming the connection string: > + case 'p': > + if ((opt.sub_port = atoi(optarg)) <= 0) > + pg_fatal("invalid subscriber > port number"); > + break; I don't have a strong preference but decided to provide a patch for it. See v26-0003. > 2) We can have some valid range for this, else we will end up in some > unexpected values when a higher number is specified: > + case 't': > + opt.recovery_timeout = atoi(optarg); > + break; I wouldn't like to add an arbitrary value. Suggestions? > 3) Now that we have addressed most of the items, can we handle this TODO: > + /* > +* TODO use primary_conninfo (if available) from subscriber > and > +* extract publisher connection string. Assume that there are > +* identical entries for physical and logical > replication. If there is > +* not, we would fail anyway. > +*/ > + pg_log_error("no publisher connection string specified"); > + pg_log_error_hint("Try \"%s --help\" for more > information.", progname); > + exit(1); It is not in my top priority at the moment. > 4) By default the log level as info here, I was not sure how to set > it to debug level to get these error messages: > + pg_log_debug("publisher(%d): connection string: %s", > i, dbinfo[i].pubconninfo); > + pg_log_debug("subscriber(%d): connection string: %s", > i, dbinfo[i].subconninfo); -v --verbose Enables verbose mode. This will cause pg_createsubscriber to output progress messages and detailed information about each step to standard error. Repeating the option causes additional debug-level messages to appear on standard error. > 5) Currently in non verbose mode there are no messages printed on > console, we could have a few of them printed irrespective of verbose > or not like the following: > a) creating publication > b) creating replication slot > c) waiting for the target server to reach the consistent state > d) If pg_createsubscriber fails after this point, you must recreate > the physical replica before continuing. > e) creating subscription That's the idea. Quiet mode by default. > 6) The message should be "waiting for the target server to reach the > consistent state": > +#define NUM_CONN_ATTEMPTS 5 > + > + pg_log_info("waiting the target server to reach the consistent > state"); > + > + conn = connect_database(conninfo, true); Fixed. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Wed, Mar 6, 2024, at 7:02 AM, Hayato Kuroda (Fujitsu) wrote: > Thanks for updating the patch! Thanks for the feedback. I'm attaching v26 that addresses most of your comments and some issues pointed by Vignesh [1]. > >* I also removed the check for standby is running. If the standby was > >stopped a > > long time ago, it will take some time to reach the start point. > >* Dry run mode has to start / stop the service to work correctly. Is it an > > issue? > > One concern (see below comment) is that -l option would not be passed even if > the standby has been logging before running pg_createsubscriber. Also, some > settings > passed by pg_ctl start -o would not be restored. That's a good point. We should state in the documentation that GUCs specified in the command-line options are ignored during the execution. > >However, I decided to include --retain option, I'm thinking about to remove > >it. > >If the logging is enabled, the information during the pg_createsubscriber > >will > >be available. The client log can be redirected to a file for future > >inspection. > > Just to confirm - you meant to say like below, right? > * the client output would be redirected, and > * -r option would be removed. Yes. The logging_collector is usually enabled or the syslog is collecting the log entries. Under reflection, another log directory to store entries for a short period of time doesn't seem a good idea. It divides the information and it also costs development time. The questions that make me think about it were: Should I remove the pg_createsubscriber_output.d directory if it runs successfully? What if there is an old file there? Is it another directory to exclude while taking a backup? I also don't like the long directory name. > Here are my initial comments for v25-0001. I read new doc and looks very good. > I may do reviewing more about v25-0001, but feel free to revise. > > 01. cleanup_objects_atexit > ``` > PGconn*conn; > int i; > ``` > The declaration *conn can be in the for-loop. Also, the declaration of the > indicator can be in the bracket. Changed. > 02. cleanup_objects_atexit > ``` > > /* > * If a connection could not be established, inform the user > * that some objects were left on primary and should be > * removed before trying again. > */ > if (dbinfo[i].made_publication) > { > pg_log_warning("There might be a publication \"%s\" in database \"%s\" on > primary", >dbinfo[i].pubname, dbinfo[i].dbname); > pg_log_warning_hint("Consider dropping this publication before trying > again."); > } > if (dbinfo[i].made_replslot) > { > pg_log_warning("There might be a replication slot \"%s\" in database \"%s\" > on primary", >dbinfo[i].subname, dbinfo[i].dbname); > pg_log_warning_hint("Drop this replication slot soon to avoid retention of > WAL files."); > } > ``` > > Not sure which is better, but we may able to the list to the concrete file > like pg_upgrade. > (I thought it had been already discussed, but could not find from the > archive. Sorry if it was a duplicated comment) Do you mean the replication slot file? I think the replication slot and the server (primary) is sufficient for checking and fixing if required. > 03. main > ``` > while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v", > long_options, _index)) != -1) > ``` > > Missing update for __shortopts. Fixed. > 04. main > ``` > case 'D': > opt.subscriber_dir = pg_strdup(optarg); > canonicalize_path(opt.subscriber_dir); > break; > ... > case 'P': > opt.pub_conninfo_str = pg_strdup(optarg); > break; > ... > case 's': > opt.socket_dir = pg_strdup(optarg); > break; > ... > case 'U': > opt.sub_username = pg_strdup(optarg); > break; > ``` > > Should we consider the case these options would be specified twice? > I.e., should we call pg_free() before the substitution? It isn't a concern for the other client tools. I think the reason is that it doesn't continue to leak memory during the execution. I wouldn't bother with it. > 05. main > > Missing canonicalize_path() to the socket_dir. Fixed. > 06. main > ``` > /* > * If socket directory is not provided, use the current directory. > */ > ``` > > One-line comment can be used. Period can be also removed at that time. Fixed. > 07. main > ``` > /* > * > * If subscriber username is not provided, check if the environment > * variable sets it. If not, obtain the operating system name of the user > * running it. > */ > ``` > Unnecessary blank. Fixed. > 08. main > ``` > char*errstr = NULL; > ``` > > This declaration can be at else-part. Fixed. > 09. main. > > Also, as the first place, do we have to get username if not specified? > I felt libpq can handle the case if we skip passing the info. Are you suggesting that the username should be optional? > 10. main > ``` > appendPQExpBuffer(sub_conninfo_str, "host=%s port=%u user=%s > fallback_application_name=%s", > opt.socket_dir, opt.sub_port, opt.sub_username, progname); > sub_base_conninfo =
Re: speed up a logical replica setup
On Sat, 2 Mar 2024 at 02:19, Euler Taveira wrote: > > On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote: > > Based on idea from Euler, I roughly implemented. Thought? > > 0001-0013 were not changed from the previous version. > > V24-0014: addressed your comment in the replied e-mail. > V24-0015: Add disconnect_database() again, per [3] > V24-0016: addressed your comment in [4]. > V24-0017: addressed your comment in [5]. > V24-0018: addressed your comment in [6]. > > > Thanks for your review. I'm attaching v25 that hopefully addresses all pending > points. > > Regarding your comments [1] on v21, I included changes for almost all items > except 2, 20, 23, 24, and 25. It still think item 2 is not required because > pg_ctl will provide a suitable message. I decided not to rearrange the block > of > SQL commands (item 20) mainly because it would avoid these objects on node_f. > Do we really need command_checks_all? Depending on the output it uses > additional cycles than command_ok. > > In summary: > > v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, > I > checked what you wrote and fix some gaps in what I've been written. > v24-0003: as I said I don't think we need to add it, however, I won't fight > against it if people want to add this check. > v24-0004: I spent some time on it. This patch is not completed. I cleaned it > up > and include the start_standby_server code. It starts the server using the > specified socket directory, port and username, hence, preventing external > client > connections during the execution. > v24-0005: partially applied > v24-0006: applied with cosmetic change > v24-0007: applied with cosmetic change > v24-0008: applied > v24-0009: applied with cosmetic change > v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is > not used when there is a database connection open. Instead, pg_log_error() > followed by disconnect_database(). In cases that it should exit immediately, > disconnect_database() has a new parameter (exit_on_error) that controls if it > needs to call exit(1). I go ahead and did the same for connect_database(). > v24-0011: partially applied. I included some of the suggestions (18, 19, and > 21). > v24-0012: not applied. Under reflection, after working on v24-0004, the target > server will start with new parameters (that only accepts local connections), > hence, during the execution is not possible anymore to detect if the target > server is a primary for another server. I added a sentence for it in the > documentation (Warning section). > v24-0013: good catch. Applied. > v24-0014: partially applied. After some experiments I decided to use a small > number of attempts. The current code didn't reset the counter if the > connection > is reestablished. I included the documentation suggestion. I didn't include > the > IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead, > I refactored the drop_publication() to not try again if the DROP PUBLICATION > failed. > v24-0015: not applied. I refactored the exit code to do the right thing: print > error message, disconnect database (if applicable) and exit. > v24-0016: not applied. But checked if the information was presented in the > documentation; it is. > v24-0017: good catch. Applied. > v24-0018: not applied. > > I removed almost all boolean return and include the error logic inside the > function. It reads better. I also changed the connect|disconnect_database > functions to include the error logic inside it. There is a new parameter > on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I > think someone suggested it -- and the error message was moved to outside the > function. I improved the cleanup routine. It provides useful information if it > cannot remove the object (publication or replication slot) from the primary. > > Since I applied v24-0004, I realized that extra start / stop service are > required. It mean pg_createsubscriber doesn't start the transformation with > the > current standby settings. Instead, it stops the standby if it is running and > start it with the provided command-line options (socket, port, > listen_addresses). It has a few drawbacks: > * See v34-0012. It cannot detect if the target server is a primary for another > server. It is documented. > * I also removed the check for standby is running. If the standby was stopped > a > long time ago, it will take some time to reach the start point. > * Dry run mode has to start / stop the service to work correctly. Is it an > issue? > > However, I decided to include --retain option, I'm thinking about to remove > it. > If the logging is enabled, the information during the pg_createsubscriber will > be available. The client log can be redirected to a file for future > inspection. > > Comments? Few comments: 1) Can we use strdup here instead of atoi, as we do similarly in case of pg_dump too, else we will do double
RE: speed up a logical replica setup
Dear Euler, Thanks for updating the patch! >v24-0003: as I said I don't think we need to add it, however, I won't fight >against it if people want to add this check. OK, let's wait comments from senior members. >Since I applied v24-0004, I realized that extra start / stop service are >required. It mean pg_createsubscriber doesn't start the transformation with the >current standby settings. Instead, it stops the standby if it is running and >start it with the provided command-line options (socket, port, >listen_addresses). It has a few drawbacks: >* See v34-0012. It cannot detect if the target server is a primary for another > server. It is documented. Yeah, It is a collateral damage. >* I also removed the check for standby is running. If the standby was stopped a > long time ago, it will take some time to reach the start point. >* Dry run mode has to start / stop the service to work correctly. Is it an > issue? One concern (see below comment) is that -l option would not be passed even if the standby has been logging before running pg_createsubscriber. Also, some settings passed by pg_ctl start -o would not be restored. >However, I decided to include --retain option, I'm thinking about to remove it. >If the logging is enabled, the information during the pg_createsubscriber will >be available. The client log can be redirected to a file for future inspection. Just to confirm - you meant to say like below, right? * the client output would be redirected, and * -r option would be removed. Here are my initial comments for v25-0001. I read new doc and looks very good. I may do reviewing more about v25-0001, but feel free to revise. 01. cleanup_objects_atexit ``` PGconn *conn; int i; ``` The declaration *conn can be in the for-loop. Also, the declaration of the indicator can be in the bracket. 02. cleanup_objects_atexit ``` /* * If a connection could not be established, inform the user * that some objects were left on primary and should be * removed before trying again. */ if (dbinfo[i].made_publication) { pg_log_warning("There might be a publication \"%s\" in database \"%s\" on primary", dbinfo[i].pubname, dbinfo[i].dbname); pg_log_warning_hint("Consider dropping this publication before trying again."); } if (dbinfo[i].made_replslot) { pg_log_warning("There might be a replication slot \"%s\" in database \"%s\" on primary", dbinfo[i].subname, dbinfo[i].dbname); pg_log_warning_hint("Drop this replication slot soon to avoid retention of WAL files."); } ``` Not sure which is better, but we may able to the list to the concrete file like pg_upgrade. (I thought it had been already discussed, but could not find from the archive. Sorry if it was a duplicated comment) 03. main ``` while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v", long_options, _index)) != -1) ``` Missing update for __shortopts. 04. main ``` case 'D': opt.subscriber_dir = pg_strdup(optarg); canonicalize_path(opt.subscriber_dir); break; ... case 'P': opt.pub_conninfo_str = pg_strdup(optarg); break; ... case 's': opt.socket_dir = pg_strdup(optarg); break; ... case 'U': opt.sub_username = pg_strdup(optarg); break; ``` Should we consider the case these options would be specified twice? I.e., should we call pg_free() before the substitution? 05. main Missing canonicalize_path() to the socket_dir. 06. main ``` /* * If socket directory is not provided, use the current directory. */ ``` One-line comment can be used. Period can be also removed at that time. 07. main ``` /* * * If subscriber username is not provided, check if the environment * variable sets it. If not, obtain the operating system name of the user * running it. */ ``` Unnecessary blank. 08. main ``` char *errstr = NULL; ``` This
Re: speed up a logical replica setup
On Tue, Mar 5, 2024, at 12:48 AM, Shubham Khanna wrote: > I applied the v25 patch and did RUN the 'pg_createsubscriber' command. > I was unable to execute it and experienced the following error: > > $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432 > dbname=postgres" -d postgres -d db1 -p 9000 -r > ./pg_createsubscriber: invalid option -- 'p' > pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more > information. Oops. Good catch! I will post an updated patch soon. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
Hi, > I applied the v25 patch and did RUN the 'pg_createsubscriber' command. > I was unable to execute it and experienced the following error: > > $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432 > dbname=postgres" -d postgres -d db1 -p 9000 -r > ./pg_createsubscriber: invalid option -- 'p' > pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more > information. > > Here, the --p is not accepting the desired port number. Thoughts? I investigated it and found that: + while ((c = getopt_long(argc, argv, "d:D:nP:rS:t:v", + long_options, _index)) != -1) + { Here 'p', 's' and 'U' options are missing so we are getting the error. Also, I think the 'S' option should be removed from here. I also see that specifying long options like --subscriber-port, --subscriber-username, --socket-directory works fine. Thoughts? Thanks and regards, Shlok Kyal
Re: speed up a logical replica setup
On Sat, Mar 2, 2024 at 2:19 AM Euler Taveira wrote: > > On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote: > > Based on idea from Euler, I roughly implemented. Thought? > > 0001-0013 were not changed from the previous version. > > V24-0014: addressed your comment in the replied e-mail. > V24-0015: Add disconnect_database() again, per [3] > V24-0016: addressed your comment in [4]. > V24-0017: addressed your comment in [5]. > V24-0018: addressed your comment in [6]. > > > Thanks for your review. I'm attaching v25 that hopefully addresses all pending > points. > > Regarding your comments [1] on v21, I included changes for almost all items > except 2, 20, 23, 24, and 25. It still think item 2 is not required because > pg_ctl will provide a suitable message. I decided not to rearrange the block > of > SQL commands (item 20) mainly because it would avoid these objects on node_f. > Do we really need command_checks_all? Depending on the output it uses > additional cycles than command_ok. > > In summary: > > v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, > I > checked what you wrote and fix some gaps in what I've been written. > v24-0003: as I said I don't think we need to add it, however, I won't fight > against it if people want to add this check. > v24-0004: I spent some time on it. This patch is not completed. I cleaned it > up > and include the start_standby_server code. It starts the server using the > specified socket directory, port and username, hence, preventing external > client > connections during the execution. > v24-0005: partially applied > v24-0006: applied with cosmetic change > v24-0007: applied with cosmetic change > v24-0008: applied > v24-0009: applied with cosmetic change > v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is > not used when there is a database connection open. Instead, pg_log_error() > followed by disconnect_database(). In cases that it should exit immediately, > disconnect_database() has a new parameter (exit_on_error) that controls if it > needs to call exit(1). I go ahead and did the same for connect_database(). > v24-0011: partially applied. I included some of the suggestions (18, 19, and > 21). > v24-0012: not applied. Under reflection, after working on v24-0004, the target > server will start with new parameters (that only accepts local connections), > hence, during the execution is not possible anymore to detect if the target > server is a primary for another server. I added a sentence for it in the > documentation (Warning section). > v24-0013: good catch. Applied. > v24-0014: partially applied. After some experiments I decided to use a small > number of attempts. The current code didn't reset the counter if the > connection > is reestablished. I included the documentation suggestion. I didn't include > the > IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead, > I refactored the drop_publication() to not try again if the DROP PUBLICATION > failed. > v24-0015: not applied. I refactored the exit code to do the right thing: print > error message, disconnect database (if applicable) and exit. > v24-0016: not applied. But checked if the information was presented in the > documentation; it is. > v24-0017: good catch. Applied. > v24-0018: not applied. > > I removed almost all boolean return and include the error logic inside the > function. It reads better. I also changed the connect|disconnect_database > functions to include the error logic inside it. There is a new parameter > on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I > think someone suggested it -- and the error message was moved to outside the > function. I improved the cleanup routine. It provides useful information if it > cannot remove the object (publication or replication slot) from the primary. > > Since I applied v24-0004, I realized that extra start / stop service are > required. It mean pg_createsubscriber doesn't start the transformation with > the > current standby settings. Instead, it stops the standby if it is running and > start it with the provided command-line options (socket, port, > listen_addresses). It has a few drawbacks: > * See v34-0012. It cannot detect if the target server is a primary for another > server. It is documented. > * I also removed the check for standby is running. If the standby was stopped > a > long time ago, it will take some time to reach the start point. > * Dry run mode has to start / stop the service to work correctly. Is it an > issue? > > However, I decided to include --retain option, I'm thinking about to remove > it. > If the logging is enabled, the information during the pg_createsubscriber will > be available. The client log can be redirected to a file for future > inspection. > > Comments? I applied the v25 patch and did RUN the 'pg_createsubscriber' command. I was unable to execute it and experienced the following
Re: speed up a logical replica setup
g_createsubscriber: creates a new logical replica from a standby server It must be run on the target server and should be able to connect to the source server (publisher) and the target server (subscriber). Some prerequisites must be met to successfully run it. It is basically the logical replication requirements. It starts creating a publication for each specified database. After that, it stops the target server. One temporary replication slot is created to get the replication start point. It is used as (a) a stopping point for the recovery process and (b) a starting point for the subscriptions. Write recovery parameters into the target data directory and start the target server. Wait until the target server is promoted. Create one subscription per specified database (using replication slot and publication created in a previous step) on the target server. Set the replication progress to the replication start point for each subscription. Enable the subscription for each specified database on the target server. And finally, change the system identifier on the target server. Depending on your workload and database size, creating a logical replica couldn't be an option due to resource constraints (WAL backlog should be available until all table data is synchronized). The initial data copy and the replication progress tends to be faster on a physical replica. The purpose of this tool is to speed up a logical replica setup. --- doc/src/sgml/ref/allfiles.sgml|1 + doc/src/sgml/ref/pg_createsubscriber.sgml | 523 + doc/src/sgml/reference.sgml |1 + src/bin/pg_basebackup/.gitignore |1 + src/bin/pg_basebackup/Makefile| 11 +- src/bin/pg_basebackup/meson.build | 19 + src/bin/pg_basebackup/pg_createsubscriber.c | 2038 + .../t/040_pg_createsubscriber.pl | 214 ++ 8 files changed, 2805 insertions(+), 3 deletions(-) create mode 100644 doc/src/sgml/ref/pg_createsubscriber.sgml create mode 100644 src/bin/pg_basebackup/pg_createsubscriber.c create mode 100644 src/bin/pg_basebackup/t/040_pg_createsubscriber.pl diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index 4a42999b18..f5be638867 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -205,6 +205,7 @@ Complete list of usable sgml source files in this directory. + diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml new file mode 100644 index 00..44705b2c52 --- /dev/null +++ b/doc/src/sgml/ref/pg_createsubscriber.sgml @@ -0,0 +1,523 @@ + + + + + pg_createsubscriber + + + + pg_createsubscriber + 1 + Application + + + + pg_createsubscriber + convert a physical replica into a new logical replica + + + + + pg_createsubscriber + option + + + -d + --database + +dbname + + -D + --pgdata + +datadir + + -P + --publisher-server + +connstr + + + + + + Description + +pg_createsubscriber creates a new logical +replica from a physical standby server. It must be run at the target server. + + + + The pg_createsubscriber targets large database + systems because it speeds up the logical replication setup. For smaller + databases, initial data synchronization + is recommended. + + + + There are some prerequisites for pg_createsubscriber + to convert the target server into a logical replica. If these are not met an + error will be reported. + + + + + + The source and target servers must have the same major version as the + pg_createsubscriber. + + + + + + The given target data directory must have the same system identifier than + the source data directory. If a standby server is running on the target + data directory or it is a base backup from the source data directory, + system identifiers are the same. + + + + + + The target server must be used as a physical standby. + + + + + + The given database user for the target data directory must have privileges + for creating subscriptions + and using pg_replication_origin_advance(). + + + + + + The target server must have + max_replication_slots + and + max_logical_replication_workers + configured to a value greater than or equal to the number of specified databases. + + + + + + The target server must have + max_worker_processes + configured to a value greater than the number of specified databases. + + + + + + The target server must accept local connections. + + + + + + The source server must accept connections from the target server. + + + + + + The source server must not be in recovery. Publications cannot be created +
Re: speed up a logical replica setup
On 2024-02-27 Tu 05:02, Hayato Kuroda (Fujitsu) wrote: Dear Euler, Sorry, which pg_rewind option did you mention? I cannot find. IIUC, -l is an only option which can accept the path, but it is not related with us. Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so changed. The reason why --config-file exists is that pg_rewind requires that target must be stopped, which is different from the current pg_createsubscriber. So I still do not like to add options. [1]: https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename [2]: ``` The target server must be shut down cleanly before running pg_rewind ``` Even though that is a difference I'd still rather we did more or less the same thing more or less the same way across utilities, so I agree with Euler's suggestion. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
RE: speed up a logical replica setup
Dear Euler, > Sorry, which pg_rewind option did you mention? I cannot find. > IIUC, -l is an only option which can accept the path, but it is not related > with us. Sorry, I found [1]. I was confused with pg_resetwal. But my opinion was not so changed. The reason why --config-file exists is that pg_rewind requires that target must be stopped, which is different from the current pg_createsubscriber. So I still do not like to add options. [1]: https://www.postgresql.org/docs/current/app-pgrewind.html#:~:text=%2D%2Dconfig%2Dfile%3Dfilename [2]: ``` The target server must be shut down cleanly before running pg_rewind ``` Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
RE: speed up a logical replica setup
Dear Euler, >> The possible solution would be >> 1) allow to run pg_createsubscriber if standby is initially stopped . >> I observed that pg_logical_createsubscriber also uses this approach. >> 2) read GUCs via SHOW command and restore them when server restarts >> >3. add a config-file option. That's similar to what pg_rewind does. Sorry, which pg_rewind option did you mention? I cannot find. IIUC, -l is an only option which can accept the path, but it is not related with us. Also, I'm not sure the benefit to add as new options. Basically it should be less. Is there benefits than read via SHOW? Even if I assume the pg_resetwal has such an option, the reason is that the target postmaster for pg_resetwal must be stopped. >I expect >that Debian-based installations will have this issue. I'm not familiar with the Debian-env, so can you explain the reason? >It was not a good idea if you want to keep the postgresql.conf outside PGDATA. >I mean you need extra steps that can be error prone (different settings between >files). Yeah, if we use my approach, users who specify such GUCs may not be happy. So...based on above discussion, we should choose either of below items. Thought? a) enforce the standby must be *stopped*, and options like config_file can be specified via option. b) enforce the standby must be *running*, options like config_file would be read via SHOW command. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/global/
RE: speed up a logical replica setup
Dear Vignesh, I forgot to reply one of the suggestion. > 2) There is a CheckDataVersion function which does exactly this, will > we be able to use this: > + /* Check standby server version */ > + if ((ver_fd = fopen(versionfile, "r")) == NULL) > + pg_fatal("could not open file \"%s\" for reading: %m", > versionfile); > + > + /* Version number has to be the first line read */ > + if (!fgets(rawline, sizeof(rawline), ver_fd)) > + { > + if (!ferror(ver_fd)) > + pg_fatal("unexpected empty file \"%s\"", versionfile); > + else > + pg_fatal("could not read file \"%s\": %m", > versionfile); > + } > + > + /* Strip trailing newline and carriage return */ > + (void) pg_strip_crlf(rawline); > + > + if (strcmp(rawline, PG_MAJORVERSION) != 0) > + { > + pg_log_error("standby server is of wrong version"); > + pg_log_error_detail("File \"%s\" contains \"%s\", > which is not compatible with this program's version \"%s\".", > + versionfile, > rawline, PG_MAJORVERSION); > + exit(1); > + } > + > + fclose(ver_fd); This suggestion has been rejected because I was not sure the location of the declaration and the implementation. Function which could be called from clients must be declared in src/include/{common|fe_utils|utils} directory. I saw files located at there but I could not appropriate location for CheckDataVersion. Also, I did not think new file should be created only for this function. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: speed up a logical replica setup
On Fri, Feb 23, 2024 at 8:16 AM Euler Taveira wrote: > > On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote: > > I found some issues and fixed those issues with top up patches > v23-0012 and v23-0013 > 1. > Suppose there is a cascade physical replication node1->node2->node3. > Now if we run pg_createsubscriber with node1 as primary and node2 as > standby, pg_createsubscriber will be successful but the connection > between node2 and node3 will not be retained and log og node3 will > give error: > 2024-02-20 12:32:12.340 IST [277664] FATAL: database system > identifier differs between the primary and standby > 2024-02-20 12:32:12.340 IST [277664] DETAIL: The primary's identifier > is 7337575856950914038, the standby's identifier is > 7337575783125171076. > 2024-02-20 12:32:12.341 IST [277491] LOG: waiting for WAL to become > available at 0/3000F10 > > To fix this I am avoiding pg_createsubscriber to run if the standby > node is primary to any other server. > Made the change in v23-0012 patch > > > IIRC we already discussed the cascading replication scenario. Of course, > breaking a node is not good that's why you proposed v23-0012. However, > preventing pg_createsubscriber to run if there are standbys attached to it is > also annoying. If you don't access to these hosts you need to (a) kill > walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3) > add a firewall rule to prevent that these hosts do not establish a connection > to the target server. I wouldn't like to include the patch as-is. IMO we need > at least one message explaining the situation to the user, I mean, add a hint > message. > Yeah, it could be a bit tricky for users to ensure that no follow-on standby is present but I think it is still better to give an error and prohibit running pg_createsubscriber than breaking the existing replication. The possible solution, in this case, is to allow running pg_basebackup via this tool or otherwise and then let the user use it to convert to a subscriber. It would be good to keep things simple for the first version then we can add such options like --force in subsequent versions. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote: > I found some issues and fixed those issues with top up patches > v23-0012 and v23-0013 > 1. > Suppose there is a cascade physical replication node1->node2->node3. > Now if we run pg_createsubscriber with node1 as primary and node2 as > standby, pg_createsubscriber will be successful but the connection > between node2 and node3 will not be retained and log og node3 will > give error: > 2024-02-20 12:32:12.340 IST [277664] FATAL: database system > identifier differs between the primary and standby > 2024-02-20 12:32:12.340 IST [277664] DETAIL: The primary's identifier > is 7337575856950914038, the standby's identifier is > 7337575783125171076. > 2024-02-20 12:32:12.341 IST [277491] LOG: waiting for WAL to become > available at 0/3000F10 > > To fix this I am avoiding pg_createsubscriber to run if the standby > node is primary to any other server. > Made the change in v23-0012 patch IIRC we already discussed the cascading replication scenario. Of course, breaking a node is not good that's why you proposed v23-0012. However, preventing pg_createsubscriber to run if there are standbys attached to it is also annoying. If you don't access to these hosts you need to (a) kill walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3) add a firewall rule to prevent that these hosts do not establish a connection to the target server. I wouldn't like to include the patch as-is. IMO we need at least one message explaining the situation to the user, I mean, add a hint message. I'm resistant to a new option but probably a --force option is an answer. There is no test coverage for it. I adjusted this patch (didn't include the --force option) and add a test case. > 2. > While checking 'max_replication_slots' in 'check_publisher' function, > we are not considering the temporary slot in the check: > + if (max_repslots - cur_repslots < num_dbs) > + { > + pg_log_error("publisher requires %d replication slots, but > only %d remain", > +num_dbs, max_repslots - cur_repslots); > + pg_log_error_hint("Consider increasing max_replication_slots > to at least %d.", > + cur_repslots + num_dbs); > + return false; > + } > Fixed this in v23-0013 Good catch! Both are included in the next patch. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
On Thu, Feb 22, 2024, at 9:43 AM, Hayato Kuroda (Fujitsu) wrote: > > The possible solution would be > > 1) allow to run pg_createsubscriber if standby is initially stopped . > > I observed that pg_logical_createsubscriber also uses this approach. > > 2) read GUCs via SHOW command and restore them when server restarts > > 3. add a config-file option. That's similar to what pg_rewind does. I expect that Debian-based installations will have this issue. > I also prefer the first solution. > Another reason why the standby should be stopped is for backup purpose. > Basically, the standby instance should be saved before running > pg_createsubscriber. > An easiest way is hard-copy, and the postmaster should be stopped at that > time. > I felt it is better that users can run the command immediately later the > copying. > Thought? It was not a good idea if you want to keep the postgresql.conf outside PGDATA. I mean you need extra steps that can be error prone (different settings between files). Shlok, I didn't read your previous email carefully. :-/ -- Euler Taveira EDB https://www.enterprisedb.com/
Re: speed up a logical replica setup
Hello, On 2024-Feb-22, Hayato Kuroda (Fujitsu) wrote: > Dear Alvaro, > > Hmm, but doesn't this mean that the server will log an ugly message > > that "client closed connection unexpectedly"? I think it's nicer to > > close the connection before terminating the process (especially > > since the code for that is already written). > > OK. So we should disconnect properly even if the process exits. I > added the function call again. Note that PQclear() was not added > because it is only related with the application. Sounds about right, but I didn't verify the patches in detail. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Re: speed up a logical replica setup
On Thu, 22 Feb 2024 at 21:17, Hayato Kuroda (Fujitsu) wrote: > > Dear Vignesh, > > > Few comments on the tests: > > 1) If the dry run was successful because of some issue then the server > > will be stopped so we can check for "pg_ctl status" if the server is > > running otherwise the connection will fail in this case. Another way > > would be to check if it does not have "postmaster was stopped" > > messages in the stdout. > > + > > +# Check if node S is still a standby > > +is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'), > > + 't', 'standby is in recovery'); > > Just to confirm - your suggestion is to add `pg_ctl status`, right? Added. Yes, that is correct. > > 2) Can we add verification of "postmaster was stopped" messages in > > the stdout for dry run without --databases testcase > > +# pg_createsubscriber can run without --databases option > > +command_ok( > > + [ > > + 'pg_createsubscriber', '--verbose', > > + '--dry-run', '--pgdata', > > + $node_s->data_dir, '--publisher-server', > > + $node_p->connstr('pg1'), '--subscriber-server', > > + $node_s->connstr('pg1') > > + ], > > + 'run pg_createsubscriber without --databases'); > > + > > Hmm, in case of --dry-run, the server would be never shut down. > See below part. > > ``` > if (!dry_run) > stop_standby_server(pg_ctl_path, opt.subscriber_dir); > ``` One way to differentiate whether the server is run in dry_run mode or not is to check if the server was stopped or not. So I mean we can check that the stdout does not have a "postmaster was stopped" message from the stdout. Can we add validation based on this code: + if (action) + pg_log_info("postmaster was started"); Or another way is to check pg_ctl status to see that the server is not shutdown. > > 3) This message "target server must be running" seems to be wrong, > > should it be cannot specify cascading replicating standby as standby > > node(this is for v22-0011 patch : > > + 'pg_createsubscriber', '--verbose', '--pgdata', > > $node_c->data_dir, > > + '--publisher-server', $node_s->connstr('postgres'), > > + '--port', $node_c->port, '--socketdir', $node_c->host, > > + '--database', 'postgres' > > ], > > - 'primary server is in recovery'); > > + 1, > > + [qr//], > > + [qr/primary server cannot be in recovery/], > > + 'target server must be running'); > > Fixed. > > > 4) Should this be "Wait for subscriber to catch up" > > +# Wait subscriber to catch up > > +$node_s->wait_for_subscription_sync($node_p, $subnames[0]); > > +$node_s->wait_for_subscription_sync($node_p, $subnames[1]); > > Fixed. > > > 5) Should this be 'Subscriptions has been created on all the specified > > databases' > > +); > > +is($result, qq(2), > > + 'Subscriptions has been created to all the specified databases' > > +); > > Fixed, but "has" should be "have". > > > 6) Add test to verify current_user is not a member of > > ROLE_PG_CREATE_SUBSCRIPTION, has no create permissions, has no > > permissions to execution replication origin advance functions > > > > 7) Add tests to verify insufficient max_logical_replication_workers, > > max_replication_slots and max_worker_processes for the subscription > > node > > > > 8) Add tests to verify invalid configuration of wal_level, > > max_replication_slots and max_wal_senders for the publisher node > > Hmm, but adding these checks may increase the test time. we should > measure the time and then decide. We can check and see if it does not take significantly more time, then we can have these tests. Regards, Vignesh
RE: speed up a logical replica setup
Dear Vignesh, > Few comments on the tests: > 1) If the dry run was successful because of some issue then the server > will be stopped so we can check for "pg_ctl status" if the server is > running otherwise the connection will fail in this case. Another way > would be to check if it does not have "postmaster was stopped" > messages in the stdout. > + > +# Check if node S is still a standby > +is($node_s->safe_psql('postgres', 'SELECT pg_catalog.pg_is_in_recovery()'), > + 't', 'standby is in recovery'); Just to confirm - your suggestion is to add `pg_ctl status`, right? Added. > 2) Can we add verification of "postmaster was stopped" messages in > the stdout for dry run without --databases testcase > +# pg_createsubscriber can run without --databases option > +command_ok( > + [ > + 'pg_createsubscriber', '--verbose', > + '--dry-run', '--pgdata', > + $node_s->data_dir, '--publisher-server', > + $node_p->connstr('pg1'), '--subscriber-server', > + $node_s->connstr('pg1') > + ], > + 'run pg_createsubscriber without --databases'); > + Hmm, in case of --dry-run, the server would be never shut down. See below part. ``` if (!dry_run) stop_standby_server(pg_ctl_path, opt.subscriber_dir); ``` > 3) This message "target server must be running" seems to be wrong, > should it be cannot specify cascading replicating standby as standby > node(this is for v22-0011 patch : > + 'pg_createsubscriber', '--verbose', '--pgdata', > $node_c->data_dir, > + '--publisher-server', $node_s->connstr('postgres'), > + '--port', $node_c->port, '--socketdir', $node_c->host, > + '--database', 'postgres' > ], > - 'primary server is in recovery'); > + 1, > + [qr//], > + [qr/primary server cannot be in recovery/], > + 'target server must be running'); Fixed. > 4) Should this be "Wait for subscriber to catch up" > +# Wait subscriber to catch up > +$node_s->wait_for_subscription_sync($node_p, $subnames[0]); > +$node_s->wait_for_subscription_sync($node_p, $subnames[1]); Fixed. > 5) Should this be 'Subscriptions has been created on all the specified > databases' > +); > +is($result, qq(2), > + 'Subscriptions has been created to all the specified databases' > +); Fixed, but "has" should be "have". > 6) Add test to verify current_user is not a member of > ROLE_PG_CREATE_SUBSCRIPTION, has no create permissions, has no > permissions to execution replication origin advance functions > > 7) Add tests to verify insufficient max_logical_replication_workers, > max_replication_slots and max_worker_processes for the subscription > node > > 8) Add tests to verify invalid configuration of wal_level, > max_replication_slots and max_wal_senders for the publisher node Hmm, but adding these checks may increase the test time. we should measure the time and then decide. > 9) We can use the same node name in comment and for the variable > +# Set up node P as primary > +$node_p = PostgreSQL::Test::Cluster->new('node_p'); > +$node_p->init(allows_streaming => 'logical'); > +$node_p->start; Fixed. > 10) Similarly we can use node_f instead of F in the comments. > +# Set up node F as about-to-fail node > +# Force it to initialize a new cluster instead of copying a > +# previously initdb'd cluster. > +{ > + local $ENV{'INITDB_TEMPLATE'} = undef; > + > + $node_f = PostgreSQL::Test::Cluster->new('node_f'); > + $node_f->init(allows_streaming => 'logical'); > + $node_f->start; > Fixed. Also, recent commit [1] allows to run the initdb forcibly. So followed. New patch can be available in [2]. [1]: https://github.com/postgres/postgres/commit/ff9e1e764fcce9a34467d614611a34d4d2a91b50 [2]: https://www.postgresql.org/message-id/TYCPR01MB12077CD76B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
RE: speed up a logical replica setup
Dear Vignesh, > Few comments: > 1) The below code can lead to assertion failure if the publisher is > stopped while dropping the replication slot: > + if (primary_slot_name != NULL) > + { > + conn = connect_database(dbinfo[0].pubconninfo); > + if (conn != NULL) > + { > + drop_replication_slot(conn, [0], > primary_slot_name); > + } > + else > + { > + pg_log_warning("could not drop replication > slot \"%s\" on primary", > + primary_slot_name); > + pg_log_warning_hint("Drop this replication > slot soon to avoid retention of WAL files."); > + } > + disconnect_database(conn); > + } > > pg_createsubscriber: error: connection to database failed: connection > to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or > directory > Is the server running locally and accepting connections on that socket? > pg_createsubscriber: warning: could not drop replication slot > "standby_1" on primary > pg_createsubscriber: hint: Drop this replication slot soon to avoid > retention of WAL files. > pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database: > Assertion `conn != ((void *)0)' failed. > Aborted (core dumped) > > This is happening because we are calling disconnect_database in case > of connection failure case too which has the following assert: > +static void > +disconnect_database(PGconn *conn) > +{ > + Assert(conn != NULL); > + > + PQfinish(conn); > +} Right. disconnect_database() was moved to if (conn != NULL) block. > 2) There is a CheckDataVersion function which does exactly this, will > we be able to use this: > + /* Check standby server version */ > + if ((ver_fd = fopen(versionfile, "r")) == NULL) > + pg_fatal("could not open file \"%s\" for reading: %m", > versionfile); > + > + /* Version number has to be the first line read */ > + if (!fgets(rawline, sizeof(rawline), ver_fd)) > + { > + if (!ferror(ver_fd)) > + pg_fatal("unexpected empty file \"%s\"", versionfile); > + else > + pg_fatal("could not read file \"%s\": %m", > versionfile); > + } > + > + /* Strip trailing newline and carriage return */ > + (void) pg_strip_crlf(rawline); > + > + if (strcmp(rawline, PG_MAJORVERSION) != 0) > + { > + pg_log_error("standby server is of wrong version"); > + pg_log_error_detail("File \"%s\" contains \"%s\", > which is not compatible with this program's version \"%s\".", > + versionfile, > rawline, PG_MAJORVERSION); > + exit(1); > + } > + > + fclose(ver_fd); > 3) Should this be added to typedefs.list: > +enum WaitPMResult > +{ > + POSTMASTER_READY, > + POSTMASTER_STILL_STARTING > +}; But the comment from Peter E. [1] was opposite. I did not handle this. > 4) pgCreateSubscriber should be mentioned after pg_controldata to keep > the ordering consistency: > diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml > index aa94f6adf6..c5edd244ef 100644 > --- a/doc/src/sgml/reference.sgml > +++ b/doc/src/sgml/reference.sgml > @@ -285,6 +285,7 @@ > > > > + > This has been already pointed out by Peter E. I did not handle this. > 5) Here pg_replication_slots should be pg_catalog.pg_replication_slots: > + if (primary_slot_name) > + { > + appendPQExpBuffer(str, > + "SELECT 1 FROM > pg_replication_slots " > + "WHERE active AND > slot_name = '%s'", > + primary_slot_name); Fixed. > 6) Here pg_settings should be pg_catalog.pg_settings: > +* - max_worker_processes >= 1 + number of dbs to be converted > + > * > +*/ > + res = PQexec(conn, > +"SELECT setting FROM pg_settings > WHERE name IN (" > +"'max_logical_replication_workers', " > +"'max_replication_slots', " > +"'max_worker_processes', " > +"'primary_slot_name') " > +"ORDER BY name"); Fixed. New version can be available in [2] [1]: https://www.postgresql.org/message-id/3ee79f2c-e8b3-4342-857c-a31b87e1afda%40eisentraut.org [2]: https://www.postgresql.org/message-id/TYCPR01MB12077CD76B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
RE: speed up a logical replica setup
Dear Vignesh, > Few comments regarding the documentation: > 1) max_replication_slots information seems to be present couple of times: > > + > + The target instance must have > + linkend="guc-max-replication-slots">max_replication_slots me> > + and linkend="guc-max-logical-replication-workers">max_logical_replica > tion_workers > + configured to a value greater than or equal to the number of target > + databases. > + > > + > + > + The target instance must have > + linkend="guc-max-replication-slots">max_replication_slots me> > + configured to a value greater than or equal to the number of target > + databases and replication slots. > + > + Fixed. > 2) Can we add an id to prerequisites and use it instead of referring > to r1-app-pg_createsubscriber-1: > - pg_createsubscriber checks if the > given target data > - directory has the same system identifier than the source data directory. > - Since it uses the recovery process as one of the steps, it starts the > - target server as a replica from the source server. If the system > - identifier is not the same, > pg_createsubscriber will > - terminate with an error. > + Checks the target can be converted. In particular, things listed in > + above section > would be > + checked. If these are not met > pg_createsubscriber > + will terminate with an error. > Changed. > 3) The code also checks the following: > Verify if a PostgreSQL binary (progname) is available in the same > directory as pg_createsubscriber. > > But this is not present in the pre-requisites of documentation. I think it is quite trivial so that I did not add. > 4) Here we mention that the target server should be stopped, but the > same is not mentioned in prerequisites: > + Here is an example of using > pg_createsubscriber. > + Before running the command, please make sure target server is stopped. > + > +$ pg_ctl -D /usr/local/pgsql/data > stop > + > + Oh, it is opposite, it should NOT be stopped. Fixed. > 5) If there is an error during any of the pg_createsubscriber > operation like if create subscription fails, it might not be possible > to rollback to the earlier state which had physical-standby > replication. I felt we should document this and also add it to the > console message like how we do in case of pg_upgrade. Added. New version can be available in [1] [1]: https://www.postgresql.org/message-id/TYCPR01MB12077CD76B53F9CAE7AC0F5562%40TYCPR01MB12077.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/