Hi,
>
> I was trying to test this utility when 'sync_replication_slots' is on
> and it gets in an ERROR loop [1] and never finishes. Please find the
> postgresql.auto used on the standby attached. I think if the standby
> has enabled sync_slots, you need to pass dbname in
> GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> there are already synced slots on the standby (either due to
> 'sync_replication_slots' or users have used
> pg_sync_replication_slots() before invoking pg_createsubscriber),
> those would be retained as it is on new subscriber and lead to
> unnecessary WAL retention and dead rows.
>
> [1]
> 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> requires dbname to be specified in primary_conninfo

Hi,

I tested the scenario posted by Amit in [1], in which retaining synced
slots lead to unnecessary WAL retention and ERROR. This is raised as
the second open point in [2].
The steps to reproduce the issue:
(1) Setup physical replication with sync slot feature turned on by
setting sync_replication_slots = 'true' or using
pg_sync_replication_slots() on the standby node.
For physical replication setup, run pg_basebackup with -R  and -d option.
(2) Create a logical replication slot on primary node with failover
option as true. A corresponding slot is created on standby as part of
sync slot feature.
(3) Run pg_createsubscriber on standby node.
(4) On Checking for the replication slot on standby node, I noticed
that the logical slots created in step 2 are retained.
 I have attached the script to reproduce the issue.

I and Kuroda-san worked to resolve open points. Here are patches to
solve the second and third point in [2].
Patches proposed by Euler are also attached just in case, but they
were not modified.

v2-0001: not changed
v2-0002: not changed
v2-0003: ensures the slot sync is disabled during the conversion. This
resolves the second point.
v2-0004: drops sync slots which may be retained after running. This
resolves the second point.
v2-0005: removes misleading output messages in dry-run. This resolves
the third point.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1KdCb%2B5sjYu6qCMXXdCX1y_ihr8kFzMozq0%3DP%3DauYxgog%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/CAA4eK1J22UEfrqx222h5j9DQ7nxGrTbAa_BC%2B%3DmQXdXs-RCsew%40mail.gmail.com

Thanks and Regards,
Shlok Kyal
# cleanup
rm -rf ../primary ../standby primary.log standby.log

# setup primary node
./initdb -D ../primary
 
cat << EOF >> ../primary/postgresql.conf
wal_level = logical
EOF

./pg_ctl -D ../primary -l primary.log start
./psql -d postgres -c "CREATE table t1 (c1 int);"
./psql -d postgres -c "Insert into t1 values(1);"
./psql -d postgres -c "Insert into t1 values(2);"
./psql -d postgres -c "INSERT into t1 values(3);"
./psql -d postgres -c "INSERT into t1 values(4);"

# setup standby node with sync slot feature
./pg_basebackup -h localhost -X stream -v -W -R -S 'sb1' -C -D ../standby/ -d 'dbname=postgres' 

cat << EOF >> ../standby/postgresql.conf
sync_replication_slots = 'true'
hot_standby_feedback = 'on'
port = 9000
EOF

./pg_ctl -D ../standby -l standby.log start

# create a replication slot on primary with failover option
./psql -d postgres -c "SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot_2', 'test_decoding', false, false, true);"
sleep 2

# check if slots are synced on standby
./psql -d postgres -p 9000 -c "select * from pg_replication_slots;" 
./pg_ctl -D ../standby -l standby.log stop

# run pg_createsubscriber on standby
./pg_createsubscriber -D ../standby/ -p 9000 -P "host=localhost port=5432 dbname=postgres" -d postgres -v

# check if replication slots are retained after running pg_createsubscriber
./pg_ctl -D ../standby -l standby.log start
./psql -d postgres -p 9000 -c "select * from pg_replication_slots;" 

./pg_ctl -D ../standby -l standby.log stop
./pg_ctl -D ../primary -l primary.log stop

Attachment: v2-0001-Improve-the-code-that-checks-if-the-recovery-is-f.patch
Description: Binary data

Attachment: v2-0002-Improve-the-code-that-checks-if-the-primary-slot-.patch
Description: Binary data

Attachment: v2-0003-Disable-slot-sync-during-the-convertion.patch
Description: Binary data

Attachment: v2-0004-Drop-replication-slots-which-had-been-synchronize.patch
Description: Binary data

Attachment: v2-0005-Avoid-outputing-some-messages-in-dry_run-mode.patch
Description: Binary data

Reply via email to