Re: Synchronizing slots from primary to standby

Amit Kapila Tue, 20 Feb 2024 05:26:15 -0800

On Tue, Feb 20, 2024 at 6:19 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> Thank you for the explanation. It makes sense to me to move the check.
>
> As for ValidateSlotSyncParams() called by SlotSyncWorkerAllowed(), I
> have two comments:
>
> 1. The error messages are not very descriptive and seem not to match
> other messages the postmaster says. When starting the standby server
> with misconfiguration about the slotsync, I got the following messages
> from the postmaster:
>
> 2024-02-20 17:01:16.356 JST [456741] LOG:  database system is ready to
> accept read-only connections
> 2024-02-20 17:01:16.358 JST [456741] LOG:  bad configuration for slot
> synchronization
> 2024-02-20 17:01:16.358 JST [456741] HINT:  "hot_standby_feedback"
> must be enabled.
>
> It says "bad configuration" but is still working, and does not say
> further information such as whether it skipped to start the slotsync
> worker etc. I think these messages could work for the slotsync worker
> but we might want to have more descriptive messages for the
> postmaster. For example, "skipped starting slot sync worker because
> hot_standby_feedback is disabled".
>


We are planning to change it to something like:"slot synchronization
requires hot_standby_feedback to be enabled". See [1]

> 2. If the wal_level is not logical, the server will need to restart
> anyway to change the wal_level and have the slotsync worker work. Does
> it make sense to have the postmaster exit if the wal_level is not
> logical and sync_replication_slots is enabled? For instance, we have
> similar checks in PostmsaterMain():
>
>     if (summarize_wal && wal_level == WAL_LEVEL_MINIMAL)
>         ereport(ERROR,
>                 (errmsg("WAL cannot be summarized when wal_level is
> \"minimal\"")));
>

+1. I think giving an error in this case makes sense.

Miscellaneous comments:
========================
1.
+void
+ShutDownSlotSync(void)
+{
+ SpinLockAcquire(&SlotSyncCtx->mutex);
+
+ SlotSyncCtx->stopSignaled = true;

This flag is never reset back. I think we should reset this once the
promotion is complete. Though offhand, I don't see any problem with
this but it doesn't look clean and can be a source of bugs in the
future.

2.
+char *
+CheckDbnameInConninfo(void)
 {
  char    *dbname;

Let's name this function as CheckAndGetDbnameFromConninfo().

Apart from the above, I have made cosmetic changes in the attached.

[1] - 
https://www.postgresql.org/message-id/CAJpy0uBWomyAjP0zyFdzhGxn%2BXsAb2OdJA%2BKfNyZRv2nV6PD9g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

diff --git a/src/backend/replication/logical/slotsync.c 
b/src/backend/replication/logical/slotsync.c
index d139d53173..c133fed6c2 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -1389,7 +1389,7 @@ ShutDownSlotSync(void)
 /*
  * SlotSyncWorkerCanRestart
  *
- * Returns true if enough time has passed (SLOTSYNC_RESTART_INTERVAL_SEC)
+ * Returns true if enough time (SLOTSYNC_RESTART_INTERVAL_SEC) has passed
  * since it was launched last. Otherwise returns false.
  *
  * This is a safety valve to protect against continuous respawn attempts if the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a2269c46f7..7e9bdf9e33 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1242,14 +1242,13 @@ restart:
                 * happening here. The persistent synced slots are thus safe 
but there
                 * is a possibility that the slot sync worker has created a 
temporary
                 * slot (which stays active even on release) and we are trying 
to drop
-                * the same here. In practice, the chances of hitting this 
scenario is
-                * very less as during slot synchronization, the temporary slot 
is
-                * immediately converted to persistent and thus is safe due to 
the
-                * shared lock taken on the database. So for the time being, 
we'll
-                * just bail out in such a scenario.
+                * the here. In practice, the chances of hitting this scenario 
are less
+                * as during slot synchronization, the temporary slot is 
immediately
+                * converted to persistent and thus is safe due to the shared 
lock
+                * taken on the database. So, we'll just bail out in such a 
case.
                 *
-                * XXX: If needed, we can consider shutting down slot sync 
worker
-                * before trying to drop synced temporary slots here.
+                * XXX: We can consider shutting down the slot sync worker 
before
+                * trying to drop synced temporary slots here.
                 */
                if (active_pid)
                        ereport(ERROR,

Re: Synchronizing slots from primary to standby

Reply via email to