On Tue, Feb 20, 2024 at 6:19 PM Masahiko Sawada <[email protected]> wrote:
>
> Thank you for the explanation. It makes sense to me to move the check.
>
> As for ValidateSlotSyncParams() called by SlotSyncWorkerAllowed(), I
> have two comments:
>
> 1. The error messages are not very descriptive and seem not to match
> other messages the postmaster says. When starting the standby server
> with misconfiguration about the slotsync, I got the following messages
> from the postmaster:
>
> 2024-02-20 17:01:16.356 JST [456741] LOG: database system is ready to
> accept read-only connections
> 2024-02-20 17:01:16.358 JST [456741] LOG: bad configuration for slot
> synchronization
> 2024-02-20 17:01:16.358 JST [456741] HINT: "hot_standby_feedback"
> must be enabled.
>
> It says "bad configuration" but is still working, and does not say
> further information such as whether it skipped to start the slotsync
> worker etc. I think these messages could work for the slotsync worker
> but we might want to have more descriptive messages for the
> postmaster. For example, "skipped starting slot sync worker because
> hot_standby_feedback is disabled".
>
We are planning to change it to something like:"slot synchronization
requires hot_standby_feedback to be enabled". See [1]
> 2. If the wal_level is not logical, the server will need to restart
> anyway to change the wal_level and have the slotsync worker work. Does
> it make sense to have the postmaster exit if the wal_level is not
> logical and sync_replication_slots is enabled? For instance, we have
> similar checks in PostmsaterMain():
>
> if (summarize_wal && wal_level == WAL_LEVEL_MINIMAL)
> ereport(ERROR,
> (errmsg("WAL cannot be summarized when wal_level is
> \"minimal\"")));
>
+1. I think giving an error in this case makes sense.
Miscellaneous comments:
========================
1.
+void
+ShutDownSlotSync(void)
+{
+ SpinLockAcquire(&SlotSyncCtx->mutex);
+
+ SlotSyncCtx->stopSignaled = true;
This flag is never reset back. I think we should reset this once the
promotion is complete. Though offhand, I don't see any problem with
this but it doesn't look clean and can be a source of bugs in the
future.
2.
+char *
+CheckDbnameInConninfo(void)
{
char *dbname;
Let's name this function as CheckAndGetDbnameFromConninfo().
Apart from the above, I have made cosmetic changes in the attached.
[1] -
https://www.postgresql.org/message-id/CAJpy0uBWomyAjP0zyFdzhGxn%2BXsAb2OdJA%2BKfNyZRv2nV6PD9g%40mail.gmail.com
--
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/logical/slotsync.c
b/src/backend/replication/logical/slotsync.c
index d139d53173..c133fed6c2 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -1389,7 +1389,7 @@ ShutDownSlotSync(void)
/*
* SlotSyncWorkerCanRestart
*
- * Returns true if enough time has passed (SLOTSYNC_RESTART_INTERVAL_SEC)
+ * Returns true if enough time (SLOTSYNC_RESTART_INTERVAL_SEC) has passed
* since it was launched last. Otherwise returns false.
*
* This is a safety valve to protect against continuous respawn attempts if the
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index a2269c46f7..7e9bdf9e33 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1242,14 +1242,13 @@ restart:
* happening here. The persistent synced slots are thus safe
but there
* is a possibility that the slot sync worker has created a
temporary
* slot (which stays active even on release) and we are trying
to drop
- * the same here. In practice, the chances of hitting this
scenario is
- * very less as during slot synchronization, the temporary slot
is
- * immediately converted to persistent and thus is safe due to
the
- * shared lock taken on the database. So for the time being,
we'll
- * just bail out in such a scenario.
+ * the here. In practice, the chances of hitting this scenario
are less
+ * as during slot synchronization, the temporary slot is
immediately
+ * converted to persistent and thus is safe due to the shared
lock
+ * taken on the database. So, we'll just bail out in such a
case.
*
- * XXX: If needed, we can consider shutting down slot sync
worker
- * before trying to drop synced temporary slots here.
+ * XXX: We can consider shutting down the slot sync worker
before
+ * trying to drop synced temporary slots here.
*/
if (active_pid)
ereport(ERROR,