On Tue, Feb 20, 2024 at 6:19 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > Thank you for the explanation. It makes sense to me to move the check. > > As for ValidateSlotSyncParams() called by SlotSyncWorkerAllowed(), I > have two comments: > > 1. The error messages are not very descriptive and seem not to match > other messages the postmaster says. When starting the standby server > with misconfiguration about the slotsync, I got the following messages > from the postmaster: > > 2024-02-20 17:01:16.356 JST [456741] LOG: database system is ready to > accept read-only connections > 2024-02-20 17:01:16.358 JST [456741] LOG: bad configuration for slot > synchronization > 2024-02-20 17:01:16.358 JST [456741] HINT: "hot_standby_feedback" > must be enabled. > > It says "bad configuration" but is still working, and does not say > further information such as whether it skipped to start the slotsync > worker etc. I think these messages could work for the slotsync worker > but we might want to have more descriptive messages for the > postmaster. For example, "skipped starting slot sync worker because > hot_standby_feedback is disabled". >
We are planning to change it to something like:"slot synchronization requires hot_standby_feedback to be enabled". See [1] > 2. If the wal_level is not logical, the server will need to restart > anyway to change the wal_level and have the slotsync worker work. Does > it make sense to have the postmaster exit if the wal_level is not > logical and sync_replication_slots is enabled? For instance, we have > similar checks in PostmsaterMain(): > > if (summarize_wal && wal_level == WAL_LEVEL_MINIMAL) > ereport(ERROR, > (errmsg("WAL cannot be summarized when wal_level is > \"minimal\""))); > +1. I think giving an error in this case makes sense. Miscellaneous comments: ======================== 1. +void +ShutDownSlotSync(void) +{ + SpinLockAcquire(&SlotSyncCtx->mutex); + + SlotSyncCtx->stopSignaled = true; This flag is never reset back. I think we should reset this once the promotion is complete. Though offhand, I don't see any problem with this but it doesn't look clean and can be a source of bugs in the future. 2. +char * +CheckDbnameInConninfo(void) { char *dbname; Let's name this function as CheckAndGetDbnameFromConninfo(). Apart from the above, I have made cosmetic changes in the attached. [1] - https://www.postgresql.org/message-id/CAJpy0uBWomyAjP0zyFdzhGxn%2BXsAb2OdJA%2BKfNyZRv2nV6PD9g%40mail.gmail.com -- With Regards, Amit Kapila.
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index d139d53173..c133fed6c2 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -1389,7 +1389,7 @@ ShutDownSlotSync(void) /* * SlotSyncWorkerCanRestart * - * Returns true if enough time has passed (SLOTSYNC_RESTART_INTERVAL_SEC) + * Returns true if enough time (SLOTSYNC_RESTART_INTERVAL_SEC) has passed * since it was launched last. Otherwise returns false. * * This is a safety valve to protect against continuous respawn attempts if the diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index a2269c46f7..7e9bdf9e33 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1242,14 +1242,13 @@ restart: * happening here. The persistent synced slots are thus safe but there * is a possibility that the slot sync worker has created a temporary * slot (which stays active even on release) and we are trying to drop - * the same here. In practice, the chances of hitting this scenario is - * very less as during slot synchronization, the temporary slot is - * immediately converted to persistent and thus is safe due to the - * shared lock taken on the database. So for the time being, we'll - * just bail out in such a scenario. + * the here. In practice, the chances of hitting this scenario are less + * as during slot synchronization, the temporary slot is immediately + * converted to persistent and thus is safe due to the shared lock + * taken on the database. So, we'll just bail out in such a case. * - * XXX: If needed, we can consider shutting down slot sync worker - * before trying to drop synced temporary slots here. + * XXX: We can consider shutting down the slot sync worker before + * trying to drop synced temporary slots here. */ if (active_pid) ereport(ERROR,