On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.ma...@gmail.com> wrote: > > Please find v2. Changes are:
Thanks for the patch. Here are some comments. 1. Can we have a clear saying in the shmem variable who's syncing at the moment? Is it a slot sync worker or a backend via SQL function? Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;" typedef enum SlotSyncSource { SLOT_SYNC_NONE, SLOT_SYNC_WORKER, SLOT_SYNC_BACKEND, } SlotSyncSource; Then, the check in ShutDownSlotSync can be: + /* + * Return if neither the slot sync worker is running nor the function + * pg_sync_replication_slots() is executing. + */ + if ((SlotSyncCtx->pid == InvalidPid) && SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND) { 2. SyncReplicationSlots(WalReceiverConn *wrconn) { + /* + * Startup process signaled the slot sync to stop, so if meanwhile user + * has invoked slot sync SQL function, simply return. + */ + SpinLockAcquire(&SlotSyncCtx->mutex); + if (SlotSyncCtx->stopSignaled) + { + ereport(LOG, + errmsg("skipping slot synchronization as slot sync shutdown is signaled during promotion")); + Unless I'm missing something, I think this can't detect if the backend via SQL function is already half-way through syncing in synchronize_one_slot. So, better move this check to (or also have it there) slot sync loop that calls synchronize_one_slot. To avoid spinlock acquisitions, we can perhaps do this check in when we acquire the spinlock for synced flag. /* Search for the named slot */ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) { bool synced; SpinLockAcquire(&slot->mutex); synced = slot->data.synced; << get SlotSyncCtx->stopSignaled here >> SpinLockRelease(&slot->mutex); << do the slot sync skip check here if (stopSignaled) >> 3. Can we have a test or steps at least to check the consequences manually one can get if slot syncing via SQL function is happening during the promotion? IIUC, we need to ensure there is no backend acquiring it and performing sync while the slot sync worker is shutting down/standby promotion is occuring. Otherwise, some of the slots can get resynced and some are not while we are shutting down the slot sync worker as part of the standby promotion which might leave the slots in an inconsistent state. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com