On Fri, Apr 12, 2024 at 4:18 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Apr 12, 2024 at 7:57 AM shveta malik <shveta.ma...@gmail.com> wrote: > > > > On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > > > On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.ma...@gmail.com> > > > wrote: > > > > > > > > Please find v2. Changes are: > > > > > > Thanks for the patch. Here are some comments. > > > > Thanks for reviewing. > > > > > > 1. Can we have a clear saying in the shmem variable who's syncing at > > > the moment? Is it a slot sync worker or a backend via SQL function? > > > Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;" > > > > > > typedef enum SlotSyncSource > > > { > > > SLOT_SYNC_NONE, > > > SLOT_SYNC_WORKER, > > > SLOT_SYNC_BACKEND, > > > } SlotSyncSource; > > > > > > Then, the check in ShutDownSlotSync can be: > > > > > > + /* > > > + * Return if neither the slot sync worker is running nor the > > > function > > > + * pg_sync_replication_slots() is executing. > > > + */ > > > + if ((SlotSyncCtx->pid == InvalidPid) && > > > SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND) > > > { > > > > > I don't know if this will be help, especially after fixing the race > condition I mentioned. But otherwise, also, at this stage it doesn't > seem helpful to add the source of sync explicitly. >
Agreed. Please find v3 addressing race-condition and one other comment. Up-thread it was suggested that, probably, checking SlotSyncCtx->syncing should be sufficient in ShutDownSlotSync(). On re-thinking, it might not be. Slot sync worker sets and resets 'syncing' with each sync-cycle, and thus we need to rely on worker's pid in ShutDownSlotSync(), as there could be a window where promotion is triggered and 'syncing' is not set for worker, while the worker is still running. This implementation of setting and resetting syncing with each sync-cycle looks better as compared to setting syncing during the entire life-cycle of the worker. So, I did not change it. To fix the race condition, I moved the setting of the 'syncing' flag together with the 'stopSignaled' check under the same spinLock for the SQL function. OTOH, for worker, I feel it is good to check 'stopSignaled' at the beginning itself, while retaining the setting/resetting of 'syncing' at a later stage during the actual sync cycle. This makes handling for SQL function and worker slightly different. And thus to achieve this, I had to take the 'syncing' flag handling out of synchronize_slots() and move it to both worker and SQL function by introducing 2 new functions check_and_set_syncing_flag() and reset_syncing_flag(). I am analyzing if there are better ways to achieve this, any suggestions are welcome. thanks Shveta
v3-0001-Handle-stopSignaled-during-sync-function-call.patch
Description: Binary data