On Tuesday, November 14, 2023 10:27 PM Drouvot, Bertrand <bertranddrouvot...@gmail.com> wrote: > On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote: > > On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand > <bertranddrouvot...@gmail.com> wrote: > >> Yeah good point, agree to just error out in all the case then (if we > >> discard the sync_ reserved wording proposal, which seems to be the > >> case as probably not worth the extra work). > > > > Thanks for the discussion! > > > > Here is the V33 patch set which includes the following changes: > > Thanks for working on it! > > > > > 1) Drop slots with state 'i' in promotion flow after we shut down > > WalReceiver. > > @@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr > RecPtr, bool randAccess, > * this only after failure, so when you promote, we still > * finish replaying as much as we can from archive and > * pg_wal before failover. > + * > + * Drop the slots for which sync is initiated but not yet > + * completed i.e. they are still waiting for the primary > + * server to catch up. > */ > if (StandbyMode && CheckForStandbyTrigger()) > { > XLogShutdownWalRcv(); > + slotsync_drop_initiated_slots(); > return XLREAD_FAIL; > } > > I had a closer look and it seems this is not located at the right place. > > Indeed, it's added here: > > switch (currentSource) > { > case XLOG_FROM_ARCHIVE: > case XLOG_FROM_PG_WAL: > > While in our case we are in > > case XLOG_FROM_STREAM: > > So I think we should move slotsync_drop_initiated_slots() in the > XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker? > (the TODO item number 2 you mentioned up-thread)
Thanks for the comment. I feel the WaitForWALToBecomeAvailable may not be the best place to shutdown slotsync worker and drop slots. There could be other reasons(other than promotion) as mentioned in comments in case XLOG_FROM_STREAM to reach the code there. I thought if the intention is to stop slotsync workers on promotion, maybe FinishWalRecovery() is a better place to do it as it's indicating the end of recovery and XLogShutdownWalRcv is also called in it. And I feel we'd better drop the slots after shutting down the slotsync workers, because otherwise the slotsync workers could create the dropped slot again in rare cases. Best Regards, Hou zj