Dear Sawada-san, > I thought we could fix this issue by checking the number of in-use > logical slots while holding ReplicationSlotControlLock and > LogicalDecodingControlLock, but it seems we need to deal with another > race condition too between backends and startup processes at the end > of recovery. > > Currently the backend skips controlling logical decoding status if the > server is in recovery (by checking RecoveryInProgress()), but it's > possible that a backend process tries to drop a logical slot after the > startup process calling UpdateLogicalDecodingStatusEndOfRecovery() and > before accepting writes.
Right. I also verified on local and found that ReplicationSlotDropAcquired()->DisableLogicalDecodingIfNecessary() sometimes skips to modify the status because RecoveryInProgress is still false. > In this case, the backend ends up not > disabling logical decoding and it remains enabled. I think we would > somehow need to delay the logical decoding status change in this > period until the recovery completes. My primitive idea was to 1) keep startup acquiring the lock till end of recovery and 2) DisableLogicalDecodingIfNecessary() acquires lock before checking the recovery status, but it could not work well. Not sure but WaitForProcSignalBarrier() stucked if the process acquired LogicalDecodingControlLock lock.... Best regards, Hayato Kuroda FUJITSU LIMITED