On Sat, Jul 2, 2022 at 11:18 AM Andres Freund <and...@anarazel.de> wrote: > On 2022-07-01 13:14:23 -0700, Andres Freund wrote: > > - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can > > completely swamp the target(s) on a busy system. This surely is > > exascerbated > > by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling > > pace > > does seem like a bad idea. > > This one is also implicated in > https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql > and related issues. > > Besides being very short, it also seems like a bad idea to wait when we might > not need to? Seems we should only wait if we subsequently couldn't get the > lock? > > Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at > least I wouldn't expect given its name. > > > A minimal fix would be to increase the wait time, similar how it is done with > standbyWait_us? > > Medium term it seems we ought to set the startup process's latch when handling > a conflict, and use a latch wait. But avoiding races probably isn't quite > trivial.
Yeah, I had the same thought; it's easy to criticise the current collateral damage maximising design, but a whole project to come up with a good race-free precise design. We should do that, though. > I guess the reason we first get an ERROR and then a FATAL is that the second > iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit, > because we end up there after handling the first error? And that's a FATAL. > > I suspect that Thomas' fix will address at least part of this, as the check > whether we're still waiting for a lock will be made just before the error is > thrown. That seems right. > > reporting a FATAL error in process of reporting a FATAL error. Yeha. > > > > I assume this could lead to sending out the same message quite a few > > times. > > This seems like it needs to be fixed in elog.c. ISTM that at the very least we > ought to HOLD_INTERRUPTS() before the EmitErrorReport() for FATAL. That seems to make sense. About my patch... even though it solves a couple of problems now identified, I found an architectural problem that I don't have a solution for yet, which stopped me in my tracks a few weeks back. I need to find a way forward that is back-patchable. Recap: The basic concept here is to kick all "real work" out of signal handlers, because that work is unsafe in that context. So instead of deciding whether we need to cancel the current query at the next CFI by setting QueryCancelPending, we defer the whole decision to the next CFI. Sometimes the decision is that we don't need to do anything, and the CFI returns and execution continues normally. The problem is that there are a couple of parts of our tree that don't use a standard CFI, but are interrupted by looking for QueryCancelPending directly. syncrep.c is one, but I don't believe you could be blocked there while recovery is in progress, and regcomp.c is another. (There was a third case relating to that posix_fallocate() problem report you mentioned above, but 4518c798 removed that). The regular expression machinery is capable of consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re) frequently to avoid getting stuck. With the patch as it stands, that would never be true.