Robert Haas <robertmh...@gmail.com> writes: > On Wed, Sep 9, 2020 at 10:07 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> bgworker_die (SIGTERM) >> Calls ereport(FATAL). This is surely not any safer than, say, >> quickdie(). No, it's worse, because at least that won't try >> to go out via proc_exit().
> I think bgworker_die() is pretty much a terrible idea. Yeah. It'd likely be better to insist that bgworkers handle SIGTERM the same way backends do, via CHECK_FOR_INTERRUPTS. Not sure how big a change that would be. > I think that the only way this could actually > be safe is if you have a background worker that never uses ereport() > itself, so that the ereport() in the signal handler can't be > interrupting one that's already happening. That doesn't begin to make it safe, because quite aside from anything that happens in elog.c, we're then going to go out via proc_exit() which will invoke who knows what. >> StandbyDeadLockHandler (from SIGALRM) >> StandbyTimeoutHandler (ditto) >> Calls CancelDBBackends, which just for starters tries to acquire >> an LWLock. I think the only reason we've gotten away with this >> for this long is the high probability that by the time either >> timeout fires, we're going to be blocked on a semaphore. > Yeah, I'm not sure these are so bad. In fact, in the deadlock case, I > believe the old coding was designed to make sure we *had to* be > blocked on a semaphore, but I'm not sure whether that's still true. Looking at this closer, the only code that could get interrupted by these timeouts is a call to ProcWaitForSignal, which is void ProcWaitForSignal(uint32 wait_event_info) { (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0, wait_event_info); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); } Interrupting the latch operations might be safe enough, although now I'm casting a side-eye at Munro's recent changes to share WaitEvent state all over the place. I wonder whether blocking on an LWLock inside the signal handler will break an interrupted WaitLatch. Also, man that CHECK_FOR_INTERRUPTS() looks like trouble. Could we take that out? regards, tom lane