Hi, On 2024-04-11 16:11:40 -0400, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2024-04-11 15:24:28 -0400, Robert Haas wrote: > >> Or, rip out the whole, whole mechanism and just don't PANIC. > > > I continue believe that that'd be a quite bad idea. > > I'm warming to it myself. > > > My suspicion is that most of the false positives are caused by lots of > > signals > > interrupting the pg_usleep()s. Because we measure the number of delays, not > > the actual time since we've been waiting for the spinlock, signals > > interrupting pg_usleep() trigger can very significantly shorten the amount > > of > > time until we consider a spinlock stuck. We should fix that. > > We wouldn't need to fix it, if we simply removed the NUM_DELAYS > limit. Whatever kicked us off the sleep doesn't matter, we might > as well go check the spinlock.
I suspect we should fix it regardless of whether we keep NUM_DELAYS. We shouldn't increase cur_delay faster just because a lot of signals are coming in. If it were just user triggered signals it'd probably not be worth worrying about, but we do sometimes send a lot of signals ourselves... > Also, you propose in your other message replacing spinlocks with lwlocks. > Whatever the other merits of that, I notice that we have no timeout or > "stuck lwlock" detection. True. And that's not great. But at least lwlocks can be identified in pg_stat_activity, which does help some. Greetings, Andres Freund