Hi, On 2024-04-11 16:46:23 -0400, Robert Haas wrote: > On Thu, Apr 11, 2024 at 3:52 PM Andres Freund <and...@anarazel.de> wrote: > > My suspicion is that most of the false positives are caused by lots of > > signals > > interrupting the pg_usleep()s. Because we measure the number of delays, not > > the actual time since we've been waiting for the spinlock, signals > > interrupting pg_usleep() trigger can very significantly shorten the amount > > of > > time until we consider a spinlock stuck. We should fix that. > > I mean, go nuts. But <dons asbestos underpants, asbestos regular > pants, 2 pair of asbestos socks, 3 asbestos shirts, 2 asbestos > jackets, and then hides inside of a flame-proof capsule at the bottom > of the Pacific ocean> this is just another thing like query hints, > where everybody says "oh, the right thing to do is fix X or Y or Z and > then you won't need it". But of course it never actually gets fixed > well enough that people stop having problems in the real world. And > eventually we look like a developer community that cares more about > our own opinion about what is right than what the experience of real > users actually is.
I don't think that's a particularly apt comparison. If you have spinlocks that cannot be acquired within tens of seconds, you're in a really bad situation, regardless of whether you crash-restart or not. Whereas with hints, you might actually be operating perfectly normally when using hints. Never using the wrong plan is also just an order of magnitude harder and fuzzier problem than ensuring we don't wait for spinlocks for a long time. > In all seriousness, I'd really like to understand what experience > you've had that makes this check seem useful. Because I think all of > my experiences with it have been bad. If they weren't, the last good > one was a very long time ago. By far the most of the stuck spinlocks I've seen were due to bugs in out-of-core extensions. Absurdly enough, the next common thing probably is due to people using gdb to make an uninterruptible process break out of some code, without a crash-restart, accidentally doing so while a spinlock is held. Greetings, Andres Freund