Hi,

On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote:
> Will do, but I'll wait for another lockup to see how frequent it
> actually is. I'm now at ~90 runs total, and it didn't happen again yet.
> So hitting it after 15 runs might have been a bit of a luck.

Was there a difference in how much load there was on the machine between
"reproduced in 15 runs" and "not reproed in 90"?  If indeed lack of barriers
is related to the issue, an increase in context switches could substantially
change the behaviour (in both directions).  More intra-process context
switches can amount to "probabilistic barriers" because that'll be a
barrier. At the same time it can make it more likely that the relatively
narrow window in WaitEventSetWait() is hit, or lead to larger delays
processing signals.

Greetings,

Andres Freund


Reply via email to