On 2015-05-27 15:39:14 -0400, Robert Haas wrote: > On Mon, May 25, 2015 at 10:05 PM, Andres Freund <and...@anarazel.de> wrote: > > Hm. So we have a *occasional* stack size exceeded failure and an > > occasional spinlock error in test_shm_mq. I'm inclined to think that > > this is a shm_mq problem, and not a more general locking problem - it > > seems likely, but not guaranteed, that that'd have materialized > > elsewhere. > > I think the problem might be that the spinlock-based memory barrier is > not re-entrant. Suppose some kind of barrier operation is in process, > and we've acquired the dummy spnlock but not yet released it. Just > then, we receive a signal. Since the shm_mq code sets > set_latch_on_sigusr1, procsignal_sigusr1_handler will set MyLatch. > SetLatch now includes barrier operations, so we'll try to acquire and > release the spinlock despite already holding it. Oops.
Oh wow, that's bad, and could explain a couple of the problems we're seing. One possible way to fix is to replace the sequence with if (!TAS(spin)) S_UNLOCK();. But that'd mean TAS() has to be a barrier, even if the lock isn't free - which e.g. isn't the case for PowerPC's implementation :( -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers