Re: [HACKERS] Yes, WaitLatch is vulnerable to weak-memory-ordering bugs

Tom Lane Mon, 08 Aug 2011 09:39:58 -0700

Robert Haas <[email protected]> writes:
> On Sun, Aug 7, 2011 at 1:47 PM, Tom Lane <[email protected]> wrote:
>> Any protocol of that sort has *obviously* got a race condition between
>> changes of the latch state and changes of the separate flag state, if run
>> in a weak-memory-ordering machine.


> On the flip side, I'm not sure that anyone ever really expected that a
> latch could be safely used this way.  Normally, I'd expect the flag to
> be some piece of state protected by an LWLock, and I think that ought
> to be OK provided that the lwlock is released before setting or
> checking the latch.  Maybe we should try to document the contract here
> a little better; I think it's just that there must be a full memory
> barrier (such as LWLockRelease) in both processes involved in the
> interaction.

Heikki argued the same thing in
http://archives.postgresql.org/message-id/[email protected]
but I don't think anyone has actually done the legwork to verify that
current uses are safe.  So [ ... warms up grep ... ]

Currrent callers of WaitLatch(OrSocket):

XLogPageRead waits on XLogCtl->recoveryWakeupLatch

        latch is set by WakeupRecovery, which is called by process'
        own signal handlers (OK) and XLogWalRcvFlush.  The latter is OK
        because there's a spinlock protecting the transmitted data.

pgarch_MainLoop waits on mainloop_latch

        non-issue because latch is process-local

WalSndLoop waits on MyWalSnd->latch

        latch is set by signal handlers and WalSndWakeup.  The latter is
        OK because it's called right after XLogFlush (which certainly
        locks whatever it touches) or a spinlock release.

SyncRepWaitForLSN waits on MyProc->waitLatch

        latch is set by signal handlers and SyncRepWakeQueue.  The
        latter appears unsafe at first glance, because it sets the
        shared variable (thisproc->syncRepState) and immediately
        sets the latch.  However, the receiver does this curious dance:

                syncRepState = MyProc->syncRepState;
                if (syncRepState == SYNC_REP_WAITING)
                {
                        LWLockAcquire(SyncRepLock, LW_SHARED);
                        syncRepState = MyProc->syncRepState;
                        LWLockRelease(SyncRepLock);
                }

        which I believe makes it safe, though rather underdocumented;
        if a race does occur, the receiver will acquire the lock and
        recheck, and the lock acquisition will block until the caller of
        SyncRepWakeQueue has released SyncRepLock, and that release
        will flush any pending writes to shared memory.

Conclusions:

(1) 9.1 does not have a problem of this sort.

(2) Breathing hard on any of this code could break it.

(3) I'm not going to hold still for not inserting a memory barrier
into SetLatch, once we have the code.  Unsubstantiated performance
worries are not a sufficient argument not to --- as a wise man once
said, you can make the code arbitrarily fast if it doesn't have to
give the right answer.

(4) In the meantime, we'd better document that it's caller's
responsibility to ensure that the flag variable is adequately
protected by locking.  I think SyncRepWaitForLSN could use a warning
comment, too.  I will go do that.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Yes, WaitLatch is vulnerable to weak-memory-ordering bugs

Reply via email to