Re: [HACKERS] RFC: bufmgr locking changes

Tom Lane Thu, 08 Jan 2004 17:13:14 -0800

Neil Conway <[EMAIL PROTECTED]> writes:
>     - to modify a BufferDesc's meta data (shared refcount, flags, tag,
>       etc.), one must hold the buffer's "meta data lock". Also, I
>       removed the existing "io_in_progress" lock; instead, we now hold
>       the buffer's meta data lock while doing buffer I/O.


The latter is a really bad idea IMHO.  The io_in_progress lock can be
held for eons (in CPU terms) and should not be blocking people who
simply want to bump their refcount up and down.  In particular, there is
no reason for an in-process write to block people who want read-only
access to the buffer.

It's possible that you could combine the io_in_progress lock with the
cntx_lock, but I doubt locking out access to the metadata during i/o
is a good idea.

>     - if a backend holds the BufMgrLock, it will never try to
>       LWLockAcquire() a per-buffer meta data lock, due to the risk of
>       deadlock (and the loss in concurrency if we got blocked waiting
>       while still holding the BufMgrLock). Instead it does a
>       LWLockConditionalAcquire() and handles an acquisition failure
>       sanely

Hmm, what's "sanely" exactly?  It seems to me that figuring out how to
not need to lock in this direction is a critical part of the redesign,
so you can't just handwave here and expect people to understand.

> - Remove SetBufferCommitInfoNeedsSave(). AFAICS, this is now
>   completely equivalent to WriteNoReleaseBuffer(), so I just removed
>   the former and replaced all the calls to it with calls to the later.

The reason I've kept the separation was as a form of documentation as to
the reason for each write.  Although they currently do the same thing,
that might not always be true.  I'd prefer not to eliminate the
distinction from the source code --- though I'd not object if you want
to make SetBufferCommitInfoNeedsSave a macro that invokes the other
routine.

> - Removed the following now-unused buffer flag bits: BM_VALID,
>   BM_DELETED, BM_JUST_DIRTIED, BM_IO_IN_PROGRESS, and shrunk the
>   'flags' field of the BufferDesc down to 8 bits (from 16)

I cannot believe that it's workable to remove BM_JUST_DIRTIED.  How are
you handling the race conditions that this was for?  I'm also
unconvinced that removing BM_IO_IN_PROGRESS is a good idea.

> - Make 'PrivateRefCount' an array of uint16s, rather than longs. This
>   saves 2 bits * shared_buffers per backend on 32-bit machines and 6
>   bits * shared_buffers per backend on some 64-bit machines. It means
>   a given backend can only pin a single buffer 65,636 times, but that
>   should be more than enough. Similarly, made LocalRefCount an array
>   of uint16s.

I think both of these are ill-considered micro-optimization.  How do you
know that the pin count can't exceed 64K?  Consider recursive plpgsql
functions for a likely counterexample.

>   I was thinking of adding overflow checking to the refcount increment
>   code to make sure we fail safely if a backend *does* try to exceed
>   this number of pins, but I can't imagine a scenario when it would
>   actually occur, so I haven't bothered.

Leaving the arrays as longs is a much saner approach IMHO.

> - Remove the BufferLocks array. AFAICS this wasn't actually necessary.

Please put that back.  It is there to avoid unnecessary acquisitions of
buffer locks during UnlockBuffers (which is executed during any
transaction abort).  Without it, you will be needing to lock every
single buffer during an abort in order to check its flags.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] RFC: bufmgr locking changes

Reply via email to