Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Andres Freund Sun, 05 Sep 2021 13:28:31 -0700

On 2021-09-05 14:22:51 +0530, Dilip Kumar wrote:
> On Sat, Sep 4, 2021 at 3:24 AM Andres Freund <and...@anarazel.de> wrote:
> 
> > Hi,
> >
> > On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote:
> > > Yeah, we can surely lock the relation as described by Robert, but IMHO,
> > > while creating the database we are already holding the exclusive lock on
> > > the database and there is no one else allowed to be connected to the
> > > database, so do we actually need to bother about the lock for the
> > > correctness?
> >
> > The problem is that checkpointer, bgwriter, buffer reclaim don't care about
> > the database of the buffer they're working on... The exclusive lock on the
> > database doesn't change anything about that.
> 
> 
> But these directly operate on the buffers and In my patch, whether we are
> reading the pg_class for identifying the relfilenode or we are copying the
> relation block by block we are always holding the lock on the buffer.


I don't think a buffer lock is really sufficient. See e.g. code like:

static void
InvalidateBuffer(BufferDesc *buf)
{
...
        /*
         * We assume the only reason for it to be pinned is that someone else is
         * flushing the page out.  Wait for them to finish.  (This could be an
         * infinite loop if the refcount is messed up... it would be nice to 
time
         * out after awhile, but there seems no way to be sure how many loops 
may
         * be needed.  Note that if the other guy has pinned the buffer but not
         * yet done StartBufferIO, WaitIO will fall through and we'll 
effectively
         * be busy-looping here.)
         */
        if (BUF_STATE_GET_REFCOUNT(buf_state) != 0)
        {
                UnlockBufHdr(buf, buf_state);
                LWLockRelease(oldPartitionLock);
                /* safety check: should definitely not be our *own* pin */
                if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf)) > 0)
                        elog(ERROR, "buffer is pinned in InvalidateBuffer");
                WaitIO(buf);
                goto retry;
        }

IOW, currently we assume that you're only allowed to pin a block in a relation
while you hold a lock on the relation. It might be a good idea to change that,
but it's not as trivial as one might think - consider e.g. dropping a
relation's buffers while holding an exclusive lock: If there's potential
concurrent reads of that buffer we'd be in trouble.


> 3. While copying the relation whether to use the bufmgr or directly use the
> smgr?
> 
> If we use the bufmgr then maybe we can avoid flushing some of the buffers
> to the disk and save some I/O but in general we copy from the template
> database so there might not be a lot of dirty buffers and we might not save
> anything

I would assume the big benefit would be that the *target* database does not
have to be written out / shared buffer is immediately populated.

Greetings,

Andres Freund

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Reply via email to