RE: [Patch] Optimize dropping of relation buffers using dlist

[email protected] Wed, 18 Nov 2020 23:08:05 -0800

From: Andres Freund <[email protected]>
> DropRelFileNodeBuffers() in recovery? The most common path is
> DropRelationFiles()->smgrdounlinkall()->DropRelFileNodesAllBuffers(),
> which 3/4 doesn't address and 4/4 doesn't mention.
> 
> 4/4 seems to address DropRelationFiles(), but only talks about TRUNCATE?


Yes.  DropRelationFiles() is used in the following two paths:

[Replay of TRUNCATE during recovery]
xact_redo_commit/abort() -> DropRelationFiles()
 -> smgrdounlinkall() -> DropRelFileNodesAllBuffers()

[COMMIT/ROLLBACK PREPARED]
FinishPreparedTransaction() -> DropRelationFiles()
 -> smgrdounlinkall() -> DropRelFileNodesAllBuffers()



> I also don't get why 4/4 would be a good idea on its own. It uses
> BUF_DROP_FULL_SCAN_THRESHOLD to guard
> FindAndDropRelFileNodeBuffers() on a per relation basis. But since
> DropRelFileNodesAllBuffers() can be used for many relations at once, this
> could end up doing BUF_DROP_FULL_SCAN_THRESHOLD - 1 lookups a lot of
> times, once for each of nnodes relations?

So, the threshold value should be compared with the total number of blocks of 
all target relations, not each relation.  You seem to be right, got it.


> Also, how is 4/4 safe - this is outside of recovery too?

It seems that DropRelFileNodesAllBuffers() should trigger the new optimization 
path only when InRecovery == true, because it intentionally doesn't check the 
"accurate" value returned from smgrnblocks().


> Smaller comment:
> 
> +static void
> +FindAndDropRelFileNodeBuffers(RelFileNode rnode, ForkNumber *forkNum,
> int nforks,
> +                                                       BlockNumber
> *nForkBlocks, BlockNumber *firstDelBlock)
> ...
> +                     /* Check that it is in the buffer pool. If not, do 
> nothing.
> */
> +                     LWLockAcquire(bufPartitionLock, LW_SHARED);
> +                     buf_id = BufTableLookup(&bufTag, bufHash);
> ...
> +                     bufHdr = GetBufferDescriptor(buf_id);
> +
> +                     buf_state = LockBufHdr(bufHdr);
> +
> +                     if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) &&
> +                             bufHdr->tag.forkNum == forkNum[i] &&
> +                             bufHdr->tag.blockNum >= firstDelBlock[i])
> +                             InvalidateBuffer(bufHdr);       /* releases
> spinlock */
> +                     else
> +                             UnlockBufHdr(bufHdr, buf_state);
> 
> I'm a bit confused about the check here. We hold a buffer partition lock, and
> have done a lookup in the mapping table. Why are we then rechecking the
> relfilenode/fork/blocknum? And why are we doing so holding the buffer header
> lock, which is essentially a spinlock, so should only ever be held for very 
> short
> portions?
> 
> This looks like it's copying logic from DropRelFileNodeBuffers() etc, but 
> there
> the situation is different: We haven't done a buffer mapping lookup, and we
> don't hold a partition lock!

That's because the buffer partition lock is released immediately after the hash 
table has been looked up.  As an aside, InvalidateBuffer() requires the caller 
to hold the buffer header spinlock and doesn't hold the buffer partition lock.



Regards
Takayuki Tsunakawa

RE: [Patch] Optimize dropping of relation buffers using dlist

Reply via email to