I notice CopyXLogRecordToWAL contains this loop (in the case where the record being copied is a switch):
while (CurrPos < EndPos) { /* initialize the next page (if not initialized already) */ WALInsertLockUpdateInsertingAt(CurrPos); AdvanceXLInsertBuffer(CurrPos, false); CurrPos += XLOG_BLCKSZ; } in which it calls, one page at a time, AdvanceXLInsertBuffer, which contains its own loop able to do a sequence of pages. A comment explains why: /* * We do this one page at a time, to make sure we don't deadlock * against ourselves if wal_buffers < XLOG_SEG_SIZE. */ I want to make sure I understand what the deadlock potential is in this case. AdvanceXLInsertBuffer will call WaitXLogInsertionsToFinish before writing any dirty buffer, and we do hold insertion slot locks (all of 'em, in the case of a log switch, because that makes XlogInsertRecord call WALInsertLockAcquireExclusive instead of just WALInsertLockAcquire for other record types). Does not the fact we hold all the insertion slots exclude the possibility that any dirty buffer (preceding the one we're touching) needs to be checked for in-flight insertions? I've been thinking along the lines of another parameter to AdvanceXLInsertBuffer to indicate when the caller is exactly this loop filling out the tail after a log switch (originally, to avoid filling in page headers). It now seems to me that, if AdvanceXLInsertBuffer has that information, it could also be safe for it to skip the WaitXLogInsertionsToFinish in that case. Would that eliminate the deadlock potential, and allow the loop in CopyXLogRecordToWAL to be replaced with a single call to AdvanceXLInsertBuffer and a single WALInsertLockUpdateInsertingAt ? Or have I overlooked some other subtlety? -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers