Hi, On 2019-08-20 21:02:18 +1200, Thomas Munro wrote: > Aside from code changes based on review (and I have more to come of > those), the attached experimental patchset (also at > https://github.com/EnterpriseDB/zheap/tree/undo) has a new protocol > that, I hope, allows for better concurrency, reliability and > readability, and removes a bunch of TODO notes about questionable > interlocking. However, I'm not quite done figuring out if the bufmgr > interaction is right and will be manageable on the undoaccess side, so > I'm hoping to get some feedback, not asking for anyone to rebase on > top of it yet. > > Previously, there were two LWLocks used to make sure that all > discarding was prevented while anyone was reading or writing data in > any part of an undo log, and (probably more problematically) vice > versa. Here's a new approach that removes that blocking: > > 1. Anyone is allowed to try to read or write data at any UndoRecPtr > that has been allocated, through the buffer pool (though you'd usually > want to check it with UndoRecPtrIsDiscarded() first, and only rely on > the system I'm describing to deal with races). > > 2. ReadBuffer() might return InvalidBuffer. This can happen for a > cache miss, if the smgrread implementation wants to indicate that the > buffer has been discarded/truncated and that is expected (md.c won't > ever do that, but undofile.c can).
Hm. This gives me a bit of a stomach ache. It somehow feels like a weird form of signalling. Can't quite put my finger on why it makes me feel queasy. > 3. UndoLogDiscard() uses DiscardBuffer() to invalidate any currently > unpinned buffers, and marks as BM_DISCARDED any that happen to be > pinned right now, so they can't be immediately invalidated. Such > buffers are never written back and are eligible for reuse on the next > clock sweep, even if they're written into by a backend that managed to > do that when we were trying to discard. Hm. When is it legitimate for a backend to write into such a buffer? I guess that's about updating the previous transaction's next pointer? Or progress info? > 5. Separating begin from discard allows the WAL logging for > UndoLogDiscard() to do filesystem actions before logging, and other > effects after logging, which have several nice properties if you work > through the various crash scenarios. Hm. ISTM we always need to log before doing some filesystem operation (see also my recent complaint Robert and I are discussing at the bottom of [1]). It's just that we can have a separate stage afterwards? [1] https://www.postgresql.org/message-id/CA%2BTgmoZc5JVYORsGYs8YnkSxUC%3DcLQF1Z%2BfcpH2TTKvqkS7MFg%40mail.gmail.com > So now I'd like to get feedback on the sanity of this scheme. I'm not > saying it doesn't have bugs right now -- I've been trying to figure > out good ways to test it and I'm not quite there yet -- but the > concept. One observation I have is that there were already code paths > in undoaccess.c that can tolerate InvalidBuffer in recovery, due to > the potentially different discard timing for DO vs REDO. I think > that's a point in favour of this scheme, but I can see that it's > inconvenient to have to deal with InvalidBuffer whenever you read. FWIW, I'm far from convinced that those are currently quite right. See discussion pointed to above. > I pulled in the latest code from undoprocessing as of today, and I > might be a bit confused about "Defect and enhancement in multi-log > support" some of which I have squashed into the make undolog patch. > BTW undoprocessing builds with initialized variable warnings in xact.c > on clang today. I've complained about that existance of that commit multiple times now. So far without any comments. Greetings, Andres Freund