On Fri, Dec 15, 2023 at 9:53 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > ... We've seen a system with ~30GB of files in there > (note: full/untruncated be would be 2³² xids × sizeof(uint64_t) = > 32GB). It's not just a gradual disk space leak: according to disk > space monitoring, this system suddenly wrote ~half of that data, which > I think must be the while loop in SerialAdd() zeroing out pages.
Attempt at an analysis of this rare anti-social I/O pattern: SerialAdd() writes zero pages in a range from the old headPage up to some target page, but headPage can be any number, arbitrarily far in the past (or apparently, the future). It only keeps up with the progress of the xid clock and spreads that work out if we happen to call SerialAdd() often enough. If we call SerialAdd() only every couple of billion xids (eg very occasionally you leave a transaction open and go out to lunch on a very busy system using SERIALIZABLE everywhere), you might find yourself suddenly needing to write out many gigabytes of zeroes there. One observation is that headPage gets periodically zapped to -1 by checkpoints, near the comment "SLRU is no longer needed", providing a periodic dice-roll that chops the range down. Unfortunately the historical "apparent wraparound" bug prevents that from being reached. That bug was fixed by commit d6b0c2b (master only, no back-patch). On the system where we saw pg_serial going bananas, that message appeared regularly. Attempts to find a solution: I think it might make sense to clamp firstZeroPage into the page range implied by tailXid, headXid. Those values are eagerly maintained and interlock with snapshots and global xmin (correctly but under-documented-ly, AFAICS so far), and we will never try to look up the CSN for any xid outside that range. I think that should exclude the pathological zero-writing cases. I wouldn't want to do this without a working reproducer though, which will take some effort. Another thought is that in the glorious 64 bit future, we might be able to invent a "sparse" SLRU, where if the file or page doesn't exist, we just return a zero CSN, and when we write a new page we just let the OS provide filesystem holes as required. The reason I wouldn't want to invent sparse SLRUs with 32 bit indexing is that we have no confidence in the truncation logic, which might leave stray files from earlier epochs. So I think we need zero'd pages (or perhaps at least to confirm that there is nothing already there, but I have zero desire to make the current wraparound-ridden system more complex).