On Wed, Feb 3, 2021 at 03:29:13PM -0800, Andres Freund wrote: > > Is the above case valid, and would it cause two full page writes to WAL? > > More specifically, wouldn't it cause every write of the page to the file > > system to use a new LSN? > > No. 8) won't happen. Look e.g. at XLogSaveBufferForHint(): > > /* > * Update RedoRecPtr so that we can make the right decision > */ > RedoRecPtr = GetRedoRecPtr(); > > /* > * We assume page LSN is first data on *every* page that can be passed to > * XLogInsert, whether it has the standard page layout or not. Since we're > * only holding a share-lock on the page, we must take the buffer header > * lock when we look at the LSN. > */ > lsn = BufferGetLSNAtomic(buffer); > > if (lsn <= RedoRecPtr) > /* wal log hint bit */ > > The RedoRecPtr is determined at 1. and doesn't change between 4) and > 8). The LSN for 4) has to be *past* the RedoRecPtr from 1). Therefore we > don't do another FPW.
OK, so, what is happening is that it knows the page LSN is after the start of the current checkpoint (the redo point), so it knows not do to a full page write again? Smart, and makes sense. > Changing this is *completely* infeasible. In a lot of workloads it'd > cause a *massive* explosion of WAL volume. Like quadratically. You'll > need to find another way to generate a nonce. Do we often do multiple writes to the file system of the same page during a single checkpoint, particularly only-hint-bit-modified pages? I didn't think so. > In the non-hint bit case you'll automatically have a higher LSN in 7/8 > though. So you won't need to do anything about getting a higher nonce. Yes, I was counting on that. :-) > For the hint bit case in 8 you could consider just using any LSN generated > after 4 (preferably already flushed to disk) - but that seems somewhat > ugly from a debuggability POV :/. Alternatively you could just create > tiny WAL record to get a new LSN, but that'll sometimes trigger new WAL > flushes when the pages are dirtied. Yes, that would make sense. I do need the first full page write during a checkpoint to be sure I don't have torn pages that have some part of the page encrypted with one LSN and a second part with a different LSN. You are right that I don't need a second full page write during the same checkpoint because a torn page would just restore the first full page write and throw away the second LSN and hint bit changes, which is fine. I hadn't gotten to ask about that until I found if the previous assumptions were true, which they were not. Is the logical approach here to modify XLogSaveBufferForHint() so if a page write is not needed, to create a dummy WAL record that just increments the WAL location and updates the page LSN? (Is there a small WAL record I should reuse?) I can try to add a hint-bit-page-write page counter, but that might overflow, and then we will need a way to change the LSN anyway. I am researching this so I can give a clear report on the impact of adding this feature. I will update the wiki once we figure this out. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee