On Tue, Dec 20, 2022 at 5:44 PM Jeff Davis <pg...@j-davis.com> wrote: > Comments on 0002: > > Can you explain the following portion of the diff: > > > - else if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff)) > + else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact)) > > ... > > + /* Can't violate the MultiXactCutoff invariant, either */ > + if (!need_replace) > + need_replace = MultiXactIdPrecedes( > + multi, cutoffs->MultiXactCutoff);
Don't forget the historic context: before Postgres 15's commit 0b018fab, VACUUM's final relfrozenxid always came from FreezeLimit. Almost all of this code predates that work. So the general idea that you can make a "should I freeze or should I ratchet back my relfrozenxid tracker instead?" trade-off at the level of individual tuples and pages is still a very new one. Right now it's only applied within lazy_scan_noprune(), but 0002 leverages the same principles here. Before now, these heapam.c freezing routines had cutoffs called cutoff_xid and cutoff_multi. These had values that actually came from vacuumlazy.c's FreezeLimit and MultiXactCutoff cutoffs (which was rather unclear). But cutoff_xid and cutoff_multi were *also* used as inexact proxies for OldestXmin and OldestMxact (also kind of unclear, but true). For example, there are some sanity checks in heapam.c that kind of pretend that cutoff_xid is OldestXmin, even though it usually isn't the same value (it can be, but only during VACUUM FREEZE, or when the min freeze age is 0 in some other way). So 0002 teaches the same heapam.c code about everything -- about all of the different cutoffs, and about the true requirements of VACUUM around relfrozenxid advancement. In fact, 0002 makes vacuumlazy.c cede a lot of control of "XID stuff" to the same heapam.c code, freezing it up to think about freezing as something that works at the level of physical pages. This is key to allowing vacuumlazy.c to reason about freezing at the level of the whole table. It thinks about physical blocks, leaving logical XIDs up to heapam.c code. This business that you asked about in FreezeMultiXactId() is needed so that we can allow vacuumlazy.c to "think in terms of physical pages", while at the same time avoiding allocating new Multis in VACUUM -- which requires "thinking about individual xmax fields" instead -- a somewhat conflicting goal. We're really trying to have it both ways (we get page-level freezing, with a little tuple level freezing on the side, sufficient to to avoid allocating new Multis during VACUUMs in roughly the same way as we do right now). In most cases "freezing a page" removes all XIDs < OldestXmin, and all MXIDs < OldestMxact. It doesn't quite work that way in certain rare cases involving MultiXacts, though. It is convenient to define "freeze the page" in a way that gives heapam.c's FreezeMultiXactId() the leeway to put off the work of processing an individual tuple's xmax, whenever it happens to be a MultiXactId that would require an expensive second pass to process aggressively (allocating a new Multi during VACUUM is especially worth avoiding here). Our definition of "freeze the page" is a bit creative, at least if you're used to thinking about it in terms of strict XID-wise cutoffs like OldestXmin/FreezeLimit. But even if you do think of it in terms of XIDs, the practical difference is extremely small in practice. FreezeMultiXactId() effectively makes a decision on how to proceed with processing at the level of each individual xmax field. Its no-op multi processing "freezes" an xmax in the event of a costly-to-process xmax on a page when (for whatever reason) page-level freezing is triggered. If, on the other hand, page-level freezing isn't triggered for the page, then page-level no-op processing takes care of the multi for us instead. Either way, the remaining Multi will ratchet back VACUUM's relfrozenxid and/or relminmxid trackers as required, and we won't need an expensive second pass over the multi (unless we really have no choice, for example during a VACUUM FREEZE, where OldestXmin==FreezeLimit). > Regarding correctness, it seems like the basic structure and invariants > are the same, and it builds on the changes already in 9e5405993c. Patch > 0002 seems *mostly* about making choices within the existing framework. > That gives me more confidence. You're right that it's the same basic invariants as before, of course. Turns out that those invariants can be pushed quite far. Though note that I kind of invented a new invariant (not really, sort of). Well, it's a postcondition, which is a sort of invariant: any scanned heap page that can be cleanup locked must never have any remaining XIDs < FreezeLimit, nor can any MXIDs < MultiXactCutoff remain. But a cleanup-locked page does *not* need to get rid of all XIDs < OldestXmin, nor MXIDs < OldestMxact. This flexibility is mostly useful because it allows lazy_scan_prune to just decide to not freeze. But, to a much lesser degree, it's useful because of the edge case with multis -- in general we might just need the same leeway when lazy_scan_prune "freezes the page". > That being said, it does push harder against the limits on both sides. > If I understand correctly, that means pages with wider distributions of > xids are going to persist longer, which could expose pre-existing bugs > in new and interesting ways. I don't think it's fundamentally different to what we're already doing in lazy_scan_noprune. It's just more complicated, because you have to tease apart slightly different definitions of freezing to understand code around FreezeMultiXactId(). This is more or less needed to provide maximum flexibility, where we delay decisions about what to do until the very last moment. > Next, the 'freeze_required' field suggests that it's more involved in > the control flow that causes freezing than it actually is. All it does > is communicate how the trackers need to be adjusted. The return value > of heap_prepare_freeze_tuple() (and underneath, the flags set by > FreezeMultiXactId()) are what actually control what happens. It would > be nice to make this more clear somehow. I'm not sure what you mean. Page-level freezing *doesn't* have to go ahead when freeze_required is not ever set to true for any tuple on the page (which is most of the time, in practice). lazy_scan_prune gets to make a choice about freezing the page, when the choice is available. Note also that the FRM_NOOP case happens when a call to FreezeMultiXactId() takes place that won't leave behind a freeze plan for the tuple (unless its xmin happens to necessitate a freeze plan for the same tuple). And yet, it will do useful work, needed iff the "freeze the page" path is ultimately taken by lazy_scan_prune -- FreezeMultiXactId() itself will ratchet back FreezePageRelfrozenXid/NewRelfrozenXid as needed to make everything safe. > The comment: > > /* > * If we freeze xmax, make absolutely sure that it's not an XID that > * is important. (Note, a lock-only xmax can be removed independent > * of committedness, since a committed lock holder has released the > * lock). > */ > > caused me to go down a rabbit hole looking for edge cases where we > might want to freeze an xmax but not an xmin; e.g. tup.xmax < > OldestXmin < tup.xmin or the related case where tup.xmax < RecentXmin < > tup.xmin. I didn't find a problem, so that's good news. This is an example of what I meant about the heapam.c code using a cutoff that actually comes from FreezeLimit, when it would be more sensible to use OldestXmin instead. > I also tried some pgbench activity along with concurrent vacuums (and > vacuum freezes) along with periodic verify_heapam(). No problems there. > > Did you already describe the testing you've done for 0001+0002 > specfiically? It's not radically new logic, but it would be good to try > to catch minor state-handling errors. Lots of stuff with contrib/amcheck, which, as you must already know, will notice when an XID/MXID is contained in a table whose relfrozenxid and/or relminmxid indicates that it shouldn't be there. (Though VACUUM itself does the same thing, albeit not as effectively.) Obviously the invariants haven't changed here. In many ways it's a very small set of changes. But in one or two ways it's a significant shift. It depends on how you think about it. -- Peter Geoghegan