On Fri, Jul 28, 2023 at 11:13 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > if (pagefrz.freeze_required || tuples_frozen == 0 || > (prunestate->all_visible && prunestate->all_frozen && > fpi_before != pgWalUsage.wal_fpi)) > { > > I'm trying to understand the condition fpi_before != > pgWalUsage.wal_fpi -- don't eager freeze if pruning emitted an FPI.
You mean "prunestate->all_visible && prunestate->all_frozen", which is a condition of applying FPI-based eager freezing, but not traditional lazy freezing? Obviously, the immediate impact of that is that the FPI trigger condition is not met unless we know for sure that the page will be marked all-visible and all-frozen in the visibility map afterwards. A page that's eligible to become all-visible will also be seen as eligible to become all-frozen in the vast majority of cases, but there are some rare and obscure cases involving MultiXacts that must be considered here. There is little point in freezing early unless we have at least some chance of not having to freeze the same page again in the future (ideally forever). There is generally no point in freezing most of the tuples on a page when there'll still be one or more not-yet-eligible unfrozen tuples that get left behind -- we might as well not bother with freezing at all when we see a page where that'll be the outcome from freezing. However, there is (once again) a need to account for rare and extreme cases -- it still needs to be possible to do that. Specifically, we may be forced to freeze a page that's left with some remaining unfrozen tuples when VACUUM is fundamentally incapable of freezing them due to its OldestXmin/removable cutoff being too old. That can happen when VACUUM needs to freeze according to the traditional age-based settings, and yet the OldestXmin/removable cutoff gets held back (by a leaked replication slot or whatever). (Actually, VACUUM FREEZE freeze will freeze only a subset of the tuples from some heap pages far more often. VACUUM FREEZE seems like a bad design to me, though -- it uses the most aggressive possible XID cutoff for freezing when it should probably hold off on freezing those individual pages where we determine that it makes little sense. We need to focus more on physical pages and their costs, and less on XID cutoffs.) > Is this test meant to guard against unnecessary freezing or to avoid > freezing when the cost is too high? That is, are we trying to > determine how likely it is that the page has been recently modified > and avoid eager freezing when it would be pointless (because the page > will soon be modified again)? Sort of. This cost of freezing over time is weirdly nonlinear, so it's hard to give a simple answer. The justification for the FPI trigger optimization is that FPIs are overwhelmingly the cost that really matters when it comes to freezing (and vacuuming in general) -- so we might as well make the best out of a bad situation when pruning happens to get an FPI. There can easily be a 10x or more cost difference (measured in total WAL volume) between freezing without an FPI and freezing with an FPI. > Or are we trying to determine how likely > the freeze record is to emit an FPI and avoid eager freezing when it > isn't worth the cost? No, that's not something that we're doing right now (we just talked about doing something like that). In 16 VACUUM just "makes the best out of a bad situation" when an FPI was already required during pruning. We have already "paid for the privilege" of writing some WAL for the page at that point, so it's reasonable to not squander a window of opportunity to avoid future FPIs in future VACUUM operations, by freezing early. We're "taking a chance" on being able to get freezing out of the way early when an FPI triggers freezing. It's not guaranteed to work out in each individual case, of course, but even if we assume it's fairly unlikely to work out (which is very pessimistic) it's still very likely a good deal. This strategy (the 16 strategy of freezing eagerly because we already got an FPI) seems safer than a strategy involving freezing eagerly because we won't get an FPI as a result. If for no other reason than this: with the approach in 16 we already know for sure that we'll have written an FPI anyway. It's hard to imagine somebody being okay with the FPIs, but not being okay with the other extra WAL. > But, the final rationale is still not clear to me. Could we add a > comment above the if condition specifying both: > a) what the test is a proxy for > b) the intended outcome (when do we expect to eager freeze) > And perhaps we could even describe a scenario where this heuristic is > effective? There are lots of scenarios where it'll be effective. I agree that there is a need to document this stuff a lot better. I have a pending doc patch that overhauls the user-facing docs in this area. My latest draft is here: https://postgr.es/m/CAH2-Wz=UUJz+MMb1AxFzz-HDA=1t1fx_kmrdovopzxkpa-t...@mail.gmail.com https://www.postgresql.org/message-id/attachment/146830/routine-vacuuming.html I've been meaning to get back to that, but other commitments have kept me from it. I'd welcome your involvement with that effort. -- Peter Geoghegan