Re: Eager page freeze criteria clarification

Peter Geoghegan Fri, 28 Jul 2023 12:00:46 -0700

On Fri, Jul 28, 2023 at 11:13 AM Melanie Plageman
<[email protected]> wrote:
>     if (pagefrz.freeze_required || tuples_frozen == 0 ||
>         (prunestate->all_visible && prunestate->all_frozen &&
>          fpi_before != pgWalUsage.wal_fpi))
>     {
>
> I'm trying to understand the condition fpi_before !=
> pgWalUsage.wal_fpi -- don't eager freeze if pruning emitted an FPI.


You mean "prunestate->all_visible && prunestate->all_frozen", which is
a condition of applying FPI-based eager freezing, but not traditional
lazy freezing?

Obviously, the immediate impact of that is that the FPI trigger
condition is not met unless we know for sure that the page will be
marked all-visible and all-frozen in the visibility map afterwards. A
page that's eligible to become all-visible will also be seen as
eligible to become all-frozen in the vast majority of cases, but there
are some rare and obscure cases involving MultiXacts that must be
considered here. There is little point in freezing early unless we
have at least some chance of not having to freeze the same page again
in the future (ideally forever).

There is generally no point in freezing most of the tuples on a page
when there'll still be one or more not-yet-eligible unfrozen tuples
that get left behind -- we might as well not bother with freezing at
all when we see a page where that'll be the outcome from freezing.
However, there is (once again) a need to account for rare and extreme
cases -- it still needs to be possible to do that. Specifically, we
may be forced to freeze a page that's left with some remaining
unfrozen tuples when VACUUM is fundamentally incapable of freezing
them due to its OldestXmin/removable cutoff being too old. That can
happen when VACUUM needs to freeze according to the traditional
age-based settings, and yet the OldestXmin/removable cutoff gets held
back (by a leaked replication slot or whatever).

(Actually, VACUUM FREEZE freeze will freeze only a subset of the
tuples from some heap pages far more often. VACUUM FREEZE seems like a
bad design to me, though -- it uses the most aggressive possible XID
cutoff for freezing when it should probably hold off on freezing those
individual pages where we determine that it makes little sense. We
need to focus more on physical pages and their costs, and less on XID
cutoffs.)

> Is this test meant to guard against unnecessary freezing or to avoid
> freezing when the cost is too high? That is, are we trying to
> determine how likely it is that the page has been recently modified
> and avoid eager freezing when it would be pointless (because the page
> will soon be modified again)?

Sort of. This cost of freezing over time is weirdly nonlinear, so it's
hard to give a simple answer.

The justification for the FPI trigger optimization is that FPIs are
overwhelmingly the cost that really matters when it comes to freezing
(and vacuuming in general) -- so we might as well make the best out of
a bad situation when pruning happens to get an FPI. There can easily
be a 10x or more cost difference (measured in total WAL volume)
between freezing without an FPI and freezing with an FPI.

> Or are we trying to determine how likely
> the freeze record is to emit an FPI and avoid eager freezing when it
> isn't worth the cost?

No, that's not something that we're doing right now (we just talked
about doing something like that). In 16 VACUUM just "makes the best
out of a bad situation" when an FPI was already required during
pruning. We have already "paid for the privilege" of writing some WAL
for the page at that point, so it's reasonable to not squander a
window of opportunity to avoid future FPIs in future VACUUM
operations, by freezing early.

We're "taking a chance" on being able to get freezing out of the way
early when an FPI triggers freezing. It's not guaranteed to work out
in each individual case, of course, but even if we assume it's fairly
unlikely to work out (which is very pessimistic) it's still very
likely a good deal.

This strategy (the 16 strategy of freezing eagerly because we already
got an FPI) seems safer than a strategy involving freezing eagerly
because we won't get an FPI as a result. If for no other reason than
this: with the approach in 16 we already know for sure that we'll have
written an FPI anyway. It's hard to imagine somebody being okay with
the FPIs, but not being okay with the other extra WAL.

> But, the final rationale is still not clear to me. Could we add a
> comment above the if condition specifying both:
> a) what the test is a proxy for
> b) the intended outcome (when do we expect to eager freeze)
> And perhaps we could even describe a scenario where this heuristic is 
> effective?

There are lots of scenarios where it'll be effective. I agree that
there is a need to document this stuff a lot better. I have a pending
doc patch that overhauls the user-facing docs in this area.

My latest draft is here:

https://postgr.es/m/CAH2-Wz=UUJz+MMb1AxFzz-HDA=1t1fx_kmrdovopzxkpa-t...@mail.gmail.com
https://www.postgresql.org/message-id/attachment/146830/routine-vacuuming.html

I've been meaning to get back to that, but other commitments have kept
me from it. I'd welcome your involvement with that effort.

--
Peter Geoghegan

Re: Eager page freeze criteria clarification

Reply via email to