On Thu, Mar 24, 2022 at 3:28 PM Peter Geoghegan <p...@bowt.ie> wrote:
> But non-aggressive VACUUMs have always been able to do that.
>
> How about: "Set relfrozenxid to oldest extant XID seen by VACUUM"

Sure, that sounds nice.

> Believe it or not, I avoided functional changes in 0002 -- at least in
> one important sense. That's why you had difficulty spotting any. This
> must sound peculiar, since the commit message very clearly says that
> the commit avoids a problem seen only in the non-aggressive case. It's
> really quite subtle.

Well, I think the goal in revising the code is to be as un-subtle as
possible. Commits that people can't easily understand breed future
bugs.

> What you're saying here boils down to this: it doesn't matter what the
> visibility map would say right this microsecond (in the aggressive
> case) were we to call VM_ALL_FROZEN(): we know for sure that the VM
> said that this page was all-frozen *in the recent past*. That's good
> enough; we will never fail to scan a page that might have an XID <
> OldestXmin (ditto for XMIDs) this way, which is all that really
> matters.

Makes sense. So maybe the commit message should try to emphasize this
point e.g. "If a page is all-frozen at the time we check whether it
can be skipped, don't allow it to affect the relfrozenxmin and
relminmxid which we set for the relation. This was previously true for
aggressive vacuums, but not for non-aggressive vacuums, which was
inconsistent. (The reason this is a safe thing to do is that any new
XIDs or MXIDs that appear on the page after we initially observe it to
be frozen must be newer than any relfrozenxid or relminmxid the
current vacuum could possibly consider storing into pg_class.)"

> This is absolutely mandatory in the aggressive case, because otherwise
> relfrozenxid advancement might be seen as unsafe. My observation is:
> Why should we accept the same race in the non-aggressive case? Why not
> do essentially the same thing in every VACUUM?

Sure, that seems like a good idea. I think I basically agree with the
goals of the patch. My concern is just about making the changes
understandable to future readers. This area is notoriously subtle, and
people are going to introduce more bugs even if the comments and code
organization are fantastic.

> And so you could almost say that there is now behavioral change at
> all.

I vigorously object to this part, though. We should always err on the
side of saying that commits *do* have behavioral changes. We should go
out of our way to call out in the commit message any possible way that
someone might notice the difference between the post-commit situation
and the pre-commit situation. It is fine, even good, to also be clear
about how we're maintaining continuity and why we don't think it's a
problem, but the only commits that should be described as not having
any behavioral change are ones that do mechanical code movement, or
are just changing comments, or something like that.

> It seems kinda tricky to split up 0002 like that. It's possible, but
> I'm not sure if it's possible to split it in a way that highlights the
> issue that I just described. Because we already avoided the race in
> the aggressive case.

I do see that there are some difficulties there. I'm not sure what to
do about that. I think a sufficiently clear commit message could
possibly be enough, rather than trying to split the patch. But I also
think splitting the patch should be considered, if that can reasonably
be done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to