Re: new heapcheck contrib module

Robert Haas Tue, 04 Aug 2020 09:08:29 -0700

On Fri, Jul 31, 2020 at 12:33 PM Andres Freund <[email protected]> wrote:
> I'm not sure what I was thinking "back then", but right now I'd argue
> that the best lock against vacuum isn't a SUE, but announcing the
> correct ->xmin, so you can be sure that clog entries won't be yanked out
> from under you. Potentially with the right flag sets to avoid old enough
> tuples eing pruned.


I was just thinking about this some more (and talking it over with
Mark) and I think this might actually be a really bad idea. One
problem with it is that it means that the oldest-xmin value can go
backward, which is something that I think has caused us some problems
before. There are some other cases where it can happen, and I'm not
sure that there's any necessarily fatal problem with doing it in this
case, but it would definitely be a shame if this contrib module broke
something for core in a way that was hard to fix. But let's leave that
aside and suppose that there is no fatal problem there. Essentially
what we're talking about here is advertising the table's relfrozenxid
as our xmin. How old is that likely to be? Maybe pretty old. The
default value of vacuum_freeze_table_age is 150 million transactions,
and that's just the trigger to start vacuuming; the actual value of
age(relfrozenxid) could easily be higher than that. But even if it's
only a fraction of that, it's still pretty bad. Advertising an xmin
half that old (75 million transactions) is equivalent to keeping a
snapshot open for an amount of time equal to however long it takes you
to burn through 75 million XIDs. For instance, if you burn 10 million
XIDs/hour, that's the equivalent of keeping a snapshot open for 7.5
hours. In other words, it's quite likely that doing this is going to
make VACUUM (and HOT pruning) drastically less effective throughout
the entire database cluster. To me, this seems a lot worse than just
taking ShareUpdateExclusiveLock on the table. After all,
ShareUpdateExclusiveLock will prevent VACUUM from running on that
table, but it only affects that one table rather than the whole
cluster, and it "only" stops VACUUM from running, which is still
better than having it do lots of I/O but not clean anything up.

I think I see another problem with this approach, too: it's racey. If
some other process has entered vac_update_datfrozenxid() and has
gotten past the calls to GetOldestXmin() and GetOldestMultiXactId(),
and we then advertise an older xmin (and I guess also oldestMXact) it
can still go on to update datfrozenxid/datminmxid and then truncate
the SLRUs. Even holding XactTruncationLock is insufficient to protect
against this race condition, and there doesn't seem to be any other
obvious approach, either.

So I would like to back up a minute and lay out the possible solutions
as I understand them. The specific problem here I'm talking about here
is: how do we keep from looking up an XID or MXID whose information
might have been truncated away from the relevant SLRU?

1. Take a ShareUpdateExclusiveLock on the table. This prevents VACUUM
from running concurrently on this table (which sucks), but that for
sure guarantees that the table's relfrozenxid and relminmxid can't
advance, which precludes a concurrent CLOG truncation.

2. Advertise an older xmin and minimum MXID. See above.

3. Acquire XactTruncationLock for each lookup, like pg_xact_status().
One downside here is a lot of extra lock acquisitions, but we can
mitigate that to some degree by caching the results of lookups, and by
not doing it for XIDs that our newer than our advertised xmin (which
must be OK) or at least as old as the newest XID we previously
discovered to be unsafe to look up (because those must not be OK
either). The problem case is a table with lots of different XIDs that
are all new enough to look up but older than our xmin, e.g. a table
populated using many single-row inserts. But even if we hit this case,
how bad is it really? I don't think XactTruncationLock is particularly
hot, so maybe it just doesn't matter very much. We could contend
against other sessions checking other tables, or against widespread
use of pg_xact_status(), but I think that's about it. Another downside
of this approach is that I'm not sure it does anything to help us with
the MXID case; fixing that might require building some new
infrastructure similar to XactTruncationLock but for MXIDs.

4. Provide entrypoints for looking up XIDs that fail gently instead of
throwing errors. I've got my doubts about how practical this is; if
it's easy, why didn't we do that instead of inventing
XactTruncationLock?

Maybe there are other options here, too? At the moment, I'm thinking
that (2) and (4) are just bad and so we ought to either do (3) if it
doesn't suck too much for performance (which I don't quite see why it
should, but it might) or else fall back on (1).  (1) doesn't feel
clever enough but it might be better to be not clever enough than to
be too clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module

Reply via email to