Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Peter Geoghegan
On Tue, Dec 14, 2021 at 10:47 AM Robert Haas  wrote:
> Well I just don't understand why you insist on using the word
> "skipping." I think what we're talking about - or at least what we
> should be talking about - is whether relation_needs_vacanalyze() sets
> *wraparound = true right after the comment that says /* Force vacuum
> if table is at risk of wraparound */. And adding some kind of
> exception to the logic that's there now.

Actually, I agree. Skipping is the wrong term, especially because the
phrase "VACUUM skips..." is already too overloaded. Not necessarily in
vacuumlazy.c itself, but certainly on the mailing list.

> Yeah, I hadn't thought about it from that perspective, but that does
> seem very good. I think it's inevitable that there will be cases where
> that doesn't work out - e.g. you can always force the bad case by
> holding a table lock until your newborn heads off to college, or just
> by overthrottling autovacuum so that it can't get through the database
> in any reasonable amount of time - but it will be nice when it does
> work out, for sure.

Right. But when the patch doesn't manage to totally prevent
anti-wraparound VACUUMs, things still work out a lot better than they
would now. I would expect that in practice this will usually only
happen when non-aggressive autovacuums keep getting canceled. And
sure, it's still not ideal that things have come to that. But because
we now do freezing earlier (when it's relatively inexpensive), and
because we set all-frozen bits incrementally, the anti-wraparound
autovacuum will at least be able to reuse any freezing that we manage
to do in all those canceled autovacuums.

I think that this tends to make anti-wraparound VACUUMs mostly about
not being cancelable -- not so much about reliably advancing
relfrozenxid. I mean it doesn't change the basic rules (there is no
change to the definition of aggressive VACUUM), but in practice I
think that it'll just work that way.  Which makes a great deal of
sense. I hope to be able to totally get rid of
vacuum_freeze_table_age.

The freeze map work in PostgreSQL 9.6 was really great, and very
effective. But I think that it had an undesirable interaction with
vacuum_freeze_min_age: if we set a heap page as all-visible (but not
all frozen) before some of its tuples reached that age (which is very
likely), then tuples <  vacuum_freeze_min_age aren't going to get
frozen until whenever we do an aggressive autovacuum. Very often, this
will only happen when we next do an anti-wraparound VACUUM (at least
before Postgres 13). I suspect we risk running into a "debt cliff" in
the eventual anti-wraparound autovacuum. And so while
vacuum_freeze_min_age kinda made sense prior to 9.6, it now seems to
make a lot less sense.

> > I guess that that makes avoiding useless vacuuming seem like less of a
> > priority. ISTM that it should be something that is squarely aimed at
> > keeping things stable in truly pathological cases.
>
> Yes. I think "pathological cases" is a good summary of what's wrong
> with autovacuum.

This is 100% my focus, in general. The main goal of the patch I'm
working on isn't so much improving performance as making it more
predictable over time. Focussing on freezing while costs are low has a
natural tendency to spread the costs out over time. The system should
never "get in over its head" with debt that vacuum is expected to
eventually deal with.

> When there's nothing too crazy happening, it actually
> does pretty well. But, when resources are tight or other corner cases
> occur, really dumb things start to happen. So it's reasonable to think
> about how we can install guard rails that prevent complete insanity.

Another thing that I really want to stamp out is anything involving a
tiny, seemingly-insignificant adverse event that has the potential to
cause disproportionate impact over time. For example, right now a
non-aggressive VACUUM will never be able to advance relfrozenxid when
it cannot get a cleanup lock on one heap page. It's actually extremely
unlikely that that should have much of any impact, at least when you
determine the new relfrozenxid for the table intelligently. Not
acquiring one cleanup lock on one heap page on a huge table should not
have such an extreme impact.

It's even worse when the systemic impact over time is considered.
Let's say you only have a 20% chance of failing to acquire one or more
cleanup locks during a non-aggressive autovacuum for a given large
table, meaning that you'll fail to advance relfrozenxid in at least
20% of all non-aggressive autovacuums. I think that that might be a
lot worse than it sounds, because the impact compounds over time --
I'm not sure that 20% is much worse than 60%, or much better than 5%
(very hard to model it). But if we make the high-level, abstract idea
of "aggressiveness" more of a continuous thing, and not something
that's defined by sharp (and largely meaningless) XID-based cutoffs,
we have every chance of nipping these problems 

Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Robert Haas
On Tue, Dec 14, 2021 at 1:16 PM Peter Geoghegan  wrote:
> I think that you'd agree that the arguments in favor of skipping are
> strongest ...

Well I just don't understand why you insist on using the word
"skipping." I think what we're talking about - or at least what we
should be talking about - is whether relation_needs_vacanalyze() sets
*wraparound = true right after the comment that says /* Force vacuum
if table is at risk of wraparound */. And adding some kind of
exception to the logic that's there now.

> What I see with the draft patch series is that the oldest XID just
> isn't that old anymore, consistently -- we literally never fail to
> advance relfrozenxid, in any autovacuum, for any table. And the value
> that we end up with is consistently quite recent. This is something
> that I see both with BenchmarkSQL, and pgbench. There is a kind of
> virtuous circle, which prevents us from ever getting anywhere near
> having any table age in the tens of millions of XIDs.

Yeah, I hadn't thought about it from that perspective, but that does
seem very good. I think it's inevitable that there will be cases where
that doesn't work out - e.g. you can always force the bad case by
holding a table lock until your newborn heads off to college, or just
by overthrottling autovacuum so that it can't get through the database
in any reasonable amount of time - but it will be nice when it does
work out, for sure.

> I guess that that makes avoiding useless vacuuming seem like less of a
> priority. ISTM that it should be something that is squarely aimed at
> keeping things stable in truly pathological cases.

Yes. I think "pathological cases" is a good summary of what's wrong
with autovacuum. When there's nothing too crazy happening, it actually
does pretty well. But, when resources are tight or other corner cases
occur, really dumb things start to happen. So it's reasonable to think
about how we can install guard rails that prevent complete insanity.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Peter Geoghegan
On Tue, Dec 14, 2021 at 6:05 AM Robert Haas  wrote:
> I think this is a reasonable line of thinking, but I think it's a
> little imprecise. In general, we could be vacuuming a relation to
> advance relfrozenxid, but we could also be vacuuming a relation to
> advance relminmxid, or we could be vacuuming a relation to fight
> bloat, or set pages all-visible. It is possible that there's no hope
> of advancing relfrozenxid but that we can still accomplish one of the
> other goals. In that case, the vacuuming is not useless.  I think the
> place to put logic around this would be in the triggering logic for
> autovacuum. If we're going to force a relation to be vacuumed because
> of (M)XID wraparound danger, we could first check whether there seems
> to be any hope of advancing relfrozenxid(minmxid). If not, we discount
> that as a trigger for vacuum, but may still decide to vacuum if some
> other trigger warrants it. In most cases, if there's no hope of
> advancing relfrozenxid, there won't be any bloat to remove either, but
> aborted transactions are a counterexample. And the XID and MXID
> horizons can advance at completely different rates.

I think that you'd agree that the arguments in favor of skipping are
strongest for an aggressive anti-wraparound autovacuum (as opposed to
any other kind of aggressive VACUUM, including aggressive autovacuum).
Aside from the big benefit I pointed out already (avoiding blocking
useful anti-wraparound vacuums that starts a little later by not
starting a conflicting useless anti-wraparound vacuum now), there is
also more certainty about downsides. We can know the following things
for sure:

* We only launch an (aggressive) anti-wraparound autovacuum because we
need to advance relfrozenxid. In other words, if we didn't need to
advance relfrozenxid then (for better or worse) we definitely wouldn't
be launching anything.

* Our would-be OldestXmin exactly matches the preexisting
pg_class.relfrozenxid (and pg_class.relminmxid). And so it follows
that we're definitely not going to be able to do the thing that is
ostensibly the whole point of anti-wraparound vacuum (advance
relfrozenxid/relminmxid).

> One reason I haven't pursued this kind of optimization is that it
> doesn't really feel like it's fixing the whole problem. It would be a
> little bit sad if we did a perfect job preventing useless vacuuming
> but still allowed almost-useless vacuuming. Suppose we have a 1TB
> relation and we trigger autovacuum. It cleans up a few things but
> relfrozenxid is still old. On the next pass, we see that the
> system-wide xmin has not advanced, so we don't trigger autovacuum
> again. Then on the pass after that we see that the system-wide xmin
> has advanced by 1. Shall we trigger an autovacuum of the whole
> relation now, to be able to do relfrozenxid++? Seems dubious.

I can see what you mean, but just fixing the most extreme case can be
a useful goal. It's often enough to stop the system from going into a
tailspin, which is the real underlying goal here. Things that approach
the most extreme case (but don't quite hit it) don't have that
quality.

An anti-wraparound vacuum is supposed to be a mechanism that the
system escalates to when nothing else triggers an autovacuum worker to
run (which is aggressive but not anti-wraparound). That's not really
true in practice, of course; anti-wraparound av often becomes a
routine thing. But I think that it's a good ideal to strive for -- it
should be rare.

The draft patch series now adds opportunistic freezing -- I should be
able to post a new version in a few days time, once I've tied up some
loose ends. My testing shows an interesting effect, when opportunistic
freezing is applied on top of the relfrozenxid thing: every autovacuum
manages to advance relfrozenxid, and so we'll never have to run an
aggressive autovacuum (much less an aggressive anti-wraparound
autovacuum) in practice. And so (for example) when autovacuum runs
against the pgbench_history table, it always sets its relfrozenxid to
a value very close to the OldestXmin -- usually the exact OldestXmin.

Opportunistic freezing makes us avoid setting the all-visible bit for
a heap page without also setting the all-frozen bit -- when we're
about to do that, we go freeze the heap tuples and then set the entire
page all-frozen (so we freeze anything <= OldestXmin, not <=
FreezeLimit). We also freeze based on this more aggressive <=
OldestXmin cutoff when pruning had to delete some tuples.

The patch still needs more polishing, but I think that we can make
anti-wraparound vacuums truly exceptional with this design -- which
would make autovacuum a lot easier to deal with operationally. This
seems like a feasible goal for Postgres 15, even (though still quite
ambitious). The opportunistic freezing stuff isn't free (the WAL
records aren't tiny), but it's still not all that expensive. Plus I
think that the cost can be further reduced, with a little more work.

> Part of the problem here, for 

Re: Defining (and possibly skipping) useless VACUUM operations

2021-12-14 Thread Robert Haas
On Sun, Dec 12, 2021 at 8:47 PM Peter Geoghegan  wrote:
> I am currently working on decoupling advancing relfrozenxid from tuple
> freezing [1]. That is, I'm teaching VACUUM to keep track of
> information that it uses to generate an "optimal value" for the
> table's final relfrozenxid: the most recent XID value that might still
> be in the table. This patch is based on the observation that we don't
> actually have to use the FreezeLimit cutoff for our new
> pg_class.relfrozenxid. We need only obey the basic relfrozenxid
> invariant, which is that the final value must be <= any extant XID in
> the table.  Using FreezeLimit is needlessly conservative.

Right.

> It now occurs to me to push this patch in another direction, on top of
> all that: the OldestXmin behavior hints at a precise, robust way of
> defining "useless vacuuming". We can condition skipping a VACUUM (i.e.
> whether a VACUUM is considered "definitely won't be useful if allowed
> to execute") on whether or not our preexisting pg_class.relfrozenxid
> precisely equals our newly-acquired OldestXmin for an about-to-begin
> VACUUM operation.  (We'd also want to add an "unchangeable
> pg_class.relminmxid" test, I think.)

I think this is a reasonable line of thinking, but I think it's a
little imprecise. In general, we could be vacuuming a relation to
advance relfrozenxid, but we could also be vacuuming a relation to
advance relminmxid, or we could be vacuuming a relation to fight
bloat, or set pages all-visible. It is possible that there's no hope
of advancing relfrozenxid but that we can still accomplish one of the
other goals. In that case, the vacuuming is not useless.  I think the
place to put logic around this would be in the triggering logic for
autovacuum. If we're going to force a relation to be vacuumed because
of (M)XID wraparound danger, we could first check whether there seems
to be any hope of advancing relfrozenxid(minmxid). If not, we discount
that as a trigger for vacuum, but may still decide to vacuum if some
other trigger warrants it. In most cases, if there's no hope of
advancing relfrozenxid, there won't be any bloat to remove either, but
aborted transactions are a counterexample. And the XID and MXID
horizons can advance at completely different rates.

One reason I haven't pursued this kind of optimization is that it
doesn't really feel like it's fixing the whole problem. It would be a
little bit sad if we did a perfect job preventing useless vacuuming
but still allowed almost-useless vacuuming. Suppose we have a 1TB
relation and we trigger autovacuum. It cleans up a few things but
relfrozenxid is still old. On the next pass, we see that the
system-wide xmin has not advanced, so we don't trigger autovacuum
again. Then on the pass after that we see that the system-wide xmin
has advanced by 1. Shall we trigger an autovacuum of the whole
relation now, to be able to do relfrozenxid++? Seems dubious.

Part of the problem here, for both vacuuming-for-bloat and
vacuuming-for-relfrozenxid-advancement, we would really like to know
the distribution of old XIDs in the table. If we knew that a lot of
the inserts, updates, and deletes that are causing us to vacuum for
bloat containment were in a certain relatively narrow range, then we'd
probably want to not autovacuum for either purpose until the
system-wide xmin has crossed through at least a good chunk of that
range. And it fully crossed over that range then an immediate vacuum
looks extremely appealing: we'll both remove a bunch of dead tuples
and reclaim the associated line pointers, and at the same time we'll
be able to advance relfrozenxid. Nice! But we have no such
information.

So I'm not certain of the way forward here. Just because we can't
prevent almost-useless vacuuming is not a sufficient reason to
continue allowing entirely-useless vacuuming that we can prevent. And
it seems like we need a bunch of new bookkeeping to do any better than
that, which seems like a lot of work. So maybe it's the most practical
path forward for the time being, but it feels like more of a
special-purpose kludge than a truly high-quality solution.

-- 
Robert Haas
EDB: http://www.enterprisedb.com