On Sun, Oct 23, 2022 at 9:32 PM Jeff Davis <pg...@j-davis.com> wrote: > It's possible this would be easier for users to understand: one process > that does cleanup work over time in a way that minimizes interference; > and another process that activates in more urgent situations (perhaps > due to misconfiguration of the first process).
I think that the new "early" version of antiwraparound autovacuum (that can still be autocancelled) would simply be called autovacuum. It wouldn't appear as "autovacuum to prevent wraparound" in places like pg_stat_activity. For the most part users wouldn't have to care about the difference between these autovacuums and traditional non-antiwraparound autovacuums. They really would be exactly the same thing, so it would make sense if users typically noticed no difference whatsoever (at least in contexts like pg_stat_activity). > But we should be careful that we don't end up with more confusion. For > something like that to work, we'd probably want the second process to > not be configurable at all, and we'd want it to be issuing WARNINGs > pointing to what might be misconfigured, and otherwise just be > invisible. There should be some simple scheme for determining when an antiwraparound autovacuum (non-cancellable autovacuum to advance relfrozenxid/relminmxid) should run (applied by the autovacuum.c scheduling logic). Something like "table has attained an age that's now 2x autovacuum_freeze_max_age, or 1/2 of vacuum_failsafe_age, whichever is less". The really important thing is giving a regular/early autocancellable autovacuum triggered by age(relfrozenxid) *some* opportunity to run. I strongly suspect that the exact details won't matter too much, provided we manage to launch at least one such autovacuum before escalating to traditional antiwraparound autovacuum (which cannot be autocancelled). Even if regular/early autovacuum had just one opportunity to run to completion, we'd already be much better off. The hazards from blocking automated DDL in a way that leads to a very disruptive traffic jam (like in the Joyent Manta postmortem) would go way down. > > That way we wouldn't be fighting against the widely held perception > > that antiwraparound autovacuums are scary. > > There's certainly a terminology problem there. Just to brainstorm on > some new names, we might want to call it something like "xid > reclamation" or "xid horizon advancement". I think that we should simply call it autovacuum. Under this scheme, antiwraparound autovacuum would be a qualitatively different kind of operation to users (though not to vacuumlazy.c), because it would not be autocancellable in the standard way. And because users should take it as a signal that things aren't really working well (otherwise we wouldn't have reached the point of requiring a scary antiwraparound autovacuum in the first place). Right now antiwraparound autovacuums are both an emergency thing (or at least described as such in one or two areas of the source code), and a completely routine occurrence. This is deeply confusing. Separately, I plan on breaking out insert-triggered autovacuums from traditional dead tuple triggered autovacuums [1], which creates a need to invent some kind of name to differentiate the new table age triggering criteria from both insert-driven and dead tuple driven autovacuums. These are all fundamentally the same operations with the same urgency to users, though. We'd only need to describe the *criteria* that *triggered* the autovacuum in our autovacuum log report (actually we'd still report autovacuums aš antiwraparound autovacuum in cases where that still happened, which won't be presented as just another triggering criteria in the report). [1] https://www.postgresql.org/message-id/flat/cah2-wzneqmkmry8feudk8xdph37-4anygf7a04bwxoc1gkd...@mail.gmail.com -- Peter Geoghegan