Hi, On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan <p...@bowt.ie> wrote:
> Hi Samay, > > On Tue, May 2, 2023 at 11:40 PM samay sharma <smilingsa...@gmail.com> > wrote: > > Thanks for taking the time to do this. It is indeed difficult work. > > Thanks for the review! I think that this is something that would > definitely benefit from a perspective such as yours. > Glad to hear that my feedback was helpful. > > > There are things I like about the changes you've proposed and some where > I feel that the previous section was easier to understand. > > That makes sense, and I think that I agree with every point you've > raised, bar none. I'm pleased to see that you basically agree with the > high level direction. > > I would estimate that the version you looked at (v2) is perhaps 35% > complete. So some of the individual problems you noticed were a direct > consequence of the work just not being anywhere near complete. I'll > try to do a better job of tracking the relative maturity of each > commit/patch in each commit message, going forward. > > Anything that falls under "25.2.1. Recovering Disk Space" is > particularly undeveloped in v2. The way that I broke that up into a > bunch of WARNINGs/NOTEs/TIPs was just a short term way of breaking it > up into pieces, so that the structure was very approximately what I > wanted. I actually think that the stuff about CLUSTER and VACUUM FULL > belongs in a completely different chapter. Since it is not "Routine > Vacuuming" at all. > > >> 2. Renamed "Preventing Transaction ID Wraparound Failures" to > >> "Freezing to manage the transaction ID space". Now we talk about > >> wraparound as a subtopic of freezing, not vice-versa. (This is a > >> complete rewrite, as described by later items in this list). > > > > +1 on this too. Freezing is a normal part of vacuuming and while the > aggressive vacuums are different, I think just talking about the worst case > scenario while referring to it is alarmist. > > Strangely enough, Postgres 16 is the first version that instruments > freezing in its autovacuum log reports. I suspect that some long term > users will find it quite surprising to see how much (or how little) > freezing takes place in non-aggressive VACUUMs. > > The introduction of page-level freezing will make it easier and more > natural to tune settings like vacuum_freeze_min_age, with the aim of > smoothing out the burden of freezing over time (particularly by making > non-aggressive VACUUMs freeze more). Page-level freezing removes any > question of not freezing every tuple on a page (barring cases where > "removable cutoff" is noticeably held back by an old MVCC snapshot). > This makes it more natural to think of freezing as a process that > makes it okay to store data in individual physical heap pages, long > term. > > > 1) While I agree that bundling VACUUM and VACUUM FULL is not the right > way, moving all VACUUM FULL references into tips and warnings also seems > excessive. I think it's probably best to just have a single paragraph which > talks about VACUUM FULL as I do think it should be mentioned in the > reclaiming disk space section. > > As I mentioned briefly already, my intention is to move it to another > chapter entirely. I was thinking of "Chapter 29. Monitoring Disk > Usage". The "Routine Vacuuming" docs would then link to this sect1 -- > something along the lines of "non-routine commands to reclaim a lot of > disk space in the event of extreme bloat". > > > 2) I felt that the new section, "Freezing to manage the transaction ID > space" could be made simpler to understand. As an example, I understood > what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do > and how they interact better in the previous version of the docs. > > Agreed. I'm going to split it up some more. I think that the current > "25.2.2.1. VACUUM's Aggressive Strategy" should be split in two, so we > go from talking about aggressive VACUUMs to Antiwraparound > autovacuums. Finding the least confusing way of explaining it has been > a focus of mine in the last few days. > To be honest, this was not super simple to understand even in the previous version. However, as our goal is to simplify this and make it easier to understand, I'll hold this patch-set to a higher standard :). I wish there was a simple representation (maybe even a table or something) which would explain the differences between a VACUUM which is not aggressive, a VACUUM which ends up being aggressive due to vacuum_freeze_table_age and an antiwraparound autovacuum. > > > 4) I think we should explicitly call out that seeing an anti-wraparound > VACUUM or "VACUUM table (to prevent wraparound)" is normal and that it's > just a VACUUM triggered due to the table having unfrozen rows with an XID > older than autovacuum_freeze_max_age. I've seen many users panicking on > seeing this and feeling that they are close to a wraparound. > > That has also been my exact experience. Users are terrified, usually > for no good reason at all. I'll make sure that this comes across in > the next revision of the patch series. > Thinking about it a bit more, I wonder if there's value in changing the "(to prevent wraparound)" to something else. It's understandable why people who just see that in pg_stat_activity and don't read docs might assume they are close to a wraparound. Regards, Samay > > > Also, we should be more clear about how it's different from VACUUMs > triggered due to the scale factors (cancellation behavior, being triggered > when autovacuum is disabled etc.). > > Right. Though I think that the biggest point of confusion for users is > how *few* differences there really are between antiwraparound > autovacuum, and any other kind of autovacuum that happens to use > VACUUM's aggressive strategy. There is really only one important > difference: the autocancellation behavior. This is an autovacuum > behavior, not a VACUUM behavior -- so the "VACUUM side" doesn't know > anything about that at all. > > 5) Can we use a better name for the XidStopLimit mode? It seems like a > very implementation centric name. Maybe a better version of "Running out of > the XID space" or something like that? > > Coming up with a new user-facing name for xidStopLimit is already on > my TODO list (it's surprisingly hard). I have used that name so far > because it unambiguously refers to the exact thing that I want to talk > about when discussing the worst case. Other than that, it's a terrible > name. > > > 6) In the XidStopLimit mode section, it would be good to explain briefly > why you could get to this scenario. It's not something which should happen > in a normal running system unless you have a long running transaction or > inactive replication slots or a badly configured system or something of > that sort. > > I agree that that's important. Note that there is already something > about "removable cutoff" being held back at the start of the > discussion of freezing -- that will prevent freezing in exactly the > same way as it prevents cleanup of dead tuples. > > That will become a WARNING box in the next revision. There should also > be a similar, analogous WARNING box (about "removable cutoff" being > held back) much earlier on in the docs -- this should appear in > "25.2.1. Recovering Disk Space". Obviously this structure suggests > that there is an isomorphism between freezing and removing bloat. For > example, if you cannot "freeze" an XID that appears in some tuple's > xmax, then you also cannot remove that tuple because VACUUM only sees > it as a recently dead tuple (if xmax is >= OldestXmin/removable > cutoff, and from a deleter that already committed). > > I don't think that we need to spell the "isomorphism" point out to the > reader directly, but having a subtle cue that that's how it works > seems like a good idea. > > > If you got to this point, other than running VACUUM to get out of the > situation, it's also important to figure out what got you there in the > first place as many VACUUMs should have attempted to advance the > relfrozenxid and failed. > > It's also true that problems that can lead to the system entering > xidStopLimit mode aren't limited to cases where doing required > freezing is fundamentally impossible due to something holding back > "removable cutoff". It's also possible that VACUUM simply can't keep > up (though the failsafe has helped with that problem a lot). > > I tend to agree that there needs to be more about this in the > xidStopLimit subsection (discussion of freezing being held back by > "removable cutoff" is insufficient), but FWIW that seems like it > should probably be treated as out of scope for this patch. It is more > the responsibility of the other patch [1] that aims to put the > xidStopLimit documentation on a better footing (and remove that > terrible HINT about single user mode). > > Of course, that other patch is closely related to this patch -- the > precise boundaries are unclear at this point. In any case I think that > this should happen, because I think that it's a good idea. > > > There are a few other small things I noticed along the way but my goal > was to look at the overall structure. > > Thanks again! This is very helpful. > > [1] > https://www.postgresql.org/message-id/flat/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg%40mail.gmail.com > -- > Peter Geoghegan >