On Thu, Jan 18, 2024 at 8:52 AM Robert Haas <robertmh...@gmail.com> wrote: > But I also said one more thing that I'd still like to hear your > thoughts about, which is: why is it right to update the FSM after the > second heap pass rather than the first one? I can't help but suspect > this is an algorithmic holdover from pre-HOT days, when VACUUM's first > heap pass was read-only and all the work happened in the second pass. > Now, nearly all of the free space that will ever become free becomes > free in the first pass, so why not advertise it then, instead of > waiting?
I don't think that doing everything FSM-related in the first heap pass is a bad idea -- especially not if it buys you something elsewhere. The problem with your justification for moving things in that direction (if any) is that it is occasionally not quite true: there are at least some cases where line pointer truncation after making a page's LP_DEAD items -> LP_UNUSED will actually matter. Plus PageGetHeapFreeSpace() will return 0 if and when "PageGetMaxOffsetNumber(page) > MaxHeapTuplesPerPage && !PageHasFreeLinePointers(page)". Of course, nothing stops you from compensating for this by anticipating what will happen later on, and assuming that the page already has that much free space. It might even be okay to just not try to compensate for anything, PageGetHeapFreeSpace-wise -- just do all FSM stuff in the first heap pass, and ignore all this. I happen to believe that a FSM_CATEGORIES of 256 is way too much granularity to be useful in practice -- I just don't have any faith in the idea that that kind of granularity is useful (it's quite the opposite). A further justification might be what we already do in the heapam.c REDO routines: the way that we use XLogRecordPageWithFreeSpace already operates with far less precision that corresponding code from vacuumlazy.c. heap_xlog_prune() already has recovery do what you propose to do during original execution; it doesn't try to avoid duplicating an anticipated call to XLogRecordPageWithFreeSpace that'll take place when heap_xlog_vacuum() runs against the same page a bit later on. You'd likely prefer a simpler argument for doing this -- an argument that doesn't require abandoning/discrediting the idea that a high degree of FSM_CATEGORIES-wise precision is a valuable thing. Not sure that that's possible -- the current design is at least correct on its own terms. And what you propose to do will probably be less correct on those same terms, silly though they are. -- Peter Geoghegan