On Sat, 20 Dec 2025 at 02:10, Melanie Plageman <[email protected]> wrote: > > Attached v29 addresses some feedback and also corrects a small error > with the assertion I had added in the previous version's 0009. > > On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <[email protected]> wrote: > > > > I’ve done a basic review of patches 1 and 2. Here are some comments > > which may be somewhat immature, as this is a fairly large change set > > and I’m new to some parts of the code. > > > > 1) Potential stale old_vmbits after VM repair n v2 > > Good catch! I've fixed this in attached v29. > > > 2) Add Assert(BufferIsDirty(buf)) > > > > Since the patch's core claim is "buffer must be dirty before WAL > > registration", an assertion encodes this invariant. Should we add: > > > > Assert(BufferIsValid(buf)); > > Assert(BufferIsDirty(buf)); > > > > right before the visibilitymap_set() call? > > There are already assertions that will trip in various places -- most > importantly in XLogRegisterBuffer(), which is the one that inspired > this refactor. > > > The comment at lines: > > > "The only scenario where it is not already dirty is if the VM was > > > removed…" > > > > This phrasing could become misleading after future refactors. Can we > > make it more direct like: > > > > > "We must mark the heap buffer dirty before calling visibilitymap_set(), > > > because it may WAL-log the buffer and XLogRegisterBuffer() requires it." > > I see your point about future refactors missing updating comments like > this. But, I don't think we are going to refactor the code such that > we can have PD_ALL_VISIBLE set without the VM bits set more often. > Also, it is common practice in Postgres to describe very specific edge > cases or odd scenarios in order to explain code that may seem > confusing without the comment. It does risk that comment later > becoming stale, but it is better that future developers understand why > the code is there. > > That being said, I take your point that the comment is confusing. I > have updated it in a different way. > > > > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about > > > unnecessarily dirtying the heap buffer, as it must be marked dirty before > > > adding it to the WAL chain. The only scenario where it is not already > > > dirty is if the VM was removed..." > > > > In this test we now call MarkBufferDirty() on the heap page even when > > only setting the VM, so the comments claiming “does not need to modify > > the heap buffer”/“no heap page modification” might be misleading. It > > might be better to say the test doesn’t need to modify heap > > tuples/page contents or doesn’t need to prune/freeze. > > The point I'm trying to make is that we have to dirty the buffer even > if we don't modify the page because of the XLOG sub-system > requirements. And, it may seem like a waste to do that if not > modifying the page, but the page will rarely be clean anyway. I've > tried to make this more clear in attached v29. > > - Melanie
Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins of this code track down to fdf9e21196a6 which was committed as part of [0], at which point there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is the reason this optimization was not performed earlier. I also think this patch is correct, because we do similar things for HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and HeapTupleSatisfiesVacuumHorizon is just a proxy to HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS RECENTLY_DEAD handling. Similar change could be done at heapam_scan_analyze_next_tuple ... case HEAPTUPLE_DEAD: case HEAPTUPLE_RECENTLY_DEAD: /* Count dead and recently-dead rows */ *deadrows += 1; break; ... [0] https://www.postgresql.org/message-id/CABOikdP0meGuXPPWuYrP%3DvDvoqUdshF2xJAzZHWSKg03Rz_%2B9Q%40mail.gmail.com -- Best regards, Kirill Reshke
