On Thu, Mar 28, 2024 at 4:49 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > On 28/03/2024 03:53, Melanie Plageman wrote: > > On Thu, Mar 28, 2024 at 01:04:04AM +0200, Heikki Linnakangas wrote: > >> One change with this is that live_tuples and many of the other fields are > >> now again updated, even if the caller doesn't need them. It was hard to > >> skip > >> them in a way that would save any cycles, with the other refactorings. > > > > I am worried we are writing checks that are going to have to come out of > > SELECT queries' bank accounts, but I'll do some benchmarking when we're > > all done with major refactoring. > > Sounds good, thanks.
Actually, after having looked at it again, there really are only a few more increments of various counters, the memory consumed by revisit, and the additional setting of offsets in marked. I think a few carefully constructed queries which do on-access pruning could test the impact of this (as opposed to a bigger benchmarking endeavor). I also wonder if there would be any actual impact of marking the various record_*() routines inline. > >> @@ -728,10 +832,12 @@ htsv_get_valid_status(int status) > >> * DEAD, our visibility test is just too coarse to detect it. > >> * > >> * In general, pruning must never leave behind a DEAD tuple that still > >> has > >> - * tuple storage. VACUUM isn't prepared to deal with that case. That's > >> why > >> + * tuple storage. VACUUM isn't prepared to deal with that case (FIXME: > >> it no longer cares, right?). > >> + * That's why > >> * VACUUM prunes the same heap page a second time (without dropping its > >> lock > >> * in the interim) when it sees a newly DEAD tuple that we initially saw > >> as > >> - * in-progress. Retrying pruning like this can only happen when an > >> inserting > >> + * in-progress (FIXME: Really? Does it still do that?). > > > > Yea, I think all that is no longer true. I missed this comment back > > then. > > Committed a patch to remove it. > > >> @@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber > >> rootoffnum, > >> * RECENTLY_DEAD tuples just in case there's a DEAD one after > >> them; > >> * but we can't advance past anything else. We have to make > >> sure that > >> * we don't miss any DEAD tuples, since DEAD tuples that > >> still have > >> - * tuple storage after pruning will confuse VACUUM. > >> + * tuple storage after pruning will confuse VACUUM (FIXME: > >> not anymore > >> + * I think?). > > > > Meaning, it won't confuse vacuum anymore or there won't be DEAD tuples > > with storage after pruning anymore? > > I meant that it won't confuse VACUUM anymore. lazy_scan_prune() doesn't > loop through the items on the page checking their visibility anymore. > > Hmm, one confusion remains though: In the 2nd phase of vacuum, we remove > all the dead line pointers that we have now removed from the indexes. > When we do that, we assume them all to be dead line pointers, without > storage, rather than normal tuples that happen to be HEAPTUPLE_DEAD. So > it's important that if pruning would leave behind HEAPTUPLE_DEAD tuples, > they are not included in 'deadoffsets'. It seems like master only adds items it is marking LP_DEAD to deadoffsets. And things marked LP_DEAD have lp_len set to 0. > In any case, let's just make sure that pruning doesn't leave > HEAPTUPLE_DEAD tuples. There's no reason it should. Maybe worth adding an assert to static void heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState *prstate, OffsetNumber offnum) { ... Assert(!ItemIdHasStorage(itemid)); prstate->deadoffsets[prstate->lpdead_items++] = offnum; } By the way, I wasn't quite sure about the purpose of heap_prune_record_unchanged_lp_dead(). It is called in heap_prune_chain() in a place where we didn't add things to deadoffsets before, no? /* * Likewise, a dead line pointer can't be part of the chain. (We * already eliminated the case of dead root tuple outside this * function.) */ if (ItemIdIsDead(lp)) { /* * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead * line pointers LP_UNUSED now. */ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW)) heap_prune_record_unused(prstate, offnum, false); else heap_prune_record_unchanged_lp_dead(lp, prstate, offnum); break; } > >> @@ -1224,10 +1327,21 @@ heap_prune_record_live_or_recently_dead(Page page, > >> PruneState *prstate, OffsetNu > >> * ensure the math works out. The assumption that the transaction will > >> * commit and update the counters after we report is a bit shaky; but > >> it > >> * is what acquire_sample_rows() does, so we do the same to be > >> consistent. > >> + * > >> + * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*() > >> + * subroutines. They don't count dead items like > >> acquire_sample_rows() > >> + * does, because we assume that all dead items will become LP_UNUSED > >> + * before VACUUM finishes. This difference is only superficial. > >> VACUUM > >> + * effectively agrees with ANALYZE about DEAD items, in the end. > >> VACUUM > >> + * won't remember LP_DEAD items, but only because they're not > >> supposed to > >> + * be left behind when it is done. (Cases where we bypass index > >> vacuuming > >> + * will violate this optimistic assumption, but the overall impact of > >> that > >> + * should be negligible.) FIXME: I don't understand that last > >> sentence in > >> + * parens. It was copied from elsewhere. > > > > If we bypass index vacuuming, there will be LP_DEAD items left behind > > when we are done because we didn't do index vacuuming and then reaping > > of those dead items. All of these comments are kind of a copypasta, > > though. > > Ah, gotcha, makes sense now. I didn't remember that we sometimes by pass > index vacuuming if there are very few dead items. I thought this talked > about the case that there are no indexes, but that case is OK. These comments could use another pass. I had added some extra (probably redundant) content when I thought I was refactoring it a certain way and then changed my mind. Attached is a diff with some ideas I had for a bit of code simplification. Are you working on cleaning this patch up or should I pick it up? I wonder if it makes sense to move some of the changes to heap_prune_chain() with setting things in marked/revisited to the start of the patch set (to be committed first). - Melanie
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index a8ed11a1858..2f477aa43b1 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -143,7 +143,7 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal); static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum); -static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum); +static void heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState *prstate, OffsetNumber offnum); static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum); static void page_verify_redirects(Page page); @@ -766,7 +766,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer, * we can advance relfrozenxid and relminmxid to the values in * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid. * MFIXME: which one should be pick if presult->nfrozen == 0 and - * presult->all_frozen = True. + * presult->all_frozen = True. MTODO: see Peter's response here + * https://www.postgresql.org/message-id/CAH2-Wz%3DLmOs%3DiJ%3DFfCERnma0q7QjaNSnCgWEp7zOK7hD24YC_w%40mail.gmail.com */ if (new_relfrozen_xid) { @@ -868,9 +869,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, OffsetNumber latestdead = InvalidOffsetNumber, maxoff = PageGetMaxOffsetNumber(dp), offnum; + + /* TODO: while maybe self-explanatory, I would prefer if chainitems and */ + /* nchain had a comment up here */ OffsetNumber chainitems[MaxHeapTuplesPerPage]; - int nchain = 0, - i; + int nchain = 0; + int i = 0; rootlp = PageGetItemId(dp, rootoffnum); @@ -943,6 +947,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, */ prstate->revisit[prstate->nrevisit++] = rootoffnum; + /* TODO: I don't like this comment now */ /* Nothing more to do */ return; } @@ -1020,7 +1025,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW)) heap_prune_record_unused(prstate, offnum, false); else - heap_prune_record_unchanged_lp_dead(prstate, offnum); + heap_prune_record_unchanged_lp_dead(lp, prstate, offnum); break; } @@ -1044,6 +1049,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, */ tupdead = recent_dead = false; + /* TODO: maybe this should just be an if statement now */ switch (htsv_get_valid_status(prstate->htsv[offnum])) { case HEAPTUPLE_DEAD: @@ -1132,42 +1138,34 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * dead and the root line pointer can be marked dead. Otherwise just * redirect the root to the correct chain member. */ - if (i >= nchain) - heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp)); - else - { + if (i < nchain) heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], ItemIdIsNormal(rootlp)); - - /* the rest of tuples in the chain are normal, unchanged tuples */ - for (; i < nchain; i++) - heap_prune_record_unchanged(dp, prstate, chainitems[i]); - } + else + heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp)); } - else + else if ((i = ItemIdIsRedirected(rootlp))) { - i = 0; - if (ItemIdIsRedirected(rootlp)) + if (i < nchain) { - if (nchain < 2) - { - /* - * We found a redirect item that doesn't point to a valid - * follow-on item. This can happen if the loop in - * heap_page_prune_and_freeze() caused us to visit the dead - * successor of a redirect item before visiting the redirect - * item. We can clean up by setting the redirect item to DEAD - * state or LP_UNUSED if the caller indicated. - */ - heap_prune_record_dead_or_unused(prstate, rootoffnum, false); - } - else - heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum); - i++; + heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum); + } + else + { + /* + * We found a redirect item that doesn't point to a valid + * follow-on item. This can happen if the loop in + * heap_page_prune_and_freeze() caused us to visit the dead + * successor of a redirect item before visiting the redirect item. + * We can clean up by setting the redirect item to DEAD state or + * LP_UNUSED if the caller indicated. + */ + heap_prune_record_dead_or_unused(prstate, rootoffnum, false); } - /* the rest of tuples in the chain are normal, unchanged tuples */ - for (; i < nchain; i++) - heap_prune_record_unchanged(dp, prstate, chainitems[i]); } + + /* the rest of tuples in the chain are normal, unchanged tuples */ + for (; i < nchain; i++) + heap_prune_record_unchanged(dp, prstate, chainitems[i]); } /* Record lowest soon-prunable XID */ @@ -1486,7 +1484,7 @@ heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum) * Record line pointer that was already LP_DEAD and is left unchanged. */ static void -heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum) +heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState *prstate, OffsetNumber offnum) { /* * Deliberately don't set hastup for LP_DEAD items. We make the soft @@ -1506,6 +1504,7 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum) Assert(!prstate->marked[offnum]); prstate->marked[offnum] = true; + Assert(!ItemIdHasStorage(itemid)); prstate->deadoffsets[prstate->lpdead_items++] = offnum; }