Hi,
On 2023-03-22 13:45:52 -0700, Andres Freund wrote:
> On 2023-03-22 09:19:18 -0400, Robert Haas wrote:
> > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev
> > <[email protected]> wrote:
> > > The patch needed a rebase due to a4f23f9b. PFA v12.
> >
> > I have committed this after tidying up a bunch of things in the test
> > case file that I found too difficult to understand -- or in some cases
> > just incorrect, like:
>
> As noticed by Andrew
> https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and
> then reproduced on HEAD by me, there's something not right here.
>
> At the very least there's missing verification that tids actually exists in
> the
> "Update chain validation" loop, leading to:
> TRAP: failed Assert("ItemIdHasStorage(itemId)"), File:
> "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line:
> 355, PID: 645093
>
> Which makes sense - the earlier loop adds t_ctid to the successor array, which
> we then query without checking if there still is such a tid on the page.
It's not quite so simple - I see now that the lp_valid check tries to prevent
that. However, it's not sufficient, because there is no guarantee that
lp_valid[nextoffnum] is initialized. Consider what happens if t_ctid of a heap
tuple points to beyond the end of the item array - lp_valid[nextoffnum] won't
be initialized.
Why are redirections now checked in two places? There already was a
ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's
the ItemIdIsRedirected() check in the "Update chain validation." loop as well
- and the output of that is confusing, because it'll just mention the target
of the redirect.
I also think it's not quite right that some of checks inside if
(ItemIdIsRedirected()) continue in case of corruption, others don't. While
there's a later continue, that means the corrupt tuples get added to the
predecessor array. Similarly, in the non-redirect portion, the successor
array gets filled with corrupt tuples, which doesn't seem quite right to me.
A patch addressing some, but not all, of those is attached. With that I don't
see any crashes or false-positives anymore.
Greetings,
Andres Freund
diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c
index 663fb65dee6..43f93f7784a 100644
--- i/contrib/amcheck/verify_heapam.c
+++ w/contrib/amcheck/verify_heapam.c
@@ -486,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS)
report_corruption(&ctx,
psprintf("line pointer redirection to unused item at offset %u",
(unsigned) rdoffnum));
+ else if (ItemIdIsDead(rditem))
+ report_corruption(&ctx,
+ psprintf("line pointer redirection to dead item at offset %u",
+ (unsigned) rdoffnum));
+ else if (ItemIdIsRedirected(rditem))
+ report_corruption(&ctx,
+ psprintf("line pointer redirection to another redirect at offset %u",
+ (unsigned) rdoffnum));
/*
* Record the fact that this line pointer has passed basic
@@ -564,16 +572,19 @@ verify_heapam(PG_FUNCTION_ARGS)
TransactionId next_xmin;
OffsetNumber nextoffnum = successor[ctx.offnum];
+ /* the current line pointer may not have a successor */
+ if (nextoffnum == InvalidOffsetNumber)
+ continue;
+
/*
- * The current line pointer may not have a successor, either
- * because it's not valid or because it didn't point to anything.
- * In either case, we have to give up.
- *
- * If the current line pointer does point to something, it's
- * possible that the target line pointer isn't valid. We have to
- * give up in that case, too.
+ * The successor is located beyond the end of the line item array,
+ * which can happen when the array is truncated.
*/
- if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum])
+ if (nextoffnum > maxoff)
+ continue;
+
+ /* the successor is not valid, have to give up */
+ if (!lp_valid[nextoffnum])
continue;
/* We have two valid line pointers that we can examine. */
@@ -583,14 +594,8 @@ verify_heapam(PG_FUNCTION_ARGS)
/* Handle the cases where the current line pointer is a redirect. */
if (ItemIdIsRedirected(curr_lp))
{
- /* Can't redirect to another redirect. */
- if (ItemIdIsRedirected(next_lp))
- {
- report_corruption(&ctx,
- psprintf("redirected line pointer points to another redirected line pointer at offset %u",
- (unsigned) nextoffnum));
- continue;
- }
+ /* should have filtered out all the other cases */
+ Assert(ItemIdIsNormal(next_lp));
/* Can only redirect to a HOT tuple. */
next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);
@@ -602,8 +607,12 @@ verify_heapam(PG_FUNCTION_ARGS)
}
/*
- * Redirects are created by updates, so successor should be
- * the result of an update.
+ * Redirects are created by HOT updates, so successor should
+ * be the result of an HOT update.
+ *
+ * XXX: HeapTupleHeaderIsHeapOnly() should always imply
+ * HEAP_UPDATED. This should be checked even when the tuple
+ * isn't a target of a redirect.
*/
if ((next_htup->t_infomask & HEAP_UPDATED) == 0)
{
@@ -665,15 +674,18 @@ verify_heapam(PG_FUNCTION_ARGS)
* tuple should be marked as a heap-only tuple. Conversely, if the
* current tuple isn't marked as HOT-updated, then the next tuple
* shouldn't be marked as a heap-only tuple.
+ *
+ * NB: Can't use HeapTupleHeaderIsHotUpdated() as it checks if
+ * hint bits indicate xmin/xmax aborted.
*/
- if (!HeapTupleHeaderIsHotUpdated(curr_htup) &&
+ if (!(curr_htup->t_infomask2 & HEAP_HOT_UPDATED) &&
HeapTupleHeaderIsHeapOnly(next_htup))
{
report_corruption(&ctx,
psprintf("non-heap-only update produced a heap-only tuple at offset %u",
(unsigned) nextoffnum));
}
- if (HeapTupleHeaderIsHotUpdated(curr_htup) &&
+ if ((curr_htup->t_infomask2 & HEAP_HOT_UPDATED) &&
!HeapTupleHeaderIsHeapOnly(next_htup))
{
report_corruption(&ctx,