Re: HOT chain validation in verify_heapam()

Andres Freund Wed, 22 Mar 2023 14:56:43 -0700

Hi,

On 2023-03-22 13:45:52 -0700, Andres Freund wrote:
> On 2023-03-22 09:19:18 -0400, Robert Haas wrote:
> > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev
> > <[email protected]> wrote:
> > > The patch needed a rebase due to a4f23f9b. PFA v12.
> >
> > I have committed this after tidying up a bunch of things in the test
> > case file that I found too difficult to understand -- or in some cases
> > just incorrect, like:
>
> As noticed by Andrew
> https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and
> then reproduced on HEAD by me, there's something not right here.
>
> At the very least there's missing verification that tids actually exists in 
> the
> "Update chain validation" loop, leading to:
> TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: 
> "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: 
> 355, PID: 645093
>
> Which makes sense - the earlier loop adds t_ctid to the successor array, which
> we then query without checking if there still is such a tid on the page.


It's not quite so simple - I see now that the lp_valid check tries to prevent
that. However, it's not sufficient, because there is no guarantee that
lp_valid[nextoffnum] is initialized. Consider what happens if t_ctid of a heap
tuple points to beyond the end of the item array - lp_valid[nextoffnum] won't
be initialized.

Why are redirections now checked in two places? There already was a
ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's
the ItemIdIsRedirected() check in the "Update chain validation." loop as well
- and the output of that is confusing, because it'll just mention the target
of the redirect.


I also think it's not quite right that some of checks inside if
(ItemIdIsRedirected()) continue in case of corruption, others don't. While
there's a later continue, that means the corrupt tuples get added to the
predecessor array.  Similarly, in the non-redirect portion, the successor
array gets filled with corrupt tuples, which doesn't seem quite right to me.


A patch addressing some, but not all, of those is attached. With that I don't
see any crashes or false-positives anymore.

Greetings,

Andres Freund

diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c
index 663fb65dee6..43f93f7784a 100644
--- i/contrib/amcheck/verify_heapam.c
+++ w/contrib/amcheck/verify_heapam.c
@@ -486,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS)
 					report_corruption(&ctx,
 									  psprintf("line pointer redirection to unused item at offset %u",
 											   (unsigned) rdoffnum));
+				else if (ItemIdIsDead(rditem))
+					report_corruption(&ctx,
+									  psprintf("line pointer redirection to dead item at offset %u",
+											   (unsigned) rdoffnum));
+				else if (ItemIdIsRedirected(rditem))
+					report_corruption(&ctx,
+									  psprintf("line pointer redirection to another redirect at offset %u",
+											   (unsigned) rdoffnum));
 
 				/*
 				 * Record the fact that this line pointer has passed basic
@@ -564,16 +572,19 @@ verify_heapam(PG_FUNCTION_ARGS)
 			TransactionId next_xmin;
 			OffsetNumber nextoffnum = successor[ctx.offnum];
 
+			/* the current line pointer may not have a successor */
+			if (nextoffnum == InvalidOffsetNumber)
+				continue;
+
 			/*
-			 * The current line pointer may not have a successor, either
-			 * because it's not valid or because it didn't point to anything.
-			 * In either case, we have to give up.
-			 *
-			 * If the current line pointer does point to something, it's
-			 * possible that the target line pointer isn't valid. We have to
-			 * give up in that case, too.
+			 * The successor is located beyond the end of the line item array,
+			 * which can happen when the array is truncated.
 			 */
-			if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum])
+			if (nextoffnum > maxoff)
+				continue;
+
+			/* the successor is not valid, have to give up */
+			if (!lp_valid[nextoffnum])
 				continue;
 
 			/* We have two valid line pointers that we can examine. */
@@ -583,14 +594,8 @@ verify_heapam(PG_FUNCTION_ARGS)
 			/* Handle the cases where the current line pointer is a redirect. */
 			if (ItemIdIsRedirected(curr_lp))
 			{
-				/* Can't redirect to another redirect. */
-				if (ItemIdIsRedirected(next_lp))
-				{
-					report_corruption(&ctx,
-									  psprintf("redirected line pointer points to another redirected line pointer at offset %u",
-											   (unsigned) nextoffnum));
-					continue;
-				}
+				/* should have filtered out all the other cases */
+				Assert(ItemIdIsNormal(next_lp));
 
 				/* Can only redirect to a HOT tuple. */
 				next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);
@@ -602,8 +607,12 @@ verify_heapam(PG_FUNCTION_ARGS)
 				}
 
 				/*
-				 * Redirects are created by updates, so successor should be
-				 * the result of an update.
+				 * Redirects are created by HOT updates, so successor should
+				 * be the result of an HOT update.
+				 *
+				 * XXX: HeapTupleHeaderIsHeapOnly() should always imply
+				 * HEAP_UPDATED. This should be checked even when the tuple
+				 * isn't a target of a redirect.
 				 */
 				if ((next_htup->t_infomask & HEAP_UPDATED) == 0)
 				{
@@ -665,15 +674,18 @@ verify_heapam(PG_FUNCTION_ARGS)
 			 * tuple should be marked as a heap-only tuple. Conversely, if the
 			 * current tuple isn't marked as HOT-updated, then the next tuple
 			 * shouldn't be marked as a heap-only tuple.
+			 *
+			 * NB: Can't use HeapTupleHeaderIsHotUpdated() as it checks if
+			 * hint bits indicate xmin/xmax aborted.
 			 */
-			if (!HeapTupleHeaderIsHotUpdated(curr_htup) &&
+			if (!(curr_htup->t_infomask2 & HEAP_HOT_UPDATED) &&
 				HeapTupleHeaderIsHeapOnly(next_htup))
 			{
 				report_corruption(&ctx,
 								  psprintf("non-heap-only update produced a heap-only tuple at offset %u",
 										   (unsigned) nextoffnum));
 			}
-			if (HeapTupleHeaderIsHotUpdated(curr_htup) &&
+			if ((curr_htup->t_infomask2 & HEAP_HOT_UPDATED) &&
 				!HeapTupleHeaderIsHeapOnly(next_htup))
 			{
 				report_corruption(&ctx,

Re: HOT chain validation in verify_heapam()

Reply via email to