HOT chain validation in verify_heapam()
Hi, On Mon, Apr 4, 2022 at 11:46 PM Andres Freund wrote: > > I think there's a few more things that'd be good to check. For example > amcheck > doesn't verify that HOT chains are reasonable, which can often be spotted > looking at an individual page. Which is a bit unfortunate, given how many > bugs > we had in that area. > > Stuff to check around that: > - target of redirect has HEAP_ONLY_TUPLE, HEAP_UPDATED set > - In a valid ctid chain within a page (i.e. xmax = xmin): > - tuples have HEAP_UPDATED set > - HEAP_ONLY_TUPLE / HEAP_HOT_UPDATED matches across chains elements (I changed the subject because the attached patch is related to HOT chain validation). Please find attached the patch with the above idea of HOT chain's validation(within a Page) and a few more validation as below. * If the predecessor’s xmin is aborted or in progress, the current tuples xmin should be aborted or in progress respectively. Also, Both xmin must be equal. * If the predecessor’s xmin is not frozen, then-current tuple’s shouldn’t be either. * If the predecessor’s xmin is equal to the current tuple’s xmin, the current tuple’s cmin should be greater than the predecessor’s xmin. * If the current tuple is not HOT then its predecessor’s tuple must not be HEAP_HOT_UPDATED. * If the current Tuple is HOT then its predecessor’s tuple must be HEAP_HOT_UPDATED and vice-versa. * If xmax is 0, which means it's the last tuple in the chain, then it must not be HEAP_HOT_UPDATED. * If the current tuple is the last tuple in the HOT chain then the next tuple should not be HOT. I am looking into the process of adding the TAP test for these changes and finding a way to corrupt a page in the tap test. Will try to include these test cases in my Upcoming version of the patch. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From f3ae2f783a255109655cade770ebc74e01aef34c Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Thu, 18 Aug 2022 13:20:51 +0530 Subject: [PATCH v1] HOT chain validation in verify_heapam. --- contrib/amcheck/verify_heapam.c | 144 1 file changed, 144 insertions(+) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index c875f3e5a2..ae2c100de2 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,6 +399,7 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {0}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +434,9 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextTupOffnum; + HeapTupleHeader htup; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +473,13 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem); +if (!(HeapTupleHeaderIsHeapOnly(htup) && htup->t_infomask & HEAP_UPDATED)) + report_corruption(&ctx, + psprintf("Redirected Tuple at line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple", + (unsigned) rdoffnum)); + continue; } @@ -507,6 +518,139 @@ verify_heapam(PG_FUNCTION_ARGS) /* Ok, ready to check this next tuple */ check_tuple(&ctx); + /* + * Add line pointer offset to predecessor array. + * 1) If xmax is matching with xmin of next Tuple(reaching via its t_ctid). + * 2) If next tuple is in the same page. + * Raise corruption if: + * We have two tuples having same predecessor. + * + * We add offset to predecessor irrespective of transaction(t_xmin) status. We will + * do validation related to transaction status(and also all other validations) + * when we loop over predecessor array. + */ + nextTupOffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextTupOffnum != ctx.offnum) + { +TransactionId currTupXmax; +ItemId lp; + +if (predecessor[nextTupOffnum] != 0) +{ + report_corruption(&ctx, +psprintf("Tuple at offset %u is reachable from two or more updated tuple", + (unsigned) nextTupOffnum)); + continue; +} +currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr); +lp = PageGetItemId(ctx.page, nextTupOffnum); + +htup = (HeapTupleHeader) PageGetItem(ctx.page, lp); + +if (TransactionIdIsValid(currTupXmax) && + TransactionIdEquals(HeapTupleHeaderGetXmin(htup), currTupXmax)) +{ + pr
Re: HOT chain validation in verify_heapam()
Hi hackers, > Fixed. I noticed that this patch stuck a little and decided to take another look. It seems to be well written, covered with tests and my understanding is that all the previous feedback was accounted for. To your knowledge is there anything that prevents us from moving it to "Ready for Committer"? -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
On Thu, Jan 19, 2023 at 8:55 AM Aleksander Alekseev wrote: > I noticed that this patch stuck a little and decided to take another look. > > It seems to be well written, covered with tests and my understanding > is that all the previous feedback was accounted for. To your knowledge > is there anything that prevents us from moving it to "Ready for > Committer"? Thanks for taking a look, and for pinging the thread. I think that the handling of lp_valid[] in the loop that begins with "Loop over offset and populate predecessor array from all entries that are present in successor array" is very confusing. I think that lp_valid[] should be answering the question "is the line pointer basically sane?". That is, if it's a redirect, it needs to point to something within the line pointer array (and we also check that it must be an entry in the line pointer array that is used, which seems fine). If it's not a redirect, it needs to point to space that's entirely within the block, properly aligned, and big enough to contain a tuple. We determine the answers to all of these questions in the first loop, the one that starts with /* Perform tuple checks */. Nothing that happens in the second loop, where we populate the predecessor array, can reverse our previous conclusion that the line pointer is valid, so this loop shouldn't be resetting entries in lp_valid[] to false. The reason that it's doing so seems to be that it wants to use lp_valid[] to control the behavior of the third loop, where we perform checks against things that have entries in the predecessor array. As written, the code ensures that we always set lp_valid[nextoffnum] to false unless we set predecessor[nextoffnum] to a value other than InvalidOffsetNumber. But that is needlessly complex: the third loop doesn't need to look at lp_valid[] at all. It can just check whether predecessor[currentoffnum] is valid. If it is, perform checks. Otherwise, skip it. It seems to me that this would be significantly simpler. To put the above complaint another way, a variable shouldn't mean two different things depending on where you are in the function. Right now, at the end of the first loop, lp_valid[x] answers the question "is line pointer x basically valid?". But by the end of the second loop, it answers the question "is line pointer x valid and does it also have a valid predecessor?". That kind of definitional change is something to be avoided. The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin)) seems wrong to me. Shouldn't it be &&? Has this code been tested at all? It doesn't seem to have a test case. Some of these other errors don't, either. Maybe there's some that we can't easily test in an automated way, but we should test what we can. I guess maybe casual testing wouldn't reveal the problem here because of the recheck, but it's worrying to find logic that doesn't look right with no corresponding comments or test cases. Some error message kibitizing: psprintf("redirected tuple at line pointer offset %u is not heap only tuple", It seems to me that this should say "redirected line pointer pointing to a non-heap-only tuple at offset %u". There is no such thing as a redirected tuple -- and even if there were, what we have here is clearly a redirected line pointer. psprintf("redirected tuple at line pointer offset %u is not heap only tuple", And I think for the same reasons this one should say something like "redirected line pointer pointing to a non-heap-only tuple at offset %u". psprintf("redirected tuple at line pointer offset %u is not heap updated tuple", Possibly all of these would sound better with "points" rather than "pointing" -- if so, we'd need to change an existing message in the same way. And this one should say something like "redirected line pointer pointing to a tuple not produced by an update at offset %u". psprintf("tuple is root of chain but it is marked as heap-only tuple")); I think this would sound better if you deleted the word "it". I don't know whether it's worth arguing about -- it feels like we've argued too much already about this sort of thing -- but I am not very convinced by initializers like OffsetNumber predecessor[MaxOffsetNumber] = {InvalidOffsetNumber}. That style is only correct because InvalidOffsetNumber happens to be zero. If it were up to me, I'd use memset to clear the predecessor array. I would not bulk initialize sucessor and lp_valid but make sure that the first loop always sets them, possibly by having the top of the loop set them to InvalidOffsetNumber and false initially and then letting code later in the loop change the value, or possibly in some other way. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Fri, Jan 20, 2023 at 12:38 AM Robert Haas wrote: > > I think that the handling of lp_valid[] in the loop that begins with > "Loop over offset and populate predecessor array from all entries that > are present in successor array" is very confusing. I think that > lp_valid[] should be answering the question "is the line pointer > basically sane?". That is, if it's a redirect, it needs to point to > something within the line pointer array (and we also check that it > must be an entry in the line pointer array that is used, which seems > fine). If it's not a redirect, it needs to point to space that's > entirely within the block, properly aligned, and big enough to contain > a tuple. We determine the answers to all of these questions in the > first loop, the one that starts with /* Perform tuple checks */. > > Nothing that happens in the second loop, where we populate the > predecessor array, can reverse our previous conclusion that the line > pointer is valid, so this loop shouldn't be resetting entries in > lp_valid[] to false. The reason that it's doing so seems to be that it > wants to use lp_valid[] to control the behavior of the third loop, > where we perform checks against things that have entries in the > predecessor array. As written, the code ensures that we always set > lp_valid[nextoffnum] to false unless we set predecessor[nextoffnum] to > a value other than InvalidOffsetNumber. But that is needlessly > complex: the third loop doesn't need to look at lp_valid[] at all. It > can just check whether predecessor[currentoffnum] is valid. If it is, > perform checks. Otherwise, skip it. It seems to me that this would be > significantly simpler. > I was trying to use lp_valid as I need to identify the root of the HOT chain and we are doing validation on the root of the HOT chain when we loop over the predecessor array. if (nextoffnum == InvalidOffsetNumber || !lp_valid[ctx.offnum] || !lp_valid[nextoffnum]) { /* * Set lp_valid of nextoffnum to false if current tuple's * lp_valid is true. We don't add this to predecessor array as * it's of no use to validate tuple if its predecessor is * already corrupted but we need to identify all those tuple's * so that we can differentiate between all the cases of * missing offset in predecessor array, this will help in * validating the root of chain when we loop over predecessor * array. */ if (!lp_valid[ctx.offnum] && lp_valid[nextoffnum]) lp_valid[nextoffnum] = false; Was resetting lp_valid in the last patch because we don't add data to predecessor[] and while looping over the predecessor array we need to isolate (and identify) all cases of missing data in the predecessor array to exactly identify the root of HOT chain. One solution is to always add data to predecessor array while looping over successor array and then while looping over predecessor array we can continue for other validation "if (lp_valid [predecessor[currentoffnum]] && lp_valid[currentoffnum]" is true but in this case also our third loop will also look at lp_valid[]. To put the above complaint another way, a variable shouldn't mean two > different things depending on where you are in the function. Right > now, at the end of the first loop, lp_valid[x] answers the question > "is line pointer x basically valid?". But by the end of the second > loop, it answers the question "is line pointer x valid and does it > also have a valid predecessor?". That kind of definitional change is > something to be avoided. > > agree. > The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin)) > seems wrong to me. Shouldn't it be &&? Has this code been tested at > all? It doesn't seem to have a test case. Some of these other errors > don't, either. Maybe there's some that we can't easily test in an > automated way, but we should test what we can. I guess maybe casual > testing wouldn't reveal the problem here because of the recheck, but > it's worrying to find logic that doesn't look right with no > corresponding comments or test cases. > > This is totally my Mistake, apologies for that. I will fix this in my next patch. Regarding the missing test cases, I need one in-progress transaction for these test cases to be included in 004_verify_heapam.pl but I don't find a clear way to have an in-progress transaction(as per the design of 004_verify_heapam.pl ) that I can use in the test cases. I will be doing more research on a solution to add these missing test cases. > Some error message kibitizing: > > psprintf("redirected tuple at line pointer offset %u is not heap
Re: HOT chain validation in verify_heapam()
On Sun, Jan 22, 2023 at 10:19 AM Himanshu Upadhyaya wrote: > I was trying to use lp_valid as I need to identify the root of the HOT chain > and we are doing validation on the root of the HOT chain when we loop over > the predecessor array. > Was resetting lp_valid in the last patch because we don't add data to > predecessor[] and while looping over the predecessor array we need to isolate > (and identify) all cases of missing data in the predecessor array to exactly > identify the root of HOT chain. > One solution is to always add data to predecessor array while looping over > successor array and then while looping over predecessor array we can continue > for other validation "if (lp_valid [predecessor[currentoffnum]] && > lp_valid[currentoffnum]" is true but in this case also our third loop will > also look at lp_valid[]. I don't mind if the third loop looks at lp_valid if it has a reason to do that, but I don't think we should be resetting values from true to false. Once we know a line pointer to be valid, it doesn't stop being valid later because we found out some other thing about something else. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi Hackers, On Sun, Jan 22, 2023 at 8:48 PM Himanshu Upadhyaya < upadhyaya.himan...@gmail.com> wrote: > > The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin)) >> seems wrong to me. Shouldn't it be &&? Has this code been tested at >> all? It doesn't seem to have a test case. Some of these other errors >> don't, either. Maybe there's some that we can't easily test in an >> automated way, but we should test what we can. I guess maybe casual >> testing wouldn't reveal the problem here because of the recheck, but >> it's worrying to find logic that doesn't look right with no >> corresponding comments or test cases. >> >> This is totally my Mistake, apologies for that. I will fix this in my > next patch. Regarding the missing test cases, I need one in-progress > transaction for these test cases to be included in 004_verify_heapam.pl > but I don't find a clear way to have an in-progress transaction(as per the > design of 004_verify_heapam.pl ) that I can use in the test cases. I will > be doing more research on a solution to add these missing test cases. > >> >> I am trying to add test cases related to in-progress transactions in 004_verify_heapam.pl but I am not able to find a proper way to achieve this. We have a logic where we manually corrupt each tuple. Please refer to the code just after the below comment in 004_verify_heapam.pl "# Corrupt the tuples, one type of corruption per tuple. Some types of # corruption cause verify_heapam to skip to the next tuple without # performing any remaining checks, so we can't exercise the system properly if # we focus all our corruption on a single tuple." Before this we stop the node by "$node->stop;" and then only we progress to manual corruption. This will abort all running/in-progress transactions. So, if we create an in-progress transaction and comment "$node->stop;" then somehow all the code that we have for manual corruption does not work. I think it is required to stop the server and then only proceed for manual corruption? If this is the case then please suggest if there is a way to get an in-progress transaction that we can use for manual corruption. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi Himanshu, > Test cases are now part of this v6 patch. I believe the patch is in pretty good shape now. I'm going to change its status to "Ready for Committer" soon unless there are going to be any objections. -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
Hi, To start with: I think this is an extremely helpful and important feature. Both for checking production systems and for finding problems during development. > From 08fe01f5073c0a850541265494bb4a875bec7d3f Mon Sep 17 00:00:00 2001 > From: Himanshu Upadhyaya > Date: Fri, 30 Sep 2022 17:44:56 +0530 > Subject: [PATCH v6] Implement HOT chain validation in verify_heapam() > > Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev > > Discussion: > https://postgr.es/m/CAPF61jBBR2-iE-EmN_9v0hcQEfyz_17e5Lbb0%2Bu2%3D9ukA9sWmQ%40mail.gmail.com > --- > contrib/amcheck/verify_heapam.c | 207 ++ > src/bin/pg_amcheck/t/004_verify_heapam.pl | 192 ++-- > 2 files changed, 388 insertions(+), 11 deletions(-) > > diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c > index c875f3e5a2..007f7b2f37 100644 > --- a/contrib/amcheck/verify_heapam.c > +++ b/contrib/amcheck/verify_heapam.c > @@ -399,6 +399,9 @@ verify_heapam(PG_FUNCTION_ARGS) > for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) > { > OffsetNumber maxoff; > + OffsetNumber predecessor[MaxOffsetNumber] = {0}; > + OffsetNumber successor[MaxOffsetNumber] = {0}; > + boollp_valid[MaxOffsetNumber] = {false}; > > CHECK_FOR_INTERRUPTS(); > > @@ -433,6 +436,8 @@ verify_heapam(PG_FUNCTION_ARGS) > for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; >ctx.offnum = OffsetNumberNext(ctx.offnum)) > { > + OffsetNumber nextoffnum; > + > ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); > > /* Skip over unused/dead line pointers */ > @@ -469,6 +474,13 @@ verify_heapam(PG_FUNCTION_ARGS) > report_corruption(&ctx, > > psprintf("line pointer redirection to unused item at offset %u", > >(unsigned) rdoffnum)); > + > + /* > + * make entry in successor array, redirected > tuple will be > + * validated at the time when we loop over > successor array > + */ > + successor[ctx.offnum] = rdoffnum; > + lp_valid[ctx.offnum] = true; > continue; > } > > @@ -504,9 +516,197 @@ verify_heapam(PG_FUNCTION_ARGS) > /* It should be safe to examine the tuple's header, at > least */ > ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, > ctx.itemid); > ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); > + lp_valid[ctx.offnum] = true; > > /* Ok, ready to check this next tuple */ > check_tuple(&ctx); > + > + /* > + * Add the data to the successor array if next updated > tuple is in > + * the same page. It will be used later to generate the > + * predecessor array. > + * > + * We need to access the tuple's header to populate the > + * predecessor array. However the tuple is not > necessarily sanity > + * checked yet so delaying construction of predecessor > array until > + * all tuples are sanity checked. > + */ > + nextoffnum = > ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); > + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == > ctx.blkno && > + nextoffnum != ctx.offnum) > + { > + successor[ctx.offnum] = nextoffnum; > + } I don't really understand this logic - why can't we populate the predecessor array, if we can construct a successor entry? > + } > + > + /* > + * Loop over offset and populate predecessor array from all > entries > + * that are present in successor array. > + */ > + ctx.attnum = -1; > + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; > + ctx.offnum = OffsetNumberNext(ctx.offnum)) > +
Re: HOT chain validation in verify_heapam()
On Wed, Nov 9, 2022 at 2:08 PM Andres Freund wrote: > To start with: I think this is an extremely helpful and important > feature. Both for checking production systems and for finding problems during > development. +1. It's painful to get this in, in part because we now have to actually decide what the rules really are with total precision, including for all of the tricky edge cases. The exercise of writing this code should "keep us honest" about whether or not we really know what the invariants are, which is more than half the battle. > > + /* > > + * Add a line pointer offset to the predecessor array > > if xmax is > > + * matching with xmin of next tuple (reaching via its > > t_ctid). > > + * Prior to PostgreSQL 9.4, we actually changed the > > xmin to > > + * FrozenTransactionId > > I'm doubtful it's a good idea to try to validate the 9.4 case. The likelihood > of getting that right seems low and I don't see us gaining much by even > trying. This is the kind of comment that I'd usually agree with, but I disagree in this instance because of special considerations that apply to amcheck (or should IMV apply, at least). We're living in a world where we have to assume that the pre-9.4 format can occur in the field. If we can't get it right in amcheck, what chance do we have with other new code that tickles the same areas? I think that we need to support obsolescent heapam representations (both FrozenTransactionId and xvac) here on general principle. Besides, why not accept some small chance of getting this wrong? The worst that can happen is that we'll have a relatively benign bug. If we get it wrong then it's a short term problem, but also an opportunity to be less wrong in the future -- including in places where the consequences of being wrong are much more serious. > I doubt it is correct to enter this path with next_xmin == > FrozenTransactionId. This is following a ctid chain that we normally wouldn't > follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin condition. We should never see FrozenTransactionId in an xmax field (nor should it be returned by HeapTupleHeaderGetUpdateXid() under any circumstances). We can "freeze xmax" during VACUUM, but that actually means setting xmax to InvalidTransactionId (in rare cases it might mean replacing a Multi with a new Multi). The terminology in this area is a bit tricky. Anyway, it follows that we cannot expect "next_xmin == FrozenTransactionId", because that would mean that we'd called HeapTupleHeaderGetUpdateXid() which returned FrozenTransactionId -- an impossibility. (Maybe we should be checking that it really is an impossibility by checking the HeapTupleHeaderGetUpdateXid() return value, but that should be enough.) > I don't immediately see what prevents the frozen tuple being from an entirely > different HOT chain than the two tuples pointing to it. In my view it simply isn't possible for a valid HOT chain to be in this state in the first place. So by definition it wouldn't be a HOT chain. That would be a form of corruption, which is something that would probably be detected by noticing orphaned heap-only tuples (heap-only tuples not reachable from some root item on the same page, or some other intermediary heap-only tuple reachable from a root item). > Can't we have a an update chain that is e.g. > xmin 10, xmax 5 -> xmin 5, xmax invalid > > and a vacuum cutoff of 7? That'd preent the first tuple from being removed, > but would allow 5 to be frozen. I don't see how that can be possible. That is contradictory, and cannot possibly work, since it supposes a situation where every possible MVCC snapshot sees the update that generated the second/successor tuple as committed, while at the same time also somehow needing the original tuple to stay in place. Surely both things can never be true at the same time. I believe you're right that an update chain that looks like this one is possible. However, I don't think it's possible for OldestXmin/FreezeLimit to take on a value like that (i.e. a value that "skewers" the update chain like this, the value 7 from your example). We ought to be able to rely on an OldestXmin value that can never let such a situation emerge. Right? -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-09 15:03:39 -0800, Peter Geoghegan wrote: > > > + /* > > > + * Add a line pointer offset to the predecessor > > > array if xmax is > > > + * matching with xmin of next tuple (reaching via > > > its t_ctid). > > > + * Prior to PostgreSQL 9.4, we actually changed the > > > xmin to > > > + * FrozenTransactionId > > > > I'm doubtful it's a good idea to try to validate the 9.4 case. The > > likelihood > > of getting that right seems low and I don't see us gaining much by even > > trying. > > This is the kind of comment that I'd usually agree with, but I > disagree in this instance because of special considerations that apply > to amcheck (or should IMV apply, at least). We're living in a world > where we have to assume that the pre-9.4 format can occur in the > field. If we can't get it right in amcheck, what chance do we have > with other new code that tickles the same areas? I think that we need > to support obsolescent heapam representations (both > FrozenTransactionId and xvac) here on general principle. To me this is extending the problem into more areas rather than reducing it. I'd have *zero* confidence in any warnings that amcheck issued that involved <9.4 special cases. We've previously discussed adding pg_class column tracking the PG version that last scanned the whole relation. We really should get to that one of these decades :(. > Besides, why not accept some small chance of getting this wrong? The > worst that can happen is that we'll have a relatively benign bug. If > we get it wrong then it's a short term problem, but also an > opportunity to be less wrong in the future -- including in places > where the consequences of being wrong are much more serious. I think it doesn't just affect the < 9.4 path, but also makes us implement things differently for >= 9.4. And we loose some accuracy due to that. > > I doubt it is correct to enter this path with next_xmin == > > FrozenTransactionId. This is following a ctid chain that we normally > > wouldn't > > follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin > > condition. > > We should never see FrozenTransactionId in an xmax field (nor should > it be returned by HeapTupleHeaderGetUpdateXid() under any > circumstances). The field we check for FrozenTransactionId in the code I was quoting is the xmin of the follower tuple. We follow the chain if either cur->xmax == next->xmin or if next->xmin == FrozenTransactionId What I'm doubting is the FrozenTransactionId path. > Anyway, it follows that we cannot expect "next_xmin == > FrozenTransactionId", because that would mean that we'd called > HeapTupleHeaderGetUpdateXid() which returned FrozenTransactionId -- an > impossibility. (Maybe we should be checking that it really is an > impossibility by checking the HeapTupleHeaderGetUpdateXid() return > value, but that should be enough.) next_xmin is acquired via HeapTupleHeaderGetXmin(next_htup), not HeapTupleHeaderGetUpdateXid(cur_typ). > > I don't immediately see what prevents the frozen tuple being from an > > entirely > > different HOT chain than the two tuples pointing to it. > > In my view it simply isn't possible for a valid HOT chain to be in > this state in the first place. So by definition it wouldn't be a HOT > chain. We haven't done any visibility checking at this point and my whole point is that there's no guarantee that the pointed-to tuple actually belongs to the same hot chain, given that we follow as soon as "xmin == FrozenXid". So the pointing tuple might be an orphaned tuple. > That would be a form of corruption, which is something that > would probably be detected by noticing orphaned heap-only tuples > (heap-only tuples not reachable from some root item on the same page, > or some other intermediary heap-only tuple reachable from a root > item). You're saying that there's no way that there's a tuple pointing to another tuple on the same page, which the pointed-to tuple belonging to a different HOT chain? I'm fairly certain that that at least used to be possible, and likely is still possible. Isn't that pretty much what you'd expect to happen if there's concurrent aborts leading to abandoned hot chains? > > Can't we have a an update chain that is e.g. > > xmin 10, xmax 5 -> xmin 5, xmax invalid > > > > and a vacuum cutoff of 7? That'd preent the first tuple from being removed, > > but would allow 5 to be frozen. > > I don't see how that can be possible. That is contradictory, and > cannot possibly work, since it supposes a situation where every > possible MVCC snapshot sees the update that generated the > second/successor tuple as committed, while at the same time also > somehow needing the original tuple to stay in place. Surely both > things can never be true at the same time. The xmin horizon is very coarse grained. Just because it is 7 doesn't mean that xid 10 is still runni
Re: HOT chain validation in verify_heapam()
On Wed, Nov 9, 2022 at 4:15 PM Andres Freund wrote: > To me this is extending the problem into more areas rather than reducing > it. I'd have *zero* confidence in any warnings that amcheck issued that > involved <9.4 special cases. Maybe you would at first. But then we get to learn what mistake we made. And then we get to fix the bug, and get to know better next time around. Next time (or maybe the time after that) you really will have confidence in amcheck, because it'll have been battle tested at that point. For something like this that seems like the best way to go. Either we support an on-disk format that includes legacy representations, or we don't. > I think it doesn't just affect the < 9.4 path, but also makes us implement > things differently for >= 9.4. And we loose some accuracy due to that. I don't follow. How so? > The field we check for FrozenTransactionId in the code I was quoting is the > xmin of the follower tuple. We follow the chain if either > cur->xmax == next->xmin or if next->xmin == FrozenTransactionId > > What I'm doubting is the FrozenTransactionId path. AFAICT we shouldn't be treating it as part of the same HOT chain. Only the first heap-only tuple in a valid HOT chain should have an xmin that is FrozenTransactionId (and only with an LP_REDIRECT root item, I think). Otherwise the "prev_xmax==xmin" HOT chain traversal logic used in places like heap_hot_search_buffer() simply won't work. > > In my view it simply isn't possible for a valid HOT chain to be in > > this state in the first place. So by definition it wouldn't be a HOT > > chain. > > We haven't done any visibility checking at this point and my whole point is > that there's no guarantee that the pointed-to tuple actually belongs to the > same hot chain, given that we follow as soon as "xmin == FrozenXid". So the > pointing tuple might be an orphaned tuple. But an orphaned heap-only tuple shouldn't ever have xmin==FrozenTransactionId to begin with. The only valid source of orphaned heap-only tuples is transaction aborts. Aborted XID xmin fields are never frozen. > > That would be a form of corruption, which is something that > > would probably be detected by noticing orphaned heap-only tuples > > (heap-only tuples not reachable from some root item on the same page, > > or some other intermediary heap-only tuple reachable from a root > > item). > > You're saying that there's no way that there's a tuple pointing to another > tuple on the same page, which the pointed-to tuple belonging to a different > HOT chain? Define "belongs to a different HOT chain". You can get orphaned heap-only tuples, obviously. But only due to transaction abort. Any page with an orphaned heap-only tuple that is not consistent with it being from an earlier abort is a corrupt heap page. > > > Can't we have a an update chain that is e.g. > > > xmin 10, xmax 5 -> xmin 5, xmax invalid > > > > > > and a vacuum cutoff of 7? That'd preent the first tuple from being > > > removed, > > > but would allow 5 to be frozen. > > > > I don't see how that can be possible. That is contradictory, and > > cannot possibly work, since it supposes a situation where every > > possible MVCC snapshot sees the update that generated the > > second/successor tuple as committed, while at the same time also > > somehow needing the original tuple to stay in place. Surely both > > things can never be true at the same time. > > The xmin horizon is very coarse grained. Just because it is 7 doesn't mean > that xid 10 is still running. All it means that one backend or slot has an > xmin or xid of 7. Of course that's true. But I wasn't talking about the general case -- I was talking about your "xmin 10, xmax 5 -> xmin 5, xmax invalid" update chain case specifically, with its "skewered" OldestXmin of 7. > s1: acquire xid 5 > s2: acquire xid 7 > s3: acquire xid 10 > > s3: insert > s3: commit > s1: update > s1: commit > > s2: get a new snapshot, xmin 7 (or just hold no snapshot) > > At this point the xmin horizon is 7. The first tuple's xmin can't be > frozen. The second tuple's xmin can be. Basically what I'm saying about OldestXmin is that it ought to "work transitively", from the updater to the inserter that inserted the now-updated tuple. That is, the OldestXmin should either count both XIDs that appear in the update chain, or neither XID. > > I believe you're right that an update chain that looks like this one > > is possible. However, I don't think it's possible for > > OldestXmin/FreezeLimit to take on a value like that (i.e. a value that > > "skewers" the update chain like this, the value 7 from your example). > > We ought to be able to rely on an OldestXmin value that can never let > > such a situation emerge. Right? > > I don't see anything that'd guarantee that currently, nor do immediately see a > possible way to get there. > > What do you think prevents such an OldestXmin? ComputeXidHorizons() computes VACUUM's OldestXmin (actually it computes h->data_oldest_no
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-09 17:32:46 -0800, Peter Geoghegan wrote: > > The xmin horizon is very coarse grained. Just because it is 7 doesn't mean > > that xid 10 is still running. All it means that one backend or slot has an > > xmin or xid of 7. > > Of course that's true. But I wasn't talking about the general case -- > I was talking about your "xmin 10, xmax 5 -> xmin 5, xmax invalid" > update chain case specifically, with its "skewered" OldestXmin of 7. The sequence below produces such an OldestXmin: > > s1: acquire xid 5 > > s2: acquire xid 7 > > s3: acquire xid 10 > > > > s3: insert > > s3: commit > > s1: update > > s1: commit > > > > s2: get a new snapshot, xmin 7 (or just hold no snapshot) > > > > At this point the xmin horizon is 7. The first tuple's xmin can't be > > frozen. The second tuple's xmin can be. > > Basically what I'm saying about OldestXmin is that it ought to "work > transitively", from the updater to the inserter that inserted the > now-updated tuple. That is, the OldestXmin should either count both > XIDs that appear in the update chain, or neither XID. It doesn't work that way. The above sequence shows one case where it doesn't. > > > I believe you're right that an update chain that looks like this one > > > is possible. However, I don't think it's possible for > > > OldestXmin/FreezeLimit to take on a value like that (i.e. a value that > > > "skewers" the update chain like this, the value 7 from your example). > > > We ought to be able to rely on an OldestXmin value that can never let > > > such a situation emerge. Right? > > > > I don't see anything that'd guarantee that currently, nor do immediately > > see a > > possible way to get there. > > > > What do you think prevents such an OldestXmin? > > ComputeXidHorizons() computes VACUUM's OldestXmin (actually it > computes h->data_oldest_nonremovable values) by scanning the proc > array. And counts PGPROC.xmin from each running xact. So ultimately > the inserter and updater are tied together by that. It's either an > OldestXmin that includes both, or one that includes neither. > Here are some facts that I think we both agree on already: > > 1. It is definitely possible to have an update chain like your "xmin > 10, xmax 5 -> xmin 5, xmax invalid" example. > > 2. It is definitely not possible to "freeze xmax" by setting its value > to FrozenTransactionId or something similar -- there is simply no code > path that can do that, and never has been. (The term "freeze xmax" is > a bit ambiguous, though it usually means set xmax to > InvalidTransactionId.) > > 3. There is no specific reason to believe that there is a live bug here. I don't think there's a live bug here. I think the patch isn't dealing correctly with that issue though. > Putting all 3 together: doesn't it seem quite likely that the way that > we compute OldestXmin is the factor that prevents "skewering" of an > update chain? What else could possibly be preventing corruption here? > (Theoretically it might never have been discovered, but that seems > pretty hard to believe.) I don't see how that follows. The existing code is just ok with that. In fact we have explicit code trying to exploit this: /* * If the DEAD tuple is at the end of the chain, the entire chain is * dead and the root line pointer can be marked dead. Otherwise just * redirect the root to the correct chain member. */ if (i >= nchain) heap_prune_record_dead(prstate, rootoffnum); else heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]); Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
Hi, And thinking about it, it'd be quite bad if the horizon worked that way. You can easily construct a workload where every single xid would "skewer" some chain, never allowing the horizon to be raised. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: HOT chain validation in verify_heapam()
On Wed, Nov 9, 2022 at 5:46 PM Andres Freund wrote: > > Putting all 3 together: doesn't it seem quite likely that the way that > > we compute OldestXmin is the factor that prevents "skewering" of an > > update chain? What else could possibly be preventing corruption here? > > (Theoretically it might never have been discovered, but that seems > > pretty hard to believe.) > > I don't see how that follows. The existing code is just ok with that. My remarks about "3 facts we agree on" were not intended to be a watertight argument. More like: what else could it possibly be that prevents problems in practice, if not *something* to do with how we compute OldestXmin? Leaving aside the specifics of how OldestXmin is computed for a moment: what alternative explanation is even remotely plausible? There just aren't that many moving parts involved here. The idea that we can ever freeze the xmin of a successor tuple/version from an update chain without also pruning away earlier versions of the same chain is wildly implausible. It sounds totally contradictory. > In fact > we have explicit code trying to exploit this: > > /* > * If the DEAD tuple is at the end of the chain, the entire > chain is > * dead and the root line pointer can be marked dead. > Otherwise just > * redirect the root to the correct chain member. > */ > if (i >= nchain) > heap_prune_record_dead(prstate, rootoffnum); > else > heap_prune_record_redirect(prstate, rootoffnum, > chainitems[i]); I don't see why this code is relevant. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
On Wed, Nov 9, 2022 at 6:10 PM Andres Freund wrote: > And thinking about it, it'd be quite bad if the horizon worked that way. You > can easily construct a workload where every single xid would "skewer" some > chain, never allowing the horizon to be raised. Your whole scenario is one involving a insert of a tuple by XID 10, which is then updated by XID 5 -- a lower XID. Obviously that's possible, but it's relatively rare. I have to imagine that the vast majority of updates affect tuples inserted by an XID before the XID of the updater. My use of the term "skewer" was limited to updates that look like that. So I don't know what you mean about never allowing the horizon to be raised. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-09 18:35:12 -0800, Peter Geoghegan wrote: > On Wed, Nov 9, 2022 at 6:10 PM Andres Freund wrote: > > And thinking about it, it'd be quite bad if the horizon worked that way. > > You can easily construct a workload where every single xid would "skewer" > > some chain, never allowing the horizon to be raised. > > Your whole scenario is one involving a insert of a tuple by XID 10, > which is then updated by XID 5 -- a lower XID. Obviously that's > possible, but it's relatively rare. I have to imagine that the vast > majority of updates affect tuples inserted by an XID before the XID of > the updater. > My use of the term "skewer" was limited to updates that look like > that. So I don't know what you mean about never allowing the horizon > to be raised. You don't need it to happen all the time, it's enough when it happens occasionally, since that'd "block" the whole range of xids between. So you you'd just need occasional transactions to prevent the horizon from increasing. Anyway, I played a bit around with this. It's hard to hit, not because we somehow won't choose such a horizon, but because we'll commonly prune the earlier tuple version away due to xmax being old enough. It *is* possible to hit, if the horizon increases between the two tuple version checks (e.g. if there's another tuple inbetween that we check the visibility of). I think there's another way it can happen in older cluster, but don't want to spend the time to verify it. Either way, we can't error out in this situation - there's nothing invalid about it. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Mon, Nov 14, 2022 at 9:38 AM Andres Freund wrote: > Anyway, I played a bit around with this. It's hard to hit, not because we > somehow won't choose such a horizon, but because we'll commonly prune the > earlier tuple version away due to xmax being old enough. That must be a bug, then. Since, as I said, I can't see how it could possibly be okay to freeze an xmin of tuple in a HOT chain without also making sure that it has no earlier versions left behind. If there are earlier versions that we have to go through to get to the frozen-xmin tuple (not just an LP_REDIRECT), we're going to break the HOT chain traversal logic in code like heap_hot_search_buffer in a rather obvious way. HOT chain traversal logic code will interpret the frozen xmin from the tuple as FrozenTransactionId (not as its raw xmin). So traversal is just broken in this scenario. > It *is* possible to > hit, if the horizon increases between the two tuple version checks (e.g. if > there's another tuple inbetween that we check the visibility of). I suppose that that's the detail that "protects" us, then -- that would explain the apparent lack of problems in the field. Your sequence requires 3 sessions, not just 2. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
On Wed, Nov 9, 2022 at 5:08 PM Andres Freund wrote: > I don't really understand this logic - why can't we populate the predecessor > array, if we can construct a successor entry? This whole thing was my idea, so let me try to explain. I think the naming and comments need work, but I believe the fundamental idea may be sound. successor[x] = y means that when we looked at line pointer x, we saw that it was either a redirect to line pointer y, or else it had storage and the associated tuple's CTID pointed to line pointer y. At this point, we do not have any idea whether y is at all sane, nor we do we know anything about which of x and y is larger. Furthermore, it is possible that successor[x] = successor[x'] since the page might be corrupted and we haven't checked otherwise. predecessor[y] = x means that successor[x] = y but in addition we've checked that y is sane, and that x.xmax=y.xmin. If there are multiple tuples for which these conditions hold, we've issued complaints about all but one and entered the last into the predecessor array. An earlier version of the algorithm had only a predecessor[] array but the code to try to populate in a single pass was complex and looked ugly and error-prone. To set a predecessor entry in one step, we had to sanity-check each of x and y but only if that hadn't yet been done, which was quite awkward. For example, imagine line pointers 1 and 2 both point to 3, and line pointer 3 points backward to line pointer 1 (because of corruption, since it shouldn't ever be circular). We can't reason about the relationship between 1 and 3 without first making sure that each one is sane in isolation. But if we do that when we're at line pointer 1, then when we get to 2, we need to check 2 but don't need to recheck 3, and when we get to 3 we need to recheck neither 3 nor 1. This algorithm lets us run through and do all the basic sanity checks first, while populating the successor array, and then check relationships in later stages. Part of the motivation here is also driven by trying to figure out how to word the complaints. We have a dedicated field in the amcheck that can hold one tuple offset or the other, but if we're checking the relationships between tuples, what do we put there? I feel it will be easiest to understand if we put the offset of the older tuple in that field and then phrase the complaint as the patch does, e.g.: tuple with uncommitted xmin %u was updated to produce a tuple at offset %u with differing xmin %u We could flip that around and put the newer tuple offset in the field and then phrase the complaint the other way around, but it seems a bit awkward, e.g.: tuple with uncommited xmin %u at offset %u was updated to produce this tuple with differing xmin %u I think if we did do it that way around (and figured out how to phrase the messages) we might not need both arrays any more (though I'm not positive about that). It seems hard to avoid needing at least one, else you can't explicitly notice two converging HOT chains, which seems like a case we probably ought to notice. But the "to produce this tuple" phrasing is just confusing, I think, and removing "this" doesn't help. You need to somehow get people to understand that the offset they probably saw in another field is the second tuple, not the first one. Maybe: xmin %u does not match xmax %u of prior tuple at offset %u Hmm. Anyway, whether it was the right idea or not, the desire to have the earlier tuple be the focus of the error messages was part of the motivation here. > I'm doubtful it's a good idea to try to validate the 9.4 case. The likelihood > of getting that right seems low and I don't see us gaining much by even > trying. I agree with Peter. We have to try to get that case right. If we can eventually eliminate it as a valid case by some mechanism, hooray. But in the meantime we have to deal with it as best we can. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-14 09:50:48 -0800, Peter Geoghegan wrote: > On Mon, Nov 14, 2022 at 9:38 AM Andres Freund wrote: > > Anyway, I played a bit around with this. It's hard to hit, not because we > > somehow won't choose such a horizon, but because we'll commonly prune the > > earlier tuple version away due to xmax being old enough. > > That must be a bug, then. Since, as I said, I can't see how it could > possibly be okay to freeze an xmin of tuple in a HOT chain without > also making sure that it has no earlier versions left behind. Hard to imagine us having bugs in this code. Ahem. I really wish I knew of a reasonably complex way to utilize coverage guided fuzzing on heap pruning / vacuuming. I wonder if we ought to add an error check to heap_prepare_freeze_tuple() against this scenario. We're working towards being more aggressive around freezing, which will make it more likely to hit corner cases around this. > If there are earlier versions that we have to go through to get to the > frozen-xmin tuple (not just an LP_REDIRECT), we're going to break the HOT > chain traversal logic in code like heap_hot_search_buffer in a rather > obvious way. > > HOT chain traversal logic code will interpret the frozen xmin from the > tuple as FrozenTransactionId (not as its raw xmin). So traversal is > just broken in this scenario. > Which'd still be fine if the whole chain were already "fully dead". One way I think this can happen is <= PG 13's HEAPTUPLE_DEAD handling in lazy_scan_heap(). I now suspect that the seemingly-odd "We will advance past RECENTLY_DEAD tuples just in case there's a DEAD one after them;" logic in heap_prune_chain() might be required for correctness. Which IIRC we'd been talking about getting rid elsewhere? At least as long as correctness requires not ending up in endless loops - indeed. We end up with lazy_scan_prune() endlessly retrying. Without a chance to interrupt. Shouldn't there at least be a CFI somewhere? The attached isolationtester spec has a commented out test for this. I think the problem partially is that the proposed verify_heapam() code is too "aggressive" considering things to be part of the same hot chain - which then means we have to be very careful about erroring out. The attached isolationtester test triggers: "unfrozen tuple was updated to produce a tuple at offset %u which is frozen" "updated version at offset 3 is also the updated version of tuple at offset %u" Despite there afaict not being any corruption. Worth noting that this happens regardless of hot/non-hot updates being used (uncomment s3ci to see). > > It *is* possible to > > hit, if the horizon increases between the two tuple version checks (e.g. if > > there's another tuple inbetween that we check the visibility of). > > I suppose that that's the detail that "protects" us, then -- that > would explain the apparent lack of problems in the field. Your > sequence requires 3 sessions, not just 2. One important protection right now is that vacuumlazy.c uses a more pessimistic horizon than pruneheap.c. Even if visibility determinations within pruning recompute the horizon, vacuumlazy.c won't freeze based on the advanced horizon. I don't quite know where we we'd best put a comment with a warning about this fact. Greetings, Andres Freund diff --git c/src/test/isolation/expected/skewer.out i/src/test/isolation/expected/skewer.out new file mode 100644 index 000..e69de29bb2d diff --git c/src/test/isolation/specs/skewer.spec i/src/test/isolation/specs/skewer.spec new file mode 100644 index 000..f9fa6ea8aa1 --- /dev/null +++ i/src/test/isolation/specs/skewer.spec @@ -0,0 +1,74 @@ +setup +{ +DROP TABLE IF EXISTS t; +DROP EXTENSION IF EXISTS amcheck; +DROP EXTENSION IF EXISTS pageinspect; + +CREATE EXTENSION amcheck; +CREATE EXTENSION pageinspect; +CREATE TABLE t(key int, d int); +} + +session s1 +step s1b {BEGIN; SELECT txid_current();} +step s1u1 {UPDATE t SET d = d + 1 WHERE key = 1; } +step s1i3 {INSERT INTO t VALUES (3, 1);} +step s1c {COMMIT;} + +session s2 +step s2b {BEGIN; SELECT txid_current();} +step s2c {COMMIT;} + +session s3 +step s3b {BEGIN; SELECT txid_current();} +step s3i1 {INSERT INTO t VALUES (1, 1);} +step s3i2 {INSERT INTO t VALUES (2, 1);} +step s3c {COMMIT;} +step s3a {ABORT;} +step s3u1 {UPDATE t SET d = d + 1 WHERE key = 1; } +step s3u2 {UPDATE t SET d = d + 1 WHERE key = 2; } + +step s3ci { CREATE INDEX ON t(d)} + +session s4 +step s4show { SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, mask.raw_flags, mask.combined_flags, t_hoff, t_bits FROM heap_page_items(get_raw_page('t', 0)), heap_tuple_infomask_flags(t_infomask, t_infomask2) AS mask;} +step s4verify { SELECT * FROM verify_heapam('t'); SELECT pg_backend_pid();} +step s4vac { VACUUM (VERBOSE) t; } +step s4vacfreeze { VACUUM (VERBOSE, FREEZE) t; } + +session s5 +step s5pin { BEGIN; DECLARE foo CURSOR FOR SELECT * FROM t; FETCH FROM foo;} + +
Re: HOT chain validation in verify_heapam()
On Mon, Nov 14, 2022 at 11:28 AM Robert Haas wrote: > Part of the motivation here is also driven by trying to figure out how > to word the complaints. We have a dedicated field in the amcheck that > can hold one tuple offset or the other, but if we're checking the > relationships between tuples, what do we put there? I feel it will be > easiest to understand if we put the offset of the older tuple in that > field and then phrase the complaint as the patch does, e.g.: That makes a lot of sense to me, and reminds me of how things work in verify_nbtree.c. At a high level verify_nbtree.c works by doing a breadth-first traversal of the tree. The search makes each distinct page the "target page" exactly once. The target page is the clear focal point for everything -- almost every complaint about corruption frames the problem as a problem in the target page. We consistently describe things in terms of their relationship with the target page, so under this scheme everybody is...on the same page (ahem). Being very deliberate about that probably had some small downsides. Maybe it would have made a little more sense to word certain particular corruption report messages in a way that placed blame on "ancillary" pages like sibling/child pages (not the target page) as problems in the ancillary page itself, not the target page. This still seems like the right trade-off -- the control flow can be broken up into understandable parts once you understand that the target page is the thing that we use to describe every other page. > > I'm doubtful it's a good idea to try to validate the 9.4 case. The > > likelihood > > of getting that right seems low and I don't see us gaining much by even > > trying. > > I agree with Peter. We have to try to get that case right. If we can > eventually eliminate it as a valid case by some mechanism, hooray. But > in the meantime we have to deal with it as best we can. Practiced intellectual humility seems like the way to go here. On some level I suspect that we'll have problems in exactly the places that we don't look for them. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-14 14:27:54 -0500, Robert Haas wrote: > On Wed, Nov 9, 2022 at 5:08 PM Andres Freund wrote: > > I don't really understand this logic - why can't we populate the predecessor > > array, if we can construct a successor entry? > > This whole thing was my idea, so let me try to explain. I think the > naming and comments need work, but I believe the fundamental idea may > be sound. > > successor[x] = y means that when we looked at line pointer x, we saw > that it was either a redirect to line pointer y, or else it had > storage and the associated tuple's CTID pointed to line pointer y. > At this point, we do not have any idea whether y is at all sane, nor we do > we know anything about which of x and y is larger. What do you mean with "larger" here? > Furthermore, it is > possible that successor[x] = successor[x'] since the page might be corrupted > and we haven't checked otherwise. > > predecessor[y] = x means that successor[x] = y but in addition we've > checked that y is sane, and that x.xmax=y.xmin. If there are multiple > tuples for which these conditions hold, we've issued complaints about > all but one and entered the last into the predecessor array. As shown by the isolationtester test I just posted, this doesn't quite work right now. Probably fixable. I don't think we can follow non-HOT ctid chains if they're older than the xmin horizon, including all cases of xmin being frozen. There's just nothing guaranteeing that the tuples are actually "related". It seems like we should do a bit more validation within a chain of tuples. E.g. that no live tuple can follow an !DidCommit xmin? > > I'm doubtful it's a good idea to try to validate the 9.4 case. The > > likelihood > > of getting that right seems low and I don't see us gaining much by even > > trying. > > I agree with Peter. We have to try to get that case right. If we can > eventually eliminate it as a valid case by some mechanism, hooray. But > in the meantime we have to deal with it as best we can. I now think that the 9.4 specific reasoning is bogus in the first place. The patch says: * Add a line pointer offset to the predecessor array if xmax is * matching with xmin of next tuple (reaching via its t_ctid). * Prior to PostgreSQL 9.4, we actually changed the xmin to * FrozenTransactionId so we must add offset to predecessor * array(irrespective of xmax-xmin matching) if updated tuple xmin * is frozen, so that we can later do validation related to frozen * xmin. Raise corruption if we have two tuples having the same * predecessor. but it's simply not correct to iterate through xmin=FrozenTransactionId - as shown in the isolationtester test. And that's unrelated to 9.4, because we couldn't rely on the raw xmin value either, because even if they match, they could be from different epochs. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Mon, Nov 14, 2022 at 12:58 PM Andres Freund wrote: > I wonder if we ought to add an error check to heap_prepare_freeze_tuple() > against this scenario. We're working towards being more aggressive around > freezing, which will make it more likely to hit corner cases around this. In theory my work on freezing doesn't change the basic rules about how freezing works, and doesn't know anything about HOT, so it shouldn't introduce any new risk. Even still, I agree that this seems like something to do in the scope of the same work, just in case. Plus it's just important. It would be possible to have exhaustive heap_prepare_freeze_tuple checks in assert-only builds -- we can exhaustively check the final array of prepared freeze plans that we collected for a given heap page, and check it against the page exhaustively right before freezing is executed. That's not perfect, but it would be a big improvement. Right now I am not entirely sure what I would need to check in such a mechanism. I am legitimately unsure of what the rules are in light of this new information. > Which'd still be fine if the whole chain were already "fully dead". One way I > think this can happen is <= PG 13's HEAPTUPLE_DEAD handling in > lazy_scan_heap(). You mean the tupgone thing? Perhaps it would have avoided this particular problem, or one like it. But it had so many other problems that I don't see why it matters now. > At least as long as correctness requires not ending up in endless loops - > indeed. We end up with lazy_scan_prune() endlessly retrying. Without a chance > to interrupt. Shouldn't there at least be a CFI somewhere? Probably, but that wouldn't change the fact that it's a bug when this happens. Obviously it's more important to avoid such a bug than it is to ameliorate it. > I think the problem partially is that the proposed verify_heapam() code is too > "aggressive" considering things to be part of the same hot chain - which then > means we have to be very careful about erroring out. > > The attached isolationtester test triggers: > "unfrozen tuple was updated to produce a tuple at offset %u which is frozen" > "updated version at offset 3 is also the updated version of tuple at offset > %u" > > Despite there afaict not being any corruption. Worth noting that this happens > regardless of hot/non-hot updates being used (uncomment s3ci to see). Why don't you think that there is corruption? The terminology here is tricky. It's possible that the amcheck patch makes a very good point here, even without necessarily complaining about a state that leads to obviously wrong behavior. It's also possible that there really is wrong behavior, at least in my mind -- I don't know what your remarks about no corruption are really based on. I feel like I'm repeating myself more than I should, but: why isn't it as simple as "HOT chain traversal logic is broken by frozen xmin in the obvious way, therefore all bets are off"? Maybe you're right about the proposed new functionality getting things wrong with your adversarial isolation test, but I seem to have missed the underlying argument. Are you just talking about regular update chains here, not HOT chains? Something else? > One important protection right now is that vacuumlazy.c uses a more > pessimistic horizon than pruneheap.c. Even if visibility determinations within > pruning recompute the horizon, vacuumlazy.c won't freeze based on the advanced > horizon. I don't quite know where we we'd best put a comment with a warning > about this fact. We already have comments discussing the relationship between OldestXmin and vistest (as well as rel_pages) in heap_vacuum_rel(). That seems like the obvious place to put something like this, at least to me. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-14 14:13:10 -0800, Peter Geoghegan wrote: > > I think the problem partially is that the proposed verify_heapam() code is > > too > > "aggressive" considering things to be part of the same hot chain - which > > then > > means we have to be very careful about erroring out. > > > > The attached isolationtester test triggers: > > "unfrozen tuple was updated to produce a tuple at offset %u which is frozen" > > "updated version at offset 3 is also the updated version of tuple at offset > > %u" > > > > Despite there afaict not being any corruption. Worth noting that this > > happens > > regardless of hot/non-hot updates being used (uncomment s3ci to see). > > Why don't you think that there is corruption? I looked at the state after the test and the complaint is bogus. It's caused by the patch ignoring the cur->xmax == next->xmin condition if next->xmin is FrozenTransactionId. The isolationtester test creates a situation where that leads to verify_heapam() considering tuples to be part of the same chain even though they aren't. > Because I feel like I'm repeating myself more than I should, but: why isn't > it as simple as "HOT chain traversal logic is broken by frozen xmin in the > obvious way, therefore all bets are off"? Because that's irrelevant for the testcase and a good number of my concerns. > Maybe you're right about the proposed new functionality getting things wrong > with your adversarial isolation test, but I seem to have missed the > underlying argument. Are you just talking about regular update chains here, > not HOT chains? Something else? As I noted, it happens regardless of HOT being used or not. The tuples aren't part of the same chain, but the patch treats them as if they were. The reason the patch considers them to be part of the same chain is precisely the FrozenTransactionId condition I was worried about. Just because t_ctid points to a tuple on the same page and the next tuple has xmin == FrozenTransactionId, doesn't mean they're part of the same chain. Once you encounter a tuple with a frozen xmin you simply cannot assume it's part of the chain you've been following. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Mon, Nov 14, 2022 at 2:33 PM Andres Freund wrote: > > Why don't you think that there is corruption? > > I looked at the state after the test and the complaint is bogus. It's caused > by the patch ignoring the cur->xmax == next->xmin condition if next->xmin is > FrozenTransactionId. The isolationtester test creates a situation where that > leads to verify_heapam() considering tuples to be part of the same chain even > though they aren't. Having looked at your isolation test in more detail, it seems like you were complaining about a fairly specific and uncontroversial shortcoming in the patch itself: it complains about a newly inserted tuple that gets frozen. It thinks that the inserted tuple is part of the same HOT chain (or at least the same update chain) as other tuples on the same heap page, when in fact it's just some wholly unrelated tuple/logical row. It seems as if the new amcheck code doesn't get all the details of validating HOT chains right, and so jumps the gun here, reporting corruption based on a faulty assumption that the frozen-xmin tuple is in any way related to the chain. I was confused about whether we were talking about this patch, bugs in HEAD, or both. > > Maybe you're right about the proposed new functionality getting things wrong > > with your adversarial isolation test, but I seem to have missed the > > underlying argument. Are you just talking about regular update chains here, > > not HOT chains? Something else? > > As I noted, it happens regardless of HOT being used or not. The tuples aren't > part of the same chain, but the patch treats them as if they were. The reason > the patch considers them to be part of the same chain is precisely the > FrozenTransactionId condition I was worried about. Just because t_ctid points > to a tuple on the same page and the next tuple has xmin == > FrozenTransactionId, doesn't mean they're part of the same chain. Once you > encounter a tuple with a frozen xmin you simply cannot assume it's part of the > chain you've been following. Got it. That seems relatively straightforward and uncontroversial to me, because it's just how code like heap_hot_search_buffer (HOT chain traversal code) works already. The patch got some of those details wrong, and should be revised. What does this have to tell us, if anything, about the implications for code on HEAD? I don't see any connection between this problem and the possibility of a live bug on HEAD involving freezing later tuple versions in a HOT chain, leaving earlier non-frozen versions behind to break HOT chain traversal code. Should I have noticed such a connection? -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-14 14:42:16 -0800, Peter Geoghegan wrote: > What does this have to tell us, if anything, about the implications > for code on HEAD? Nothing really test I sent (*) - I wanted to advance the discussion about the patch being wrong as-is in a concrete way. This logic was one of my main complaints in https://postgr.es/m/20221109220803.t25sosmfvkeglhy4%40awork3.anarazel.de and you went in a very different direction in your reply. Hence a test showcasing the issue. Note that neither of my complaints around FrozenTransactionId in that email actually require that HOT is involved. The code in the patch doesn't differentiate between hot and not-hot until later. > I don't see any connection between this problem and the possibility of a > live bug on HEAD involving freezing later tuple versions in a HOT chain, > leaving earlier non-frozen versions behind to break HOT chain traversal > code. Should I have noticed such a connection? No. Greetings, Andres Freund (*) the commented out test perhaps is an argument for expanding the comment nd "We will advance past RECENTLY_DEAD tuples just in case there's a DEAD after them;" in heap_prune_chain()
Re: HOT chain validation in verify_heapam()
On Mon, Nov 14, 2022 at 2:58 PM Andres Freund wrote: > On 2022-11-14 14:42:16 -0800, Peter Geoghegan wrote: > > What does this have to tell us, if anything, about the implications > > for code on HEAD? > > Nothing really test I sent (*) - I wanted to advance the discussion about the > patch being wrong as-is in a concrete way. Got it. > This logic was one of my main complaints in > https://postgr.es/m/20221109220803.t25sosmfvkeglhy4%40awork3.anarazel.de > and you went in a very different direction in your reply. Hence a test > showcasing the issue. I guess I was also confused by the fact that you called it "skewer.diff", which is a terminology I invented to describe the scary HOT chain freezing bug. You probably started out writing an isolation test to do something like that, but then repurposed it to show a bug in the patch. Anyway, never mind, I understand you now. > Note that neither of my complaints around FrozenTransactionId in that email > actually require that HOT is involved. The code in the patch doesn't > differentiate between hot and not-hot until later. I understand. I mentioned HOT only because it's more obviously not okay with HOT -- you can point to the precise code that is broken quite easily (index scans break with HOT chain traversals give wrong answers in the problem scenario with freezing HOT chains in the wrong place). I'd really like to know if the scary HOT chain freezing scenario is possible, for the very obvious reason. Have you tried to write a test case for that? -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
On Mon, Nov 14, 2022 at 5:02 PM Andres Freund wrote: > On 2022-11-14 14:27:54 -0500, Robert Haas wrote: > > On Wed, Nov 9, 2022 at 5:08 PM Andres Freund wrote: > > > I don't really understand this logic - why can't we populate the > > > predecessor > > > array, if we can construct a successor entry? > > > > This whole thing was my idea, so let me try to explain. I think the > > naming and comments need work, but I believe the fundamental idea may > > be sound. > > > > successor[x] = y means that when we looked at line pointer x, we saw > > that it was either a redirect to line pointer y, or else it had > > storage and the associated tuple's CTID pointed to line pointer y. > > > At this point, we do not have any idea whether y is at all sane, nor we do > > we know anything about which of x and y is larger. > > What do you mean with "larger" here? Numerically bigger. As in, a redirect line pointer or CTID will most commonly point to a tuple that appears later in the line pointer array, because we assign offset numbers in ascending order. But we also reuse line pointers, so it's possible for a redirect line pointer or CTID to point backwards to a lower-number offset. > > Furthermore, it is > > possible that successor[x] = successor[x'] since the page might be corrupted > > and we haven't checked otherwise. > > > > predecessor[y] = x means that successor[x] = y but in addition we've > > checked that y is sane, and that x.xmax=y.xmin. If there are multiple > > tuples for which these conditions hold, we've issued complaints about > > all but one and entered the last into the predecessor array. > > As shown by the isolationtester test I just posted, this doesn't quite work > right now. Probably fixable. > > I don't think we can follow non-HOT ctid chains if they're older than the xmin > horizon, including all cases of xmin being frozen. There's just nothing > guaranteeing that the tuples are actually "related". Yeah, glad you caught that. I think it's clearly wrong to regard A and B as related if B.xmin is frozen. It doesn't matter whether it's "old-style" frozen where we actually put 2 into the xmin field, or new-style frozen where we set hint bits. Because, even if it's new-style frozen and the value actually stored in B.xmin is numerically equal to A.xmax, it could be from a different epoch. If it's from the same epoch, then something is corrupted, because when we froze B we should have pruned A. But we have no way of knowing whether that's the case, and shouldn't assume corruption. > It seems like we should do a bit more validation within a chain of > tuples. E.g. that no live tuple can follow an !DidCommit xmin? I think this check is already present in stronger form. If we see a !DidCommit xmin, the xmin of the next tuple in the chain not only can't be committed, but had better be the same. See "tuple with uncommitted xmin %u was updated to produce a tuple at offset %u with differing xmin %u". > I now think that the 9.4 specific reasoning is bogus in the first place. The > patch says: > > * Add a line pointer offset to the predecessor array > if xmax is > * matching with xmin of next tuple (reaching via its > t_ctid). > * Prior to PostgreSQL 9.4, we actually changed the > xmin to > * FrozenTransactionId so we must add offset to > predecessor > * array(irrespective of xmax-xmin matching) if > updated tuple xmin > * is frozen, so that we can later do validation > related to frozen > * xmin. Raise corruption if we have two tuples > having the same > * predecessor. > > but it's simply not correct to iterate through xmin=FrozenTransactionId - as > shown in the isolationtester test. And that's unrelated to 9.4, because we > couldn't rely on the raw xmin value either, because even if they match, they > could be from different epochs. I agree completely. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-15 11:36:21 -0500, Robert Haas wrote: > On Mon, Nov 14, 2022 at 5:02 PM Andres Freund wrote: > > It seems like we should do a bit more validation within a chain of > > tuples. E.g. that no live tuple can follow an !DidCommit xmin? > > I think this check is already present in stronger form. If we see a > !DidCommit xmin, the xmin of the next tuple in the chain not only can't be > committed, but had better be the same. As I think I mentioned before, I don't think the "better be the same" aspect is correct, think subxacts. E.g. off 0: xmin: top, xmax: child_1 off 1: xmin: child_1, xmax: invalid If top hasn't committed yet, the current logic afaict will warn about this situation, no? And I don't think we can generally the subxid parent at this point, unfortunately (might have truncated subtrans). Different aspect: Is it ok that we use TransactionIdDidCommit() without a preceding IsInProgress() check? I do think there's some potential for additional checks that don't run into the above issue, e.g. checking that no in-progress xids follow an explicitly aborted xact, that a committed xid can't follow an uncommitted xid etc. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Tue, Nov 15, 2022 at 2:50 PM Andres Freund wrote: > On 2022-11-15 11:36:21 -0500, Robert Haas wrote: > > On Mon, Nov 14, 2022 at 5:02 PM Andres Freund wrote: > > > It seems like we should do a bit more validation within a chain of > > > tuples. E.g. that no live tuple can follow an !DidCommit xmin? > > > > I think this check is already present in stronger form. If we see a > > !DidCommit xmin, the xmin of the next tuple in the chain not only can't be > > committed, but had better be the same. > > As I think I mentioned before, I don't think the "better be the same" aspect > is correct, think subxacts. E.g. > > off 0: xmin: top, xmax: child_1 > off 1: xmin: child_1, xmax: invalid > > If top hasn't committed yet, the current logic afaict will warn about this > situation, no? And I don't think we can generally the subxid parent at this > point, unfortunately (might have truncated subtrans). Woops, you're right. > Different aspect: Is it ok that we use TransactionIdDidCommit() without a > preceding IsInProgress() check? Well, the code doesn't match the comments here, sadly. The comments claim that we want to check that if the prior tuple's xmin was aborted or in progress the current one is in the same state. If that's actually what the code checked, we'd definitely need to check both TransactionIdInProgress and TransactionIdCommit and be wary of the possibility of the value changing concurrently. But the code doesn't actually check the status of more than one XID, nor does it care about the distinction between aborted and in progress, so I don't think that the current code is buggy in that particular way, just in a bunch of other ways. > I do think there's some potential for additional checks that don't run into > the above issue, e.g. checking that no in-progress xids follow an explicitly > aborted xact, that a committed xid can't follow an uncommitted xid etc. Yeah, maybe so. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-14 15:07:05 -0800, Peter Geoghegan wrote: > I'd really like to know if the scary HOT chain freezing scenario is > possible, for the very obvious reason. Have you tried to write a test > case for that? I tried. Unfortunately, even if the bug exists, we currently don't have the infrastructure to write isolationtester tests for it. There's just too many points where we'd need to wait where I don't know of ways to wait with isolationtester. I'm quite certain that it's possible to end up freezing an earlier row versions in a hot chain in < 14, I got there with careful gdb orchestration. Of course possible I screwed something up, given I did it once, interactively. Not sure if trying to fix it is worth the risk of backpatching all the necessary changes to switch to the retry approach. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Thu, Nov 10, 2022 at 3:38 AM Andres Freund wrote: > > > + } > > + > > + /* > > + * Loop over offset and populate predecessor array from > all entries > > + * that are present in successor array. > > + */ > > + ctx.attnum = -1; > > + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; > > + ctx.offnum = OffsetNumberNext(ctx.offnum)) > > + { > > + ItemId curr_lp; > > + ItemId next_lp; > > + HeapTupleHeader curr_htup; > > + HeapTupleHeader next_htup; > > + TransactionId curr_xmax; > > + TransactionId next_xmin; > > + > > + OffsetNumber nextoffnum = successor[ctx.offnum]; > > + > > + curr_lp = PageGetItemId(ctx.page, ctx.offnum); > > Why do we get the item when nextoffnum is 0? > > Yes, right, I will move this call to PageGetItemId, just after the next "if" condition in the patch. > > > + if (nextoffnum == 0 || !lp_valid[ctx.offnum] || > !lp_valid[nextoffnum]) > > + { > > + /* > > + * This is either the last updated tuple > in the chain or a > > + * corruption raised for this tuple. > > + */ > > "or a corruption raised" isn't quite right grammatically. > > will change to "This is either the last updated tuple in the chain or corruption has been raised for this tuple" > > > + continue; > > + } > > + if (ItemIdIsRedirected(curr_lp)) > > + { > > + next_lp = PageGetItemId(ctx.page, > nextoffnum); > > + if (ItemIdIsRedirected(next_lp)) > > + { > > + report_corruption(&ctx, > > + > psprintf("redirected line pointer pointing to another redirected line > pointer at offset %u", > > + > (unsigned) nextoffnum)); > > + continue; > > + } > > + next_htup = (HeapTupleHeader) PageGetItem( > ctx.page, next_lp); > > + if (!HeapTupleHeaderIsHeapOnly(next_htup)) > > + { > > + report_corruption(&ctx, > > + > psprintf("redirected tuple at line pointer offset %u is not heap only > tuple", > > + > (unsigned) nextoffnum)); > > + } > > + if ((next_htup->t_infomask & HEAP_UPDATED) > == 0) > > + { > > + report_corruption(&ctx, > > + > psprintf("redirected tuple at line pointer offset %u is not heap updated > tuple", > > + > (unsigned) nextoffnum)); > > + } > > + continue; > > + } > > + > > + /* > > + * Add a line pointer offset to the predecessor > array if xmax is > > + * matching with xmin of next tuple (reaching via > its t_ctid). > > + * Prior to PostgreSQL 9.4, we actually changed > the xmin to > > + * FrozenTransactionId > > I'm doubtful it's a good idea to try to validate the 9.4 case. The > likelihood > of getting that right seems low and I don't see us gaining much by even > trying. > > > > so we must add offset to predecessor > > + * array(irrespective of xmax-xmin matching) if > updated tuple xmin > > + * is frozen, so that we can later do validation > related to frozen > > + * xmin. Raise corruption if we have two tuples > having the same > > + * predecessor. > > + * We add the offset to the predecessor array > irrespective of the > > + * transaction (t_xmin) status. We will do > validation related to > > + * the transaction status (and also all other > validations) when we > > + * loop over the predecessor array. > > + */ > > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > > + curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup); > > + next_lp = PageGetItemId(ctx.page, nextoffnum); > > + next_htup = (HeapTupleHeader) PageGetItem(ctx.page, > next_lp); > > + next_xmin = HeapTupleHeaderGetXmin(next_htup); > > + if (TransactionIdIsValid(curr_xmax) && > > + (TransactionIdEquals(curr_xmax, next_x
Re: HOT chain validation in verify_heapam()
On Wed, Nov 16, 2022 at 1:58 AM Robert Haas wrote: > On Tue, Nov 15, 2022 at 2:50 PM Andres Freund wrote: > > On 2022-11-15 11:36:21 -0500, Robert Haas wrote: > > > On Mon, Nov 14, 2022 at 5:02 PM Andres Freund > wrote: > > > > It seems like we should do a bit more validation within a chain of > > > > tuples. E.g. that no live tuple can follow an !DidCommit xmin? > > > > > > I think this check is already present in stronger form. If we see a > > > !DidCommit xmin, the xmin of the next tuple in the chain not only > can't be > > > committed, but had better be the same. > > > > As I think I mentioned before, I don't think the "better be the same" > aspect > > is correct, think subxacts. E.g. > > > > off 0: xmin: top, xmax: child_1 > > off 1: xmin: child_1, xmax: invalid > > > > If top hasn't committed yet, the current logic afaict will warn about > this > > situation, no? And I don't think we can generally the subxid parent at > this > > point, unfortunately (might have truncated subtrans). > > Woops, you're right. yes, got it, have tried to test and it is giving false corruption in case of subtransaction. I think a better way to have this check is, we need to check that if pred_xmin is aborted then current_xmin should be aborted only. So there is no way that we validate corruption with in_progress txid. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Nov 16, 2022 at 4:51 AM Himanshu Upadhyaya wrote: > yes, got it, have tried to test and it is giving false corruption in case of > subtransaction. > I think a better way to have this check is, we need to check that if > pred_xmin is > aborted then current_xmin should be aborted only. So there is no way that we > validate corruption with in_progress txid. Please note that you can't use TransactionIdDidAbort here, because that will return false for transactions aborted by a crash. You have to check that it's not in progress and then afterwards check that it's not committed. Also note that if you check whether it's committed first and then check whether it's in progress afterwards, there's a race condition: it might commit just after you verify that it isn't committed yet, and then it won't be in progress any more and will look aborted. I disagree with the idea that we can't check in progress. I think the checks could look something like this: pred_in_progress = TransactionIdIsInProgress(pred_xmin); current_in_progress = TransactionIdIsInProgress(current_xmin); if (pred_in_progress) { if (current_in_progress) return ok; // recheck to avoid race condition if (TransactionIdIsInProgress(pred_xmin)) { if (TransactionIdDidCommit(current_xmin)) return corruption: predecessor xmin in progress, but current xmin committed; else return corruption: predecessor xmin in progress, but current xmin aborted; } // fallthrough: when we entered this if-block pred_xmin was still in progress but no longer; pred_in_progress = false; } if (TransactionIdDidCommit(pred_xmin)) return ok; if (current_in_progress) return corruption: predecessor xmin aborted, but current xmin in progress; else if (TransactionIdDidCommit(current_xmin)) return corruption: predecessor xmin aborted, but current xmin committed; The error messages as phrased here aren't actually what we should use; they would need rephrasing. But I think, or hope anyway, that the logic works. I think you basically just need the 1 recheck: if you see the predecessor xmin in progress and then the current xmin in progress, you have to go back and check that the predecessor xmin is still in progress, because otherwise both could have committed or aborted together in between. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Mon, Jan 30, 2023 at 8:24 AM Himanshu Upadhyaya wrote: > Before this we stop the node by "$node->stop;" and then only we progress to > manual corruption. This will abort all running/in-progress transactions. > So, if we create an in-progress transaction and comment "$node->stop;" > then somehow all the code that we have for manual corruption does not work. > > I think it is required to stop the server and then only proceed for manual > corruption? > If this is the case then please suggest if there is a way to get an > in-progress transaction > that we can use for manual corruption. How about using a prepared transaction? -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Tue, Jan 31, 2023 at 7:20 PM Robert Haas wrote: > On Mon, Jan 30, 2023 at 8:24 AM Himanshu Upadhyaya > wrote: > > Before this we stop the node by "$node->stop;" and then only we progress > to > > manual corruption. This will abort all running/in-progress transactions. > > So, if we create an in-progress transaction and comment "$node->stop;" > > then somehow all the code that we have for manual corruption does not > work. > > > > I think it is required to stop the server and then only proceed for > manual corruption? > > If this is the case then please suggest if there is a way to get an > in-progress transaction > > that we can use for manual corruption. > > How about using a prepared transaction? > > Thanks, yes it's working fine with Prepared Transaction. Please find attached the v9 patch incorporating all the review comments. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From d241937841c5990ff4df00d1abc6bfbb3491ac3e Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Sun, 5 Feb 2023 12:50:21 +0530 Subject: [PATCH v9] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund --- contrib/amcheck/verify_heapam.c | 297 ++ src/bin/pg_amcheck/t/004_verify_heapam.pl | 241 +- 2 files changed, 527 insertions(+), 11 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 4fcfd6df72..01c639b5e1 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,9 +399,14 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber]; + OffsetNumber successor[MaxOffsetNumber]; + bool lp_valid[MaxOffsetNumber]; CHECK_FOR_INTERRUPTS(); + memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber); + /* Optionally skip over all-frozen or all-visible blocks */ if (skip_option != SKIP_PAGES_NONE) { @@ -433,6 +438,10 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextoffnum; + + successor[ctx.offnum] = InvalidOffsetNumber; + lp_valid[ctx.offnum] = false; ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +478,13 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * Make entry in successor array, redirected lp will be + * validated at the time when we loop over successor array. + */ +successor[ctx.offnum] = rdoffnum; +lp_valid[ctx.offnum] = true; continue; } @@ -504,9 +520,283 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); + + /* + * Add the data to the successor array if next updated tuple is in + * the same page. It will be used later to generate the + * predecessor array. + * + * We need to access the tuple's header to populate the + * predecessor array. However the tuple is not necessarily sanity + * checked yet so delaying construction of predecessor array until + * all tuples are sanity checked. + */ + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextoffnum != ctx.offnum) + { +successor[ctx.offnum] = nextoffnum; + } + } + + /* + * Loop over offset and populate predecessor array from all entries + * that are present in successor array. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmax; + TransactionId next_xmin; + + OffsetNumber nextoffnum = successor[ctx.offnum]; + + if (nextoffnum == InvalidOffsetNumber) + { +/* + * This is either the last updated tuple in the chain or a + * corrupted Tuple/lp or unused/dead line pointer. + */ +continue; + } + + /* + * Add data to the predecessor array even if the current or + * successor's LP is not valid. We will not process/validate these + * offset entries while looping over the predecessor array but + * h
Re: HOT chain validation in verify_heapam()
On Sun, Feb 5, 2023 at 3:57 AM Himanshu Upadhyaya wrote: > Thanks, yes it's working fine with Prepared Transaction. > Please find attached the v9 patch incorporating all the review comments. I don't know quite how we're still going around in circles about this, but this code makes no sense to me at all: /* * Add data to the predecessor array even if the current or * successor's LP is not valid. We will not process/validate these * offset entries while looping over the predecessor array but * having all entries in the predecessor array will help in * identifying(and validating) the Root of a chain. */ if (!lp_valid[ctx.offnum] || !lp_valid[nextoffnum]) { predecessor[nextoffnum] = ctx.offnum; continue; } If the current offset number is not for a valid line pointer, then it makes no sense to talk about the successor. An invalid redirected line pointer is one that points off the end of the line pointer array, or to before the beginning of the line pointer array, or to a line pointer that is unused. An invalid line pointer that is LP_USED is one which points to a location outside the page, or to a location inside the page. In none of these cases does it make any sense to talk about the next tuple. If the line pointer isn't valid, it's pointing to some invalid location where there cannot possibly be a tuple. In other words, if lp_valid[ctx.offnum] is false, then nextoffnum is a garbage value, and therefore referencing predecessor[nextoffnum] is useless and dangerous. If the next offset number is not for a valid line pointer, we could in theory still assign to the predecessor array, as you propose here. In that case, the tuple or line pointer at ctx.offnum is pointing to the line pointer at nextoffnum and that is all fine. But what is the point? The comment claims that the point is that it will help us identify and validate the root of the hot chain. But if the line pointer at nextoffnum is not valid, it can't be the root of a hot chain. When we're talking about the root of a HOT chain, we're speaking about a tuple. If lp_valid[nextoffnum] is false, there is no tuple. Instead of pointing to a tuple, that line pointer is pointing to garbage. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Feb 8, 2023 at 11:17 PM Robert Haas wrote: > On Sun, Feb 5, 2023 at 3:57 AM Himanshu Upadhyaya > wrote: > > Thanks, yes it's working fine with Prepared Transaction. > > Please find attached the v9 patch incorporating all the review comments. > > I don't know quite how we're still going around in circles about this, > but this code makes no sense to me at all: > > /* > * Add data to the predecessor array even if the current or > * successor's LP is not valid. We will not process/validate > these > * offset entries while looping over the predecessor array but > * having all entries in the predecessor array will help in > * identifying(and validating) the Root of a chain. > */ > if (!lp_valid[ctx.offnum] || !lp_valid[nextoffnum]) > { > predecessor[nextoffnum] = ctx.offnum; > continue; > } > > If the current offset number is not for a valid line pointer, then it > makes no sense to talk about the successor. An invalid redirected line > pointer is one that points off the end of the line pointer array, or > to before the beginning of the line pointer array, or to a line > pointer that is unused. An invalid line pointer that is LP_USED is one > which points to a location outside the page, or to a location inside > the page. In none of these cases does it make any sense to talk about > the next tuple. If the line pointer isn't valid, it's pointing to some > invalid location where there cannot possibly be a tuple. In other > words, if lp_valid[ctx.offnum] is false, then nextoffnum is a garbage > value, and therefore referencing predecessor[nextoffnum] is useless > and dangerous. > > If the next offset number is not for a valid line pointer, we could in > theory still assign to the predecessor array, as you propose here. In > that case, the tuple or line pointer at ctx.offnum is pointing to the > line pointer at nextoffnum and that is all fine. But what is the > point? The comment claims that the point is that it will help us > identify and validate the root of the hot chain. But if the line > pointer at nextoffnum is not valid, it can't be the root of a hot > chain. When we're talking about the root of a HOT chain, we're > speaking about a tuple. If lp_valid[nextoffnum] is false, there is no > tuple. Instead of pointing to a tuple, that line pointer is pointing > to garbage. > > Initially while implementing logic to identify the root of the HOT chain I was getting crash and regression failure's that time I thought of having this check along with a few other changes that were required, but you are right, it's unnecessary to add data to the predecessor array(in this case) and is not required. I am removing this from the patch. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 19c93cce0189150d6bfe68237eb3d5a414a18ad9 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Thu, 9 Feb 2023 22:00:25 +0530 Subject: [PATCH v10] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund --- contrib/amcheck/verify_heapam.c | 303 +- src/bin/pg_amcheck/t/004_verify_heapam.pl | 241 - 2 files changed, 527 insertions(+), 17 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 4fcfd6df72..7fc984dd33 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -150,7 +150,7 @@ typedef struct HeapCheckContext } HeapCheckContext; /* Internal implementation */ -static void check_tuple(HeapCheckContext *ctx); +static void check_tuple(HeapCheckContext *ctx, bool *lp_valid); static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx, ToastedAttribute *ta, int32 *expected_chunk_seq, uint32 extsize); @@ -160,7 +160,7 @@ static void check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta); static bool check_tuple_header(HeapCheckContext *ctx); -static bool check_tuple_visibility(HeapCheckContext *ctx); +static bool check_tuple_visibility(HeapCheckContext *ctx, bool *lp_valid); static void report_corruption(HeapCheckContext *ctx, char *msg); static void report_toast_corruption(HeapCheckContext *ctx, @@ -399,9 +399,14 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber]; + OffsetNumber successor[MaxOffsetNumber]; + bool lp_valid[MaxOffsetNumber]; CHECK_FOR_INTERRUPTS(); + memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber); + /* Option
Re: HOT chain validation in verify_heapam()
On Thu, Feb 9, 2023 at 12:09 PM Himanshu Upadhyaya wrote: > Initially while implementing logic to identify the root of the HOT chain > I was getting crash and regression failure's that time I thought of having > this check along with a few other changes that were required, > but you are right, it's unnecessary to add data to the predecessor > array(in this case) and is not required. I am removing this from the patch. I finally found time to look at this today -- apologies for the long delay -- and I don't think that it addresses my objections. When I proposed lp_valid, I had a very simple idea in mind: it tells you whether or not the line pointer is, at some basic level, valid. Like, it contains numbers that could point to a tuple on the page, at least hypothetically. But that is something that can be determined strictly by inspecting the line pointer, and yet you have check_tuple_visibility() changing the value based on the visibility status of xmin. So it seems that we still don't have a patch where the value of a variable called lp_valid corresponds to whether or not the L.P. is valid. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Mon, Mar 6, 2023 at 12:36 PM Robert Haas wrote: > So it seems that we still don't have a patch where the > value of a variable called lp_valid corresponds to whether or not the > L.P. is valid. Here's a worked-over version of this patch. Changes: - I got rid of the code that sets lp_valid in funny places and instead arranged to have check_tuple_visibility() pass up the information on the XID status. That's important, because we can't casually apply operations like TransactionIdIsCommitted() to XIDs that, for all we know, might not even be in the range covered by CLOG. In such cases, we should not perform any HOT chain validation because we can't do it sensibly; the new code accomplishes this, and also reduces the number of CLOG lookups as compared with your version. - I moved most of the HOT chain checks from the loop over the predecessor[] array to the loop over the successor[] array. It didn't seem to have any value to put them in the third loop; it forces us to expend extra code to distinguish between redirects and tuples, information that we already had in the second loop. The only check that seems to make sense to do in that last loop is the one for a HOT chain that starts with a HOT tuple, which can't be done any earlier. - I realized that your patch had a guard against setting the predecessor[] when it was set already only for tuples, not for redirects. That means if a redirect pointed into the middle of a HOT chain we might not report corruption appropriately. I fixed this and reworded the associated messages a bit. - Assorted cosmetic and comment changes. I think this is easier to follow and more nearly correct, but what do you (and others) think? -- Robert Haas EDB: http://www.enterprisedb.com 0001-Implement-HOT-chain-validation-in-verify_heapam.patch Description: Binary data
Re: HOT chain validation in verify_heapam()
> On Mar 7, 2023, at 10:16 AM, Robert Haas wrote: > > On Mon, Mar 6, 2023 at 12:36 PM Robert Haas wrote: >> So it seems that we still don't have a patch where the >> value of a variable called lp_valid corresponds to whether or not the >> L.P. is valid. > > Here's a worked-over version of this patch. Changes: > > - I got rid of the code that sets lp_valid in funny places and instead > arranged to have check_tuple_visibility() pass up the information on > the XID status. That's important, because we can't casually apply > operations like TransactionIdIsCommitted() to XIDs that, for all we > know, might not even be in the range covered by CLOG. In such cases, > we should not perform any HOT chain validation because we can't do it > sensibly; the new code accomplishes this, and also reduces the number > of CLOG lookups as compared with your version. > > - I moved most of the HOT chain checks from the loop over the > predecessor[] array to the loop over the successor[] array. It didn't > seem to have any value to put them in the third loop; it forces us to > expend extra code to distinguish between redirects and tuples, > information that we already had in the second loop. The only check > that seems to make sense to do in that last loop is the one for a HOT > chain that starts with a HOT tuple, which can't be done any earlier. > > - I realized that your patch had a guard against setting the > predecessor[] when it was set already only for tuples, not for > redirects. That means if a redirect pointed into the middle of a HOT > chain we might not report corruption appropriately. I fixed this and > reworded the associated messages a bit. > > - Assorted cosmetic and comment changes. > > I think this is easier to follow and more nearly correct, but what do > you (and others) think? Thanks, Robert. Quickly skimming over this patch, it looks like something reviewable. Your changes to t/004_verify_heapam.pl appear to be consistent with how that test was intended to function. Note that I have not tried any of this yet. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: HOT chain validation in verify_heapam()
Hi, > Note that I have not tried any of this yet. I did, both with Meson and Autotools. All in all the patch looks very good, but I have a few little nitpicks. ``` +/* HOT chains should not intersect. */ +if (predecessor[nextoffnum] != InvalidOffsetNumber) +{ +report_corruption(&ctx, + psprintf("redirect line pointer points to offset %u, but offset %u also points there", + (unsigned) nextoffnum, (unsigned) predecessor[nextoffnum])); +continue; +} ``` This type of corruption doesn't seem to be test-covered. ``` +/* + * If the next line pointer is a redirect, or if it's a tuple + * but the XMAX of this tuple doesn't match the XMIN of the next + * tuple, then the two aren't part of the same update chain and + * there is nothing more to do. + */ +if (ItemIdIsRedirected(next_lp)) +continue; ``` lcov shows that the `continue` path is never executed. This is probably not a big deal however. ``` +$node->append_conf('postgresql.conf','max_prepared_transactions=100'); ``` >From what I can tell this line is not needed. -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
On Wed, Mar 8, 2023 at 5:35 AM Aleksander Alekseev wrote: > I did, both with Meson and Autotools. All in all the patch looks very > good, but I have a few little nitpicks. Thank you for the nitpicks. > +/* HOT chains should not intersect. */ > +if (predecessor[nextoffnum] != InvalidOffsetNumber) > +{ > +report_corruption(&ctx, > + psprintf("redirect line pointer > points to offset %u, but offset %u also points there", > + (unsigned) nextoffnum, > (unsigned) predecessor[nextoffnum])); > +continue; > +} > ``` > > This type of corruption doesn't seem to be test-covered. Himanshu, would you be able to try to write a test case for this? I think you need something like this: update a tuple with a lower TID to produce a tuple with a higher TID, e.g. (0,10) is updated to produce (0,11). But then have a redirect line pointer that also points to the result of the update, in this case (0,11). > ``` > +/* > + * If the next line pointer is a redirect, or if it's a tuple > + * but the XMAX of this tuple doesn't match the XMIN of the next > + * tuple, then the two aren't part of the same update chain and > + * there is nothing more to do. > + */ > +if (ItemIdIsRedirected(next_lp)) > +continue; > ``` > > lcov shows that the `continue` path is never executed. This is > probably not a big deal however. It might be good to have a negative test case for this, though. Let's say we, e.g. update (0,1) to produce (0,2), but then abort. The page is HOT-pruned. Then we add insert a new tuple at (0,2), HOT-update it to produce (0,3), and commit. Then we HOT-prune again. Possibly we could try to write a test case that verifies that this does NOT produce any corruption indication. > ``` > +$node->append_conf('postgresql.conf','max_prepared_transactions=100'); > ``` > > From what I can tell this line is not needed. That surprises me, because the new test cases involve preparing a transaction, and by default max_prepared_transactions=0. So it seems to me (without testing) that this ought to be required. Did you test that it works without this setting? The value of 100 seems a bit excessive, though. Most TAP tests seem to use 10. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi, > > ``` > > +$node->append_conf('postgresql.conf','max_prepared_transactions=100'); > > ``` > > > > From what I can tell this line is not needed. > > That surprises me, because the new test cases involve preparing a > transaction, and by default max_prepared_transactions=0. So it seems > to me (without testing) that this ought to be required. Did you test > that it works without this setting? Sorry, I was wrong the first time. The test fails without this line: ``` 112/238 postgresql:pg_amcheck / pg_amcheck/004_verify_heapam ERROR 4.94s exit status 29 ``` -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
On Wed, Mar 8, 2023 at 7:06 PM Robert Haas wrote: > > > +/* HOT chains should not intersect. */ > > +if (predecessor[nextoffnum] != InvalidOffsetNumber) > > +{ > > +report_corruption(&ctx, > > + psprintf("redirect line pointer > > points to offset %u, but offset %u also points there", > > + (unsigned) nextoffnum, > > (unsigned) predecessor[nextoffnum])); > > +continue; > > +} > > ``` > > > > This type of corruption doesn't seem to be test-covered. > > Himanshu, would you be able to try to write a test case for this? I > think you need something like this: update a tuple with a lower TID to > produce a tuple with a higher TID, e.g. (0,10) is updated to produce > (0,11). But then have a redirect line pointer that also points to the > result of the update, in this case (0,11). > > Sure Robert, I will work on this. > > ``` > > +/* > > + * If the next line pointer is a redirect, or if it's a > tuple > > + * but the XMAX of this tuple doesn't match the XMIN of the > next > > + * tuple, then the two aren't part of the same update chain > and > > + * there is nothing more to do. > > + */ > > +if (ItemIdIsRedirected(next_lp)) > > +continue; > > ``` > > > > lcov shows that the `continue` path is never executed. This is > > probably not a big deal however. > > It might be good to have a negative test case for this, though. Let's > say we, e.g. update (0,1) to produce (0,2), but then abort. The page > is HOT-pruned. Then we add insert a new tuple at (0,2), HOT-update it > to produce (0,3), and commit. Then we HOT-prune again. Possibly we > could try to write a test case that verifies that this does NOT > produce any corruption indication. > > will work on this too. > > ``` > > +$node->append_conf('postgresql.conf','max_prepared_transactions=100'); > > ``` > > > > From what I can tell this line is not needed. > > That surprises me, because the new test cases involve preparing a > transaction, and by default max_prepared_transactions=0. So it seems > to me (without testing) that this ought to be required. Did you test > that it works without this setting? > > The value of 100 seems a bit excessive, though. Most TAP tests seem to use > 10. > > We need this for prepare transaction, will change it to 10. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Nov 16, 2022 at 11:23 PM Robert Haas wrote: > On Wed, Nov 16, 2022 at 4:51 AM Himanshu Upadhyaya > wrote: > > yes, got it, have tried to test and it is giving false corruption in > case of subtransaction. > > I think a better way to have this check is, we need to check that if > pred_xmin is > > aborted then current_xmin should be aborted only. So there is no way > that we > > validate corruption with in_progress txid. > > Please note that you can't use TransactionIdDidAbort here, because > that will return false for transactions aborted by a crash. You have > to check that it's not in progress and then afterwards check that it's > not committed. Also note that if you check whether it's committed > first and then check whether it's in progress afterwards, there's a > race condition: it might commit just after you verify that it isn't > committed yet, and then it won't be in progress any more and will look > aborted. > > I disagree with the idea that we can't check in progress. I think the > checks could look something like this: > > pred_in_progress = TransactionIdIsInProgress(pred_xmin); > current_in_progress = TransactionIdIsInProgress(current_xmin); > if (pred_in_progress) > { > if (current_in_progress) > return ok; > // recheck to avoid race condition > if (TransactionIdIsInProgress(pred_xmin)) > { > if (TransactionIdDidCommit(current_xmin)) > return corruption: predecessor xmin in progress, but > current xmin committed; > else > return corruption: predecessor xmin in progress, but > current xmin aborted; > } > I think we can have a situation where pred_xmin is in progress but curr_xmin is aborted, consider below example: ‘postgres[14723]=#’BEGIN; BEGIN ‘postgres[14723]=#*’insert into test2 values (1,1); INSERT 0 1 ‘postgres[14723]=#*’savepoint s1; SAVEPOINT ‘postgres[14723]=#*’update test2 set a =2; UPDATE 1 ‘postgres[14723]=#*’rollback to savepoint s1; ROLLBACK Now pred_xmin is in progress but curr_xmin is aborted, am I missing anything here? I think if pred_xmin is aborted and curr_xmin is in progress we should consider it as a corruption case but vice versa is not true. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Nov 16, 2022 at 10:57 PM Himanshu Upadhyaya wrote: > I think if pred_xmin is aborted and curr_xmin is in progress we should > consider it as a corruption case but vice versa is not true. Yeah, you're right. I'm being stupid about subtransactions again. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Nov 16, 2022 at 12:41 PM Himanshu Upadhyaya < upadhyaya.himan...@gmail.com> wrote: > > >> > + } >> > + >> > + /* Loop over offsets and validate the data in the >> predecessor array. */ >> > + for (OffsetNumber currentoffnum = FirstOffsetNumber; >> currentoffnum <= maxoff; >> > + currentoffnum = OffsetNumberNext(currentoffnum)) >> > + { >> > + HeapTupleHeader pred_htup; >> > + HeapTupleHeader curr_htup; >> > + TransactionId pred_xmin; >> > + TransactionId curr_xmin; >> > + ItemId pred_lp; >> > + ItemId curr_lp; >> > + >> > + ctx.offnum = predecessor[currentoffnum]; >> > + ctx.attnum = -1; >> > + >> > + if (ctx.offnum == 0) >> > + { >> > + /* >> > + * Either the root of the chain or an >> xmin-aborted tuple from >> > + * an abandoned portion of the HOT chain. >> > + */ >> >> Hm - couldn't we check that the tuple could conceivably be at the root of >> a >> chain? I.e. isn't HEAP_HOT_UPDATED? Or alternatively has an aborted xmin? >> >> > I don't see a way to check if tuple is at the root of HOT chain because > predecessor array will always be having either xmin from non-abandoned > transaction or it will be zero. We can't differentiate root or tuple > inserted via abandoned transaction. > > I was wrong here. I think this can be done and will be doing these changes in my next patch. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Tue, Nov 15, 2022 at 3:32 AM Andres Freund wrote: > > > Furthermore, it is > > possible that successor[x] = successor[x'] since the page might be > corrupted > > and we haven't checked otherwise. > > > > predecessor[y] = x means that successor[x] = y but in addition we've > > checked that y is sane, and that x.xmax=y.xmin. If there are multiple > > tuples for which these conditions hold, we've issued complaints about > > all but one and entered the last into the predecessor array. > > As shown by the isolationtester test I just posted, this doesn't quite work > right now. Probably fixable. > > I don't think we can follow non-HOT ctid chains if they're older than the > xmin > horizon, including all cases of xmin being frozen. There's just nothing > guaranteeing that the tuples are actually "related". > > I understand the problem with frozen tuples but don't understand the concern with non-HOT chains, could you please help with some explanation around it? -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-17 21:33:17 +0530, Himanshu Upadhyaya wrote: > On Tue, Nov 15, 2022 at 3:32 AM Andres Freund wrote: > > > Furthermore, it is > > > possible that successor[x] = successor[x'] since the page might be > > corrupted > > > and we haven't checked otherwise. > > > > > > predecessor[y] = x means that successor[x] = y but in addition we've > > > checked that y is sane, and that x.xmax=y.xmin. If there are multiple > > > tuples for which these conditions hold, we've issued complaints about > > > all but one and entered the last into the predecessor array. > > > > As shown by the isolationtester test I just posted, this doesn't quite work > > right now. Probably fixable. > > > > I don't think we can follow non-HOT ctid chains if they're older than the > > xmin > > horizon, including all cases of xmin being frozen. There's just nothing > > guaranteeing that the tuples are actually "related". > > > I understand the problem with frozen tuples but don't understand the > concern with non-HOT chains, > could you please help with some explanation around it? I think there might be cases where following non-HOT ctid-chains across tuples within a page will trigger spurious errors, if the tuple versions are older than the xmin horizon. But it's a bit hard to say without seeing the code with a bunch of the other bugs fixed. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Tue, Nov 15, 2022 at 10:55 PM Andres Freund wrote: > I'm quite certain that it's possible to end up freezing an earlier row > versions in a hot chain in < 14, I got there with careful gdb > orchestration. Of course possible I screwed something up, given I did it once, > interactively. Not sure if trying to fix it is worth the risk of backpatching > all the necessary changes to switch to the retry approach. There is code in heap_prepare_freeze_tuple() that treats a raw xmax as "xmax_already_frozen = true", even when the raw xmax value isn't already set to InvalidTransactionId. I'm referring to this code: if ( ... ) // process raw xmax else if (TransactionIdIsNormal(xid)) else if ((tuple->t_infomask & HEAP_XMAX_INVALID) || !TransactionIdIsValid(HeapTupleHeaderGetRawXmax(tuple))) { freeze_xmax = false; xmax_already_frozen = true; /* No need for relfrozenxid_out handling for already-frozen xmax */ } else ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg_internal("found xmax %u (infomask 0x%04x) not frozen, not multi, not normal", xid, tuple->t_infomask))); Why should it be okay to not process xmax during this call (by setting "xmax_already_frozen = true"), just because HEAP_XMAX_INVALID happens to be set? Isn't HEAP_XMAX_INVALID purely a hint? (HEAP_XMIN_FROZEN is *not* a hint, but we're dealing with xmax here.) I'm not sure how relevant this is to the concerns you have about frozen xmax, or even if it's any kind of problem, but it still seems worth fixing. It seems to me that there should be clear rules on what special transaction IDs can appear in xmax. Namely: the only special transaction ID that can ever appear in xmax is InvalidTransactionId. (Also, it's not okay to see *any* other XID in the "xmax_already_frozen = true" path, nor would it be okay to leave any other XID behind in xmax in the nearby "freeze_xmax = true" path.) -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-20 11:58:12 -0800, Peter Geoghegan wrote: > There is code in heap_prepare_freeze_tuple() that treats a raw xmax as > "xmax_already_frozen = true", even when the raw xmax value isn't > already set to InvalidTransactionId. I'm referring to this code: > > if ( ... ) // process raw xmax > > else if (TransactionIdIsNormal(xid)) > > else if ((tuple->t_infomask & HEAP_XMAX_INVALID) || > !TransactionIdIsValid(HeapTupleHeaderGetRawXmax(tuple))) > { > freeze_xmax = false; > xmax_already_frozen = true; > /* No need for relfrozenxid_out handling for already-frozen xmax */ > } > else > ereport(ERROR, > (errcode(ERRCODE_DATA_CORRUPTED), > errmsg_internal("found xmax %u (infomask 0x%04x) not > frozen, not multi, not normal", > xid, tuple->t_infomask))); > > Why should it be okay to not process xmax during this call (by setting > "xmax_already_frozen = true"), just because HEAP_XMAX_INVALID happens > to be set? Isn't HEAP_XMAX_INVALID purely a hint? (HEAP_XMIN_FROZEN is > *not* a hint, but we're dealing with xmax here.) Hm. But to get to that point we already need to have decided that xmax is not a normal xid. Unhelpfully we reuse the 'xid' variable for xmax as well: xid = HeapTupleHeaderGetRawXmax(tuple); I don't really know the HEAP_XMAX_INVALID branch is trying to do. For one, xid already is set to HeapTupleHeaderGetRawXmax(), why is it refetching the value? So it looks to me like this path should just test !TransactionIdIsValid(xid)? > It seems to me that there should be clear rules on what > special transaction IDs can appear in xmax. Namely: the only special > transaction ID that can ever appear in xmax is InvalidTransactionId. > (Also, it's not okay to see *any* other XID in the > "xmax_already_frozen = true" path, nor would it be okay to leave any > other XID behind in xmax in the nearby "freeze_xmax = true" path.) Yea. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Mon, Nov 21, 2022 at 1:34 PM Andres Freund wrote: > Hm. But to get to that point we already need to have decided that xmax > is not a normal xid. Unhelpfully we reuse the 'xid' variable for xmax as > well: > xid = HeapTupleHeaderGetRawXmax(tuple); > > I don't really know the HEAP_XMAX_INVALID branch is trying to do. For > one, xid already is set to HeapTupleHeaderGetRawXmax(), why is it > refetching the value? Right, that detail is correct, but still weird. And suggests that it might not have been super well thought through. > So it looks to me like this path should just test !TransactionIdIsValid(xid)? Agreed. Plus there should be a comment that reminds you that this is a normal regular transaction ID (easy to miss, because the initial "if" block for Multis is rather large). I will push something like that soon. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
*/ > > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > > + curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup); > > + next_lp = PageGetItemId(ctx.page, nextoffnum); > > + next_htup = (HeapTupleHeader) PageGetItem(ctx.page, > next_lp); > > + next_xmin = HeapTupleHeaderGetXmin(next_htup); > > + if (TransactionIdIsValid(curr_xmax) && > > + (TransactionIdEquals(curr_xmax, next_xmin) > || > > + next_xmin == FrozenTransactionId)) > > + { > > + if (predecessor[nextoffnum] != 0) > > + { > > + report_corruption(&ctx, > > + > psprintf("updated version at offset %u is also the updated version of > tuple at offset %u", > > + > (unsigned) nextoffnum, (unsigned) predecessor[nextoffnum])); > > + continue; > > I doubt it is correct to enter this path with next_xmin == > FrozenTransactionId. This is following a ctid chain that we normally > wouldn't > follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin > condition. > > removed this frozen check. > + } > > + > > + /* Loop over offsets and validate the data in the > predecessor array. */ > > + for (OffsetNumber currentoffnum = FirstOffsetNumber; > currentoffnum <= maxoff; > > + currentoffnum = OffsetNumberNext(currentoffnum)) > > + { > > + HeapTupleHeader pred_htup; > > + HeapTupleHeader curr_htup; > > + TransactionId pred_xmin; > > + TransactionId curr_xmin; > > + ItemId pred_lp; > > + ItemId curr_lp; > > + > > + ctx.offnum = predecessor[currentoffnum]; > > + ctx.attnum = -1; > > + > > + if (ctx.offnum == 0) > > + { > > + /* > > + * Either the root of the chain or an > xmin-aborted tuple from > > + * an abandoned portion of the HOT chain. > > + */ > > Hm - couldn't we check that the tuple could conceivably be at the root of a > chain? I.e. isn't HEAP_HOT_UPDATED? Or alternatively has an aborted xmin? > > Done, I have added code to identify cases of missing offset in the predecessor[] array and added validation that root of the chain must not be HEAP_ONLY_TUPLE. > > > + continue; > > + } > > + > > + curr_lp = PageGetItemId(ctx.page, currentoffnum); > > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > > + curr_xmin = HeapTupleHeaderGetXmin(curr_htup); > > + > > + ctx.itemid = pred_lp = PageGetItemId(ctx.page, > ctx.offnum); > > + pred_htup = (HeapTupleHeader) PageGetItem(ctx.page, > pred_lp); > > + pred_xmin = HeapTupleHeaderGetXmin(pred_htup); > > + > > + /* > > + * If the predecessor's xmin is aborted or in > progress, the > > + * current tuples xmin should be aborted or in > progress > > + * respectively. Also both xmin's must be equal. > > + */ > > + if (!TransactionIdEquals(pred_xmin, curr_xmin) && > > + !TransactionIdDidCommit(pred_xmin)) > > + { > > + report_corruption(&ctx, > > + > psprintf("tuple with uncommitted xmin %u was updated to produce a tuple at > offset %u with differing xmin %u", > > + > (unsigned) pred_xmin, (unsigned) currentoffnum, (unsigned) > curr_xmin)); > > Is this necessarily true? What about a tuple that was inserted in a > subtransaction and then updated in another subtransaction of the same > toplevel > transaction? > > patch has been updated to handle cases of sub-transaction. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From c8ef516cde2578543b23faba3598ed014a0fb95f Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Wed, 30 Nov
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-30 16:09:19 +0530, Himanshu Upadhyaya wrote: > has been updated to handle cases of sub-transaction. Thanks! > + /* Loop over offsets and validate the data in the predecessor > array. */ > + for (OffsetNumber currentoffnum = FirstOffsetNumber; > currentoffnum <= maxoff; > + currentoffnum = OffsetNumberNext(currentoffnum)) > + { > + HeapTupleHeader pred_htup; > + HeapTupleHeader curr_htup; > + TransactionId pred_xmin; > + TransactionId curr_xmin; > + ItemId pred_lp; > + ItemId curr_lp; > + boolpred_in_progress; > + XidCommitStatus xid_commit_status; > + XidBoundsViolation xid_status; > + > + ctx.offnum = predecessor[currentoffnum]; > + ctx.attnum = -1; > + curr_lp = PageGetItemId(ctx.page, currentoffnum); > + if (!lp_valid[currentoffnum] || > ItemIdIsRedirected(curr_lp)) > + continue; I don't think we should do PageGetItemId(ctx.page, currentoffnum); if !lp_valid[currentoffnum]. > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > + curr_xmin = HeapTupleHeaderGetXmin(curr_htup); > + xid_status = get_xid_status(curr_xmin, &ctx, > &xid_commit_status); > + if (!(xid_status == XID_BOUNDS_OK || xid_status == > XID_INVALID)) > + continue; Why can we even get here if the xid status isn't XID_BOUNDS_OK? > + if (ctx.offnum == 0) For one, I think it'd be better to use InvalidOffsetNumber here. But more generally, storing the predecessor in ctx.offnum seems quite confusing. > + { > + /* > + * No harm in overriding value of ctx.offnum as > we will always > + * continue if we are here. > + */ > + ctx.offnum = currentoffnum; > + if (TransactionIdIsInProgress(curr_xmin) || > TransactionIdDidCommit(curr_xmin)) Is it actually ok to call TransactionIdDidCommit() here? There's a reason get_xid_status() exists after all. And we do have the xid status for xmin already, so this could just check xid_commit_status, no? Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
Hi, On Fri, Dec 2, 2022 at 5:43 AM Andres Freund wrote: > > > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > > + curr_xmin = HeapTupleHeaderGetXmin(curr_htup); > > + xid_status = get_xid_status(curr_xmin, &ctx, > &xid_commit_status); > > + if (!(xid_status == XID_BOUNDS_OK || xid_status == > XID_INVALID)) > > + continue; > > Why can we even get here if the xid status isn't XID_BOUNDS_OK? > > @@ -504,9 +516,269 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); referring above code, check_tuple(&ctx); do have this check but we populate lp_valid before that call. Populating lp_valid before check_tuple() is intentional because even if we do changes to get the return status from check_tuple() to populate that in lp_valid, it will be hard to validate cases that are dependent on aborted transaction (like "tuple with aborted xmin %u was updated to produce a tuple at offset %u with committed xmin %u") because check_tuple_visibility is also looking for aborted xmin and return false if tuple's xmin is aborted, in fact we can add one more parameter to check_tuple and get the status of transaction if it is aborted and accordingly set lp_valid to true but that will add unnecessary complexity and don't find it convincing implementation. Alternatively, I found rechecking xid_status is simpler and straight. > > > + if (ctx.offnum == 0) > > For one, I think it'd be better to use InvalidOffsetNumber here. But more > generally, storing the predecessor in ctx.offnum seems quite confusing. > > ok, I will change it to InvalidOffsetNumber at all the places, we need ctx.offnum to have the value of the predecessor array as this will be internally used by report_corruption function to generate the message(eg. below), and the format of these message's seems more simple and meaningful to report corruption. report_corruption(&ctx, psprintf("heap-only update produced a non-heap only tuple at offset %u", (unsigned) currentoffnum)); Here we don't need to mention ctx.offnum explicitly in the above message as this will be taken care of by the code below. "report_corruption_internal(Tuplestorestate *tupstore, TupleDesc tupdesc, BlockNumber blkno, OffsetNumber offnum, AttrNumber attnum, char *msg) { Datum values[HEAPCHECK_RELATION_COLS] = {0}; boolnulls[HEAPCHECK_RELATION_COLS] = {0}; HeapTuple tuple; values[0] = Int64GetDatum(blkno); values[1] = Int32GetDatum(offnum);" -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Fri, Dec 2, 2022 at 5:43 AM Andres Freund wrote: > > > + /* Loop over offsets and validate the data in the > predecessor array. */ > > + for (OffsetNumber currentoffnum = FirstOffsetNumber; > currentoffnum <= maxoff; > > + currentoffnum = OffsetNumberNext(currentoffnum)) > > + { > > + HeapTupleHeader pred_htup; > > + HeapTupleHeader curr_htup; > > + TransactionId pred_xmin; > > + TransactionId curr_xmin; > > + ItemId pred_lp; > > + ItemId curr_lp; > > + boolpred_in_progress; > > + XidCommitStatus xid_commit_status; > > + XidBoundsViolation xid_status; > > + > > + ctx.offnum = predecessor[currentoffnum]; > > + ctx.attnum = -1; > > + curr_lp = PageGetItemId(ctx.page, currentoffnum); > > + if (!lp_valid[currentoffnum] || > ItemIdIsRedirected(curr_lp)) > > + continue; > > I don't think we should do PageGetItemId(ctx.page, currentoffnum); if > !lp_valid[currentoffnum]. > > Fixed. > > > + if (ctx.offnum == 0) > > For one, I think it'd be better to use InvalidOffsetNumber here. But more > generally, storing the predecessor in ctx.offnum seems quite confusing. > > changed all relevant places to use InvalidOffsetNumber. > > > + { > > + /* > > + * No harm in overriding value of > ctx.offnum as we will always > > + * continue if we are here. > > + */ > > + ctx.offnum = currentoffnum; > > + if (TransactionIdIsInProgress(curr_xmin) > || TransactionIdDidCommit(curr_xmin)) > > Is it actually ok to call TransactionIdDidCommit() here? There's a reason > get_xid_status() exists after all. And we do have the xid status for xmin > already, so this could just check xid_commit_status, no? > > > I think it will be good to pass NULL to get_xid_status like "get_xid_status(curr_xmin, &ctx, NULL);" so that we can only check the xid status at the time when it is actually required. This way we can avoid checking xid status in cases when we simply 'continue' due to some check. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 02f0d19bd82f390df8e3d732d60cf3eb0ae1dc97 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Mon, 5 Dec 2022 18:28:33 +0530 Subject: [PATCH v8] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund --- contrib/amcheck/verify_heapam.c | 278 ++ src/bin/pg_amcheck/t/004_verify_heapam.pl | 193 ++- 2 files changed, 460 insertions(+), 11 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index b72a5c96d1..ebad73d64b 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,6 +399,9 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {InvalidOffsetNumber}; + OffsetNumber successor[MaxOffsetNumber] = {InvalidOffsetNumber}; + bool lp_valid[MaxOffsetNumber] = {false}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +436,8 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextoffnum; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +474,13 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * make entry in successor array, redirected tuple will be + * validated at the time when we loop over successor array + */ +successor[ctx.offnum] = rdoffnum; +lp_valid[ctx.offnum] = true; continue; } @@ -504,9 +516,268 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); + + /* +
Re: HOT chain validation in verify_heapam()
Hi, Thanks for working on this. +htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem); +if (!(HeapTupleHeaderIsHeapOnly(htup) && htup->t_infomask & HEAP_UPDATED)) +report_corruption(&ctx, + psprintf("Redirected Tuple at line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple", + (unsigned) rdoffnum)); This isn't safe because the line pointer referenced by rditem may not have been sanity-checked yet. Refer to the comment just below where it says "Sanity-check the line pointer's offset and length values". There are multiple problems with this error message. First, if you take a look at the existing messages - which is always a good thing to do when adding new ones - you will see that they are capitalized differently. Try to match the existing style. Second, we made a real effort with the existing messages to avoid referring to the names of identifiers that only exist at the C level. For example, just above you will see a message which says "line pointer redirection to item at offset %u precedes minimum offset %u". It deliberately does not say "line pointer redirection to item at offset %u is less than FirstOffsetNumber" even though that would be an equally correct statement of the problem. The intent here is to make the messages at least somewhat accessible to users who are somewhat familiar with how PostgreSQL storage works but may not read the C code. These comments apply to every message you add in the patch. The message also does not match the code. The code tests for HEAP_UPDATED, but the message claims that the code is testing for either HEAP_ONLY_TUPLE or HEAP_UPDATED. As a general rule, it is best not to club related tests together in cases like this, because it enables better and more specific error messages. It would be clearer to make an explicit comparison to 0, like (htup->t_infomask & HEAP_UPDATED) != 0, rather than relying on 0 being false and non-zero being true. It doesn't matter to the compiler, but it may help human readers. +/* + * Add line pointer offset to predecessor array. + * 1) If xmax is matching with xmin of next Tuple(reaching via its t_ctid). + * 2) If next tuple is in the same page. + * Raise corruption if: + * We have two tuples having same predecessor. + * + * We add offset to predecessor irrespective of transaction(t_xmin) status. We will + * do validation related to transaction status(and also all other validations) + * when we loop over predecessor array. + */ The formatting of this comment will, I think, be mangled if pgindent is run against the file. You can use - markers to prevent that, I believe, or (better) write this as a paragraph without relying on the lines ending up uneven in length. +if (predecessor[nextTupOffnum] != 0) +{ +report_corruption(&ctx, +psprintf("Tuple at offset %u is reachable from two or more updated tuple", +(unsigned) nextTupOffnum)); +continue; +} You need to do this test after xmin/xmax matching. Otherwise you might get false positives. Also, the message should be more specific and match the style of the existing messages. ctx.offnum is already going to be reported in another column, but we can report both nextTupOffnum and predecessor[nextTupOffnum] here e.g. "updated version at offset %u is also the updated version of tuple at offset %u". +currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr); +lp = PageGetItemId(ctx.page, nextTupOffnum); + +htup = (HeapTupleHeader) PageGetItem(ctx.page, lp); This has the same problem I mentioned in my first comment above, namely, we haven't necessarily sanity-checked the length and offset values for nextTupOffnum yet. Saying that another way, if the contents of lp are corrupt and point off the page, we want that to be reported as corruption (which the current code will already do) and we want this check to be skipped so that we don't crash or access random memory addresses. You need to think about how to rearrange the code so that we only perform checks that are known to be safe. +/* Now loop over offset and validate data in predecessor array.*/ +for ( ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) Please take the time to format your code according to the PostgeSQL standard practice. If you don't know what that looks like, use pgindent. +{ +HeapTupleHeader pred_htup, curr_htup; +TransactionId pred_xmin, curr_xmin, curr_xmax; +ItemId pred_lp, curr_lp; Same here. Within this loop, y
Re: HOT chain validation in verify_heapam()
if (!TransactionIdIsValid(curr_xmax) && > +HeapTupleHeaderIsHotUpdated(curr_htup)) > +{ > +report_corruption(&ctx, > +psprintf("Current tuple at offset %u is > HOT but is last tuple in the HOT chain.", > +(unsigned) ctx.offnum)); > +} > > This check has nothing to do with the predecessor[] array, so it seems > like it belongs in check_tuple() rather than here. Also, the message > is rather confused, because the test is checking whether the tuple has > been HOT-updated, while the message is talking about whether the tuple > was *itself* created by a HOT update. Also, when we're dealing with > corruption, statements like "is last tuple in the HOT chain" are > pretty ambiguous. Also, isn't this an issue for both HOT-updated > tuples and also just regular updated tuples? i.e. maybe what we should > be complaining about here is something like "tuple has been updated, > but xmax is 0" and then make the test check exactly that. > Moved to check_tuple_header. This should be applicable for both HOT and normal updates but even the last updated tuple in the normal update is HEAP_UPDATED so not sure how we can apply this check for a normal update? +if (!HeapTupleHeaderIsHotUpdated(pred_htup) && > +HeapTupleHeaderIsHeapOnly(pred_htup) && > +HeapTupleHeaderIsHeapOnly(curr_htup)) > +{ > +report_corruption(&ctx, > +psprintf("Current tuple at offset %u is > HOT but it is next updated tuple of last Tuple in HOT chain.", > + (unsigned) ctx.offnum)); > +} > > Three if-statements up, you tested two out of these three conditions > and complained if they were met. So any time this fires, that will > have also fired. > Yes, the above condition is not required. Now removed. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 5adc1d3c6c4526b723a114ca00eacdea20143861 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Tue, 6 Sep 2022 15:59:40 +0530 Subject: [PATCH v2] HOT chain validation in verify_heapam. --- contrib/amcheck/verify_heapam.c | 228 +--- 1 file changed, 206 insertions(+), 22 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index c875f3e5a2..cc69ccb239 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -158,6 +158,7 @@ static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx, static bool check_tuple_attribute(HeapCheckContext *ctx); static void check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta); +static bool check_lp(HeapCheckContext *ctx, uint16 lp_len, uint16 lp_off); static bool check_tuple_header(HeapCheckContext *ctx); static bool check_tuple_visibility(HeapCheckContext *ctx); @@ -399,6 +400,7 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {0}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +435,10 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextTupOffnum; + HeapTupleHeader htup; + OffsetNumber currOffnum = ctx.offnum; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,44 +475,178 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +ctx.offnum = rdoffnum; +if (!check_lp(&ctx, ItemIdGetLength(rditem), ItemIdGetOffset(rditem))) + continue; +ctx.offnum = currOffnum; + +htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem); +if (!HeapTupleHeaderIsHeapOnly(htup)) +{ + report_corruption(&ctx, + psprintf("redirected tuple at line pointer offset %u is not heap only tuple", + (unsigned) rdoffnum)); +} +if ((htup->t_infomask & HEAP_UPDATED) == 0) +{ + report_corruption(&ctx, + psprintf("redirected tuple at line pointer offset %u is not heap updated tuple", + (unsigned) rdoffnum)); +} continue; } /* Sanity-check the line pointer's offset and length values */ ctx.lp_len = ItemIdGetLength(ctx.itemid); ctx.lp_off = ItemIdGetOffset(ctx.itemid); + /* + * If offnum is already present in predecessor array means it was + * previously sanity-checked. + */ + if (predecessor[ctx.offnum]
Re: HOT chain validation in verify_heapam()
Hi Himanshu, Many thanks for working on this! > Please find attached the patch with the above idea of HOT chain's validation Please correct me if I'm wrong, but don't we have a race condition here: ``` +if ((TransactionIdDidAbort(pred_xmin) || TransactionIdIsInProgress(pred_xmin)) +&& !TransactionIdEquals(pred_xmin, curr_xmin)) { ``` The scenario that concerns me is the following: 1. TransactionIdDidAbort(pred_xmin) returns false 2. The transaction aborts 3. TransactionIdIsInProgress(pred_xmin) returns false 4. (false || false) gives us false. An error is reported, although actually the condition should have been true. -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
Hi again, > I am looking into the process of adding the TAP test for these changes and > finding a way to corrupt a page in the tap test Please note that currently the patch breaks many existing tests. I suggest fixing these first. For the details please see the cfbot report [1] or execute the tests locally. Personally I'm using a little script for this [2]. [1]: http://cfbot.cputube.org/ [2]: https://github.com/afiskon/pgscripts/blob/master/full-build.sh -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
Hi hackers, > Please correct me if I'm wrong, but don't we have a race condition here: > > ``` > +if ((TransactionIdDidAbort(pred_xmin) || > TransactionIdIsInProgress(pred_xmin)) > +&& !TransactionIdEquals(pred_xmin, curr_xmin)) > { > ``` > > The scenario that concerns me is the following: > > 1. TransactionIdDidAbort(pred_xmin) returns false > 2. The transaction aborts > 3. TransactionIdIsInProgress(pred_xmin) returns false > 4. (false || false) gives us false. An error is reported, although > actually the condition should have been true. It looks like I had a slight brain fade here. In order to report a false error either TransactionIdDidAbort() or TransactionIdIsInProgress() should return true and TransactionIdEquals() should be false. So actually under rare conditions the error will NOT be reported while it should. Other than that we seem to be safe from the concurrency perspective, unless I'm missing something again. Personally I don't have a strong opinion on whether we should bother about this scenario. Probably an explicit comment will not hurt. Also I suggest checking TransactionIdEquals() first though since it's cheaper. -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
On Tue, Sep 6, 2022 at 9:38 AM Aleksander Alekseev wrote: > > Please correct me if I'm wrong, but don't we have a race condition here: > > > > ``` > > +if ((TransactionIdDidAbort(pred_xmin) || > > TransactionIdIsInProgress(pred_xmin)) > > +&& !TransactionIdEquals(pred_xmin, curr_xmin)) > > { > > ``` > > It looks like I had a slight brain fade here. > > In order to report a false error either TransactionIdDidAbort() or > TransactionIdIsInProgress() should return true and > TransactionIdEquals() should be false. So actually under rare > conditions the error will NOT be reported while it should. Other than > that we seem to be safe from the concurrency perspective, unless I'm > missing something again. > > Personally I don't have a strong opinion on whether we should bother > about this scenario. Probably an explicit comment will not hurt. > > Also I suggest checking TransactionIdEquals() first though since it's cheaper. I think the check should be written like this: !TransactionIdEquals(pred_xmin, curr_xmin) && !TransctionIdDidCommit(pred_xmin) The TransactionIdEquals check should be done first for the reason you state: it's cheaper. I think that we shouldn't be using TransactionIdDidAbort() at all, because it can sometimes return false even when the transaction actually did abort. See test_lockmode_for_conflict() and TransactionIdIsInProgress() for examples of logic that copes with this. What's happening here is that TransactionIdDidAbort doesn't ever get called if the system crashes while a transaction is running. So we can use TransactionIdDidAbort() only as an optimization: if it returns true, the transaction is definitely aborted, but if it returns false, we have to check whether it's still running. If not, it aborted anyway. TransactionIdDidCommit() does not have the same issue. A transaction can abort without updating CLOG if the system crashes, but it can never commit without updating CLOG. If the transaction didn't commit, then it is either aborted or still in progress (and we don't care which, because neither is an error here). As to whether the existing formulation of the test has an error condition, you're generally right that we should test TransactionIdIsInProgress() before TransactionIdDidCommit/Abort, because we during commit or abort, we first set the status in CLOG - which is queried by TransactionIdDidCommit/Abort - and only afterward update the procarray - which is queried by TransactionIdIsInProgress. So normally TransactionIdIsInProgress should be checked first, and TransactionIdDidCommit/Abort should only be checked if it returns false, at which point we know that the return values of the latter calls can't ever change. Possibly there is an argument for including the TransactionIdInProgress check here too: !TransactionIdEquals(pred_xmin, curr_xmin) && (TransactionIdIsInProgress(pred_xmin) || !TransctionIdDidCommit(pred_xmin)) ...but I don't think it could change the answer. Most places that check TransactionIdIsInProgress() first are concerned with MVCC semantics, and here we are not. I think the only effects of including or excluding the TransactionIdIsInProgress() test are (1) performance, in that searching the procarray might avoid expense if it's cheaper than searching clog, or add expense if the reverse is true and (2) slightly changing the time at which we're first able to detect this form of corruption. I am inclined to prefer the simpler form of the test without TransactionIdIsInProgress() unless someone can say why we shouldn't go that route. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Tue, Sep 6, 2022 at 6:34 AM Himanshu Upadhyaya wrote: >> This isn't safe because the line pointer referenced by rditem may not >> have been sanity-checked yet. Refer to the comment just below where it >> says "Sanity-check the line pointer's offset and length values". >> > handled by creating a new function check_lp and calling it before accessing > the redirected tuple. I think this is going to result in duplicate error messages, because if A redirects to B, what keeps us from calling check_lp(B) once when we reach A and again when we reach B? I am kind of generally suspicious of the idea that, both for redirects and for ctid links, you just have it check_lp() on the target line pointer and then maybe try to skip doing that again later when we get there. That feels error-prone to me. I think we should try to find a way of organizing the code where we do the check_lp() checks on all line pointers in order without skipping around or backing up. It's not 100% clear to me what the best way of accomplishing that is, though. But here's one random idea: add a successor[] array and an lp_valid[] array. In the first loop, set lp_valid[offset] = true if it passes the check_lp() checks, and set successor[A] = B if A redirects to B or has a CTID link to B, without matching xmin/xmax. Then, in a second loop, iterate over the successor[] array. If successor[A] = B && lp_valid[A] && lp_valid[B], then check whether A.xmax = B.xmin; if so, then complain if predecessor[B] is already set, else set predecessor[B] = A. Then, in the third loop, iterate over the predecessor array just as you're doing now. Then it's clear that we do the lp_valid checks exactly once for every offset that might need them, and in order. And it's also clear that the predecessor-based checks can never happen unless the lp_valid checks passed for both of the offsets involved. > Done, reformatted using pg_indent. Thanks, but the new check_lp() function's declaration is not formatted according to pgindent guidelines. It's not enough to fix the problems once, you have to avoid reintroducing them. >> +if (!TransactionIdIsValid(curr_xmax) && >> +HeapTupleHeaderIsHotUpdated(curr_htup)) >> +{ >> +report_corruption(&ctx, >> +psprintf("Current tuple at offset %u is >> HOT but is last tuple in the HOT chain.", >> +(unsigned) ctx.offnum)); >> +} >> >> This check has nothing to do with the predecessor[] array, so it seems >> like it belongs in check_tuple() rather than here. Also, the message >> is rather confused, because the test is checking whether the tuple has >> been HOT-updated, while the message is talking about whether the tuple >> was *itself* created by a HOT update. Also, when we're dealing with >> corruption, statements like "is last tuple in the HOT chain" are >> pretty ambiguous. Also, isn't this an issue for both HOT-updated >> tuples and also just regular updated tuples? i.e. maybe what we should >> be complaining about here is something like "tuple has been updated, >> but xmax is 0" and then make the test check exactly that. > > Moved to check_tuple_header. This should be applicable for both HOT and > normal updates but even the last updated tuple in the normal update is > HEAP_UPDATED so not sure how we can apply this check for a normal update? Oh, yeah. You're right. I was thinking that HEAP_UPDATED was like HEAP_HOT_UPDATED, but it's not: HEAP_UPDATED gets set on the new tuple, while HEAP_HOT_UPDATED gets set on the old tuple. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Sep 7, 2022 at 2:49 AM Robert Haas wrote: > > But here's one random idea: add a successor[] array and an lp_valid[] > array. In the first loop, set lp_valid[offset] = true if it passes the > check_lp() checks, and set successor[A] = B if A redirects to B or has > a CTID link to B, without matching xmin/xmax. Then, in a second loop, > iterate over the successor[] array. If successor[A] = B && lp_valid[A] > && lp_valid[B], then check whether A.xmax = B.xmin; if so, then > complain if predecessor[B] is already set, else set predecessor[B] = > A. Then, in the third loop, iterate over the predecessor array just as > you're doing now. Then it's clear that we do the lp_valid checks > exactly once for every offset that might need them, and in order. And > it's also clear that the predecessor-based checks can never happen > unless the lp_valid checks passed for both of the offsets involved. > > > Approach of introducing a successor array is good but I see one overhead with having both successor and predecessor array, that is, we will traverse each offset on page thrice(one more for original loop on offset) and with each offset we have to retrieve /reach an ItemID(PageGetItemId) and Item(PageGetItem) itself. This is not much overhead as they are all preprocessors but there will be some overhead. How about having new array(initialised with LP_NOT_CHECKED) of enum LPStatus as below typedef enum LPStatus { LP_NOT_CHECKED, LP_VALID, LP_NOT_VALID }LPStatus; and validating and setting with proper status at three places 1) while validating Redirect Tuple 2) while validating populating predecessor array and 3) at original place of "sanity check" something like: " if (lpStatus[rdoffnum] == LP_NOT_CHECKED) { ctx.offnum = rdoffnum; if (!check_lp(&ctx, ItemIdGetLength(rditem), ItemIdGetOffset(rditem))) { lpStatus[rdoffnum] = LP_NOT_VALID; continue; } lpStatus[rdoffnum] = LP_VALID; } else if (lpStatus[rdoffnum] == LP_NOT_VALID) continue; " thoughts? -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Fri, Sep 9, 2022 at 10:00 AM Himanshu Upadhyaya wrote: > Approach of introducing a successor array is good but I see one overhead with > having both successor and predecessor array, that is, we will traverse each > offset on page thrice(one more for original loop on offset) and with each > offset we have to retrieve > /reach an ItemID(PageGetItemId) and Item(PageGetItem) itself. This is not > much overhead as they are all preprocessors but there will be some overhead. > How about having new array(initialised with LP_NOT_CHECKED) of enum LPStatus > as below > > typedef enum LPStatus > { > LP_NOT_CHECKED, > LP_VALID, > LP_NOT_VALID > }LPStatus; > > and validating and setting with proper status at three places > 1) while validating Redirect Tuple > 2) while validating populating predecessor array and > 3) at original place of "sanity check" Well, having to duplicate the logic in three places doesn't seem all that clean to me. Admittedly, I haven't tried implementing my proposal, so maybe that doesn't come out very clean either. I don't know. But I think having the code be clearly correct here is the most important thing, not shaving a few CPU cycles here or there. It's not even clear to me that your way would be cheaper, because an "if" statement is certainly not free, and in fact is probably more expensive than an extra call to PageGetItem() or PageGetItemId(). Branches are usually more expensive than math. But actually I doubt that it matters much either way. I think the way to figure out whether we have a performance problem is to write the code in the stylistically best way and then test it. There may be no problem at all, and if there is a problem, it may not be where we think it will be. In short, let's apply Knuth's optimization principle. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Sep 7, 2022 at 2:49 AM Robert Haas wrote: > > But here's one random idea: add a successor[] array and an lp_valid[] > array. In the first loop, set lp_valid[offset] = true if it passes the > check_lp() checks, and set successor[A] = B if A redirects to B or has > a CTID link to B, without matching xmin/xmax. Then, in a second loop, > iterate over the successor[] array. If successor[A] = B && lp_valid[A] > && lp_valid[B], then check whether A.xmax = B.xmin; if so, then > complain if predecessor[B] is already set, else set predecessor[B] = > A. Then, in the third loop, iterate over the predecessor array just as > you're doing now. Then it's clear that we do the lp_valid checks > exactly once for every offset that might need them, and in order. And > it's also clear that the predecessor-based checks can never happen > unless the lp_valid checks passed for both of the offsets involved. > > ok, I have introduced a new approach to first construct a successor array and then loop over the successor array to construct a predecessor array. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 4a32d7cbdec0d090e99a9a58cb4deb5cb4c03aef Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Mon, 19 Sep 2022 13:50:47 +0530 Subject: [PATCH v3] HOT chain validation in verify_heapam --- contrib/amcheck/verify_heapam.c | 209 1 file changed, 209 insertions(+) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index c875f3e5a2..3da4c09665 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,6 +399,9 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {0}; + OffsetNumber successor[MaxOffsetNumber] = {0}; + bool lp_valid[MaxOffsetNumber] = {false}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +436,8 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextoffnum; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +474,12 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * make entry in successor array, redirected tuple will be + * validated at the time when we loop over successor array + */ +successor[ctx.offnum] = rdoffnum; continue; } @@ -504,9 +515,200 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); + + /* + * Now add data to successor array, it will later be used to + * generate the predecessor array. We need to access the tuple's + * header to populate the predecessor array but tuple is not + * necessarily sanity checked yet so delaying construction of + * predecessor array until all tuples are sanity checked. + */ + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextoffnum != ctx.offnum) + { +successor[ctx.offnum] = nextoffnum; + } + } + + /* + * Loop over offset and populate predecessor array from all entries + * that are present in successor array. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmax; + TransactionId next_xmin; + OffsetNumber nextoffnum = successor[ctx.offnum]; + + curr_lp = PageGetItemId(ctx.page, ctx.offnum); + if (nextoffnum == 0) + { +/* + * Either last updated tuple in chain or corruption raised for + * this tuple. + */ +continue; + } + if (ItemIdIsRedirected(curr_lp) && lp_valid[nextoffnum]) + { +next_lp = PageGetItemId(ctx.page, nextoffnum); +next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);; +if (!HeapTupleHeaderIsHeapOnly(next_htup)) +{ + report_corruption(&ctx, + psprintf("redirected tuple at line pointer offset %u is not heap only tuple", + (unsigned) nextoffnum)); +} +if ((next_htup->t_infomask & HEAP_UPDATED) == 0) +{ + report_corruption(&ctx, + psprintf("redirected tuple at l
Re: HOT chain validation in verify_heapam()
On Wed, Sep 7, 2022 at 12:11 AM Robert Haas wrote: > > I think the check should be written like this: > > !TransactionIdEquals(pred_xmin, curr_xmin) && > !TransctionIdDidCommit(pred_xmin) > > The TransactionIdEquals check should be done first for the reason you > state: it's cheaper. > > I think that we shouldn't be using TransactionIdDidAbort() at all, > because it can sometimes return false even when the transaction > actually did abort. See test_lockmode_for_conflict() and > TransactionIdIsInProgress() for examples of logic that copes with > this. What's happening here is that TransactionIdDidAbort doesn't ever > get called if the system crashes while a transaction is running. So we > can use TransactionIdDidAbort() only as an optimization: if it returns > true, the transaction is definitely aborted, but if it returns false, > we have to check whether it's still running. If not, it aborted > anyway. > > TransactionIdDidCommit() does not have the same issue. A transaction > can abort without updating CLOG if the system crashes, but it can > never commit without updating CLOG. If the transaction didn't commit, > then it is either aborted or still in progress (and we don't care > which, because neither is an error here). > > As to whether the existing formulation of the test has an error > condition, you're generally right that we should test > TransactionIdIsInProgress() before TransactionIdDidCommit/Abort, > because we during commit or abort, we first set the status in CLOG - > which is queried by TransactionIdDidCommit/Abort - and only afterward > update the procarray - which is queried by TransactionIdIsInProgress. > So normally TransactionIdIsInProgress should be checked first, and > TransactionIdDidCommit/Abort should only be checked if it returns > false, at which point we know that the return values of the latter > calls can't ever change. Possibly there is an argument for including > the TransactionIdInProgress check here too: > > !TransactionIdEquals(pred_xmin, curr_xmin) && > (TransactionIdIsInProgress(pred_xmin) || > !TransctionIdDidCommit(pred_xmin)) > > ...but I don't think it could change the answer. Most places that > check TransactionIdIsInProgress() first are concerned with MVCC > semantics, and here we are not. I think the only effects of including > or excluding the TransactionIdIsInProgress() test are (1) performance, > in that searching the procarray might avoid expense if it's cheaper > than searching clog, or add expense if the reverse is true and (2) > slightly changing the time at which we're first able to detect this > form of corruption. I am inclined to prefer the simpler form of the > test without TransactionIdIsInProgress() unless someone can say why we > shouldn't go that route. > > Done, updated in the v3 patch. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi Himanshu, > Done, updated in the v3 patch. Thanks for the updated patch. Here is v4 with fixed compiler warnings and some minor tweaks from me. I didn't put too much thought into the algorithm but I already see something strange. At verify_heapam.c:553 you declared curr_xmax and next_xmin. However the variables are not used/initialized until you do: ``` if (lp_valid[nextoffnum] && lp_valid[ctx.offnum] && TransactionIdIsValid(curr_xmax) && TransactionIdEquals(curr_xmax, next_xmin)) { /* ... */ ``` In v4 I elected to initialize both curr_xmax and next_xmin with InvalidTransactionId for safety and in order to silence the compiler but still there is no way this condition can succeed. Please make sure there is no logic missing. -- Best regards, Aleksander Alekseev v4-0001-Implement-HOT-chain-validation-in-verify_heapam.patch Description: Binary data
Re: HOT chain validation in verify_heapam()
On Mon, Sep 19, 2022 at 8:27 PM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi Himanshu, > > > Done, updated in the v3 patch. > > Thanks for the updated patch. > > Here is v4 with fixed compiler warnings and some minor tweaks from me. > > I didn't put too much thought into the algorithm but I already see > something strange. At verify_heapam.c:553 you declared curr_xmax and > next_xmin. However the variables are not used/initialized until you > do: > > ``` > if (lp_valid[nextoffnum] && lp_valid[ctx.offnum] && > TransactionIdIsValid(curr_xmax) && > TransactionIdEquals(curr_xmax, next_xmin)) { > /* ... */ > ``` > > In v4 I elected to initialize both curr_xmax and next_xmin with > InvalidTransactionId for safety and in order to silence the compiler > but still there is no way this condition can succeed. > > Please make sure there is no logic missing. > > Hi Aleksander, Thanks for sharing the feedback, It's my mistake, sorry about that, I was trying to merge two if conditions and forgot to move the initialization part for xmin and xmax. Now I think that it will be good to have nested if, and have an inner if condition to test xmax and xmin matching. This way we can retrieve and populate xmin and xmax when it is actually required for the inner if condition. I have changed this in the attached patch. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 1a51c544c5c9a14441831f5299b9d6fe275fbf53 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Mon, 19 Sep 2022 21:44:34 +0530 Subject: [PATCH v4] HOT chain validation in verify_heapam --- contrib/amcheck/verify_heapam.c | 211 1 file changed, 211 insertions(+) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index c875f3e5a2..2b9244ca69 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,6 +399,9 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {0}; + OffsetNumber successor[MaxOffsetNumber] = {0}; + bool lp_valid[MaxOffsetNumber] = {false}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +436,8 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextoffnum; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +474,12 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * make entry in successor array, redirected tuple will be + * validated at the time when we loop over successor array + */ +successor[ctx.offnum] = rdoffnum; continue; } @@ -504,9 +515,202 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); + + /* + * Now add data to successor array, it will later be used to + * generate the predecessor array. We need to access the tuple's + * header to populate the predecessor array but tuple is not + * necessarily sanity checked yet so delaying construction of + * predecessor array until all tuples are sanity checked. + */ + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextoffnum != ctx.offnum) + { +successor[ctx.offnum] = nextoffnum; + } + } + + /* + * Loop over offset and populate predecessor array from all entries + * that are present in successor array. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmax; + TransactionId next_xmin; + OffsetNumber nextoffnum = successor[ctx.offnum]; + + curr_lp = PageGetItemId(ctx.page, ctx.offnum); + if (nextoffnum == 0) + { +/* + * Either last updated tuple in chain or corruption raised for + * this tuple. + */ +continue; + } + if (ItemIdIsRedirected(curr_lp) && lp_valid[nextoffnum]) + { +next_lp = PageGetItemId(ctx.page, nextoffnum); +next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);; +if (!HeapTupleH
Re: HOT chain validation in verify_heapam()
Hi Himanshu, > I have changed this in the attached patch. If it's not too much trouble could you please base your changes on v4 that I submitted? I put some effort into writing a proper commit message, editing the comments, etc. The easiest way of doing this is using `git am` and `git format-patch`. -- Best regards, Aleksander Alekseev
Re: HOT chain validation in verify_heapam()
On Mon, Sep 19, 2022 at 10:04 PM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi Himanshu, > > > I have changed this in the attached patch. > > If it's not too much trouble could you please base your changes on v4 > that I submitted? I put some effort into writing a proper commit > message, editing the comments, etc. The easiest way of doing this is > using `git am` and `git format-patch`. > > Please find it attached. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 3f3290ffb857117385f79f85aa599c588340924b Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Tue, 20 Sep 2022 14:23:31 +0530 Subject: [PATCH v5] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev Discussion: https://postgr.es/m/CAPF61jBBR2-iE-EmN_9v0hcQEfyz_17e5Lbb0%2Bu2%3D9ukA9sWmQ%40mail.gmail.com --- contrib/amcheck/verify_heapam.c | 214 1 file changed, 214 insertions(+) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index c875f3e5a2..e90e1ef2f3 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,6 +399,9 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {0}; + OffsetNumber successor[MaxOffsetNumber] = {0}; + bool lp_valid[MaxOffsetNumber] = {false}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +436,8 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextoffnum; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +474,12 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * make entry in successor array, redirected tuple will be + * validated at the time when we loop over successor array + */ +successor[ctx.offnum] = rdoffnum; continue; } @@ -504,9 +515,205 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); + + /* + * Add the data to the successor array. It will be used later to + * generate the predecessor array. + * + * We need to access the tuple's header to populate the + * predecessor array. However the tuple is not necessarily sanity + * checked yet so delaying construction of predecessor array until + * all tuples are sanity checked. + */ + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextoffnum != ctx.offnum) + { +successor[ctx.offnum] = nextoffnum; + } + } + + /* + * Loop over offset and populate predecessor array from all entries + * that are present in successor array. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmax; + TransactionId next_xmin; + + OffsetNumber nextoffnum = successor[ctx.offnum]; + + curr_lp = PageGetItemId(ctx.page, ctx.offnum); + if (nextoffnum == 0) + { +/* + * This is either the last updated tuple in the chain or a + * corruption raised for this tuple. + */ +continue; + } + if (ItemIdIsRedirected(curr_lp) && lp_valid[nextoffnum]) + { +next_lp = PageGetItemId(ctx.page, nextoffnum); +next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);; +if (!HeapTupleHeaderIsHeapOnly(next_htup)) +{ + report_corruption(&ctx, + psprintf("redirected tuple at line pointer offset %u is not heap only tuple", + (unsigned) nextoffnum)); +} +if ((next_htup->t_infomask & HEAP_UPDATED) == 0) +{ + report_corruption(&ctx, + psprintf("redirected tuple at line pointer offset %u is not heap updated tuple", + (unsigned) nextoffnum)); +} +continue; + } + + /* + * Add a line pointer offset to the predecessor array. 1) If xmax + * is matching with xmin of next tuple (reaching via its t_ctid). + * 2) If the next tuple is in the same page. Raise corruption if + * we have two tuples having the same predecessor. + * + * We add the offset to
Re: HOT chain validation in verify_heapam()
On Tue, Sep 20, 2022 at 5:00 AM Himanshu Upadhyaya wrote: > Please find it attached. This patch still has no test cases. Just as we have test cases for the existing corruption checks, we should have test cases for these new corruption checks, showing cases where they actually fire. I think I would be inclined to set lp_valid[x] = true in both the redirected and non-redirected case, and then have the very first thing that the second loop does be if (nextoffnum == 0 || !lp_valid[ctx.offnum]) continue. I think that would be more clear about the intent to ignore line pointers that failed validation. Also, if you did it that way, then you could catch the case of a redirected line pointer pointing to another redirected line pointer, which is a corruption condition that the current code does not appear to check. +/* + * Validation via the predecessor array. 1) If the predecessor's + * xmin is aborted or in progress, the current tuples xmin should + * be aborted or in progress respectively. Also both xmin's must + * be equal. 2) If the predecessor's xmin is not frozen, then + * current tuple's shouldn't be either. 3) If the predecessor's + * xmin is equal to the current tuple's xmin, the current tuple's + * cmin should be greater than the predecessor's cmin. 4) If the + * current tuple is not HOT then its predecessor's tuple must not + * be HEAP_HOT_UPDATED. 5) If the current tuple is HOT then its + * predecessor's tuple must be HEAP_HOT_UPDATED. + */ This comment needs to be split up into pieces and the pieces need to be moved closer to the tests to which they correspond. + psprintf("unfrozen tuple was updated to produce a tuple at offset %u which is not frozen", Shouldn't this say "which is frozen"? + * Not a corruption if current tuple is updated/deleted by a + * different transaction, means t_cid will point to cmax (that is + * command id of deleting transaction) and cid of predecessor not + * necessarily will be smaller than cid of current tuple. t_cid I think that the next person who reads this code is likely to understand that the CIDs of different transactions are numerically unrelated. What's less obvious is that if the XID is the same, the newer update must have a higher CID. + * can hold combo command id but we are not worrying here since + * combo command id of the next updated tuple (if present) must be + * greater than combo command id of the current tuple. So here we + * are not checking HEAP_COMBOCID flag and simply doing t_cid + * comparison. I disapprove of ignoring the HEAP_COMBOCID flag. Emitting a message claiming that the CID has a certain value when that's actually a combo CID is misleading, so at least a different message wording is needed in such cases. But it's also not clear to me that the newer update has to have a higher combo CID, because combo CIDs can be reused. If you have multiple cursors open in the same transaction, the updates can be interleaved, and it seems to me that it might be possible for an older CID to have created a certain combo CID after a newer CID, and then both cursors could update the same page in succession and end up with combo CIDs out of numerical order. Unless somebody can make a convincing argument that this isn't possible, I think we should just skip this check for cases where either tuple has a combo CID. +if (TransactionIdEquals(pred_xmin, curr_xmin) && +(TransactionIdEquals(curr_xmin, curr_xmax) || + !TransactionIdIsValid(curr_xmax)) && pred_cmin >= curr_cmin) I don't understand the reason for the middle part of this condition -- TransactionIdEquals(curr_xmin, curr_xmax) || !TransactionIdIsValid(curr_xmax). I suppose the comment is meant to explain this, but I still don't get it. If a tuple with XMIN 12345 CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's corruption, regardless of what the XMAX of the second tuple may happen to be. +if (HeapTupleHeaderIsHeapOnly(curr_htup) && +!HeapTupleHeaderIsHotUpdated(pred_htup)) +if (!HeapTupleHeaderIsHeapOnly(curr_htup) && +HeapTupleHeaderIsHotUpdated(pred_htup)) I think it would be slightly clearer to write these tests the other way around i.e. check the previous tuple's state first. +if (!TransactionIdIsValid(curr_xmax) && HeapTupleHeaderIsHotUpdated(tuphdr)) +{ +report_corruption(ctx, + psprintf("tuple has been updated, but xmax is 0")); +result = false; +} I guess this message needs to say "tuple has been HOT updated, but xmax is 0" or something like that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Tue, Sep 20, 2022 at 6:43 PM Robert Haas wrote: > I disapprove of ignoring the HEAP_COMBOCID flag. Emitting a message > claiming that the CID has a certain value when that's actually a combo > CID is misleading, so at least a different message wording is needed > in such cases. But it's also not clear to me that the newer update has > to have a higher combo CID, because combo CIDs can be reused. If you > have multiple cursors open in the same transaction, the updates can be > interleaved, and it seems to me that it might be possible for an older > CID to have created a certain combo CID after a newer CID, and then > both cursors could update the same page in succession and end up with > combo CIDs out of numerical order. Unless somebody can make a > convincing argument that this isn't possible, I think we should just > skip this check for cases where either tuple has a combo CID. > > Here our objective is to validate if both Predecessor's xmin and current Tuple's xmin are same then cmin of predecessor must be less than current Tuple's cmin. In case when both tuple xmin's are same then I think predecessor's t_cid will always hold combo CID. Then either one or both tuple will always have a combo CID and skipping this check based on "either tuple has a combo CID" will make this if condition to be evaluated to false ''. > +if (TransactionIdEquals(pred_xmin, curr_xmin) && > +(TransactionIdEquals(curr_xmin, curr_xmax) || > + !TransactionIdIsValid(curr_xmax)) && pred_cmin >= > curr_cmin) > > I don't understand the reason for the middle part of this condition -- > TransactionIdEquals(curr_xmin, curr_xmax) || > !TransactionIdIsValid(curr_xmax). I suppose the comment is meant to > explain this, but I still don't get it. If a tuple with XMIN 12345 > CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's > corruption, regardless of what the XMAX of the second tuple may happen > to be. > tuple | t_xmin | t_xmax | t_cid | t_ctid | tuple_data_split | heap_tuple_infomask_flags ---+++---++-+-- - 1 |971 |971 | 0 | (0,3) | {"\\x1774657374312020202020","\\x0100"} | ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{}) 2 |971 | 0 | 1 | (0,2) | {"\\x1774657374322020202020","\\x0200"} | ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{}) 3 |971 |971 | 1 | (0,4) | {"\\x1774657374322020202020","\\x0100"} | ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY _TUPLE}",{}) 4 |971 |972 | 0 | (0,5) | {"\\x1774657374332020202020","\\x0100"} | ("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{}) 5 |972 | 0 | 0 | (0,5) | {"\\x1774657374342020202020","\\x0100"} | ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{}) In the above case Tuple 1->3->4 is inserted and updated by xid 971 and tuple 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of deleting transaction(cmax), that is why we need to check xmax of the Tuple. Please correct me if I am missing anything here? -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Sat, Sep 24, 2022 at 8:45 AM Himanshu Upadhyaya wrote: > Here our objective is to validate if both Predecessor's xmin and current > Tuple's xmin are same then cmin of predecessor must be less than current > Tuple's cmin. In case when both tuple xmin's are same then I think > predecessor's t_cid will always hold combo CID. > Then either one or both tuple will always have a combo CID and skipping this > check based on "either tuple has a combo CID" will make this if condition to > be evaluated to false ''. Fair point. I think maybe we should just remove the CID validation altogether. I thought initially that it would be possible to have a newer update with a numerically lower combo CID, but after some experimentation I don't see a way to do it. However, it also doesn't seem very useful to me to check that the combo CIDs are in ascending order. I mean, even if that's not supposed to happen and does anyway, there aren't really any enduring consequences, because command IDs are ignored completely outside of the transaction that performed the operation originally. So even if the combo CIDs were set to completely random values, is that really corruption? At most it messes things up for the duration of one transaction. And if we just have plain CIDs rather than combo CIDs, the same thing is true: they could be totally messed up and it wouldn't really matter beyond the lifetime of that one transaction. Also, it would be a lot more tempting to check this if we could check it in all cases, but we can't. If a tuple is inserted in transaction T1 and ten updated twice in transaction T2, we'll have only one combo CID and nothing to compare it against, nor any way to decode what CMIN and CMAX it originally represented. And this is probably a pretty common type of case. >> +if (TransactionIdEquals(pred_xmin, curr_xmin) && >> +(TransactionIdEquals(curr_xmin, curr_xmax) || >> + !TransactionIdIsValid(curr_xmax)) && pred_cmin >= >> curr_cmin) >> >> I don't understand the reason for the middle part of this condition -- >> TransactionIdEquals(curr_xmin, curr_xmax) || >> !TransactionIdIsValid(curr_xmax). I suppose the comment is meant to >> explain this, but I still don't get it. If a tuple with XMIN 12345 >> CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's >> corruption, regardless of what the XMAX of the second tuple may happen >> to be. > > tuple | t_xmin | t_xmax | t_cid | t_ctid | tuple_data_split > | > heap_tuple_infomask_flags > > ---+++---++-+-- > - > 1 |971 |971 | 0 | (0,3) | > {"\\x1774657374312020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{}) > 2 |971 | 0 | 1 | (0,2) | > {"\\x1774657374322020202020","\\x0200"} | > ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{}) > 3 |971 |971 | 1 | (0,4) | > {"\\x1774657374322020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY > _TUPLE}",{}) > 4 |971 |972 | 0 | (0,5) | > {"\\x1774657374332020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{}) > 5 |972 | 0 | 0 | (0,5) | > {"\\x1774657374342020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{}) > > In the above case Tuple 1->3->4 is inserted and updated by xid 971 and tuple > 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its > predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of > deleting transaction(cmax), that is why we need to check xmax of the Tuple. > > Please correct me if I am missing anything here? Hmm, I see, so basically you're trying to check whether the CID field contains a CMIN as opposed to a CMAX. But I'm not sure this test is entirely reliable, because heap_prepare/execute_freeze_tuple() can set a tuple's xmax to InvalidTransactionId even after it's had some other value, and that won't do anything to the contents of the CID field. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Tue, Sep 27, 2022 at 1:35 AM Robert Haas wrote: > On Sat, Sep 24, 2022 at 8:45 AM Himanshu Upadhyaya > wrote: > > Here our objective is to validate if both Predecessor's xmin and current > Tuple's xmin are same then cmin of predecessor must be less than current > Tuple's cmin. In case when both tuple xmin's are same then I think > predecessor's t_cid will always hold combo CID. > > Then either one or both tuple will always have a combo CID and skipping > this check based on "either tuple has a combo CID" will make this if > condition to be evaluated to false ''. > > Fair point. I think maybe we should just remove the CID validation > altogether. I thought initially that it would be possible to have a > newer update with a numerically lower combo CID, but after some > experimentation I don't see a way to do it. However, it also doesn't > seem very useful to me to check that the combo CIDs are in ascending > order. I mean, even if that's not supposed to happen and does anyway, > there aren't really any enduring consequences, because command IDs are > ignored completely outside of the transaction that performed the > operation originally. So even if the combo CIDs were set to completely > random values, is that really corruption? At most it messes things up > for the duration of one transaction. And if we just have plain CIDs > rather than combo CIDs, the same thing is true: they could be totally > messed up and it wouldn't really matter beyond the lifetime of that > one transaction. > > Also, it would be a lot more tempting to check this if we could check > it in all cases, but we can't. If a tuple is inserted in transaction > T1 and ten updated twice in transaction T2, we'll have only one combo > CID and nothing to compare it against, nor any way to decode what CMIN > and CMAX it originally represented. And this is probably a pretty > common type of case. > > ok, I will be removing this entire validation of cmin/cid in my next patch. > >> +if (TransactionIdEquals(pred_xmin, curr_xmin) && > >> +(TransactionIdEquals(curr_xmin, curr_xmax) || > >> + !TransactionIdIsValid(curr_xmax)) && pred_cmin >= > curr_cmin) > >> > >> I don't understand the reason for the middle part of this condition -- > >> TransactionIdEquals(curr_xmin, curr_xmax) || > >> !TransactionIdIsValid(curr_xmax). I suppose the comment is meant to > >> explain this, but I still don't get it. If a tuple with XMIN 12345 > >> CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's > >> corruption, regardless of what the XMAX of the second tuple may happen > >> to be. > > > > tuple | t_xmin | t_xmax | t_cid | t_ctid | > tuple_data_split | >heap_tuple_infomask_flags > > > > > ---+++---++-+-- > > - > > 1 |971 |971 | 0 | (0,3) | > {"\\x1774657374312020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{}) > > 2 |971 | 0 | 1 | (0,2) | > {"\\x1774657374322020202020","\\x0200"} | > ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{}) > > 3 |971 |971 | 1 | (0,4) | > {"\\x1774657374322020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY > > _TUPLE}",{}) > > 4 |971 |972 | 0 | (0,5) | > {"\\x1774657374332020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{}) > > 5 |972 | 0 | 0 | (0,5) | > {"\\x1774657374342020202020","\\x0100"} | > ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{}) > > > > In the above case Tuple 1->3->4 is inserted and updated by xid 971 and > tuple 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its > predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of > deleting transaction(cmax), that is why we need to check xmax of the Tuple. > > > > Please correct me if I am missing anything here? > > Hmm, I see, so basically you're trying to check whether the CID field > contains a CMIN as opposed to a CMAX. But I'm not sure this test is > entirely reliable, because heap_prepare/execute_freeze_tuple() can set > a tuple's xmax to InvalidTransactionId even after it's had some other > value, and that won't do anything to the contents of the CID field. > ok, Got it, as we are removing this cmin/cid validation so we don't need any change here. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
he second tuple may happen > to be. > > As discussed in our last discussion, I am removing this check altogether. > +if (HeapTupleHeaderIsHeapOnly(curr_htup) && > +!HeapTupleHeaderIsHotUpdated(pred_htup)) > > +if (!HeapTupleHeaderIsHeapOnly(curr_htup) && > +HeapTupleHeaderIsHotUpdated(pred_htup)) > > I think it would be slightly clearer to write these tests the other > way around i.e. check the previous tuple's state first. > > Done. > +if (!TransactionIdIsValid(curr_xmax) && > HeapTupleHeaderIsHotUpdated(tuphdr)) > +{ > + report_corruption(ctx, > + psprintf("tuple has been updated, but xmax is > 0")); > +result = false; > +} > > I guess this message needs to say "tuple has been HOT updated, but > xmax is 0" or something like that. > > Done. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 08fe01f5073c0a850541265494bb4a875bec7d3f Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Fri, 30 Sep 2022 17:44:56 +0530 Subject: [PATCH v6] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev Discussion: https://postgr.es/m/CAPF61jBBR2-iE-EmN_9v0hcQEfyz_17e5Lbb0%2Bu2%3D9ukA9sWmQ%40mail.gmail.com --- contrib/amcheck/verify_heapam.c | 207 ++ src/bin/pg_amcheck/t/004_verify_heapam.pl | 192 ++-- 2 files changed, 388 insertions(+), 11 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index c875f3e5a2..007f7b2f37 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,6 +399,9 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber] = {0}; + OffsetNumber successor[MaxOffsetNumber] = {0}; + bool lp_valid[MaxOffsetNumber] = {false}; CHECK_FOR_INTERRUPTS(); @@ -433,6 +436,8 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextoffnum; + ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +474,13 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * make entry in successor array, redirected tuple will be + * validated at the time when we loop over successor array + */ +successor[ctx.offnum] = rdoffnum; +lp_valid[ctx.offnum] = true; continue; } @@ -504,9 +516,197 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); + + /* + * Add the data to the successor array if next updated tuple is in + * the same page. It will be used later to generate the + * predecessor array. + * + * We need to access the tuple's header to populate the + * predecessor array. However the tuple is not necessarily sanity + * checked yet so delaying construction of predecessor array until + * all tuples are sanity checked. + */ + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextoffnum != ctx.offnum) + { +successor[ctx.offnum] = nextoffnum; + } + } + + /* + * Loop over offset and populate predecessor array from all entries + * that are present in successor array. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmax; + TransactionId next_xmin; + + OffsetNumber nextoffnum = successor[ctx.offnum]; + + curr_lp = PageGetItemId(ctx.page, ctx.offnum); + if (nextoffnum == 0 || !lp_valid[ctx.offnum] || !lp_valid[nextoffnum]) + { +/* + * This is either the last updated tuple in the chain or a + * corruption raised for this tuple. + */ +continue; + } + if (ItemIdIsRedirected(curr_lp)) + { +next_lp = PageGetItemId(ctx.page, nextoffnum); +if (ItemIdIsRedirected(next_lp)) +{ + report_corruption(&ctx, + psprintf("redirected line pointer pointing to another redirected li
Re: HOT chain validation in verify_heapam()
On Wed, Mar 8, 2023 at 7:30 PM Himanshu Upadhyaya < upadhyaya.himan...@gmail.com> wrote: Please find the v11 patch with all these changes. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From f2b262e95fe07dddfec994f20a6d2e76bc12b410 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Thu, 9 Mar 2023 21:18:58 +0530 Subject: [PATCH v11] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund. Some revisions by Robert Haas. --- contrib/amcheck/verify_heapam.c | 291 +- src/bin/pg_amcheck/t/004_verify_heapam.pl | 288 - 2 files changed, 562 insertions(+), 17 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 4fcfd6df72..9cd4a795a0 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -150,7 +150,9 @@ typedef struct HeapCheckContext } HeapCheckContext; /* Internal implementation */ -static void check_tuple(HeapCheckContext *ctx); +static void check_tuple(HeapCheckContext *ctx, + bool *xmin_commit_status_ok, + XidCommitStatus *xmin_commit_status); static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx, ToastedAttribute *ta, int32 *expected_chunk_seq, uint32 extsize); @@ -160,7 +162,9 @@ static void check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta); static bool check_tuple_header(HeapCheckContext *ctx); -static bool check_tuple_visibility(HeapCheckContext *ctx); +static bool check_tuple_visibility(HeapCheckContext *ctx, + bool *xmin_commit_status_ok, + XidCommitStatus *xmin_commit_status); static void report_corruption(HeapCheckContext *ctx, char *msg); static void report_toast_corruption(HeapCheckContext *ctx, @@ -399,9 +403,16 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber]; + OffsetNumber successor[MaxOffsetNumber]; + bool lp_valid[MaxOffsetNumber]; + bool xmin_commit_status_ok[MaxOffsetNumber]; + XidCommitStatus xmin_commit_status[MaxOffsetNumber]; CHECK_FOR_INTERRUPTS(); + memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber); + /* Optionally skip over all-frozen or all-visible blocks */ if (skip_option != SKIP_PAGES_NONE) { @@ -433,6 +444,12 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + BlockNumber nextblkno; + OffsetNumber nextoffnum; + + successor[ctx.offnum] = InvalidOffsetNumber; + lp_valid[ctx.offnum] = false; + xmin_commit_status_ok[ctx.offnum] = false; ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * Record the fact that this line pointer has passed basic + * sanity checking, and also the offset number to which it + * points. + */ +lp_valid[ctx.offnum] = true; +successor[ctx.offnum] = rdoffnum; continue; } @@ -502,11 +527,237 @@ verify_heapam(PG_FUNCTION_ARGS) } /* It should be safe to examine the tuple's header, at least */ + lp_valid[ctx.offnum] = true; ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); /* Ok, ready to check this next tuple */ - check_tuple(&ctx); + check_tuple(&ctx, + &xmin_commit_status_ok[ctx.offnum], + &xmin_commit_status[ctx.offnum]); + + /* + * If the CTID field of this tuple seems to point to another tuple + * on the same page, record that tuple as the successor of this + * one. + */ + nextblkno = ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid); + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (nextblkno == ctx.blkno && nextoffnum != ctx.offnum) +successor[ctx.offnum] = nextoffnum; + } + + /* + * Update chain validation. Check each line pointer that's got a valid + * successor against that successor. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmin; + TransactionId curr_xmax; + TransactionId next_xmin; + OffsetNumber nextoffnum = successor[ctx.offnum]; + + /* + * The current line pointer may not have a successor, either + * because it's not valid or because it didn't point to anything. + * I
Re: HOT chain validation in verify_heapam()
Hi, > On Wed, Mar 8, 2023 at 7:30 PM Himanshu Upadhyaya > wrote: > Please find the v11 patch with all these changes. The patch needed a rebase due to a4f23f9b. PFA v12. -- Best regards, Aleksander Alekseev v12-0001-Implement-HOT-chain-validation-in-verify_heapam.patch Description: Binary data
Re: HOT chain validation in verify_heapam()
On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev wrote: > The patch needed a rebase due to a4f23f9b. PFA v12. I have committed this after tidying up a bunch of things in the test case file that I found too difficult to understand -- or in some cases just incorrect, like: elsif ($offnum == 35) { -# set xmax to invalid transaction id. $tup->{t_xmin} = $in_progress_xid; $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED; push @expected, -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Robert Haas writes: > I have committed this after tidying up a bunch of things in the test > case file that I found too difficult to understand -- or in some cases > just incorrect, like: My animal mamba doesn't like this one bit. I suspect the reason is that it's big-endian (PPC) and the endianness hacking in the test is simply wrong: syswrite($file, pack("L", $ENDIANNESS eq 'little' ? 0x00010019 : 0x19000100)) or BAIL_OUT("syswrite failed: $!"); pack's L code should already be performing an endianness swap, so why are we doing another one in the argument? regards, tom lane
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 09:19:18 -0400, Robert Haas wrote: > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev > wrote: > > The patch needed a rebase due to a4f23f9b. PFA v12. > > I have committed this after tidying up a bunch of things in the test > case file that I found too difficult to understand -- or in some cases > just incorrect, like: As noticed by Andrew https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and then reproduced on HEAD by me, there's something not right here. At the very least there's missing verification that tids actually exists in the "Update chain validation" loop, leading to: TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: 355, PID: 645093 Which makes sense - the earlier loop adds t_ctid to the successor array, which we then query without checking if there still is such a tid on the page. I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the max offset on the page. WRT these failures: non-heap-only update produced a heap-only tuple at offset 20 I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s definition: /* * Note that we stop considering a tuple HOT-updated as soon as it is known * aborted or the would-be updating transaction is known aborted. For best * efficiency, check tuple visibility before using this macro, so that the * INVALID bits will be as up to date as possible. */ #define HeapTupleHeaderIsHotUpdated(tup) \ ( \ ((tup)->t_infomask2 & HEAP_HOT_UPDATED) != 0 && \ ((tup)->t_infomask & HEAP_XMAX_INVALID) == 0 && \ !HeapTupleHeaderXminInvalid(tup) \ ) Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set and expects to find an item it can dereference - but I don't think that's something we can rely on: Afaics HOT pruning can break chains, but doesn't reset xmax. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 1:45 PM Andres Freund wrote: > At the very least there's missing verification that tids actually exists in > the > "Update chain validation" loop, leading to: > TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: > "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: > 355, PID: 645093 > > Which makes sense - the earlier loop adds t_ctid to the successor array, which > we then query without checking if there still is such a tid on the page. > > I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the > max offset on the page. We definitely need to do it that way, since a heap-only tuple's t_ctid is allowed to point to almost anything. I guess it can't point to some completely different heap block, but that's about the only restriction. In particular, it can point to an item that's past the end of the page following line pointer array truncation (truncation can happen during pruning or when the second heap pass takes place in VACUUM). OTOH the rules for LP_REDIRECT items *are* very strict. They need to be, since it's the root item of the HOT chain, referenced by TIDs in indexes, and have no heap tuple header metadata to use in cross-checks that take place during HOT chain traversal (traversal by code in places such as heap_hot_search_buffer). > WRT these failures: > non-heap-only update produced a heap-only tuple at offset 20 > > I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s > definition: That has to be a problem for verify_heapam. > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set > and expects to find an item it can dereference - but I don't think that's > something we can rely on: Afaics HOT pruning can break chains, but doesn't > reset xmax. I think that we need two passes to be completely thorough. An initial pass, that works pretty much as-is, plus a second pass that locates any orphaned heap-only tuples -- heap-only tuples that were not deemed part of a valid HOT chain during the first pass. These remaining orphaned heap-only tuples should be verified as having come from now-aborted transactions (they should definitely be fully DEAD) -- otherwise we have corruption. That's what my abandoned patch to make heap pruning more robust did, you'll recall. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 2:14 PM Peter Geoghegan wrote: > > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is > > set > > and expects to find an item it can dereference - but I don't think that's > > something we can rely on: Afaics HOT pruning can break chains, but doesn't > > reset xmax. > > I think that we need two passes to be completely thorough. An initial > pass, that works pretty much as-is, plus a second pass that locates > any orphaned heap-only tuples -- heap-only tuples that were not deemed > part of a valid HOT chain during the first pass. These remaining > orphaned heap-only tuples should be verified as having come from > now-aborted transactions (they should definitely be fully DEAD) -- > otherwise we have corruption. I see that there is a second pass over the heap page in verify_heapam(), in fact. Kind of. I'm referring to this loop: /* * An update chain can start either with a non-heap-only tuple or with * a redirect line pointer, but not with a heap-only tuple. * * (This check is in a separate loop because we need the predecessor * array to be fully populated before we can perform it.) */ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { if (xmin_commit_status_ok[ctx.offnum] && (xmin_commit_status[ctx.offnum] == XID_COMMITTED || xmin_commit_status[ctx.offnum] == XID_IN_PROGRESS) && predecessor[ctx.offnum] == InvalidOffsetNumber) { ItemId curr_lp; curr_lp = PageGetItemId(ctx.page, ctx.offnum); if (!ItemIdIsRedirected(curr_lp)) { HeapTupleHeader curr_htup; curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, curr_lp); if (HeapTupleHeaderIsHeapOnly(curr_htup)) report_corruption(&ctx, psprintf("tuple is root of chain but is marked as heap-only tuple")); } } } However, this "second pass over page" loop has roughly the same problem as the nearby HeapTupleHeaderIsHotUpdated() coding pattern: it doesn't account for the fact that a tuple whose xmin was XID_IN_PROGRESS a little earlier on may not be in that state once we reach the second pass loop. Concurrent transaction abort needs to be accounted for. The loop needs to recheck xmin status, at least in the initially-XID_IN_PROGRESS-xmin case. -- Peter Geoghegan
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 13:45:52 -0700, Andres Freund wrote: > On 2023-03-22 09:19:18 -0400, Robert Haas wrote: > > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev > > wrote: > > > The patch needed a rebase due to a4f23f9b. PFA v12. > > > > I have committed this after tidying up a bunch of things in the test > > case file that I found too difficult to understand -- or in some cases > > just incorrect, like: > > As noticed by Andrew > https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and > then reproduced on HEAD by me, there's something not right here. > > At the very least there's missing verification that tids actually exists in > the > "Update chain validation" loop, leading to: > TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: > "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: > 355, PID: 645093 > > Which makes sense - the earlier loop adds t_ctid to the successor array, which > we then query without checking if there still is such a tid on the page. It's not quite so simple - I see now that the lp_valid check tries to prevent that. However, it's not sufficient, because there is no guarantee that lp_valid[nextoffnum] is initialized. Consider what happens if t_ctid of a heap tuple points to beyond the end of the item array - lp_valid[nextoffnum] won't be initialized. Why are redirections now checked in two places? There already was a ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's the ItemIdIsRedirected() check in the "Update chain validation." loop as well - and the output of that is confusing, because it'll just mention the target of the redirect. I also think it's not quite right that some of checks inside if (ItemIdIsRedirected()) continue in case of corruption, others don't. While there's a later continue, that means the corrupt tuples get added to the predecessor array. Similarly, in the non-redirect portion, the successor array gets filled with corrupt tuples, which doesn't seem quite right to me. A patch addressing some, but not all, of those is attached. With that I don't see any crashes or false-positives anymore. Greetings, Andres Freund diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c index 663fb65dee6..43f93f7784a 100644 --- i/contrib/amcheck/verify_heapam.c +++ w/contrib/amcheck/verify_heapam.c @@ -486,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); +else if (ItemIdIsDead(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to dead item at offset %u", + (unsigned) rdoffnum)); +else if (ItemIdIsRedirected(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to another redirect at offset %u", + (unsigned) rdoffnum)); /* * Record the fact that this line pointer has passed basic @@ -564,16 +572,19 @@ verify_heapam(PG_FUNCTION_ARGS) TransactionId next_xmin; OffsetNumber nextoffnum = successor[ctx.offnum]; + /* the current line pointer may not have a successor */ + if (nextoffnum == InvalidOffsetNumber) +continue; + /* - * The current line pointer may not have a successor, either - * because it's not valid or because it didn't point to anything. - * In either case, we have to give up. - * - * If the current line pointer does point to something, it's - * possible that the target line pointer isn't valid. We have to - * give up in that case, too. + * The successor is located beyond the end of the line item array, + * which can happen when the array is truncated. */ - if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum]) + if (nextoffnum > maxoff) +continue; + + /* the successor is not valid, have to give up */ + if (!lp_valid[nextoffnum]) continue; /* We have two valid line pointers that we can examine. */ @@ -583,14 +594,8 @@ verify_heapam(PG_FUNCTION_ARGS) /* Handle the cases where the current line pointer is a redirect. */ if (ItemIdIsRedirected(curr_lp)) { -/* Can't redirect to another redirect. */ -if (ItemIdIsRedirected(next_lp)) -{ - report_corruption(&ctx, - psprintf("redirected line pointer points to another redirected line pointer at offset %u", - (unsigned) nextoffnum)); - continue; -} +/* should have filtered out all the other cases */ +Assert(ItemIdIsNormal(next_lp)); /* Can only redirect to a HOT tuple. */ next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); @@ -602,8 +607,12 @@ verify_heapam(PG_FUNCTION_ARGS) } /* - * Redirects are created by updates, so successor should be - * the result of an update. + * Redirects are created by HOT updates, so successor should + * be the result of an HOT update. +
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 14:56:22 -0700, Andres Freund wrote: > A patch addressing some, but not all, of those is attached. With that I don't > see any crashes or false-positives anymore. That patch missed that, as committed, the first if (ItemIdIsRedirected()) check sets lp_valid[n] = true even if the target of the redirect is unused. With that fixed, 004_verify_heapam doesn't cause crash anymore - it doesn't pass though, because there's a bunch of unadjusted error messages. Andres diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c index 663fb65dee6..d0c462e28ea 100644 --- i/contrib/amcheck/verify_heapam.c +++ w/contrib/amcheck/verify_heapam.c @@ -486,14 +486,24 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); - -/* - * Record the fact that this line pointer has passed basic - * sanity checking, and also the offset number to which it - * points. - */ -lp_valid[ctx.offnum] = true; -successor[ctx.offnum] = rdoffnum; +else if (ItemIdIsDead(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to dead item at offset %u", + (unsigned) rdoffnum)); +else if (ItemIdIsRedirected(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to another redirect at offset %u", + (unsigned) rdoffnum)); +else +{ + /* + * Record the fact that this line pointer has passed basic + * sanity checking, and also the offset number to which it + * points. + */ + lp_valid[ctx.offnum] = true; + successor[ctx.offnum] = rdoffnum; +} continue; } @@ -564,16 +574,19 @@ verify_heapam(PG_FUNCTION_ARGS) TransactionId next_xmin; OffsetNumber nextoffnum = successor[ctx.offnum]; + /* the current line pointer may not have a successor */ + if (nextoffnum == InvalidOffsetNumber) +continue; + /* - * The current line pointer may not have a successor, either - * because it's not valid or because it didn't point to anything. - * In either case, we have to give up. - * - * If the current line pointer does point to something, it's - * possible that the target line pointer isn't valid. We have to - * give up in that case, too. + * The successor is located beyond the end of the line item array, + * which can happen when the array is truncated. */ - if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum]) + if (nextoffnum > maxoff) +continue; + + /* the successor is not valid, have to give up */ + if (!lp_valid[nextoffnum]) continue; /* We have two valid line pointers that we can examine. */ @@ -583,14 +596,8 @@ verify_heapam(PG_FUNCTION_ARGS) /* Handle the cases where the current line pointer is a redirect. */ if (ItemIdIsRedirected(curr_lp)) { -/* Can't redirect to another redirect. */ -if (ItemIdIsRedirected(next_lp)) -{ - report_corruption(&ctx, - psprintf("redirected line pointer points to another redirected line pointer at offset %u", - (unsigned) nextoffnum)); - continue; -} +/* should have filtered out all the other cases */ +Assert(ItemIdIsNormal(next_lp)); /* Can only redirect to a HOT tuple. */ next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); @@ -602,8 +609,12 @@ verify_heapam(PG_FUNCTION_ARGS) } /* - * Redirects are created by updates, so successor should be - * the result of an update. + * Redirects are created by HOT updates, so successor should + * be the result of an HOT update. + * + * XXX: HeapTupleHeaderIsHeapOnly() should always imply + * HEAP_UPDATED. This should be checked even when the tuple + * isn't a target of a redirect. */ if ((next_htup->t_infomask & HEAP_UPDATED) == 0) { @@ -665,15 +676,18 @@ verify_heapam(PG_FUNCTION_ARGS) * tuple should be marked as a heap-only tuple. Conversely, if the * current tuple isn't marked as HOT-updated, then the next tuple * shouldn't be marked as a heap-only tuple. + * + * NB: Can't use HeapTupleHeaderIsHotUpdated() as it checks if + * hint bits indicate xmin/xmax aborted. */ - if (!HeapTupleHeaderIsHotUpdated(curr_htup) && + if (!(curr_htup->t_infomask2 & HEAP_HOT_UPDATED) && HeapTupleHeaderIsHeapOnly(next_htup)) { report_corruption(&ctx, psprintf("non-heap-only update produced a heap-only tuple at offset %u", (unsigned) nextoffnum)); } - if (HeapTupleHeaderIsHotUpdated(curr_htup) && + if ((curr_htup->t_infomask2 & HEAP_HOT_UPDATED) && !HeapTupleHeaderIsHeapOnly(next_htup)) { report_corruption(&ctx,
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-22 13:45:52 -0700, Andres Freund wrote: > On 2023-03-22 09:19:18 -0400, Robert Haas wrote: > > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev > > wrote: > > > The patch needed a rebase due to a4f23f9b. PFA v12. > > > > I have committed this after tidying up a bunch of things in the test > > case file that I found too difficult to understand -- or in some cases > > just incorrect, like: > > As noticed by Andrew > https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and > then reproduced on HEAD by me, there's something not right here. skink / valgrind reported in a while back and found another issue: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-03-22%2021%3A53%3A41 ==2490364== VALGRINDERROR-BEGIN ==2490364== Conditional jump or move depends on uninitialised value(s) ==2490364==at 0x11D459F2: check_tuple_visibility (verify_heapam.c:1379) ==2490364==by 0x11D46262: check_tuple (verify_heapam.c:1812) ==2490364==by 0x11D46FDF: verify_heapam (verify_heapam.c:535) ==2490364==by 0x3D5B2C: ExecMakeTableFunctionResult (execSRF.c:235) ==2490364==by 0x3E8225: FunctionNext (nodeFunctionscan.c:95) ==2490364==by 0x3D6685: ExecScanFetch (execScan.c:133) ==2490364==by 0x3D6709: ExecScan (execScan.c:182) ==2490364==by 0x3E813A: ExecFunctionScan (nodeFunctionscan.c:270) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364==by 0x3FF7E7: ExecProcNode (executor.h:262) ==2490364==by 0x3FFB15: ExecNestLoop (nodeNestloop.c:160) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364== Uninitialised value was created by a stack allocation ==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994) ==2490364== ==2490364== VALGRINDERROR-END ==2490364== VALGRINDERROR-BEGIN ==2490364== Conditional jump or move depends on uninitialised value(s) ==2490364==at 0x11D45AC6: check_tuple_visibility (verify_heapam.c:1379) ==2490364==by 0x11D46262: check_tuple (verify_heapam.c:1812) ==2490364==by 0x11D46FDF: verify_heapam (verify_heapam.c:535) ==2490364==by 0x3D5B2C: ExecMakeTableFunctionResult (execSRF.c:235) ==2490364==by 0x3E8225: FunctionNext (nodeFunctionscan.c:95) ==2490364==by 0x3D6685: ExecScanFetch (execScan.c:133) ==2490364==by 0x3D6709: ExecScan (execScan.c:182) ==2490364==by 0x3E813A: ExecFunctionScan (nodeFunctionscan.c:270) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364==by 0x3FF7E7: ExecProcNode (executor.h:262) ==2490364==by 0x3FFB15: ExecNestLoop (nodeNestloop.c:160) ==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464) ==2490364== Uninitialised value was created by a stack allocation ==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994) ==2490364== Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Thu, Mar 23, 2023 at 2:15 AM Andres Freund wrote: > > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is > set > and expects to find an item it can dereference - but I don't think that's > something we can rely on: Afaics HOT pruning can break chains, but doesn't > reset xmax. > > We have below code which I think takes care of xmin and xmax matching and if they match then only we add them to the predecessor array. /* * If the next line pointer is a redirect, or if it's a tuple * but the XMAX of this tuple doesn't match the XMIN of the next * tuple, then the two aren't part of the same update chain and * there is nothing more to do. */ if (ItemIdIsRedirected(next_lp)) continue; curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, curr_lp); curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup); next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); next_xmin = HeapTupleHeaderGetXmin(next_htup); if (!TransactionIdIsValid(curr_xmax) || !TransactionIdEquals(curr_xmax, next_xmin)) continue; -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 3:27 PM Tom Lane wrote: > My animal mamba doesn't like this one bit. > > I suspect the reason is that it's big-endian (PPC) and the endianness > hacking in the test is simply wrong: > > syswrite($file, > pack("L", $ENDIANNESS eq 'little' ? 0x00010019 : 0x19000100)) > or BAIL_OUT("syswrite failed: $!"); > > pack's L code should already be performing an endianness swap, so why > are we doing another one in the argument? (Apologies for having missed responding to this yesterday afternoon.) Hmph. I didn't think very hard about that code and just assumed Himanshu had tested it. I don't have convenient access to a Big-endian test machine myself. Are you able to check whether using 0x00010019 unconditionally works? I think part of the reason that I thought this looked OK was because there are other places in this test case that do stuff differently based on endian-ness, but now that you mention it, I realize that stuff has to do with the varlena representation, which is endian-dependent in a way that the ItemIdData representation is not. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 4:45 PM Andres Freund wrote: > At the very least there's missing verification that tids actually exists in > the > "Update chain validation" loop, leading to: > TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: > "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: > 355, PID: 645093 > > Which makes sense - the earlier loop adds t_ctid to the successor array, which > we then query without checking if there still is such a tid on the page. Ah, crap. If the /* Perform tuple checks loop */ finds a redirect line pointer, it verifies that the target is between FirstOffsetnNumber and maxoff before setting lp_valid[ctx.offnum] = true. But in the case where it's a CTID link, the equivalent checks are missing. We could fix that like this: --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -543,7 +543,8 @@ verify_heapam(PG_FUNCTION_ARGS) */ nextblkno = ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid); nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); -if (nextblkno == ctx.blkno && nextoffnum != ctx.offnum) +if (nextblkno == ctx.blkno && nextoffnum != ctx.offnum && +nextoffnum >= FirstOffsetNumber && nextoffnum <= maxoff) successor[ctx.offnum] = nextoffnum; } > I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the > max offset on the page. I don't see why we need an ItemIdIsUsed check any place where we don't have one already. lp_valid[x] can't be true if the item x isn't used, unless we're referencing off the initialized portion of the array, which we shouldn't do. > WRT these failures: > non-heap-only update produced a heap-only tuple at offset 20 > > I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s > definition: > /* > * Note that we stop considering a tuple HOT-updated as soon as it is known > * aborted or the would-be updating transaction is known aborted. For best > * efficiency, check tuple visibility before using this macro, so that the > * INVALID bits will be as up to date as possible. > */ > #define HeapTupleHeaderIsHotUpdated(tup) \ > ( \ > ((tup)->t_infomask2 & HEAP_HOT_UPDATED) != 0 && \ > ((tup)->t_infomask & HEAP_XMAX_INVALID) == 0 && \ > !HeapTupleHeaderXminInvalid(tup) \ > ) Yeah, it's not good that we're looking at the hint bit or the xmin there -- it should just be checking the infomask2 bit and nothing else, I think. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 5:56 PM Andres Freund wrote: > Why are redirections now checked in two places? There already was a > ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's > the ItemIdIsRedirected() check in the "Update chain validation." loop as well > - and the output of that is confusing, because it'll just mention the target > of the redirect. ctx.offnum is reported as part of the context message, and doesn't need to be duplicate in the message itself. Any other offset numbers do need to be mentioned in the message itself. I have attempted to make sure that the code consistently follows this rule but if you see a case where I have failed to do so, let me know. > I also think it's not quite right that some of checks inside if > (ItemIdIsRedirected()) continue in case of corruption, others don't. While > there's a later continue, that means the corrupt tuples get added to the > predecessor array. Similarly, in the non-redirect portion, the successor > array gets filled with corrupt tuples, which doesn't seem quite right to me. I'm not entirely sure I agree with this. I mean, we should skip further checks of the data is so corrupt that further checks are not sensible e.g. if the line pointer is just gibberish we can't validate anything about the tuple, because we can't even find the tuple. However, we don't want to skip further checks as soon as we see any kind of a problem whatsoever. For instance, even if the tuple data is utter gibberish, that should not and does not keep us from checking whether the update chain looks sane. If the tuple header is garbage (e.g. xmin and xmax are outside the bounds of clog) then at least some of the update-chain checks are not possible, because we can't know the commit status of the tuples, but garbage in the tuple data isn't really a problem for update chain validation per se. So I think the places that lack a continue are making a judgement that continuing with further checks is reasonable in those scenarios, and it's not obvious to me that those judgements are wrong. For instance, suppose we reach the "Can only redirect to a HOT tuple." case. If we continue there, we won't check whether the HEAP_UPDATED is set on the successor tuple, and we won't perform the intersecting-HOT-chain check, and we won't set the predecessor[] array, and I don't really see why we shouldn't do those things. In my view, the fact that the redirected line pointer points to a tuple that is not marked as HOT does mean we have corruption, but it doesn't mean that we should give up on thinking of those two offset numbers as part of an update chain. > A patch addressing some, but not all, of those is attached. With that I don't > see any crashes or false-positives anymore. + else if (ItemIdIsDead(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to dead item at offset %u", + (unsigned) rdoffnum)); + else if (ItemIdIsRedirected(rditem)) + report_corruption(&ctx, + psprintf("line pointer redirection to another redirect at offset %u", + (unsigned) rdoffnum)); Hmm, the first of these definitely seems like a good additional check. The second one moves an existing check from the subsequent loop to this loop, which if we're going to add that additional check makes sense to do for consistency. +/* the current line pointer may not have a successor */ +if (nextoffnum == InvalidOffsetNumber) +continue; + /* - * The current line pointer may not have a successor, either - * because it's not valid or because it didn't point to anything. - * In either case, we have to give up. - * - * If the current line pointer does point to something, it's - * possible that the target line pointer isn't valid. We have to - * give up in that case, too. + * The successor is located beyond the end of the line item array, + * which can happen when the array is truncated. */ -if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum]) +if (nextoffnum > maxoff) +continue; + +/* the successor is not valid, have to give up */ +if (!lp_valid[nextoffnum]) continue; I don't agree with this part -- I think we should instead do what I proposed before and prevent the successor[] array from ever getting entries that are out of bounds. - * Redirects are created by updates, so successor should be - * the result of an update. + * Redirects are created by HOT updates, so successor should + * be the result of an HOT update. + * + * XXX: HeapTupleHeaderIs
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 5:42 PM Peter Geoghegan wrote: > However, this "second pass over page" loop has roughly the same > problem as the nearby HeapTupleHeaderIsHotUpdated() coding pattern: it > doesn't account for the fact that a tuple whose xmin was > XID_IN_PROGRESS a little earlier on may not be in that state once we > reach the second pass loop. Concurrent transaction abort needs to be > accounted for. The loop needs to recheck xmin status, at least in the > initially-XID_IN_PROGRESS-xmin case. I don't understand why it would need to do that. If the transaction has subsequently committed, it doesn't change anything: we'll get the same report we would have gotten anyway. If the transaction has subsequently aborted, we'll get a report about corruption that would not have been reported if the abort had occurred slightly earlier. However, the abort doesn't remove the corruption, just our ability to detect it. Consider a page where TID 1 is a redirect to TID 4; TID 2 is dead; and TIDs 3 and 4 are heap-only tuples. Any other line pointers on the page are unused. The only way this can validly happen is if there was a tuple at TID 2 and it got updated to produce the tuple at TID 3 and then that transaction aborted. Then it got updated again and produced the tuple at TID 4 and that transaction was committed. But this implies that the xmin of TID 3 must be aborted. If we observe that it's in-progress, we know that the transaction that created TID 3 was still running after TID 4 had already shown up, which should be impossible, and so it's fair to report corruption. If the xmin of TID 3 then goes on to abort, a future attempt to verify this page won't be able to notice the corruption any more, because it won't be able to prove that TID 3's xmin aborted after TID 4's xmin committed. But a current attempt to verify this page that has seen TID 3's xmin as in-progress at any point after locking the page knows for sure that TID 4 showed up before TID 3's inserter aborted, and that's inconsistent with any legal order of operations. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Thu, Mar 23, 2023 at 9:42 AM Robert Haas wrote: > Hmph. I didn't think very hard about that code and just assumed > Himanshu had tested it. I don't have convenient access to a Big-endian > test machine myself. Are you able to check whether using 0x00010019 > unconditionally works? Oh, I see now you already pushed a fix. Thanks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-23 11:41:52 -0400, Robert Haas wrote: > On Wed, Mar 22, 2023 at 5:56 PM Andres Freund wrote: > > I also think it's not quite right that some of checks inside if > > (ItemIdIsRedirected()) continue in case of corruption, others don't. While > > there's a later continue, that means the corrupt tuples get added to the > > predecessor array. Similarly, in the non-redirect portion, the successor > > array gets filled with corrupt tuples, which doesn't seem quite right to me. > > I'm not entirely sure I agree with this. I mean, we should skip > further checks of the data is so corrupt that further checks are not > sensible e.g. if the line pointer is just gibberish we can't validate > anything about the tuple, because we can't even find the tuple. > However, we don't want to skip further checks as soon as we see any > kind of a problem whatsoever. For instance, even if the tuple data is > utter gibberish, that should not and does not keep us from checking > whether the update chain looks sane. If the tuple header is garbage > (e.g. xmin and xmax are outside the bounds of clog) then at least some > of the update-chain checks are not possible, because we can't know the > commit status of the tuples, but garbage in the tuple data isn't > really a problem for update chain validation per se. The cases I was complaining about where metadata that's important. We shouldn't enter a tuple into lp_valid[] or successor[] if it failed validity checks - the subsequent error reports that that generates won't be helpful - or we'll crash. E.g. continuing after: rditem = PageGetItemId(ctx.page, rdoffnum); if (!ItemIdIsUsed(rditem)) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); means we'll look into the tuple in the "update chain validation" loop for unused items. Where it afaict will lead to a crash or bogus results, because: /* Can only redirect to a HOT tuple. */ next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp); if (!HeapTupleHeaderIsHeapOnly(next_htup)) { report_corruption(&ctx, psprintf("redirected line pointer points to a non-heap-only tuple at offset %u", (unsigned) nextoffnum)); } will just dereference the unused item. > - * Redirects are created by updates, so successor should be > - * the result of an update. > + * Redirects are created by HOT updates, so successor should > + * be the result of an HOT update. > + * > + * XXX: HeapTupleHeaderIsHeapOnly() should always imply > + * HEAP_UPDATED. This should be checked even when the tuple > + * isn't a target of a redirect. > > Hmm, OK. So the question is where to put this check. Maybe inside > check_tuple_header(), making it independent of the update chain > validation stuff? Yes, check_tuple_header sounds sensible to me. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Thu, Mar 23, 2023 at 1:26 PM Andres Freund wrote: > E.g. continuing after: > > rditem = PageGetItemId(ctx.page, rdoffnum); > if (!ItemIdIsUsed(rditem)) > report_corruption(&ctx, > > psprintf("line pointer redirection to unused item at offset %u", > > (unsigned) rdoffnum)); > > means we'll look into the tuple in the "update chain validation" loop for > unused items. Ah, yes, that's a goof for sure. > > - * Redirects are created by updates, so successor should be > > - * the result of an update. > > + * Redirects are created by HOT updates, so successor > > should > > + * be the result of an HOT update. > > + * > > + * XXX: HeapTupleHeaderIsHeapOnly() should always imply > > + * HEAP_UPDATED. This should be checked even when the tuple > > + * isn't a target of a redirect. > > > > Hmm, OK. So the question is where to put this check. Maybe inside > > check_tuple_header(), making it independent of the update chain > > validation stuff? > > Yes, check_tuple_header sounds sensible to me. OK, let me spend some more time on this and I'll post a patch (or patches) in a bit. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-23 11:20:04 +0530, Himanshu Upadhyaya wrote: > On Thu, Mar 23, 2023 at 2:15 AM Andres Freund wrote: > > > > > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is > > set > > and expects to find an item it can dereference - but I don't think that's > > something we can rely on: Afaics HOT pruning can break chains, but doesn't > > reset xmax. > > > > We have below code which I think takes care of xmin and xmax matching and > if they match then only we add them to the predecessor array. > /* > * If the next line pointer is a redirect, or if > it's a tuple > * but the XMAX of this tuple doesn't match the > XMIN of the next > * tuple, then the two aren't part of the same > update chain and > * there is nothing more to do. > */ > if (ItemIdIsRedirected(next_lp)) > continue; > curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup); > next_htup = (HeapTupleHeader) PageGetItem(ctx.page, > next_lp); > next_xmin = HeapTupleHeaderGetXmin(next_htup); > if (!TransactionIdIsValid(curr_xmax) || > !TransactionIdEquals(curr_xmax, next_xmin)) > continue; The problem is that that doesn't help if the tuple points to past maxoff, because we can't even fetch the tuple and thus won't even reach these checks. But Robert now put in defenses against that. Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Thu, Mar 23, 2023 at 1:34 PM Robert Haas wrote: > OK, let me spend some more time on this and I'll post a patch (or > patches) in a bit. All right, here are some more fixups. -- Robert Haas EDB: http://www.enterprisedb.com 0001-amcheck-Tighten-up-validation-of-redirect-line-point.patch Description: Binary data 0002-amcheck-Generalize-one-of-the-recently-added-update-.patch Description: Binary data
Re: HOT chain validation in verify_heapam()
On Wed, Mar 22, 2023 at 8:38 PM Andres Freund wrote: > skink / valgrind reported in a while back and found another issue: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-03-22%2021%3A53%3A41 > > ==2490364== VALGRINDERROR-BEGIN > ==2490364== Conditional jump or move depends on uninitialised value(s) > ==2490364==at 0x11D459F2: check_tuple_visibility (verify_heapam.c:1379) ... > ==2490364== Uninitialised value was created by a stack allocation > ==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994) OK, so this is an interesting one. It's complaining about switch (xmax_status), because the get_xid_status(xmax, ctx, &xmax_status) used in the previous switch might not actually initialize xmax_status, and apparently didn't in this case. get_xid_status() does not set xmax_status except when it returns XID_BOUNDS_OK, and the previous switch falls through both in that case and also when get_xid_status() returns XID_INVALID. That seems like it must be the issue here. As far as I can see, this isn't related to any of the recent changes but has been like this since this code was introduced, so I'm a little confused about why it's only causing a problem now. Nonetheless, here's a patch. I notice that there's a similar problem in another place, too. get_xid_status() is called a total of five times and it looks like only three of them got it right. I suppose that if this is correct we should back-patch it. -- Robert Haas EDB: http://www.enterprisedb.com 0001-amcheck-Fix-handling-when-get_xid_status-returns-XID.patch Description: Binary data
Re: HOT chain validation in verify_heapam()
Hi, On 2023-03-23 15:37:15 -0400, Robert Haas wrote: > On Wed, Mar 22, 2023 at 8:38 PM Andres Freund wrote: > > skink / valgrind reported in a while back and found another issue: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-03-22%2021%3A53%3A41 > > > > ==2490364== VALGRINDERROR-BEGIN > > ==2490364== Conditional jump or move depends on uninitialised value(s) > > ==2490364==at 0x11D459F2: check_tuple_visibility (verify_heapam.c:1379) > ... > > ==2490364== Uninitialised value was created by a stack allocation > > ==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994) > > OK, so this is an interesting one. It's complaining about switch > (xmax_status), because the get_xid_status(xmax, ctx, &xmax_status) > used in the previous switch might not actually initialize xmax_status, > and apparently didn't in this case. get_xid_status() does not set > xmax_status except when it returns XID_BOUNDS_OK, and the previous > switch falls through both in that case and also when get_xid_status() > returns XID_INVALID. That seems like it must be the issue here. As far > as I can see, this isn't related to any of the recent changes but has > been like this since this code was introduced, so I'm a little > confused about why it's only causing a problem now. Could it be that the tests didn't exercise the path before? > Nonetheless, here's a patch. I notice that there's a similar problem > in another place, too. get_xid_status() is called a total of five > times and it looks like only three of them got it right. I suppose > that if this is correct we should back-patch it. Yea, I think you're right. > + report_corruption(ctx, > + pstrdup("xmin is > invalid")); Not a correctnes issue: Nearly all callers to report_corruption() do a psprintf(), the remaining a pstrdup(), as here. Seems like it'd be cleaner to just make report_corruption() accept a format string? Greetings, Andres Freund
Re: HOT chain validation in verify_heapam()
On Thu, Mar 23, 2023 at 4:36 PM Andres Freund wrote: > Could it be that the tests didn't exercise the path before? Hmm, perhaps. > > Nonetheless, here's a patch. I notice that there's a similar problem > > in another place, too. get_xid_status() is called a total of five > > times and it looks like only three of them got it right. I suppose > > that if this is correct we should back-patch it. > > Yea, I think you're right. OK. > > + report_corruption(ctx, > > + pstrdup("xmin is > > invalid")); > > Not a correctnes issue: Nearly all callers to report_corruption() do a > psprintf(), the remaining a pstrdup(), as here. Seems like it'd be cleaner to > just make report_corruption() accept a format string? Meh. -- Robert Haas EDB: http://www.enterprisedb.com
Re: HOT chain validation in verify_heapam()
On Thu, Mar 23, 2023 at 3:06 PM Robert Haas wrote: > On Thu, Mar 23, 2023 at 1:34 PM Robert Haas wrote: > > OK, let me spend some more time on this and I'll post a patch (or > > patches) in a bit. > > All right, here are some more fixups. It looks like e88754a1965c0f40a723e6e46d670cacda9e19bd make skink happy (although Peter Geoghegan has spotted a problem with it, see the thread that begins with the commit email) so I went ahead and committed these fixups. Hopefully that won't again make the buildfarm unhappy, but I guess we'll see. -- Robert Haas EDB: http://www.enterprisedb.com