On Thu, Nov 10, 2022 at 3:38 AM Andres Freund <and...@anarazel.de> wrote:
> > > + } > > + > > + /* > > + * Loop over offset and populate predecessor array from > all entries > > + * that are present in successor array. > > + */ > > + ctx.attnum = -1; > > + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; > > + ctx.offnum = OffsetNumberNext(ctx.offnum)) > > + { > > + ItemId curr_lp; > > + ItemId next_lp; > > + HeapTupleHeader curr_htup; > > + HeapTupleHeader next_htup; > > + TransactionId curr_xmax; > > + TransactionId next_xmin; > > + > > + OffsetNumber nextoffnum = successor[ctx.offnum]; > > + > > + curr_lp = PageGetItemId(ctx.page, ctx.offnum); > > Why do we get the item when nextoffnum is 0? > > Yes, right, I will move this call to PageGetItemId, just after the next "if" condition in the patch. > > > + if (nextoffnum == 0 || !lp_valid[ctx.offnum] || > !lp_valid[nextoffnum]) > > + { > > + /* > > + * This is either the last updated tuple > in the chain or a > > + * corruption raised for this tuple. > > + */ > > "or a corruption raised" isn't quite right grammatically. > > will change to "This is either the last updated tuple in the chain or corruption has been raised for this tuple" > > > + continue; > > + } > > + if (ItemIdIsRedirected(curr_lp)) > > + { > > + next_lp = PageGetItemId(ctx.page, > nextoffnum); > > + if (ItemIdIsRedirected(next_lp)) > > + { > > + report_corruption(&ctx, > > + > psprintf("redirected line pointer pointing to another redirected line > pointer at offset %u", > > + > (unsigned) nextoffnum)); > > + continue; > > + } > > + next_htup = (HeapTupleHeader) PageGetItem( > ctx.page, next_lp); > > + if (!HeapTupleHeaderIsHeapOnly(next_htup)) > > + { > > + report_corruption(&ctx, > > + > psprintf("redirected tuple at line pointer offset %u is not heap only > tuple", > > + > (unsigned) nextoffnum)); > > + } > > + if ((next_htup->t_infomask & HEAP_UPDATED) > == 0) > > + { > > + report_corruption(&ctx, > > + > psprintf("redirected tuple at line pointer offset %u is not heap updated > tuple", > > + > (unsigned) nextoffnum)); > > + } > > + continue; > > + } > > + > > + /* > > + * Add a line pointer offset to the predecessor > array if xmax is > > + * matching with xmin of next tuple (reaching via > its t_ctid). > > + * Prior to PostgreSQL 9.4, we actually changed > the xmin to > > + * FrozenTransactionId > > I'm doubtful it's a good idea to try to validate the 9.4 case. The > likelihood > of getting that right seems low and I don't see us gaining much by even > trying. > > > > so we must add offset to predecessor > > + * array(irrespective of xmax-xmin matching) if > updated tuple xmin > > + * is frozen, so that we can later do validation > related to frozen > > + * xmin. Raise corruption if we have two tuples > having the same > > + * predecessor. > > + * We add the offset to the predecessor array > irrespective of the > > + * transaction (t_xmin) status. We will do > validation related to > > + * the transaction status (and also all other > validations) when we > > + * loop over the predecessor array. > > + */ > > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > > + curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup); > > + next_lp = PageGetItemId(ctx.page, nextoffnum); > > + next_htup = (HeapTupleHeader) PageGetItem(ctx.page, > next_lp); > > + next_xmin = HeapTupleHeaderGetXmin(next_htup); > > + if (TransactionIdIsValid(curr_xmax) && > > + (TransactionIdEquals(curr_xmax, next_xmin) > || > > + next_xmin == FrozenTransactionId)) > > + { > > + if (predecessor[nextoffnum] != 0) > > + { > > + report_corruption(&ctx, > > + > psprintf("updated version at offset %u is also the updated version of > tuple at offset %u", > > + > (unsigned) nextoffnum, (unsigned) predecessor[nextoffnum])); > > + continue; > > I doubt it is correct to enter this path with next_xmin == > FrozenTransactionId. This is following a ctid chain that we normally > wouldn't > follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin > condition. > > I don't immediately see what prevents the frozen tuple being from an > entirely > different HOT chain than the two tuples pointing to it. > > > > Prior to 9.4 we can have xmin updated with FrozenTransactionId but with 9.4 (or later) we set XMIN_FROZEN bit in t_infomask. if updated tuple is via prior of 9.4 then "TransactionIdEquals(curr_xmax, next_xmin)" will be false for Frozen Tuple. The Intention of adding "next_xmin == FrozenTransactionId" to the path is because we wanted to do validation around Frozen Tuple when we loop over predecessor array. I need to look at the isolation test in details to understand how this can provide false alarm and but if there is a valid case then we can remove logic of raising corruption related with Frozen Tuple? > > + } > > + > > + /* Loop over offsets and validate the data in the > predecessor array. */ > > + for (OffsetNumber currentoffnum = FirstOffsetNumber; > currentoffnum <= maxoff; > > + currentoffnum = OffsetNumberNext(currentoffnum)) > > + { > > + HeapTupleHeader pred_htup; > > + HeapTupleHeader curr_htup; > > + TransactionId pred_xmin; > > + TransactionId curr_xmin; > > + ItemId pred_lp; > > + ItemId curr_lp; > > + > > + ctx.offnum = predecessor[currentoffnum]; > > + ctx.attnum = -1; > > + > > + if (ctx.offnum == 0) > > + { > > + /* > > + * Either the root of the chain or an > xmin-aborted tuple from > > + * an abandoned portion of the HOT chain. > > + */ > > Hm - couldn't we check that the tuple could conceivably be at the root of a > chain? I.e. isn't HEAP_HOT_UPDATED? Or alternatively has an aborted xmin? > > I don't see a way to check if tuple is at the root of HOT chain because predecessor array will always be having either xmin from non-abandoned transaction or it will be zero. We can't differentiate root or tuple inserted via abandoned transaction. > > + continue; > > + } > > + > > + curr_lp = PageGetItemId(ctx.page, currentoffnum); > > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > > + curr_xmin = HeapTupleHeaderGetXmin(curr_htup); > > + > > + ctx.itemid = pred_lp = PageGetItemId(ctx.page, > ctx.offnum); > > + pred_htup = (HeapTupleHeader) PageGetItem(ctx.page, > pred_lp); > > + pred_xmin = HeapTupleHeaderGetXmin(pred_htup); > > + > > + /* > > + * If the predecessor's xmin is aborted or in > progress, the > > + * current tuples xmin should be aborted or in > progress > > + * respectively. Also both xmin's must be equal. > > + */ > > + if (!TransactionIdEquals(pred_xmin, curr_xmin) && > > + !TransactionIdDidCommit(pred_xmin)) > > + { > > + report_corruption(&ctx, > > + > psprintf("tuple with uncommitted xmin %u was updated to produce a tuple at > offset %u with differing xmin %u", > > + > (unsigned) pred_xmin, (unsigned) currentoffnum, (unsigned) > curr_xmin)); > > Is this necessarily true? What about a tuple that was inserted in a > subtransaction and then updated in another subtransaction of the same > toplevel > transaction? > > not sure if I am getting? I have tried with below test and don't see any issue, ‘postgres[14723]=#’drop table test2; DROP TABLE ‘postgres[14723]=#’create table test2 (a int, b int primary key); CREATE TABLE ‘postgres[14723]=#’insert into test2 values (1,1); INSERT 0 1 ‘postgres[14723]=#’BEGIN; BEGIN ‘postgres[14723]=#*’update test2 set a =2 where a =1; UPDATE 1 ‘postgres[14723]=#*’savepoint s1; SAVEPOINT ‘postgres[14723]=#*’update test2 set a =6; UPDATE 1 ‘postgres[14723]=#*’rollback to savepoint s1; ROLLBACK ‘postgres[14723]=#*’update test2 set a =6; UPDATE 1 ‘postgres[14723]=#*’savepoint s2; SAVEPOINT ‘postgres[14723]=#*’update test2 set a =7; UPDATE 1 ‘postgres[14723]=#*’end; COMMIT ‘postgres[14723]=#’SELECT lp as tuple, t_xmin, t_xmax, t_field3 as t_cid, t_ctid,tuple_data_split('test2'::regclass, t_data, t_infomask, t_infomask2, t_bits), heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('test2', 0)); tuple | t_xmin | t_xmax | t_cid | t_ctid | tuple_data_split | heap_tuple_infomask_flags -------+--------+--------+-------+--------+-------------------------------+--------------------------------------------------------------------------- 1 | 1254 | 1255 | 0 | (0,2) | {"\\x01000000","\\x01000000"} | ("{HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED}",{}) 2 | 1255 | 1257 | 1 | (0,4) | {"\\x02000000","\\x01000000"} | ("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{}) 3 | 1256 | 0 | 1 | (0,3) | {"\\x06000000","\\x01000000"} | ("{HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{}) 4 | 1257 | 1258 | 2 | (0,5) | {"\\x06000000","\\x01000000"} | ("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{}) 5 | 1258 | 0 | 3 | (0,5) | {"\\x07000000","\\x01000000"} | ("{HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{}) (5 rows) > > > + } > > + > > + /* > > + * If the predecessor's xmin is not frozen, then > current tuple's > > + * shouldn't be either. > > + */ > > + if (pred_xmin != FrozenTransactionId && curr_xmin > == FrozenTransactionId) > > + { > > + report_corruption(&ctx, > > + > psprintf("unfrozen tuple was updated to produce a tuple at offset %u which > is frozen", > > + > (unsigned) currentoffnum)); > > + } > > Can't we have a an update chain that is e.g. > xmin 10, xmax 5 -> xmin 5, xmax invalid > > and a vacuum cutoff of 7? That'd preent the first tuple from being removed, > but would allow 5 to be frozen. > > I think there were recent patches proposing we don't freeze in that case, > but > we'll having done that in the past.... > > Not very sure about this, was trying with such case but found hard to reproduce this. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com