On Thu, Nov 10, 2022 at 3:38 AM Andres Freund <and...@anarazel.de> wrote:

>
> > +             }
> > +
> > +             /*
> > +              * Loop over offset and populate predecessor array from
> all entries
> > +              * that are present in successor array.
> > +              */
> > +             ctx.attnum = -1;
> > +             for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
> > +                      ctx.offnum = OffsetNumberNext(ctx.offnum))
> > +             {
> > +                     ItemId          curr_lp;
> > +                     ItemId          next_lp;
> > +                     HeapTupleHeader curr_htup;
> > +                     HeapTupleHeader next_htup;
> > +                     TransactionId curr_xmax;
> > +                     TransactionId next_xmin;
> > +
> > +                     OffsetNumber nextoffnum = successor[ctx.offnum];
> > +
> > +                     curr_lp = PageGetItemId(ctx.page, ctx.offnum);
>
> Why do we get the item when nextoffnum is 0?
>
> Yes, right, I will move this call to PageGetItemId, just after the next
"if" condition in the patch.

>
> > +                     if (nextoffnum == 0 || !lp_valid[ctx.offnum] ||
> !lp_valid[nextoffnum])
> > +                     {
> > +                             /*
> > +                              * This is either the last updated tuple
> in the chain or a
> > +                              * corruption raised for this tuple.
> > +                              */
>
> "or a corruption raised" isn't quite right grammatically.
>
> will change to "This is either the last updated tuple in the chain or
corruption has been raised for this tuple"

>
> > +                             continue;
> > +                     }
> > +                     if (ItemIdIsRedirected(curr_lp))
> > +                     {
> > +                             next_lp = PageGetItemId(ctx.page,
> nextoffnum);
> > +                             if (ItemIdIsRedirected(next_lp))
> > +                             {
> > +                                     report_corruption(&ctx,
> > +
>  psprintf("redirected line pointer pointing to another redirected line
> pointer at offset %u",
> > +
>                 (unsigned) nextoffnum));
> > +                                     continue;
> > +                             }
> > +                             next_htup = (HeapTupleHeader) PageGetItem(
> ctx.page, next_lp);
> > +                             if (!HeapTupleHeaderIsHeapOnly(next_htup))
> > +                             {
> > +                                     report_corruption(&ctx,
> > +
>  psprintf("redirected tuple at line pointer offset %u is not heap only
> tuple",
> > +
>                 (unsigned) nextoffnum));
> > +                             }
> > +                             if ((next_htup->t_infomask & HEAP_UPDATED)
> == 0)
> > +                             {
> > +                                     report_corruption(&ctx,
> > +
>  psprintf("redirected tuple at line pointer offset %u is not heap updated
> tuple",
> > +
>                 (unsigned) nextoffnum));
> > +                             }
> > +                             continue;
> > +                     }
> > +
> > +                     /*
> > +                      * Add a line pointer offset to the predecessor
> array if xmax is
> > +                      * matching with xmin of next tuple (reaching via
> its t_ctid).
> > +                      * Prior to PostgreSQL 9.4, we actually changed
> the xmin to
> > +                      * FrozenTransactionId
>
> I'm doubtful it's a good idea to try to validate the 9.4 case. The
> likelihood
> of getting that right seems low and I don't see us gaining much by even
> trying.
>
>
> > so we must add offset to predecessor
> > +                      * array(irrespective of xmax-xmin matching) if
> updated tuple xmin
> > +                      * is frozen, so that we can later do validation
> related to frozen
> > +                      * xmin. Raise corruption if we have two tuples
> having the same
> > +                      * predecessor.
> > +                      * We add the offset to the predecessor array
> irrespective of the
> > +                      * transaction (t_xmin) status. We will do
> validation related to
> > +                      * the transaction status (and also all other
> validations) when we
> > +                      * loop over the predecessor array.
> > +                      */
> > +                     curr_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> curr_lp);
> > +                     curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup);
> > +                     next_lp = PageGetItemId(ctx.page, nextoffnum);
> > +                     next_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> next_lp);
> > +                     next_xmin = HeapTupleHeaderGetXmin(next_htup);
> > +                     if (TransactionIdIsValid(curr_xmax) &&
> > +                             (TransactionIdEquals(curr_xmax, next_xmin)
> ||
> > +                              next_xmin == FrozenTransactionId))
> > +                     {
> > +                             if (predecessor[nextoffnum] != 0)
> > +                             {
> > +                                     report_corruption(&ctx,
> > +
>  psprintf("updated version at offset %u is also the updated version of
> tuple at offset %u",
> > +
>                 (unsigned) nextoffnum, (unsigned) predecessor[nextoffnum]));
> > +                                     continue;
>
> I doubt it is correct to enter this path with next_xmin ==
> FrozenTransactionId. This is following a ctid chain that we normally
> wouldn't
> follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin
> condition.
>
> I don't immediately see what prevents the frozen tuple being from an
> entirely
> different HOT chain than the two tuples pointing to it.
>
>
>
> Prior to 9.4 we can have xmin updated with FrozenTransactionId but with
9.4 (or later) we set XMIN_FROZEN bit in t_infomask. if updated tuple
is via prior of 9.4 then "TransactionIdEquals(curr_xmax, next_xmin)" will
be false for Frozen Tuple.
The Intention of adding  "next_xmin == FrozenTransactionId" to the path is
because we wanted to do validation around Frozen Tuple when we loop over
predecessor array.

 I need to look at the isolation test in details to understand how this can
provide false alarm and but if there is a valid case then we can remove
logic of raising corruption related with Frozen Tuple?


> > +             }
> > +
> > +             /* Loop over offsets and validate the data in the
> predecessor array. */
> > +             for (OffsetNumber currentoffnum = FirstOffsetNumber;
> currentoffnum <= maxoff;
> > +                      currentoffnum = OffsetNumberNext(currentoffnum))
> > +             {
> > +                     HeapTupleHeader pred_htup;
> > +                     HeapTupleHeader curr_htup;
> > +                     TransactionId pred_xmin;
> > +                     TransactionId curr_xmin;
> > +                     ItemId          pred_lp;
> > +                     ItemId          curr_lp;
> > +
> > +                     ctx.offnum = predecessor[currentoffnum];
> > +                     ctx.attnum = -1;
> > +
> > +                     if (ctx.offnum == 0)
> > +                     {
> > +                             /*
> > +                              * Either the root of the chain or an
> xmin-aborted tuple from
> > +                              * an abandoned portion of the HOT chain.
> > +                              */
>
> Hm - couldn't we check that the tuple could conceivably be at the root of a
> chain? I.e. isn't HEAP_HOT_UPDATED? Or alternatively has an aborted xmin?
>
>
 I don't see a way to check if tuple is at the root of HOT chain because
predecessor array will always be having either xmin from non-abandoned
transaction or it will be zero. We can't differentiate root or tuple
inserted via abandoned transaction.


> > +                             continue;
> > +                     }
> > +
> > +                     curr_lp = PageGetItemId(ctx.page, currentoffnum);
> > +                     curr_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> curr_lp);
> > +                     curr_xmin = HeapTupleHeaderGetXmin(curr_htup);
> > +
> > +                     ctx.itemid = pred_lp = PageGetItemId(ctx.page,
> ctx.offnum);
> > +                     pred_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> pred_lp);
> > +                     pred_xmin = HeapTupleHeaderGetXmin(pred_htup);
> > +
> > +                     /*
> > +                      * If the predecessor's xmin is aborted or in
> progress, the
> > +                      * current tuples xmin should be aborted or in
> progress
> > +                      * respectively. Also both xmin's must be equal.
> > +                      */
> > +                     if (!TransactionIdEquals(pred_xmin, curr_xmin) &&
> > +                             !TransactionIdDidCommit(pred_xmin))
> > +                     {
> > +                             report_corruption(&ctx,
> > +
>  psprintf("tuple with uncommitted xmin %u was updated to produce a tuple at
> offset %u with differing xmin %u",
> > +
>         (unsigned) pred_xmin, (unsigned) currentoffnum, (unsigned)
> curr_xmin));
>
> Is this necessarily true? What about a tuple that was inserted in a
> subtransaction and then updated in another subtransaction of the same
> toplevel
> transaction?
>
>
not sure if I am getting? I have tried with below test and don't see any
issue,

‘postgres[14723]=#’drop table test2;
DROP TABLE
‘postgres[14723]=#’create table test2 (a int, b int primary key);
CREATE TABLE
‘postgres[14723]=#’insert into test2 values (1,1);
INSERT 0 1
‘postgres[14723]=#’BEGIN;
BEGIN
‘postgres[14723]=#*’update test2 set a =2 where a =1;
UPDATE 1
‘postgres[14723]=#*’savepoint s1;
SAVEPOINT
‘postgres[14723]=#*’update test2 set a =6;
UPDATE 1
‘postgres[14723]=#*’rollback to savepoint s1;
ROLLBACK
‘postgres[14723]=#*’update test2 set a =6;
UPDATE 1
‘postgres[14723]=#*’savepoint s2;
SAVEPOINT
‘postgres[14723]=#*’update test2 set a =7;
UPDATE 1
‘postgres[14723]=#*’end;
COMMIT
‘postgres[14723]=#’SELECT lp as tuple, t_xmin, t_xmax, t_field3 as t_cid,
t_ctid,tuple_data_split('test2'::regclass, t_data, t_infomask, t_infomask2,
t_bits), heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM
heap_page_items(get_raw_page('test2', 0));
 tuple | t_xmin | t_xmax | t_cid | t_ctid |       tuple_data_split        |
                        heap_tuple_infomask_flags
-------+--------+--------+-------+--------+-------------------------------+---------------------------------------------------------------------------
     1 |   1254 |   1255 |     0 | (0,2)  | {"\\x01000000","\\x01000000"} |
("{HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED}",{})
     2 |   1255 |   1257 |     1 | (0,4)  | {"\\x02000000","\\x01000000"} |
("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
     3 |   1256 |      0 |     1 | (0,3)  | {"\\x06000000","\\x01000000"} |
("{HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})
     4 |   1257 |   1258 |     2 | (0,5)  | {"\\x06000000","\\x01000000"} |
("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
     5 |   1258 |      0 |     3 | (0,5)  | {"\\x07000000","\\x01000000"} |
("{HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})
(5 rows)


>
> > +                     }
> > +
> > +                     /*
> > +                      * If the predecessor's xmin is not frozen, then
> current tuple's
> > +                      * shouldn't be either.
> > +                      */
> > +                     if (pred_xmin != FrozenTransactionId && curr_xmin
> == FrozenTransactionId)
> > +                     {
> > +                             report_corruption(&ctx,
> > +
>  psprintf("unfrozen tuple was updated to produce a tuple at offset %u which
> is frozen",
> > +
>         (unsigned) currentoffnum));
> > +                     }
>
> Can't we have a an update chain that is e.g.
> xmin 10, xmax 5 -> xmin 5, xmax invalid
>
> and a vacuum cutoff of 7? That'd preent the first tuple from being removed,
> but would allow 5 to be frozen.
>
> I think there were recent patches proposing we don't freeze in that case,
> but
> we'll having done that in the past....
>
>
Not very sure about this, was trying with such case but found hard to
reproduce this.


-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

Reply via email to