Re: race condition in pg_class

Noah Misch Wed, 01 Nov 2023 20:09:39 -0700

I prototyped two ways, one with a special t_ctid and one with LockTuple().

On Fri, Oct 27, 2023 at 04:26:12PM -0700, Noah Misch wrote:
> On Fri, Oct 27, 2023 at 06:40:55PM -0400, Tom Lane wrote:
> > Noah Misch <n...@leadboat.com> writes:
> > > On Fri, Oct 27, 2023 at 03:32:26PM -0400, Tom Lane wrote:


> > > I anticipate a new locktag per catalog that can receive inplace updates,
> > > i.e. LOCKTAG_RELATION_DEFINITION and LOCKTAG_DATABASE_DEFINITION.
> > 
> > We could perhaps make this work by using the existing tuple-lock
> > infrastructure, rather than inventing new locktags (a choice that
> > spills to a lot of places including clients that examine pg_locks).
> 
> That could be okay.  It would be weird to reuse a short-term lock like that
> one as something held till end of transaction.  But the alternative of new
> locktags ain't perfect, as you say.

That worked.

> > I would prefer though to find a solution that only depends on making
> > heap_inplace_update protect itself, without high-level cooperation
> > from the possibly-interfering updater.  This is basically because
> > I'm still afraid that we're defining the problem too narrowly.
> > For one thing, I have nearly zero confidence that GRANT et al are
> > the only problematic source of conflicting transactional updates.
> 
> Likewise here, but I have fair confidence that an assertion would flush out
> the rest.  heap_inplace_update() would assert that the backend holds one of
> the acceptable locks.  It could even be an elog; heap_inplace_update() can
> tolerate that cost.

That check would fall in both heap_inplace_update() and heap_update().  After
all, a heap_inplace_update() check won't detect an omission in GRANT.

> > For another, I'm worried that some extension may be using
> > heap_inplace_update against a catalog we're not considering here.
> 
> A pgxn search finds "citus" using heap_inplace_update().
> 
> > I'd also like to find a solution that fixes the case of a conflicting
> > manual UPDATE (although certainly that's a stretch goal we may not be
> > able to reach).
> 
> It would be nice.

I expect most approaches could get there by having ExecModifyTable() arrange
for the expected locking or other actions.  That's analogous to how
heap_update() takes care of sinval even for a manual UPDATE.

> > I wonder if there's a way for heap_inplace_update to mark the tuple
> > header as just-updated in a way that regular heap_update could
> > recognize.  (For standard catalog updates, we'd then end up erroring
> > in simple_heap_update, which I think is fine.)  We can't update xmin,
> > because the VACUUM callers don't have an XID; but maybe there's some
> > other way?  I'm speculating about putting a funny value into xmax,
> > or something like that, and having heap_update check that what it
> > sees in xmax matches what was in the tuple the update started with.
> 
> Hmmm.  Achieving it without an XID would be the real trick.  (With an XID, we
> could use xl_heap_lock like heap_update() does.)  Thinking out loud, what if
> heap_inplace_update() sets HEAP_XMAX_INVALID and xmax =
> TransactionIdAdvance(xmax)?  Or change t_ctid in a similar way.  Then regular
> heap_update() could complain if the field changed vs. last seen value.  This
> feels like something to regret later in terms of limiting our ability to
> harness those fields for more-valuable ends or compact them away in a future
> page format.  I can't pinpoint a specific loss, so the idea might have legs.
> Nontransactional data in separate tables or in new metapages smells like the
> right long-term state.  A project wanting to reuse the tuple header bits could
> introduce such storage to unblock its own bit reuse.

heap_update() does not have the pre-modification xmax today, so I used t_ctid.
heap_modify_tuple() preserves t_ctid, so heap_update() already has the
pre-modification t_ctid in key cases.  For details of how the prototype uses
t_ctid, see comment at "#define InplaceCanaryOffsetNumber".  The prototype
doesn't prevent corruption in the following scenario, because the aborted
ALTER TABLE RENAME overwrites the special t_ctid:

  == session 1
  drop table t;
  create table t (c int);
  begin;
  -- in gdb, set breakpoint on heap_modify_tuple
  grant select on t to public;

  == session 2
  alter table t add primary key (c);
  begin; alter table t rename to t2; rollback;

  == back in session 1
  -- release breakpoint
  -- want error (would get it w/o the begin;alter;rollback)
  commit;

I'm missing how to mark the tuple in a fashion accessible to a second
heap_update() after a rolled-back heap_update().  The mark needs enough bits
"N" so it's implausible for 2^N inplace updates to happen between GRANT
fetching the old tuple and GRANT completing heap_update().  Looking for bits
that could persist across a rolled-back heap_update(), we have 3 in t_ctid, 2
in t_infomask2, and 0 in xmax.  I definitely don't want to paint us into a
corner by spending the t_infomask2 bits on this.  Even if I did, 2^(3+2)=32
wouldn't clearly be enough inplace updates.

Is there a way to salvage the goal of fixing the bug without modifying code
like ExecGrant_common()?  If not, I'm inclined to pick from one of the
following designs:

- Acquire ShareUpdateExclusiveLock in GRANT ((B1) from previous list).  It
  does make GRANT more intrusive; e.g. GRANT will cancel autovacuum.  I'm
  leaning toward this one for two reasons.  First, it doesn't slow
  heap_update() outside of assert builds.  Second, it makes the GRANT
  experience more like the rest our DDL, in that concurrent DDL will make
  GRANT block, not fail.

- GRANT passes to heapam the fixed-size portion of the pre-modification tuple.
  heap_update() compares those bytes to the oldtup in shared buffers to see if
  an inplace update happened.  (HEAD could get the bytes from a new
  heap_update() parameter, while back branches would need a different passing
  approach.)

- LockTuple(), as seen in its attached prototype.  I like this least at the
  moment, because it changes pg_locks content without having a clear advantage
  over the previous option.  Also, the prototype has enough FIXME markers that
  I expect this to get hairy before it's done.

I might change my preference after further prototypes.  Does anyone have a
strong preference between those?  Briefly, I did consider these additional
alternatives:

- Just accept the yet-rarer chance of corruption from this message's test
  procedure.

- Hold a buffer lock long enough to solve things.

- Remember the tuples where we overwrote a special t_ctid, and reverse the
  overwrite during abort processing.  But I/O in the abort path sounds bad.

Thanks,
nm

Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de815..c0c33e9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2801,6 +2801,10 @@ l1:
        HeapTupleHeaderSetXmax(tp.t_data, new_xmax);
        HeapTupleHeaderSetCmax(tp.t_data, cid, iscombo);
        /* Make sure there is no forward chain link in t_ctid */
+       /*
+        * FIXME this will overwrite any InplaceCanary ctid.  Leave c_ctid
+        * unchanged?  Accept that a rolled-back DROP could undo the protection?
+        */
        tp.t_data->t_ctid = tp.t_self;
 
        /* Signal that this is actually a move into another partition */
@@ -3348,6 +3352,8 @@ l2:
                if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
                {
                        result = TM_Updated;
+                       /* FiXME will InplaceCanary mechanism have race 
conditions w/ this
+                        * or other t_ctid asserts in this file? */
                        Assert(!ItemPointerEquals(&oldtup.t_self, 
&oldtup.t_data->t_ctid));
                }
        }
@@ -3730,6 +3736,13 @@ l2:
                                                                                
   id_has_external,
                                                                                
   &old_key_copied);
 
+       /* FIXME missing anything important via the IsValid tests? */
+       if (ItemPointerIsValid(&oldtup.t_data->t_ctid) &&
+               ItemPointerIsValid(&newtup->t_data->t_ctid) &&
+               ItemPointerGetOffsetNumber(&oldtup.t_data->t_ctid) == 
InplaceCanaryOffsetNumber &&
+               !ItemPointerEquals(&oldtup.t_data->t_ctid, 
&newtup->t_data->t_ctid))
+               elog(ERROR, "tuple concurrently updated");
+
        /* NO EREPORT(ERROR) from here till changes are logged */
        START_CRIT_SECTION();
 
@@ -4747,6 +4760,11 @@ failed:
         * updated, we need to follow the update chain to lock the new versions 
of
         * the tuple as well.
         */
+       /*
+        * FIXME this will overwrite any InplaceCanary ctid.  Leave c_ctid
+        * unchanged?  Accept that SELECT FOR UPDATE on the catalog table could
+        * undo the protection?
+        */
        if (HEAP_XMAX_IS_LOCKED_ONLY(new_infomask))
                tuple->t_data->t_ctid = *tid;
 
@@ -5892,6 +5910,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
        HeapTupleHeader htup;
        uint32          oldlen;
        uint32          newlen;
+       HeapTupleData oldtup;
 
        /*
         * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -5922,12 +5941,29 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
        if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
                elog(ERROR, "wrong tuple length");
 
+       /* evaluate whether updated */
+       /* FIXME wait for a concurrent updater, in case it aborts */
+       oldtup.t_tableOid = RelationGetRelid(relation);
+       oldtup.t_data = htup;
+       oldtup.t_len = ItemIdGetLength(lp);
+       oldtup.t_self = tuple->t_self;
+       if (TM_Ok !=
+               HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false), 
buffer))
+               elog(ERROR, "tuple concurrently updated");
+
        /* NO EREPORT(ERROR) from here till changes are logged */
        START_CRIT_SECTION();
 
        memcpy((char *) htup + htup->t_hoff,
                   (char *) tuple->t_data + tuple->t_data->t_hoff,
                   newlen);
+       if (ItemPointerGetOffsetNumber(&htup->t_ctid) ==
+               InplaceCanaryOffsetNumber)
+               BlockIdRetreat(&htup->t_ctid.ip_blkid);
+       else
+               ItemPointerSet(&htup->t_ctid,
+                                          MaxBlockNumber,
+                                          InplaceCanaryOffsetNumber);
 
        MarkBufferDirty(buffer);
 
@@ -5945,6 +5981,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
                XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
                XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);
 
+               /* don't log t_ctid: concurrent changes not happening in 
recovery */
                /* inplace updates aren't decoded atm, don't log the origin */
 
                recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
diff --git a/src/include/storage/block.h b/src/include/storage/block.h
index 31a036d..f8d9157 100644
--- a/src/include/storage/block.h
+++ b/src/include/storage/block.h
@@ -105,4 +105,14 @@ BlockIdGetBlockNumber(const BlockIdData *blockId)
        return (((BlockNumber) blockId->bi_hi) << 16) | ((BlockNumber) 
blockId->bi_lo);
 }
 
+/* wraps on underflow, avoids InvalidBlockNumber */
+static inline void
+BlockIdRetreat(BlockIdData *blockId)
+{
+       BlockNumber proposal = BlockIdGetBlockNumber(blockId) - 1;
+       if (proposal == InvalidBlockNumber)
+               proposal = MaxBlockNumber;
+       BlockIdSet(blockId, proposal);
+}
+
 #endif                                                 /* BLOCK_H */
diff --git a/src/include/storage/off.h b/src/include/storage/off.h
index 3540308..8ff776a 100644
--- a/src/include/storage/off.h
+++ b/src/include/storage/off.h
@@ -27,6 +27,19 @@ typedef uint16 OffsetNumber;
 #define FirstOffsetNumber              ((OffsetNumber) 1)
 #define MaxOffsetNumber                        ((OffsetNumber) (BLCKSZ / 
sizeof(ItemIdData)))
 
+/*
+ * If a t_ctid contains InplaceCanaryOffsetNumber, it's a special ctid
+ * signifying that the tuple received a heap_inplace_update().  This is
+ * expected only in system catalogs, though extensions can use it elsewhere.
+ * (Offsets greater than MaxOffsetNumber are otherwise unattested.)  The
+ * BlockNumber acts as a counter to distinguish multiple inplace updates.  It
+ * starts at MaxBlockNumber, counts down to 0, and wraps back to
+ * MaxBlockNumber after 4B inplace updates.  (Counting upward or other ways
+ * would be fine, but this choice maximizes these special TIDs looking
+ * different from regular TIDs.)
+ */
+#define InplaceCanaryOffsetNumber      (InvalidOffsetNumber - 1)
+
 /* ----------------
  *             support macros
  * ----------------
diff --git a/src/test/isolation/expected/intra-grant-inplace.out 
b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..e7404e8
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,29 @@
+Parsed test spec with 2 sessions
+
+starting permutation: b1 g1 s2 a2 s2 c1 s2
+step b1: BEGIN ISOLATION LEVEL READ COMMITTED;
+step g1: GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
+step a2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c);
+ERROR:  tuple concurrently updated
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
+step c1: COMMIT;
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule 
b/src/test/isolation/isolation_schedule
index b2be88e..fcc4ad4 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -86,6 +86,7 @@ test: alter-table-3
 test: alter-table-4
 test: create-trigger
 test: sequence-ddl
+test: intra-grant-inplace
 test: async-notify
 test: vacuum-no-cleanup-lock
 test: timeouts
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec 
b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..b5720c5
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,29 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, such as
+# relhasindex=true, need special code to cope.
+#
+# FIXME this is an isolation test on the assumption that I'll change
+# heap_inplace_update() to wait for the GRANT to commit or abort.  If I don't
+# do that, this could be a non-isolation test using dblink().
+
+setup
+{
+  CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+  DROP TABLE intra_grant_inplace;
+}
+
+session s1
+step b1  { BEGIN ISOLATION LEVEL READ COMMITTED; }
+step g1  { GRANT SELECT ON intra_grant_inplace TO PUBLIC; }
+step c1  { COMMIT; }
+
+session s2
+step s2  { SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass; }
+step a2  { ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+
+permutation b1 g1 s2 a2 s2 c1 s2

Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de815..8e51557 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3730,6 +3730,15 @@ l2:
                                                                                
   id_has_external,
                                                                                
   &old_key_copied);
 
+       /*
+        * FIXME fail if updating pg_class without holding either (a)
+        * LOCKTAG_RELATION on this row's rel, in a mode conflicting w/ 
ShareLock,
+        * or (b) LOCKTAG_TUPLE.  Fail if updating pg_database without holding
+        * either (a) LOCKTAG_OBJECT on this row's database OID, in FIXME mode, 
or
+        * (b) LOCKTAG_TUPLE.  This may imply the executor taking the tuple 
locks
+        * during SQL UPDATE of those tables.
+        */
+
        /* NO EREPORT(ERROR) from here till changes are logged */
        START_CRIT_SECTION();
 
@@ -5892,6 +5901,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
        HeapTupleHeader htup;
        uint32          oldlen;
        uint32          newlen;
+       HeapTupleData oldtup;
 
        /*
         * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -5904,6 +5914,8 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
                                (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
                                 errmsg("cannot update tuples during a parallel 
operation")));
 
+       /* FIXME fail unless holding LockTuple(relation, &tuple->t-self, 
AccessExclusiveLock */
+
        buffer = ReadBuffer(relation, 
ItemPointerGetBlockNumber(&(tuple->t_self)));
        LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
        page = (Page) BufferGetPage(buffer);
@@ -5922,6 +5934,16 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
        if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
                elog(ERROR, "wrong tuple length");
 
+       /* evaluate whether updated */
+       /* FIXME wait for a concurrent updater, in case it aborts */
+       oldtup.t_tableOid = RelationGetRelid(relation);
+       oldtup.t_data = htup;
+       oldtup.t_len = ItemIdGetLength(lp);
+       oldtup.t_self = tuple->t_self;
+       if (TM_Ok !=
+               HeapTupleSatisfiesUpdate(&oldtup, GetCurrentCommandId(false), 
buffer))
+               elog(ERROR, "tuple concurrently updated");
+
        /* NO EREPORT(ERROR) from here till changes are logged */
        START_CRIT_SECTION();
 
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 3ce6c09..dba3137 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -1853,7 +1853,7 @@ ExecGrant_Relation(InternalGrant *istmt)
                HeapTuple       tuple;
                ListCell   *cell_colprivs;
 
-               tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+               tuple = SearchSysCacheLocked1(relation, RELOID, 
ObjectIdGetDatum(relOid));
                if (!HeapTupleIsValid(tuple))
                        elog(ERROR, "cache lookup failed for relation %u", 
relOid);
                pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2190,7 +2190,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, 
AclMode default_privs,
                Oid                *oldmembers;
                Oid                *newmembers;
 
-               tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
+               tuple = SearchSysCacheLocked1(relation, cacheid, 
ObjectIdGetDatum(objectid));
                if (!HeapTupleIsValid(tuple))
                        elog(ERROR, "cache lookup failed for %s %u", 
get_object_class_descr(classid), objectid);
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 143fae0..7c02380 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2860,11 +2860,19 @@ index_update_stats(Relation rel,
                tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
                tuple = heap_copytuple(tuple);
                table_endscan(pg_class_scan);
+               LockTuple(pg_class, &tuple->t_self, AccessExclusiveLock);
+               /* FIXME loop in case the pg_class tuple for pg_class moved */
        }
        else
        {
+               HeapTuple cachetup;
+
                /* normal case, use syscache */
-               tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
+               cachetup = SearchSysCacheLocked1(pg_class,
+                                                                               
 RELOID, ObjectIdGetDatum(relid));
+               tuple = heap_copytuple(cachetup);
+               if (HeapTupleIsValid(cachetup))
+                       ReleaseSysCache(cachetup);
        }
 
        if (!HeapTupleIsValid(tuple))
@@ -2933,6 +2941,7 @@ index_update_stats(Relation rel,
                CacheInvalidateRelcacheByTuple(tuple);
        }
 
+       UnlockTuple(pg_class, &tuple->t_self, AccessExclusiveLock);
        heap_freetuple(tuple);
 
        table_close(pg_class, RowExclusiveLock);
diff --git a/src/backend/utils/cache/syscache.c 
b/src/backend/utils/cache/syscache.c
index 8dbda00..76be8b5 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -76,6 +76,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/pg_user_mapping.h"
 #include "lib/qunique.h"
+#include "storage/lmgr.h"
 #include "utils/catcache.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -827,6 +828,70 @@ SearchSysCache1(int cacheId,
        return SearchCatCache1(SysCache[cacheId], key1);
 }
 
+/*
+ * Like SearchSysCache1, but a LOCKTAG_TUPLE heavyweight lock is held in
+ * AccessExclusiveMode on return.
+ *
+ * pg_class and pg_database are subject to both heap_inplace_update() and
+ * regular heap_update().  We must not lose the effects of any inplace update.
+ * That might arise by heap_inplace_update() mutating a tuple, the xmax of
+ * which then commits.  Alternatively, it could arise by heap_inplace_update()
+ * mutating a tuple before an ongoing transaction's heap_update(), after the
+ * ongoing transaction fetches the old tuple to modify.  We prevent both with
+ * locking as follows.  All heap_inplace_update(pg_class) callers acquire two
+ * locks.  First, they acquire LOCKTAG_RELATION in mode ShareLock,
+ * ShareUpdateExclusiveLock, or a mode with strictly more conflicts.  Second,
+ * they acquire LOCKTAG_TUPLE.  heap_update(pg_class) callers can either take
+ * any lock that conflicts with one of those two.  Acquire one's choice of
+ * lock before fetching the tuple to be modified.  FIXME write corresponding
+ * story for pg_database.
+ *
+ * This function bundles the tasks of retrieving a tuple and acquiring the
+ * LOCKTAG_TUPLE.  Since much DDL opts to acquire LOCKTAG_RELATION, the
+ * LOCKTAG_TUPLE doesn't block such updaters of the returned tuple.  Callers
+ * shall reject superseded tuples, such as by checking with
+ * HeapTupleSatisfiesUpdate() while holding an exclusive lock on the tuple's
+ * buffer.
+ *
+ * We opt to loop until the search finds the same tuple we've locked.  FIXME
+ * This might not be strictly necessary, but it likely avoids some spurious
+ * errors.  In the happy case, this takes two fetches, one to determine the
+ * tid to lock and another to confirm that the TID remains the latest tuple.
+ *
+ * FIXME consider dropping Relation arg and deducing applicable locktag fields
+ * from cacheId.
+ *
+ * FIXME this may belong in a different file, with available key counts other
+ * than 1, etc.
+ */
+HeapTuple
+SearchSysCacheLocked1(Relation rel,
+                                         int cacheId,
+                                         Datum key1)
+{
+       ItemPointerData tid;
+       bool retry = false;
+
+       for (;;)
+       {
+               HeapTuple       tuple;
+
+               tuple = SearchSysCache1(cacheId, key1);
+               if (!HeapTupleIsValid(tuple) ||
+                       (retry && ItemPointerEquals(&tid, &tuple->t_self)))
+                       return tuple;
+
+               if (retry)
+                       UnlockTuple(rel, &tid, AccessExclusiveLock);
+
+               tid = tuple->t_self;
+               ReleaseSysCache(tuple);
+               LockTuple(rel, &tid, AccessExclusiveLock);
+               retry = true;
+       }
+}
+
+
 HeapTuple
 SearchSysCache2(int cacheId,
                                Datum key1, Datum key2)
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index 5d47a65..aba5385 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -18,6 +18,7 @@
 
 #include "access/attnum.h"
 #include "access/htup.h"
+#include "utils/rel.h"
 /* we intentionally do not include utils/catcache.h here */
 
 /*
@@ -174,6 +175,10 @@ extern bool RelationInvalidatesSnapshotsOnly(Oid relid);
 extern bool RelationHasSysCache(Oid relid);
 extern bool RelationSupportsSysCache(Oid relid);
 
+extern HeapTuple SearchSysCacheLocked1(Relation rel,
+                                                                          int 
cacheId,
+                                                                          
Datum key1);
+
 /*
  * The use of the macros below rather than direct calls to the corresponding
  * functions is encouraged, as it insulates the caller from changes in the
diff --git a/src/test/isolation/expected/intra-grant-inplace.out 
b/src/test/isolation/expected/intra-grant-inplace.out
new file mode 100644
index 0000000..a1eed4e
--- /dev/null
+++ b/src/test/isolation/expected/intra-grant-inplace.out
@@ -0,0 +1,23 @@
+Parsed test spec with 2 sessions
+
+starting permutation: b1 g1 s2 a2 c1 s2
+step b1: BEGIN ISOLATION LEVEL READ COMMITTED;
+step g1: GRANT SELECT ON intra_grant_inplace TO PUBLIC;
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
+step a2: ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); <waiting ...>
+step c1: COMMIT;
+step a2: <... completed>
+ERROR:  tuple concurrently updated
+step s2: SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass;
+relhasindex
+-----------
+f          
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule 
b/src/test/isolation/isolation_schedule
index b2be88e..fcc4ad4 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -86,6 +86,7 @@ test: alter-table-3
 test: alter-table-4
 test: create-trigger
 test: sequence-ddl
+test: intra-grant-inplace
 test: async-notify
 test: vacuum-no-cleanup-lock
 test: timeouts
diff --git a/src/test/isolation/specs/intra-grant-inplace.spec 
b/src/test/isolation/specs/intra-grant-inplace.spec
new file mode 100644
index 0000000..85059ed
--- /dev/null
+++ b/src/test/isolation/specs/intra-grant-inplace.spec
@@ -0,0 +1,25 @@
+# GRANT's lock is the catalog tuple xmax.  GRANT doesn't acquire a heavyweight
+# lock on the object undergoing an ACL change.  In-place updates, such as
+# relhasindex=true, need special code to cope.
+
+setup
+{
+  CREATE TABLE intra_grant_inplace (c int);
+}
+
+teardown
+{
+  DROP TABLE intra_grant_inplace;
+}
+
+session s1
+step b1  { BEGIN ISOLATION LEVEL READ COMMITTED; }
+step g1  { GRANT SELECT ON intra_grant_inplace TO PUBLIC; }
+step c1  { COMMIT; }
+
+session s2
+step s2  { SELECT relhasindex FROM pg_class
+           WHERE oid = 'intra_grant_inplace'::regclass; }
+step a2  { ALTER TABLE intra_grant_inplace ADD PRIMARY KEY (c); }
+
+permutation b1 g1 s2 a2 c1 s2

Re: race condition in pg_class

Reply via email to