Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

Alvaro Herrera Fri, 03 Feb 2017 10:43:42 -0800

Pavan Deolasee wrote:

> Looking at the history and some past discussions, it seems Tomas reported
> somewhat similar problem and Andres proposed a patch here
> https://www.postgresql.org/message-id/[email protected]
> which got committed via b23b0f5588d2e2. Not exactly the same issue, but
> related to relcache flush happening in index_open().
> 
> I wonder if we should just add a recheck logic
> in RelationGetIndexAttrBitmap() such that if rd_indexvalid is seen as 0 at
> the end, we go back and start again from RelationGetIndexList(). Or is
> there any code path that relies on finding the old information?


Good find on that old report.  It occured to me to re-run Tomas' test
case with this second patch you proposed (the "ddl" test takes 11
minutes in my laptop), and lo and behold -- your "recheck" code enters
an infinite loop.  Not good.  I tried some more tricks that came to
mind, but I didn't get any good results.  Pavan and I discussed it
further and he came up with the idea of returning the bitmapset we
compute, but if an invalidation occurs in index_open, simply do not
cache the bitmapsets so that next time this is called they are all
computed again.  Patch attached.

This appears to have appeased both test_decoding's ddl test under
CACHE_CLOBBER_ALWAYS, as well as the CREATE INDEX CONCURRENTLY bug.

I intend to commit this soon to all branches, to ensure it gets into the
next set of minors.

Here is a detailed reproducer for the CREATE INDEX CONCURRENTLY bug.
Line numbers are as of today's master; comments are supposed to help
locating good spots in other branches.  Given the use of gdb
breakpoints, there's no good way to integrate this with regression
tests, which is a pity because this stuff looks a little brittle to me,
and if it breaks it is hard to detect.

Prep:
DROP TABLE IF EXISTS testtab;
CREATE TABLE testtab (a int unique, b int, c int);
INSERT INTO testtab VALUES (1, 100, 10);
CREATE INDEX testindx ON testtab (b);

S1:
SELECT * FROM testtab;
UPDATE testtab SET b = b + 1 WHERE a = 1;
SELECT pg_backend_pid();
  attach GDB to this session.
  break relcache.c:4800
      # right before entering the per-index loop
  cont

S2:
SELECT pg_backend_pid();
DROP INDEX testindx;
  attach GDB to this session
  handle SIGUSR1 nostop
  break indexcmds.c:666
     # right after index_create
  break indexcmds.c:790
     # right before the second CommitTransactionCommand
  cont
CREATE INDEX CONCURRENTLY testindx ON testtab (b);
  -- this blocks in breakpoint 1.  Leave it there.

S1:
UPDATE testtab SET b = b + 1 WHERE a = 1;
  -- this blocks in breakpoint 1. Leave it there.

S2:
  gdb -> cont
  -- This blocks waiting for S1

S1:
  gdb -> cont
  -- S1 finishes the update.  S2 awakens and blocks again in breakpoint 2.

S1:
  -- Now run more UPDATEs:
UPDATE testtab SET b = b + 1 WHERE a = 1;
  This produces a broken HOT chain.
  This can be done several times; all these updates should be non-HOT
  because of the index created by CIC, but they are marked HOT.

S2: "cont"
  From this point onwards, update are correct.

SELECT * FROM testtab;
SET enable_seqscan TO 0;
SET enable_bitmapscan TO 0;
SELECT * FROM testtab WHERE b between 100 and 110;

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

commit 7054c21883154f67058d42180606e0c5428d2745[m
Author:     Alvaro Herrera <[email protected]>
AuthorDate: Fri Feb 3 15:40:37 2017 -0300
CommitDate: Fri Feb 3 15:41:59 2017 -0300

    Fix CREATE INDEX CONCURRENTLY relcache bug

diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index 8a7c560..003ccfa 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4745,9 +4745,11 @@ RelationGetIndexPredicate(Relation relation)
  * Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
  * we can include system attributes (e.g., OID) in the bitmap representation.
  *
- * Caller had better hold at least RowExclusiveLock on the target relation
- * to ensure that it has a stable set of indexes.  This also makes it safe
- * (deadlock-free) for us to take locks on the relation's indexes.
+ * While all callers should at least RowExclusiveLock on the target relation,
+ * we still can't guarantee a stable set of indexes because CREATE INDEX
+ * CONCURRENTLY and DROP INDEX CONCURRENTLY can change set of indexes, without
+ * taking any conflicting lock. So we must be prepared to handle changes in the
+ * set of indexes and recompute bitmaps, when necessary.
  *
  * The returned result is palloc'd in the caller's memory context and should
  * be bms_free'd when not needed anymore.
@@ -4763,7 +4765,6 @@ RelationGetIndexAttrBitmap(Relation relation, 
IndexAttrBitmapKind attrKind)
        Oid                     relpkindex;
        Oid                     relreplindex;
        ListCell   *l;
-       MemoryContext oldcxt;
 
        /* Quick exit if we already computed the result. */
        if (relation->rd_indexattr != NULL)
@@ -4807,6 +4808,8 @@ RelationGetIndexAttrBitmap(Relation relation, 
IndexAttrBitmapKind attrKind)
        relpkindex = relation->rd_pkindex;
        relreplindex = relation->rd_replidindex;
 
+       Assert(relation->rd_indexvalid != 0);
+
        /*
         * For each index, add referenced attributes to indexattrs.
         *
@@ -4893,18 +4896,23 @@ RelationGetIndexAttrBitmap(Relation relation, 
IndexAttrBitmapKind attrKind)
        relation->rd_idattr = NULL;
 
        /*
-        * Now save copies of the bitmaps in the relcache entry.  We 
intentionally
-        * set rd_indexattr last, because that's the one that signals validity 
of
-        * the values; if we run out of memory before making that copy, we won't
-        * leave the relcache entry looking like the other ones are valid but
-        * empty.
+        * Now save copies of the bitmaps in the relcache entry, but only if no
+        * invalidation occured in the meantime.  We intentionally set 
rd_indexattr
+        * last, because that's the one that signals validity of the values; if 
we
+        * run out of memory before making that copy, we won't leave the 
relcache
+        * entry looking like the other ones are valid but empty.
         */
-       oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
-       relation->rd_keyattr = bms_copy(uindexattrs);
-       relation->rd_pkattr = bms_copy(pkindexattrs);
-       relation->rd_idattr = bms_copy(idindexattrs);
-       relation->rd_indexattr = bms_copy(indexattrs);
-       MemoryContextSwitchTo(oldcxt);
+       if (relation->rd_indexvalid != 0)
+       {
+               MemoryContext oldcxt;
+
+               oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+               relation->rd_keyattr = bms_copy(uindexattrs);
+               relation->rd_pkattr = bms_copy(pkindexattrs);
+               relation->rd_idattr = bms_copy(idindexattrs);
+               relation->rd_indexattr = bms_copy(indexattrs);
+               MemoryContextSwitchTo(oldcxt);
+       }
 
        /* We return our original working copy for caller to play with */
        switch (attrKind)

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

Reply via email to