On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote:
> The simplest possible fix is to use ShareLock
> instead ShareUpdateExclusiveLock in the index_concurrently_swap
> 
>             oldClassRel = relation_open(oldIndexId, ShareLock);
>             newClassRel = relation_open(newIndexId, ShareLock);
> 
> But this is not a "concurrent" way. But such update should be fast enough
> as far as I understand.

Nope, that won't fly far.  We should not use a ShareLock in this step
or we are going to conflict with row exclusive locks, impacting all
workloads when doing a REINDEX CONCURRENTLY.

That may be a long shot, but the issue is that we do the swap of all
the indexes in a single transaction, but do not wait for them to
complete when committing the swap's transaction in phase 4.  Your
report is telling us that we really have a good reason to wait for all
the transactions that may use these indexes to finish.  One thing
coming on top of my mind to keep things concurrent-safe while allowing
a clean use of the arbiter indexes would be to stick a
WaitForLockersMultiple() on AccessExclusiveLock just *before* the
transaction commit of phase 4, say, lacking the progress report part:
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4131,6 +4131,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid 
relationOid, const Rein
                CommandCounterIncrement();
        }
 
+       WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
+
        /* Commit this transaction and make index swaps visible */
        CommitTransactionCommand();
        StartTransactionCommand();

This is a non-fresh Friday-afternoon idea, but it would make sure that
we don't have any transactions using the indexes switched to _ccold
with indisvalid that are waiting for a drop in phase 5.  Your tests
seem to pass with that, and that keeps the operation intact
concurrent-wise (I'm really wishing for isolation tests with injection
points just now, because I could use them here).

> +             Assert(indexRelation->rd_index->indislive);
> +             Assert(indexRelation->rd_index->indisvalid);
> +
>               if (!indexRelation->rd_index->indimmediate)
>                       ereport(ERROR,
>                                       
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

This kind of validation check may be a good idea in the long term.
That seems incredibly useful to me if we were to add more code paths
that do concurrent index rebuilds, to make sure that we don't rely on
an index we should not use at all.  That's a HEAD-only thing IMO,
though.
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to