On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote: > The simplest possible fix is to use ShareLock > instead ShareUpdateExclusiveLock in the index_concurrently_swap > > oldClassRel = relation_open(oldIndexId, ShareLock); > newClassRel = relation_open(newIndexId, ShareLock); > > But this is not a "concurrent" way. But such update should be fast enough > as far as I understand.
Nope, that won't fly far. We should not use a ShareLock in this step or we are going to conflict with row exclusive locks, impacting all workloads when doing a REINDEX CONCURRENTLY. That may be a long shot, but the issue is that we do the swap of all the indexes in a single transaction, but do not wait for them to complete when committing the swap's transaction in phase 4. Your report is telling us that we really have a good reason to wait for all the transactions that may use these indexes to finish. One thing coming on top of my mind to keep things concurrent-safe while allowing a clean use of the arbiter indexes would be to stick a WaitForLockersMultiple() on AccessExclusiveLock just *before* the transaction commit of phase 4, say, lacking the progress report part: --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -4131,6 +4131,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein CommandCounterIncrement(); } + WaitForLockersMultiple(lockTags, AccessExclusiveLock, true); + /* Commit this transaction and make index swaps visible */ CommitTransactionCommand(); StartTransactionCommand(); This is a non-fresh Friday-afternoon idea, but it would make sure that we don't have any transactions using the indexes switched to _ccold with indisvalid that are waiting for a drop in phase 5. Your tests seem to pass with that, and that keeps the operation intact concurrent-wise (I'm really wishing for isolation tests with injection points just now, because I could use them here). > + Assert(indexRelation->rd_index->indislive); > + Assert(indexRelation->rd_index->indisvalid); > + > if (!indexRelation->rd_index->indimmediate) > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), This kind of validation check may be a good idea in the long term. That seems incredibly useful to me if we were to add more code paths that do concurrent index rebuilds, to make sure that we don't rely on an index we should not use at all. That's a HEAD-only thing IMO, though. -- Michael
signature.asc
Description: PGP signature