On Sun, Mar 11, 2018 at 3:22 AM, David Rowley
<david.row...@2ndquadrant.com> wrote:
> Due to the failure during the index build, it appears that the
> PG_TRY/PG_CATCH block in reindex_relation() causes the reindex_index()
> to abort and jump out to the catch block. Here there's a call to
> ResetReindexPending(), which complains as we're still left in parallel
> mode from the aborted _bt_begin_parallel() call which has called
> EnterParallelMode(), but not managed to make it all the way to
> _bt_end_parallel() (called from btbuild()), where ExitParallelMode()
> is normally called.
>
> Subsequent attempts to refresh the materialized view result in an
> Assert failure in list_member_oid()

Thanks for the report.

> I've not debugged that, but I assume it's because
> pendingReindexedIndexes is left as a non-empty list but has had its
> memory context obliterated due to the previous query having ended.

It's not really related to memory lifetime, so much as a corruption of
the state that tracks reindexed indexes within a backend. This is of
course due to that "cannot modify reindex state during a parallel
operation" error you saw.

> The comment in the following fragment is not well honored by the
> ResetReindexPending() since it does not clear the list if there's an
> error.

> A perhaps simple fix would be just to have ResetReindexPending() only
> reset the list to NIL again and not try to raise any error.

I noticed a very similar bug in ResetReindexProcessing() just before
parallel CREATE INDEX was committed. The fix there was simply not
throwing a "can't happen" error. I agree that the same fix should be
used here. It's not worth enforcing !IsInParallelMode() in the reset
functions; just enforcing !IsInParallelMode() in the set functions is
sufficient. Attached patch does this.

-- 
Peter Geoghegan
From 79f6708165c83c39b3e1bf785539bee84a28f650 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@bowt.ie>
Date: Sun, 11 Mar 2018 12:18:25 -0700
Subject: [PATCH] Fix corruption of backend REINDEX processing state.

When parallel index builds within the PG_TRY/PG_CATCH block in
reindex_relation() raised any error, reindex_index() jumped out to the
reindex_relation() catch block.  ResetReindexPending() was called from
there, without being prepared for the possibility that the backend is
still in parallel mode due to being in an error/cleanup path.  By
raising an error before the backend's state could be reset, the state
could never get reset.  Reindexing could continually fail within an
affected backend.

To fix, make ResetReindexPending() take the same approach as
ResetReindexProcessing(), and simply don't enforce that we cannot be in
parallel mode.

Author: Peter Geoghegan
Reported-By: David Rowley
Discussion: https://postgr.es/m/CAKJS1f91kq1wfYR8rnRRfKtxyhU2woEA+=whd640uxmyu+o...@mail.gmail.com
---
 src/backend/catalog/index.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 564f206..0da37d9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4054,8 +4054,7 @@ RemoveReindexPending(Oid indexOid)
 static void
 ResetReindexPending(void)
 {
-	if (IsInParallelMode())
-		elog(ERROR, "cannot modify reindex state during a parallel operation");
+	/* This may be called in leader error path */
 	pendingReindexedIndexes = NIL;
 }
 
-- 
2.7.4

Reply via email to