On Thu, Apr 22, 2021 at 01:36:03PM -0700, Andres Freund wrote:
> The sequence in StartupXLOG() leading to the issue is the following:
> 
> 1) RecoverPreparedTransactions();
> 2) ShutdownRecoveryTransactionEnvironment();
> 3) XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
> 
> Because 2) resets the KnownAssignedXids machinery, snapshots that happen
> between 2) and 3) will not actually look at the procarray to compute
> snapshots, as that only happens within
> 
>       snapshot->takenDuringRecovery = RecoveryInProgress();
>       if (!snapshot->takenDuringRecovery)
> 
> and RecoveryInProgress() checks XLogCtl->SharedRecoveryState !=
> RECOVERY_STATE_DONE, which is set in 3).

Oh, indeed.  It is easy to see RecentXmin jumping back-and-worth while
running 023_pitr_prepared_xact.pl with a small sleep added just after
ShutdownRecoveryTransactionEnvironment().

> Without prepared transaction there probably isn't an issue, because there
> shouldn't be any other in-progress xids at that point?

Yes, there should not be any as far as I recall.  2PC is kind of
special with its fake ProcArray entries.

> I think to fix the issue we'd have to move
> ShutdownRecoveryTransactionEnvironment() to after XLogCtl->SharedRecoveryState
> = RECOVERY_STATE_DONE.
> 
> The acquisition of ProcArrayLock() in
> ShutdownRecoveryTransactionEnvironment()->ExpireAllKnownAssignedTransactionIds()
> should prevent the data from being removed between the RecoveryInProgress()
> and the KnownAssignedXidsGetAndSetXmin() calls in GetSnapshotData().
> 
> I haven't yet figured out whether there would be a problem with deferring the
> other tasks in ShutdownRecoveryTransactionEnvironment() until after
> RECOVERY_STATE_DONE.

Hmm.  This would mean releasing all the exclusive locks tracked by a
standby, as of StandbyReleaseAllLocks(), after opening the instance
for writes after a promotion.  I don't think that's unsafe, but it
would be intrusive.

Anyway, isn't the issue ExpireAllKnownAssignedTransactionIds() itself,
where we should try to not wipe out the 2PC entries to make sure that
all those snapshots still see the 2PC transactions as something to
count on?  I am attaching a crude patch to show the idea.

Thoughts?
--
Michael
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 42a89fc5dc..2e819163c4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4853,10 +4853,24 @@ KnownAssignedXidsRemovePreceding(TransactionId removeXid)
 
 	if (!TransactionIdIsValid(removeXid))
 	{
-		elog(trace_recovery(DEBUG4), "removing all KnownAssignedXids");
-		pArray->numKnownAssignedXids = 0;
-		pArray->headKnownAssignedXids = pArray->tailKnownAssignedXids = 0;
-		return;
+		tail = pArray->tailKnownAssignedXids;
+		head = pArray->headKnownAssignedXids;
+
+		for (i = tail; i < head; i++)
+		{
+			if (KnownAssignedXidsValid[i])
+			{
+				TransactionId knownXid = KnownAssignedXids[i];
+
+				if (!StandbyTransactionIdIsPrepared(knownXid))
+				{
+					KnownAssignedXidsValid[i] = false;
+					count++;
+				}
+			}
+		}
+
+		goto recompute;
 	}
 
 	elog(trace_recovery(DEBUG4), "prune KnownAssignedXids to %u", removeXid);
@@ -4885,6 +4899,9 @@ KnownAssignedXidsRemovePreceding(TransactionId removeXid)
 		}
 	}
 
+	/* fall through */
+recompute:
+
 	pArray->numKnownAssignedXids -= count;
 	Assert(pArray->numKnownAssignedXids >= 0);
 

Attachment: signature.asc
Description: PGP signature

Reply via email to