Re: [HACKERS] Hot Standy introduced problem with query cancel behavior

Andres Freund Mon, 11 Jan 2010 21:31:09 -0800

On 01/07/10 22:37, Andres Freund wrote:

On Thursday 07 January 2010 22:28:46 Tom Lane wrote:

Andres Freund<and...@anarazel.de>  writes:

I did not want to suggest using Simons code there. Sorry for the brevity.
should have read as "revert to old code and add two step killing (thats
likely around 10 lines or so)".


"two step killing" meaning that we signal ERROR for a few times and if
nothing happens that we like, we signal FATAL.
As the code already loops around signaling anyway that should be easy to
implement.

Ah.  This loop happens in the process that's trying to send the cancel
signal, correct, not the one that needs to respond to it?  That sounds
fairly sane to me.

Yes.

There are some things we could do to make it more likely that a cancel
of this type is accepted --- for instance, give it a distinct SQLSTATE
code that *can not* be trapped by plpgsql EXCEPTION blocks --- but there
is no practical way to guarantee it except elog(FATAL).  I'm not
entirely convinced that an untrappable error would be a good thing
anyway; it's hard to argue that that's much better than a FATAL.

Well a session which is usable after a transaction abort is quite sensible -
quite some software I know handles a failing transaction much more gracefully
than a session abort (e.g. because it has to deal with serialization failures
and such anyway).

So making it cought by fewer places and degrading to FATAL sound sensible and
relatively easy to me.
Unless somebody disagrees I will give it a shot.

Ok, here is a stab at that:

1. Allow the signal multiplexing facility to transfer one sig_atomic_t
worth of data. This is usefull e.g. for making operations idempotent
or more precise.

In this the LocalBackendId is transported - that should make itimpossible to cancel the wrong transaction


Use the signal multiplexing facility.

2.

AbortTransactionAndAnySubtransactions is only used in the mainloopserror handler as it should be unproblematic there.

In the current CVS code ConditionalVirtualXactLockTableWait() inResolveRecoveryConflictWithVirtualXIDs does the wait for every try ofcancelling the other transaction.

I moved the retry logic into CancelVirtualTransaction(). If 50 times aERROR does nothing it degrades to FATAL


XXX: I temporarily do not use the skipExistingConflicts argument of

GetConflictingVirtualXIDs - I dont understand its purpose and a bit ofinfrastructure is missing right now as the recoveryConflictMode is notstored anymore anywhere. Can easily be added back though.



3.
Add a new error code ERRCODE_QUERY_CANCELED_HS for use with HS
indicating a failure that is more than a plain
ERRCODE_QUERY_CANCELED - namely it should not be caught from
various places like savepoints and in PLs.

Exemplarily I checked for that error code in plpgsql and make ituncatcheable.

I am not sure how new errorcode codes get chosen though - and the nameis not that nice.


Opinions on that?

I copied quite a bit from Simons earlier patch...

---

Currently the patch does not yet do anything to avoid letting theprotocol out of sync. What do you think about adding a flag for errorcodes not to communicate with the client (Similarly to COMERROR)?


So that one could do an elog(ERROR & ERROR_NO_SEND_CLIENT, .. or such?

Andres

PS: My current MUA suffers from a wronggone upgrade currently, so noidea how that message will appear.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 90e7b4a..a6e2405 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** AbortOutOfAnyTransaction(void)
*** 3712,3717 ****
--- 3712,3761 ----
  }
  
  /*
+  *	AbortTransactionAndAnySubtransactions
+  *
+  * Similar to AbortCurrentTransaction but if any subtransactions
+  * in progress we abort them *and* all of their parents. So this is
+  * used when the caller wishes to make the abort untrappable by the user.
+  * After this has run IsAbortedTransactionBlockState() will be true.
+  */
+ void
+ AbortTransactionAndAnySubtransactions(void)
+ {
+ 	while(true){
+ 		TransactionState s = CurrentTransactionState;
+ 
+ 		switch (s->blockState)
+ 		{
+ 			case TBLOCK_DEFAULT:
+ 			case TBLOCK_STARTED:
+ 			case TBLOCK_BEGIN:
+ 			case TBLOCK_INPROGRESS:
+ 			case TBLOCK_END:
+ 			case TBLOCK_ABORT:
+ 			case TBLOCK_SUBABORT:
+ 			case TBLOCK_ABORT_END:
+ 			case TBLOCK_ABORT_PENDING:
+ 			case TBLOCK_PREPARE:
+ 			case TBLOCK_SUBABORT_END:
+ 			case TBLOCK_SUBABORT_RESTART:
+ 				AbortCurrentTransaction();
+ 				return;
+ 				break;
+ 
+ 			case TBLOCK_SUBINPROGRESS:
+ 			case TBLOCK_SUBBEGIN:
+ 			case TBLOCK_SUBEND:
+ 			case TBLOCK_SUBABORT_PENDING:
+ 			case TBLOCK_SUBRESTART:
+ 				AbortSubTransaction();
+ 				CleanupSubTransaction();
+ 				break;
+ 		}
+ 	}
+ }
+ 
+ /*
   * IsTransactionBlock --- are we within a transaction block?
   */
  bool
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 9cb28f7..d8ea470 100644
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 51,56 ****
--- 51,57 ----
  #include "access/xact.h"
  #include "access/twophase.h"
  #include "miscadmin.h"
+ #include "storage/lmgr.h"
  #include "storage/procarray.h"
  #include "storage/standby.h"
  #include "utils/builtins.h"
*************** ProcArrayClearTransaction(PGPROC *proc)
*** 377,383 ****
  	proc->xid = InvalidTransactionId;
  	proc->lxid = InvalidLocalTransactionId;
  	proc->xmin = InvalidTransactionId;
- 	proc->recoveryConflictMode = 0;
  
  	/* redundant, but just in case */
  	proc->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
--- 378,383 ----
*************** GetConflictingVirtualXIDs(TransactionId 
*** 1665,1672 ****
  		if (proc->pid == 0)
  			continue;
  
! 		if (skipExistingConflicts && proc->recoveryConflictMode > 0)
! 			continue;
  
  		if (!OidIsValid(dbOid) ||
  			proc->databaseId == dbOid)
--- 1665,1677 ----
  		if (proc->pid == 0)
  			continue;
  
! 		/*
! 		 * XXX: doesn't skipExistingConflicts have the danger of
! 		 * shadowing an FATAL error?
! 		 * I.e. ERROR is signalled first and then FATAL is signalled...
! 		 * if (skipExistingConflicts && ...)
! 		 *   continue;
! 		 */
  
  		if (!OidIsValid(dbOid) ||
  			proc->databaseId == dbOid)
*************** GetConflictingVirtualXIDs(TransactionId 
*** 1704,1717 ****
   * Returns pid of the process signaled, or 0 if not found.
   */
  pid_t
! CancelVirtualTransaction(VirtualTransactionId vxid, int cancel_mode)
  {
  	ProcArrayStruct *arrayP = procArray;
  	int			index;
  	pid_t		pid = 0;
  
! 	LWLockAcquire(ProcArrayLock, LW_SHARED);
  
  	for (index = 0; index < arrayP->numProcs; index++)
  	{
  		VirtualTransactionId procvxid;
--- 1709,1728 ----
   * Returns pid of the process signaled, or 0 if not found.
   */
  pid_t
! CancelVirtualTransaction(VirtualTransactionId vxid,
! 						 recovery_conflict_mode cancel_mode)
  {
  	ProcArrayStruct *arrayP = procArray;
  	int			index;
  	pid_t		pid = 0;
  
! 	ProcSignalReason  sigmode = PROCSIG_CONFLICT_ERROR_INTERRUPT;
! 	size_t num_locks = 0;
  
+ 	/*
+ 	 * Find pid
+ 	 */
+ 	LWLockAcquire(ProcArrayLock, LW_SHARED);
  	for (index = 0; index < arrayP->numProcs; index++)
  	{
  		VirtualTransactionId procvxid;
*************** CancelVirtualTransaction(VirtualTransact
*** 1722,1747 ****
  		if (procvxid.backendId == vxid.backendId &&
  			procvxid.localTransactionId == vxid.localTransactionId)
  		{
- 			/*
- 			 * Issue orders for the proc to read next time it receives SIGINT
- 			 */
- 			if (proc->recoveryConflictMode < cancel_mode)
- 				proc->recoveryConflictMode = cancel_mode;
- 
  			pid = proc->pid;
  			break;
  		}
  	}
- 
  	LWLockRelease(ProcArrayLock);
  
! 	if (pid != 0)
! 	{
  		/*
! 		 * Kill the pid if it's still here. If not, that's what we wanted
! 		 * so ignore any errors.
  		 */
! 		kill(pid, SIGINT);
  	}
  
  	return pid;
--- 1733,1782 ----
  		if (procvxid.backendId == vxid.backendId &&
  			procvxid.localTransactionId == vxid.localTransactionId)
  		{
  			pid = proc->pid;
  			break;
  		}
  	}
  	LWLockRelease(ProcArrayLock);
  
! 	if(!pid)
! 		return pid;
! 
! 	/*
! 	 * Kill the transaction. We retry 50 times - if we didnt finish
! 	 * the transaction till then, we start using FATAL.
! 	 *
! 	 * We dont need to lock while signalling as the messages contain
! 	 * only sig_atomic_t wide contents which itself identify the
! 	 * operation completely. I.e. it contains the local transaction
! 	 * id, so it can't abort the wrong transaction.
! 	 */
! 	if (cancel_mode == CONFLICT_MODE_FATAL)
! 		sigmode = PROCSIG_CONFLICT_FATAL_INTERRUPT;
! 
! 	while(!ConditionalVirtualXactLockTableWait(vxid)){
! 		if(sigmode == PROCSIG_CONFLICT_ERROR_INTERRUPT && num_locks++ > 50){
! 			sigmode = PROCSIG_CONFLICT_FATAL_INTERRUPT;
! 
! 			elog(trace_recovery(DEBUG1),
! 				 "recovery tried to cancel virtual transaction %u/%u pid %ld. No success so far. Using FATAL",
! 				 vxid.backendId,
! 				 vxid.localTransactionId,
! 				 (long) pid);
! 		}
! 
! 		if(sigmode == PROCSIG_CONFLICT_FATAL_INTERRUPT){
! 			SendProcSignal(pid, sigmode, vxid.backendId, pid);
! 		}
! 		else{
! 			SendProcSignal(pid, sigmode, vxid.backendId, vxid.localTransactionId);
! 		}
! 
  		/*
! 		 * Wait awhile for it to die so that we avoid flooding an
! 		 * unresponsive backend when system is heavily loaded.
  		 */
! 		pg_usleep(5000);
  	}
  
  	return pid;
*************** CancelDBBackends(Oid databaseid)
*** 1842,1851 ****
  	{
  		volatile PGPROC *proc = arrayP->procs[index];
  
  		if (proc->databaseId == databaseid)
  		{
! 			proc->recoveryConflictMode = CONFLICT_MODE_FATAL;
! 			kill(proc->pid, SIGINT);
  		}
  	}
  
--- 1877,1887 ----
  	{
  		volatile PGPROC *proc = arrayP->procs[index];
  
+ 
  		if (proc->databaseId == databaseid)
  		{
! 			SendProcSignal(proc->pid, PROCSIG_CONFLICT_FATAL_INTERRUPT,
! 						   InvalidBackendId, true);
  		}
  	}
  
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 2892727..1fbb886 100644
*** a/src/backend/storage/ipc/procsignal.c
--- b/src/backend/storage/ipc/procsignal.c
***************
*** 24,29 ****
--- 24,31 ----
  #include "storage/procsignal.h"
  #include "storage/shmem.h"
  #include "storage/sinval.h"
+ #include "storage/standby.h"
+ #include "tcop/tcopprot.h"
  
  
  /*
*************** void
*** 255,260 ****
--- 257,276 ----
  procsignal_sigusr1_handler(SIGNAL_ARGS)
  {
  	int		save_errno = errno;
+ 	sig_atomic_t signal_data;
+ 
+ 	/*
+ 	 * We check the possible signals in decreasing order of
+ 	 * importance. For example if were FATALing the backend there is
+ 	 * no point in sending out NOTIFYs before that.
+ 	 * XXX: Possibly we should not calll the other handlers at all when
+ 	 * receiving FATAL
+ 	 */
+ 	if ((signal_data = CheckProcSignal(PROCSIG_CONFLICT_FATAL_INTERRUPT)))
+ 		RecoveryConflictInterrupt(CONFLICT_MODE_FATAL, signal_data);
+ 
+ 	if ((signal_data = CheckProcSignal(PROCSIG_CONFLICT_ERROR_INTERRUPT)))
+ 		RecoveryConflictInterrupt(CONFLICT_MODE_ERROR, signal_data);
  
  	if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
  		HandleCatchupInterrupt();
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 227b9cc..614914d 100644
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
*************** WaitExceedsMaxStandbyDelay(void)
*** 164,170 ****
   */
  void
  ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   char *reason, int cancel_mode)
  {
  	char		waitactivitymsg[100];
  
--- 164,170 ----
   */
  void
  ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   char *reason, recovery_conflict_mode cancel_mode)
  {
  	char		waitactivitymsg[100];
  
*************** ResolveRecoveryConflictWithVirtualXIDs(V
*** 234,246 ****
  					{
  						case CONFLICT_MODE_FATAL:
  							elog(trace_recovery(DEBUG1),
! 								 "recovery disconnects session with pid %ld because of conflict with %s",
  								 (long) pid,
  								 reason);
  							break;
  						case CONFLICT_MODE_ERROR:
  							elog(trace_recovery(DEBUG1),
! 								 "recovery cancels virtual transaction %u/%u pid %ld because of conflict with %s",
  								 waitlist->backendId,
  								 waitlist->localTransactionId,
  								 (long) pid,
--- 234,246 ----
  					{
  						case CONFLICT_MODE_FATAL:
  							elog(trace_recovery(DEBUG1),
! 								 "recovery disconnected session with pid %ld because of conflict with %s",
  								 (long) pid,
  								 reason);
  							break;
  						case CONFLICT_MODE_ERROR:
  							elog(trace_recovery(DEBUG1),
! 								 "recovery canceled virtual transaction %u/%u pid %ld because of conflict with %s",
  								 waitlist->backendId,
  								 waitlist->localTransactionId,
  								 (long) pid,
*************** ResolveRecoveryConflictWithVirtualXIDs(V
*** 250,261 ****
  							/* No conflict pending, so fall through */
  							break;
  					}
- 
- 					/*
- 					 * Wait awhile for it to die so that we avoid flooding an
- 					 * unresponsive backend when system is heavily loaded.
- 					 */
- 					pg_usleep(5000);
  				}
  			}
  		}
--- 250,255 ----
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 7daa5e8..3d4bd10 100644
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
*************** InitProcess(void)
*** 318,324 ****
  	MyProc->waitProcLock = NULL;
  	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
  		SHMQueueInit(&(MyProc->myProcLocks[i]));
- 	MyProc->recoveryConflictMode = 0;
  
  	/*
  	 * We might be reusing a semaphore that belonged to a failed process. So
--- 318,323 ----
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index be20bf3..919b1a3 100644
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** static int	UseNewLine = 1;		/* Use newli
*** 172,177 ****
--- 172,178 ----
  static int	UseNewLine = 0;		/* Use EOF as query delimiters */
  #endif   /* TCOP_DONTUSENEWLINE */
  
+ static LocalTransactionId CanceledLocalTransaction = InvalidLocalTransactionId;
  
  /* ----------------------------------------------------------------
   *		decls for routines only used in this file
*************** StatementCancelHandler(SIGNAL_ARGS)
*** 2642,2648 ****
  		 * service the interrupt immediately
  		 */
  		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
! 			CritSectionCount == 0)
  		{
  			/* bump holdoff count to make ProcessInterrupts() a no-op */
  			/* until we are done getting ready for it */
--- 2643,2649 ----
  		 * service the interrupt immediately
  		 */
  		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
! 			CritSectionCount == 0 && !DoingCommandRead)
  		{
  			/* bump holdoff count to make ProcessInterrupts() a no-op */
  			/* until we are done getting ready for it */
*************** ProcessInterrupts(void)
*** 2711,2819 ****
  					(errcode(ERRCODE_ADMIN_SHUTDOWN),
  			 errmsg("terminating connection due to administrator command")));
  	}
  	if (QueryCancelPending)
  	{
  		QueryCancelPending = false;
  		if (ClientAuthInProgress)
- 		{
- 			ImmediateInterruptOK = false;	/* not idle anymore */
- 			DisableNotifyInterrupt();
- 			DisableCatchupInterrupt();
- 			/* As in quickdie, don't risk sending to client during auth */
- 			if (whereToSendOutput == DestRemote)
- 				whereToSendOutput = DestNone;
  			ereport(ERROR,
  					(errcode(ERRCODE_QUERY_CANCELED),
  					 errmsg("canceling authentication due to timeout")));
! 		}
! 		if (cancel_from_timeout)
! 		{
! 			ImmediateInterruptOK = false;	/* not idle anymore */
! 			DisableNotifyInterrupt();
! 			DisableCatchupInterrupt();
  			ereport(ERROR,
  					(errcode(ERRCODE_QUERY_CANCELED),
  					 errmsg("canceling statement due to statement timeout")));
! 		}
! 		if (IsAutoVacuumWorkerProcess())
! 		{
! 			ImmediateInterruptOK = false;	/* not idle anymore */
! 			DisableNotifyInterrupt();
! 			DisableCatchupInterrupt();
  			ereport(ERROR,
  					(errcode(ERRCODE_QUERY_CANCELED),
  					 errmsg("canceling autovacuum task")));
! 		}
! 		{
! 			int cancelMode = MyProc->recoveryConflictMode;
  
  			/*
! 			 * XXXHS: We don't yet have a clean way to cancel an
! 			 * idle-in-transaction session, so make it FATAL instead.
! 			 * This isn't as bad as it looks because we don't issue a
! 			 * CONFLICT_MODE_ERROR for a session with proc->xmin == 0
! 			 * on cleanup conflicts. There's a possibility that we
! 			 * marked somebody as a conflict and then they go idle.
  			 */
! 			if (DoingCommandRead && IsTransactionBlock() &&
! 				cancelMode == CONFLICT_MODE_ERROR)
! 			{
! 				cancelMode = CONFLICT_MODE_FATAL;
  			}
  
- 			switch (cancelMode)
- 			{
- 				case CONFLICT_MODE_FATAL:
- 					ImmediateInterruptOK = false;	/* not idle anymore */
- 					DisableNotifyInterrupt();
- 					DisableCatchupInterrupt();
- 					Assert(RecoveryInProgress());
- 					ereport(FATAL,
- 							(errcode(ERRCODE_QUERY_CANCELED),
- 							 errmsg("canceling session due to conflict with recovery")));
  
! 				case CONFLICT_MODE_ERROR:
! 					/*
! 					 * We are aborting because we need to release
! 					 * locks. So we need to abort out of all
! 					 * subtransactions to make sure we release
! 					 * all locks at whatever their level.
! 					 *
! 					 * XXX Should we try to examine the
! 					 * transaction tree and cancel just enough
! 					 * subxacts to remove locks? Doubt it.
! 					 */
! 					ImmediateInterruptOK = false;	/* not idle anymore */
! 					DisableNotifyInterrupt();
! 					DisableCatchupInterrupt();
! 					Assert(RecoveryInProgress());
! 					AbortOutOfAnyTransaction();
! 					ereport(ERROR,
! 							(errcode(ERRCODE_QUERY_CANCELED),
! 							 errmsg("canceling statement due to conflict with recovery")));
  
! 				default:
! 					/* No conflict pending, so fall through */
! 					break;
! 			}
  		}
  
  		/*
! 		 * If we are reading a command from the client, just ignore the
! 		 * cancel request --- sending an extra error message won't
! 		 * accomplish anything.  Otherwise, go ahead and throw the error.
  		 */
! 		if (!DoingCommandRead)
  		{
! 			ImmediateInterruptOK = false;	/* not idle anymore */
  			DisableNotifyInterrupt();
  			DisableCatchupInterrupt();
! 			ereport(ERROR,
! 					(errcode(ERRCODE_QUERY_CANCELED),
! 					 errmsg("canceling statement due to user request")));
  		}
  	}
! 	/* If we get here, do nothing (probably, QueryCancelPending was reset) */
  }
  
  
--- 2712,2818 ----
  					(errcode(ERRCODE_ADMIN_SHUTDOWN),
  			 errmsg("terminating connection due to administrator command")));
  	}
+ 
  	if (QueryCancelPending)
  	{
  		QueryCancelPending = false;
+ 		ImmediateInterruptOK = false;	/* not idle anymore */
+ 		DisableNotifyInterrupt();
+ 		DisableCatchupInterrupt();
+ 		/* As in quickdie, don't risk sending to client during auth */
+ 		if (ClientAuthInProgress && whereToSendOutput == DestRemote)
+ 			whereToSendOutput = DestNone;
  		if (ClientAuthInProgress)
  			ereport(ERROR,
  					(errcode(ERRCODE_QUERY_CANCELED),
  					 errmsg("canceling authentication due to timeout")));
! 		else if (cancel_from_timeout)
  			ereport(ERROR,
  					(errcode(ERRCODE_QUERY_CANCELED),
  					 errmsg("canceling statement due to statement timeout")));
! 		else if (IsAutoVacuumWorkerProcess())
  			ereport(ERROR,
  					(errcode(ERRCODE_QUERY_CANCELED),
  					 errmsg("canceling autovacuum task")));
! 		else
! 			ereport(ERROR,
! 					(errcode(ERRCODE_QUERY_CANCELED),
! 					 errmsg("canceling statement due to user request")));
! 	}
! 
  
+ 	{
+ 		LocalTransactionId current_canceled = CanceledLocalTransaction;
+ 
+ 		/*
+ 		 * To avoid cancelling the wrong transaction because the
+ 		 * normal transaction already finished and generated a new
+ 		 * error before the signal handler or ProcessInterrupts has
+ 		 * run we recheck the current LocalTransactionId.
+ 		 */
+ 
+ 		if(CanceledLocalTransaction != InvalidLocalTransactionId)
+ 		{
+ 			ImmediateInterruptOK = false;	/* not idle anymore */
  			/*
! 			 * Cancel the transaction whether or not it was idle, so don't mention
! 			 * the word idle in the message.
  			 */
! 			if (current_canceled == MyProc->lxid){
! 				ereport(ERROR,
! 						(errcode(ERRCODE_QUERY_CANCELED_HS),
! 						 errmsg("canceling transaction due to conflict with recovery")));
  			}
+ 		}
+ 	}
+ }
  
  
! void
! RecoveryConflictInterrupt(recovery_conflict_mode conflict_mode, sig_atomic_t signal_data)
! {
! 	int			save_errno = errno;
  
! 	/*
! 	 * Don't joggle the elbow of proc_exit
! 	 */
! 	if (!proc_exit_inprogress)
! 	{
! 		switch (conflict_mode)
! 		{
! 			case CONFLICT_MODE_FATAL:
! 				ProcDiePending = true;
! 				break;
! 
! 			case CONFLICT_MODE_ERROR:
! 				CanceledLocalTransaction = (LocalTransactionId)signal_data;
! 				break;
! 
! 			default:
! 				elog(ERROR, "Unknown conflict mode");
  		}
  
+ 		InterruptPending = true;
+ 
  		/*
! 		 * If it's safe to interrupt, and we're waiting for input or a lock,
! 		 * service the interrupt immediately. Same as in die()
  		 */
! 		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
! 			CritSectionCount == 0)
  		{
! 			/* bump holdoff count to make ProcessInterrupts() a no-op */
! 			/* until we are done getting ready for it */
! 			InterruptHoldoffCount++;
! 			LockWaitCancel();	/* prevent CheckDeadLock from running */
  			DisableNotifyInterrupt();
  			DisableCatchupInterrupt();
! 			InterruptHoldoffCount--;
! 			ProcessInterrupts();
  		}
  	}
! 
! 	errno = save_errno;
  }
  
  
*************** PostgresMain(int argc, char *argv[], con
*** 3560,3568 ****
  		debug_query_string = NULL;
  
  		/*
! 		 * Abort the current transaction in order to recover.
  		 */
! 		AbortCurrentTransaction();
  
  		/*
  		 * Now return to normal top-level context and clear ErrorContext for
--- 3559,3593 ----
  		debug_query_string = NULL;
  
  		/*
! 		 * Abort the current transaction in order to recover. If were
! 		 * in HS this is the only safe point where we can abort not
! 		 * only the current transaction but also all transaction above
! 		 * the current as there exists no higher point to jump to and
! 		 * thus were not playing around with somebodys execution context.
! 		 *
! 		 * To avoid cancelling the wrong transaction because the
! 		 * normal transaction already finished and generated a new
! 		 * error before the signal handler has run we recheck the
! 		 * current LocalTransactionId.
! 		 *
! 		 * XXX: Possibly this should use the new error code for a
! 		 * transaction canceled by HS instead.
  		 */
! 		{
! 			LocalTransactionId current_canceled = CanceledLocalTransaction;
! 			if(current_canceled != InvalidLocalTransactionId &&
! 			   current_canceled == MyProc->lxid)
! 				AbortTransactionAndAnySubtransactions();
! 			else
! 				AbortCurrentTransaction();
! 		}
! 
! 		/*
! 		 * We cannot overwrite a newly canceled LocalTransactionId
! 		 * here because we would have to leave that block to start a
! 		 * new transaction.
! 		 */
! 		CanceledLocalTransaction = InvalidLocalTransactionId;
  
  		/*
  		 * Now return to normal top-level context and clear ErrorContext for
*************** PostgresMain(int argc, char *argv[], con
*** 3598,3603 ****
--- 3623,3630 ----
  
  	for (;;)
  	{
+ 		CanceledLocalTransaction = InvalidLocalTransactionId;
+ 
  		/*
  		 * At top of loop, reset extended-query-message flag, so that any
  		 * errors encountered in "idle" state don't provoke skip.
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 4c67be5..47b325a 100644
*** a/src/include/access/xact.h
--- b/src/include/access/xact.h
*************** extern bool IsTransactionBlock(void);
*** 204,209 ****
--- 204,210 ----
  extern bool IsTransactionOrTransactionBlock(void);
  extern char TransactionBlockStatusCode(void);
  extern void AbortOutOfAnyTransaction(void);
+ extern void AbortTransactionAndAnySubtransactions(void);
  extern void PreventTransactionChain(bool isTopLevel, const char *stmtType);
  extern void RequireTransactionChain(bool isTopLevel, const char *stmtType);
  extern bool IsInTransactionChain(bool isTopLevel);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index de0df3b..46c74ef 100644
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
*************** struct PGPROC
*** 95,107 ****
  
  	uint8		vacuumFlags;	/* vacuum-related flags, see above */
  
- 	/*
- 	 * While in hot standby mode, setting recoveryConflictMode instructs
- 	 * the backend to commit suicide. Possible values are the same as those
- 	 * passed to ResolveRecoveryConflictWithVirtualXIDs().
- 	 */
- 	int			recoveryConflictMode;
- 
  	/* Info about LWLock the process is currently waiting for, if any. */
  	bool		lwWaiting;		/* true if waiting for an LW lock */
  	bool		lwExclusive;	/* true if waiting for exclusive access */
--- 95,100 ----
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 314491d..868058b 100644
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
***************
*** 17,22 ****
--- 17,23 ----
  #include "storage/lock.h"
  #include "storage/standby.h"
  #include "utils/snapshot.h"
+ #include "storage/procsignal.h"
  
  
  extern Size ProcArrayShmemSize(void);
*************** extern VirtualTransactionId *GetCurrentV
*** 59,65 ****
  extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin,
  					Oid dbOid, bool skipExistingConflicts);
  extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid,
! 						 int cancel_mode);
  
  extern int	CountActiveBackends(void);
  extern int	CountDBBackends(Oid databaseid);
--- 60,66 ----
  extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin,
  					Oid dbOid, bool skipExistingConflicts);
  extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid,
! 									  recovery_conflict_mode cancel_mode);
  
  extern int	CountActiveBackends(void);
  extern int	CountDBBackends(Oid databaseid);
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 3b0d56d..795346c 100644
*** a/src/include/storage/procsignal.h
--- b/src/include/storage/procsignal.h
***************
*** 30,35 ****
--- 30,38 ----
   */
  typedef enum
  {
+ 	PROCSIG_CONFLICT_FATAL_INTERRUPT,	/* recovery conflict fatal */
+ 	PROCSIG_CONFLICT_ERROR_INTERRUPT,	/* recovery conflict error */
+ 
  	PROCSIG_CATCHUP_INTERRUPT,	/* sinval catchup interrupt */
  	PROCSIG_NOTIFY_INTERRUPT,	/* listen/notify interrupt */
  
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7a1c41f..ac950b7 100644
*** a/src/include/storage/standby.h
--- b/src/include/storage/standby.h
***************
*** 20,31 ****
  extern int	vacuum_defer_cleanup_age;
  
  /* cancel modes for ResolveRecoveryConflictWithVirtualXIDs */
! #define CONFLICT_MODE_NOT_SET		0
! #define CONFLICT_MODE_ERROR			1	/* Conflict can be resolved by canceling query */
! #define CONFLICT_MODE_FATAL			2	/* Conflict can only be resolved by disconnecting session */
  
  extern void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   char *reason, int cancel_mode);
  
  extern void InitRecoveryTransactionEnvironment(void);
  extern void ShutdownRecoveryTransactionEnvironment(void);
--- 20,33 ----
  extern int	vacuum_defer_cleanup_age;
  
  /* cancel modes for ResolveRecoveryConflictWithVirtualXIDs */
! typedef enum {
! 	CONFLICT_MODE_NOT_SET, /* No conflict */
! 	CONFLICT_MODE_FATAL, /* Conflict can only be resolved by disconnecting session */
! 	CONFLICT_MODE_ERROR /* Conflict can be resolved by canceling query */
! } recovery_conflict_mode;
  
  extern void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
! 									   char *reason, recovery_conflict_mode cancel_mode);
  
  extern void InitRecoveryTransactionEnvironment(void);
  extern void ShutdownRecoveryTransactionEnvironment(void);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 1298f7d..b901b6a 100644
*** a/src/include/tcop/tcopprot.h
--- b/src/include/tcop/tcopprot.h
***************
*** 19,27 ****
--- 19,30 ----
  #ifndef TCOPPROT_H
  #define TCOPPROT_H
  
+ #include <signal.h>
  #include "executor/execdesc.h"
  #include "nodes/parsenodes.h"
  #include "utils/guc.h"
+ #include "storage/standby.h"
+ 
  
  
  /* Required daylight between max_stack_depth and the kernel limit, in bytes */
*************** extern bool assign_max_stack_depth(int n
*** 63,68 ****
--- 66,73 ----
  extern void die(SIGNAL_ARGS);
  extern void quickdie(SIGNAL_ARGS);
  extern void StatementCancelHandler(SIGNAL_ARGS);
+ 
+ extern void RecoveryConflictInterrupt(recovery_conflict_mode conflict_mode, sig_atomic_t signal_data);
  extern void FloatExceptionHandler(SIGNAL_ARGS);
  extern void prepare_for_client_read(void);
  extern void client_read_ended(void);
diff --git a/src/include/utils/errcodes.h b/src/include/utils/errcodes.h
index 52c09ca..279f0e4 100644
*** a/src/include/utils/errcodes.h
--- b/src/include/utils/errcodes.h
***************
*** 328,333 ****
--- 328,334 ----
  /* Class 57 - Operator Intervention (class borrowed from DB2) */
  #define ERRCODE_OPERATOR_INTERVENTION		MAKE_SQLSTATE('5','7', '0','0','0')
  #define ERRCODE_QUERY_CANCELED				MAKE_SQLSTATE('5','7', '0','1','4')
+ #define ERRCODE_QUERY_CANCELED_HS			MAKE_SQLSTATE('5','7', '0','1','5')
  #define ERRCODE_ADMIN_SHUTDOWN				MAKE_SQLSTATE('5','7', 'P','0','1')
  #define ERRCODE_CRASH_SHUTDOWN				MAKE_SQLSTATE('5','7', 'P','0','2')
  #define ERRCODE_CANNOT_CONNECT_NOW			MAKE_SQLSTATE('5','7', 'P','0','3')
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b9ca54f..810ef8c 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*************** exception_matches_conditions(ErrorData *
*** 897,903 ****
  		 * OTHERS matches everything *except* query-canceled; if you're
  		 * foolish enough, you can match that explicitly.
  		 */
! 		if (sqlerrstate == 0)
  		{
  			if (edata->sqlerrcode != ERRCODE_QUERY_CANCELED)
  				return true;
--- 897,905 ----
  		 * OTHERS matches everything *except* query-canceled; if you're
  		 * foolish enough, you can match that explicitly.
  		 */
! 		if (edata->sqlerrcode == ERRCODE_QUERY_CANCELED_HS)
! 			;
! 		else if (sqlerrstate == 0)
  		{
  			if (edata->sqlerrcode != ERRCODE_QUERY_CANCELED)
  				return true;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standy introduced problem with query cancel behavior

Reply via email to