On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote:
> Hot Standby node can freeze when startup process calls LockBufferForCleanup().
> This bug can be reproduced by the following procedure.
>
> 0. start Hot Standby, with one active node(node A) and one standby node(node
> B)
> 1. create table X and table Y in node A
> 2. insert several rows in table X in node A
> 3. delete one row from table X in node A
> 4. begin xact 1 in node A, execute following commands, and leave xact 1 open
> 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE
> 5. wait until WAL's for above actions are applied in node B
> 6. begin xact 2 in node B, and execute following commands
> 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X;
> 6.2 FETCH test_cursor;
> 6.3 SELECT * FROM table Y;
> 7. execute VACUUM FREEZE table A in node A
> 8. commit xact 1 in node A
>
> ...then in node B occurs following "deadlock" situation, which is not
> detected by deadlock check.
> * startup process waits for xact 2 to release buffers in table X (in
> LockBufferForCleanup())
> * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table
> Y
Deadlock bug was prevented by stop-gap measure in December commit.
Full resolution patch attached for Startup process waits on buffer pins.
Startup process sets SIGALRM when waiting on a buffer pin. If woken by
alarm we send SIGUSR1 to all backends requesting that they check to see
if they are blocking Startup process. If so, they throw ERROR/FATAL as
for other conflict resolutions. Deadlock stop gap removed.
max_standby_delay = -1 option removed to prevent deadlock.
Reviews welcome, otherwise commit at end of week.
--
Simon Riggs www.2ndQuadrant.com
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 2399,2405 **** primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
</listitem>
<listitem>
<para>
! Waiting to acquire buffer cleanup locks (for which there is no time out)
</para>
</listitem>
<listitem>
--- 2399,2405 ----
</listitem>
<listitem>
<para>
! Waiting to acquire buffer cleanup locks
</para>
</listitem>
<listitem>
***************
*** 2536,2546 **** primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
Three-way deadlocks are possible between AccessExclusiveLocks arriving from
the primary, cleanup WAL records that require buffer cleanup locks and
user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
! are currently resolved by the cancellation of user processes that would
! need to wait on a lock. This is heavy-handed and generates more query
! cancellations than we need to, though does remove the possibility of deadlock.
! This behaviour is expected to improve substantially for the main release
! version of 8.5.
</para>
<para>
--- 2536,2542 ----
Three-way deadlocks are possible between AccessExclusiveLocks arriving from
the primary, cleanup WAL records that require buffer cleanup locks and
user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
! are resolved by time-out when we exceed <varname>max_standby_delay</>.
</para>
<para>
***************
*** 2630,2640 **** LOG: database system is ready to accept read only connections
<varname>max_standby_delay</> or even set it to zero, though that is a
very aggressive setting. If the standby server is tasked as an additional
server for decision support queries then it may be acceptable to set this
! to a value of many hours (in seconds). It is also possible to set
! <varname>max_standby_delay</> to -1 which means wait forever for queries
! to complete, if there are conflicts; this will be useful when performing
! an archive recovery from a backup.
! </para>
<para>
Transaction status "hint bits" written on primary are not WAL-logged,
--- 2626,2632 ----
<varname>max_standby_delay</> or even set it to zero, though that is a
very aggressive setting. If the standby server is tasked as an additional
server for decision support queries then it may be acceptable to set this
! to a value of many hours (in seconds).
<para>
Transaction status "hint bits" written on primary are not WAL-logged,
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 1825,1838 **** archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows
<listitem>
<para>
When server acts as a standby, this parameter specifies a wait policy
! for queries that conflict with incoming data changes. Valid settings
! are -1, meaning wait forever, or a wait time of 0 or more seconds.
! If a conflict should occur the server will delay up to this
! amount before it begins trying to resolve things less amicably, as
described in <xref linkend="hot-standby-conflict">. Typically,
this parameter makes sense only during replication, so when
! performing an archive recovery to recover from data loss a
! parameter setting of 0 is recommended. The default is 30 seconds.
This parameter can only be set in the <filename>postgresql.conf</>
file or on the server command line.
</para>
--- 1825,1839 ----
<listitem>
<para>
When server acts as a standby, this parameter specifies a wait policy
! for queries that conflict with data changes being replayed by recovery.
! If a conflict should occur the server will delay up to this number
! of seconds before it begins trying to resolve things less amicably, as
described in <xref linkend="hot-standby-conflict">. Typically,
this parameter makes sense only during replication, so when
! performing an archive recovery to recover from data loss a very high
! parameter setting is recommended. The default is 30 seconds.
! There is no wait-forever setting because of the potential for deadlock
! which that setting would introduce.
This parameter can only be set in the <filename>postgresql.conf</>
file or on the server command line.
</para>
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 8759,8767 **** StartupProcessMain(void)
*/
pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
pqsignal(SIGINT, SIG_IGN); /* ignore query cancel */
! pqsignal(SIGTERM, StartupProcShutdownHandler); /* request shutdown */
! pqsignal(SIGQUIT, startupproc_quickdie); /* hard crash time */
! pqsignal(SIGALRM, SIG_IGN);
pqsignal(SIGPIPE, SIG_IGN);
pqsignal(SIGUSR1, SIG_IGN);
pqsignal(SIGUSR2, SIG_IGN);
--- 8759,8770 ----
*/
pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
pqsignal(SIGINT, SIG_IGN); /* ignore query cancel */
! pqsignal(SIGTERM, StartupProcShutdownHandler); /* request shutdown */
! pqsignal(SIGQUIT, startupproc_quickdie); /* hard crash time */
! if (XLogRequestRecoveryConnections)
! pqsignal(SIGALRM, handle_standby_sig_alarm); /* ignored unless InHotStandby */
! else
! pqsignal(SIGALRM, SIG_IGN);
pqsignal(SIGPIPE, SIG_IGN);
pqsignal(SIGUSR1, SIG_IGN);
pqsignal(SIGUSR2, SIG_IGN);
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 44,49 ****
--- 44,50 ----
#include "storage/ipc.h"
#include "storage/proc.h"
#include "storage/smgr.h"
+ #include "storage/standby.h"
#include "utils/rel.h"
#include "utils/resowner.h"
***************
*** 2417,2430 **** LockBufferForCleanup(Buffer buffer)
PinCountWaitBuf = bufHdr;
UnlockBufHdr(bufHdr);
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
/* Wait to be signaled by UnpinBuffer() */
! ProcWaitForSignal();
PinCountWaitBuf = NULL;
/* Loop back and try again */
}
}
/*
* ConditionalLockBufferForCleanup - as above, but don't wait to get the lock
*
* We won't loop, but just check once to see if the pin count is OK. If
--- 2418,2459 ----
PinCountWaitBuf = bufHdr;
UnlockBufHdr(bufHdr);
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
/* Wait to be signaled by UnpinBuffer() */
! if (InHotStandby)
! {
! /* Share the bufid that Startup process waits on */
! SetStartupBufferPinWaitBufId(buffer - 1);
! /* Set alarm and then wait to be signaled by UnpinBuffer() */
! ResolveRecoveryConflictWithBufferPin();
! SetStartupBufferPinWaitBufId(-1);
! }
! else
! ProcWaitForSignal();
!
PinCountWaitBuf = NULL;
/* Loop back and try again */
}
}
/*
+ * Check called from RecoveryConflictInterrupt handler when Startup
+ * process requests cancelation of all pin holders that are blocking it.
+ */
+ bool
+ HoldingBufferPinThatDelaysRecovery(void)
+ {
+ int bufid = GetStartupBufferPinWaitBufId();
+
+ Assert(bufid >= 0);
+
+ if (PrivateRefCount[bufid] > 0)
+ return true;
+
+ return false;
+ }
+
+ /*
* ConditionalLockBufferForCleanup - as above, but don't wait to get the lock
*
* We won't loop, but just check once to see if the pin count is OK. If
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 1620,1634 **** GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
* latestCompletedXid since doing so would be a performance issue during
* normal running, so we check it essentially for free on the standby.
*
! * If dbOid is valid we skip backends attached to other databases. Some
! * callers choose to skipExistingConflicts.
*
* Be careful to *not* pfree the result from this function. We reuse
* this array sufficiently often that we use malloc for the result.
*/
VirtualTransactionId *
! GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid,
! bool skipExistingConflicts)
{
static VirtualTransactionId *vxids;
ProcArrayStruct *arrayP = procArray;
--- 1620,1632 ----
* latestCompletedXid since doing so would be a performance issue during
* normal running, so we check it essentially for free on the standby.
*
! * If dbOid is valid we skip backends attached to other databases.
*
* Be careful to *not* pfree the result from this function. We reuse
* this array sufficiently often that we use malloc for the result.
*/
VirtualTransactionId *
! GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
{
static VirtualTransactionId *vxids;
ProcArrayStruct *arrayP = procArray;
***************
*** 1667,1675 **** GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid,
if (proc->pid == 0)
continue;
- if (skipExistingConflicts && proc->recoveryConflictPending)
- continue;
-
if (!OidIsValid(dbOid) ||
proc->databaseId == dbOid)
{
--- 1665,1670 ----
***************
*** 1826,1832 **** CountDBBackends(Oid databaseid)
* CancelDBBackends --- cancel backends that are using specified database
*/
void
! CancelDBBackends(Oid databaseid)
{
ProcArrayStruct *arrayP = procArray;
int index;
--- 1821,1827 ----
* CancelDBBackends --- cancel backends that are using specified database
*/
void
! CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
{
ProcArrayStruct *arrayP = procArray;
int index;
***************
*** 1839,1851 **** CancelDBBackends(Oid databaseid)
{
volatile PGPROC *proc = arrayP->procs[index];
! if (proc->databaseId == databaseid)
{
VirtualTransactionId procvxid;
GET_VXID_FROM_PGPROC(procvxid, *proc);
! proc->recoveryConflictPending = true;
pid = proc->pid;
if (pid != 0)
{
--- 1834,1846 ----
{
volatile PGPROC *proc = arrayP->procs[index];
! if (databaseid == InvalidOid || proc->databaseId == databaseid)
{
VirtualTransactionId procvxid;
GET_VXID_FROM_PGPROC(procvxid, *proc);
! proc->recoveryConflictPending = conflictPending;
pid = proc->pid;
if (pid != 0)
{
***************
*** 1853,1860 **** CancelDBBackends(Oid databaseid)
* Kill the pid if it's still here. If not, that's what we wanted
* so ignore any errors.
*/
! (void) SendProcSignal(pid, PROCSIG_RECOVERY_CONFLICT_DATABASE,
! procvxid.backendId);
}
}
}
--- 1848,1854 ----
* Kill the pid if it's still here. If not, that's what we wanted
* so ignore any errors.
*/
! (void) SendProcSignal(pid, sigmode, procvxid.backendId);
}
}
}
*** a/src/backend/storage/ipc/procsignal.c
--- b/src/backend/storage/ipc/procsignal.c
***************
*** 272,276 **** procsignal_sigusr1_handler(SIGNAL_ARGS)
--- 272,279 ----
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+ if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
+ RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+
errno = save_errno;
}
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***************
*** 126,135 **** WaitExceedsMaxStandbyDelay(void)
long delay_secs;
int delay_usecs;
- /* max_standby_delay = -1 means wait forever, if necessary */
- if (MaxStandbyDelay < 0)
- return false;
-
/* Are we past max_standby_delay? */
TimestampDifference(GetLatestXLogTime(), GetCurrentTimestamp(),
&delay_secs, &delay_usecs);
--- 126,131 ----
***************
*** 241,248 **** ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid)
VirtualTransactionId *backends;
backends = GetConflictingVirtualXIDs(latestRemovedXid,
! InvalidOid,
! true);
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
--- 237,243 ----
VirtualTransactionId *backends;
backends = GetConflictingVirtualXIDs(latestRemovedXid,
! InvalidOid);
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
***************
*** 273,280 **** ResolveRecoveryConflictWithTablespace(Oid tsid)
* non-transactional.
*/
temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
! InvalidOid,
! false);
ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
}
--- 268,274 ----
* non-transactional.
*/
temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
! InvalidOid);
ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
}
***************
*** 295,301 **** ResolveRecoveryConflictWithDatabase(Oid dbid)
*/
while (CountDBBackends(dbid) > 0)
{
! CancelDBBackends(dbid);
/*
* Wait awhile for them to die so that we avoid flooding an
--- 289,295 ----
*/
while (CountDBBackends(dbid) > 0)
{
! CancelDBBackends(dbid, PROCSIG_RECOVERY_CONFLICT_TABLESPACE, true);
/*
* Wait awhile for them to die so that we avoid flooding an
***************
*** 331,338 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
else
{
backends = GetConflictingVirtualXIDs(InvalidTransactionId,
! InvalidOid,
! true);
report_memory_error = true;
}
--- 325,331 ----
else
{
backends = GetConflictingVirtualXIDs(InvalidTransactionId,
! InvalidOid);
report_memory_error = true;
}
***************
*** 346,351 **** ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
--- 339,451 ----
}
/*
+ * ResolveRecoveryConflictWithBufferPin is called from LockBufferForCleanup()
+ * to resolve conflicts with other backends holding buffer pins.
+ *
+ * We either resolve conflicts immediately or set a SIGALRM to wake us at
+ * the limit of our patience. The sleep in LockBufferForCleanup() is
+ * performed here, for code clarity.
+ *
+ * Resolve conflict by sending a SIGUSR1 reason to all backends to check if
+ * they hold one of the buffer pins that is blocking Startup process. If so,
+ * backends will take an appropriate error action, ERROR or FATAL.
+ *
+ * A secondary purpose of this is to avoid deadlocks that might occur between
+ * the Startup process and lock waiters. Deadlocks occur because if queries
+ * wait on a lock, that must be behind an AccessExclusiveLock, which can only
+ * be clared if the Startup process replays a transaction completion record.
+ * If Startup process is waiting then that is a deadlock. If we allowed a
+ * setting of max_standby_delay that meant "wait forever" we would then need
+ * special code to protect against deadlock. Such deadlocks are rare, so the
+ * code would be almost certainly buggy, so we avoid both long waits and
+ * deadlocks using the same mechanism.
+ */
+ void
+ ResolveRecoveryConflictWithBufferPin(void)
+ {
+ bool sig_alarm_enabled = false;
+
+ Assert(InHotStandby);
+
+ /*
+ * Signal immediately or set alarm for later.
+ */
+ if (MaxStandbyDelay == 0)
+ SendRecoveryConflictWithBufferPin();
+ else
+ {
+ TimestampTz now;
+ long standby_delay_secs; /* How far Startup process is lagging */
+ int standby_delay_usecs;
+
+ now = GetCurrentTimestamp();
+
+ /* Are we past max_standby_delay? */
+ TimestampDifference(GetLatestXLogTime(), now,
+ &standby_delay_secs, &standby_delay_usecs);
+
+ if (standby_delay_secs >= (long) MaxStandbyDelay)
+ SendRecoveryConflictWithBufferPin();
+ else
+ {
+ TimestampTz fin_time; /* Expected wake-up time by timer */
+ long timer_delay_secs; /* Amount of time we set timer for */
+ int timer_delay_usecs = 0;
+
+ /*
+ * How much longer we should wait?
+ */
+ timer_delay_secs = MaxStandbyDelay - standby_delay_secs;
+ if (standby_delay_usecs > 0)
+ {
+ timer_delay_secs -= 1;
+ timer_delay_usecs = 1000000 - standby_delay_usecs;
+ }
+
+ /*
+ * It's possible that the difference is less than a microsecond;
+ * ensure we don't cancel, rather than set, the interrupt.
+ */
+ if (timer_delay_secs == 0 && timer_delay_usecs == 0)
+ timer_delay_usecs = 1;
+
+ /*
+ * When is the finish time? We recheck this if we are woken early.
+ */
+ fin_time = TimestampTzPlusMilliseconds(now,
+ (timer_delay_secs * 1000) +
+ (timer_delay_usecs / 1000));
+
+ if (enable_standby_sig_alarm(timer_delay_secs, timer_delay_usecs, fin_time))
+ sig_alarm_enabled = true;
+ else
+ elog(FATAL, "could not set timer for process wakeup");
+ }
+ }
+
+ /* Wait to be signaled by UnpinBuffer() */
+ ProcWaitForSignal();
+
+ if (sig_alarm_enabled)
+ {
+ if (!disable_standby_sig_alarm())
+ elog(FATAL, "could not disable timer for process wakeup");
+ }
+ }
+
+ void
+ SendRecoveryConflictWithBufferPin(void)
+ {
+ /*
+ * We send signal to all backends to ask them if they are holding
+ * the buffer pin which is delaying the Startup process. We must
+ * not set the conflict flag yet, since most backends will be innocent.
+ * Let the SIGUSR1 handling in each backend decide their own fate.
+ */
+ CancelDBBackends(InvalidOid, PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, false);
+ }
+
+ /*
* -----------------------------------------------------
* Locking in Recovery Mode
* -----------------------------------------------------
*** a/src/backend/storage/lmgr/lock.c
--- b/src/backend/storage/lmgr/lock.c
***************
*** 815,839 **** LockAcquireExtended(const LOCKTAG *locktag,
}
/*
- * In Hot Standby we abort the lock wait if Startup process is waiting
- * since this would result in a deadlock. The deadlock occurs because
- * if we are waiting it must be behind an AccessExclusiveLock, which
- * can only clear when a transaction completion record is replayed.
- * If Startup process is waiting we never will clear that lock, so to
- * wait for it just causes a deadlock.
- */
- if (RecoveryInProgress() && !InRecovery &&
- locktag->locktag_type == LOCKTAG_RELATION)
- {
- LWLockRelease(partitionLock);
- ereport(ERROR,
- (errcode(ERRCODE_T_R_DEADLOCK_DETECTED),
- errmsg("possible deadlock detected"),
- errdetail("process conflicts with recovery - please resubmit query later"),
- errdetail_log("process conflicts with recovery")));
- }
-
- /*
* Set bitmask of locks this process already holds on this object.
*/
MyProc->heldLocks = proclock->holdMask;
--- 815,820 ----
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
***************
*** 73,78 **** NON_EXEC_STATIC PGPROC *AuxiliaryProcs = NULL;
--- 73,79 ----
static LOCALLOCK *lockAwaited = NULL;
/* Mark these volatile because they can be changed by signal handler */
+ static volatile bool standby_timeout_active = false;
static volatile bool statement_timeout_active = false;
static volatile bool deadlock_timeout_active = false;
static volatile DeadLockState deadlock_state = DS_NOT_YET_CHECKED;
***************
*** 89,94 **** static void RemoveProcFromArray(int code, Datum arg);
--- 90,96 ----
static void ProcKill(int code, Datum arg);
static void AuxiliaryProcKill(int code, Datum arg);
static bool CheckStatementTimeout(void);
+ static bool CheckStandbyTimeout(void);
/*
***************
*** 107,112 **** ProcGlobalShmemSize(void)
--- 109,116 ----
size = add_size(size, mul_size(MaxBackends, sizeof(PGPROC)));
/* ProcStructLock */
size = add_size(size, sizeof(slock_t));
+ /* startupBufferPinWaitBufId */
+ size = add_size(size, sizeof(NBuffers));
return size;
}
***************
*** 487,497 **** PublishStartupProcessInformation(void)
--- 491,540 ----
procglobal->startupProc = MyProc;
procglobal->startupProcPid = MyProcPid;
+ procglobal->startupBufferPinWaitBufId = 0;
SpinLockRelease(ProcStructLock);
}
/*
+ * Used from bufgr to share the value of the buffer that Startup waits on,
+ * or to reset the value to "not waiting" (-1). This allows processing
+ * of recovery conflicts for buffer pins.
+ */
+ void
+ SetStartupBufferPinWaitBufId(int bufid)
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ SpinLockAcquire(ProcStructLock);
+
+ procglobal->startupBufferPinWaitBufId = bufid;
+
+ SpinLockRelease(ProcStructLock);
+ }
+
+ /*
+ * Used by backends when they receive a request to check for buffer pin waits.
+ */
+ int
+ GetStartupBufferPinWaitBufId(void)
+ {
+ int bufid;
+
+ /* use volatile pointer to prevent code rearrangement */
+ volatile PROC_HDR *procglobal = ProcGlobal;
+
+ SpinLockAcquire(ProcStructLock);
+
+ bufid = procglobal->startupBufferPinWaitBufId;
+
+ SpinLockRelease(ProcStructLock);
+
+ return bufid;
+ }
+
+ /*
* Check whether there are at least N free PGPROC objects.
*
* Note: this is designed on the assumption that N will generally be small.
***************
*** 1542,1548 **** CheckStatementTimeout(void)
/*
! * Signal handler for SIGALRM
*
* Process deadlock check and/or statement timeout check, as needed.
* To avoid various edge cases, we must be careful to do nothing
--- 1585,1591 ----
/*
! * Signal handler for SIGALRM for normal user backends
*
* Process deadlock check and/or statement timeout check, as needed.
* To avoid various edge cases, we must be careful to do nothing
***************
*** 1565,1567 **** handle_sig_alarm(SIGNAL_ARGS)
--- 1608,1714 ----
errno = save_errno;
}
+
+ /*
+ * Signal handler for SIGALRM in Startup process
+ *
+ * To avoid various edge cases, we must be careful to do nothing
+ * when there is nothing to be done. We also need to be able to
+ * reschedule the timer interrupt if called before end of statement.
+ */
+ bool
+ enable_standby_sig_alarm(long delay_s, int delay_us, TimestampTz fin_time)
+ {
+ struct itimerval timeval;
+
+ Assert(delay_s >= 0 && delay_us >= 0);
+
+ statement_fin_time = fin_time;
+
+ standby_timeout_active = true;
+
+ MemSet(&timeval, 0, sizeof(struct itimerval));
+ timeval.it_value.tv_sec = delay_s;
+ timeval.it_value.tv_usec = delay_us;
+ if (setitimer(ITIMER_REAL, &timeval, NULL))
+ return false;
+ return true;
+ }
+
+ bool
+ disable_standby_sig_alarm(void)
+ {
+ /*
+ * Always disable the interrupt if it is active; this avoids being
+ * interrupted by the signal handler and thereby possibly getting
+ * confused.
+ *
+ * We will re-enable the interrupt if necessary in CheckStandbyTimeout.
+ */
+ if (standby_timeout_active)
+ {
+ struct itimerval timeval;
+
+ MemSet(&timeval, 0, sizeof(struct itimerval));
+ if (setitimer(ITIMER_REAL, &timeval, NULL))
+ {
+ standby_timeout_active = false;
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ /*
+ * CheckStandbyTimeout() runs unconditionally in the Startup process
+ * SIGALRM handler. Timers will only be set when InHotStandby.
+ * We simply ignore any signals unless the timer has been set.
+ */
+ static bool
+ CheckStandbyTimeout(void)
+ {
+ TimestampTz now;
+
+ if (!standby_timeout_active)
+ return true; /* do nothing if not active */
+
+ now = GetCurrentTimestamp();
+
+ if (now >= statement_fin_time)
+ SendRecoveryConflictWithBufferPin();
+ else
+ {
+ /* Not time yet, so (re)schedule the interrupt */
+ long secs;
+ int usecs;
+ struct itimerval timeval;
+
+ TimestampDifference(now, statement_fin_time,
+ &secs, &usecs);
+
+ /*
+ * It's possible that the difference is less than a microsecond;
+ * ensure we don't cancel, rather than set, the interrupt.
+ */
+ if (secs == 0 && usecs == 0)
+ usecs = 1;
+ MemSet(&timeval, 0, sizeof(struct itimerval));
+ timeval.it_value.tv_sec = secs;
+ timeval.it_value.tv_usec = usecs;
+ if (setitimer(ITIMER_REAL, &timeval, NULL))
+ return false;
+ }
+
+ return true;
+ }
+
+ void
+ handle_standby_sig_alarm(SIGNAL_ARGS)
+ {
+ int save_errno = errno;
+
+ (void) CheckStandbyTimeout();
+
+ errno = save_errno;
+ }
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***************
*** 2718,2723 **** RecoveryConflictInterrupt(ProcSignalReason reason)
--- 2718,2735 ----
{
switch (reason)
{
+ case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+ /*
+ * If we aren't blocking the Startup process there is
+ * nothing more to do.
+ */
+ if (!HoldingBufferPinThatDelaysRecovery())
+ return;
+
+ MyProc->recoveryConflictPending = true;
+
+ /* Intentional drop through to error handling */
+
case PROCSIG_RECOVERY_CONFLICT_LOCK:
case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 1383,1389 **** static struct config_int ConfigureNamesInt[] =
NULL
},
&MaxStandbyDelay,
! 30, -1, INT_MAX, NULL, NULL
},
{
--- 1383,1389 ----
NULL
},
&MaxStandbyDelay,
! 30, 0, INT_MAX, NULL, NULL
},
{
*** a/src/include/storage/bufmgr.h
--- b/src/include/storage/bufmgr.h
***************
*** 198,203 **** extern void LockBuffer(Buffer buffer, int mode);
--- 198,204 ----
extern bool ConditionalLockBuffer(Buffer buffer);
extern void LockBufferForCleanup(Buffer buffer);
extern bool ConditionalLockBufferForCleanup(Buffer buffer);
+ extern bool HoldingBufferPinThatDelaysRecovery(void);
extern void AbortBufferIO(void);
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
***************
*** 16,22 ****
#include "storage/lock.h"
#include "storage/pg_sema.h"
!
/*
* Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
--- 16,22 ----
#include "storage/lock.h"
#include "storage/pg_sema.h"
! #include "utils/timestamp.h"
/*
* Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
***************
*** 145,150 **** typedef struct PROC_HDR
--- 145,152 ----
/* The proc of the Startup process, since not in ProcArray */
PGPROC *startupProc;
int startupProcPid;
+ /* Buffer id of the buffer that Startup process waits for pin on */
+ int startupBufferPinWaitBufId;
} PROC_HDR;
/*
***************
*** 177,182 **** extern void InitProcessPhase2(void);
--- 179,186 ----
extern void InitAuxiliaryProcess(void);
extern void PublishStartupProcessInformation(void);
+ extern void SetStartupBufferPinWaitBufId(int bufid);
+ extern int GetStartupBufferPinWaitBufId(void);
extern bool HaveNFreeProcs(int n);
extern void ProcReleaseLocks(bool isCommit);
***************
*** 194,197 **** extern bool enable_sig_alarm(int delayms, bool is_statement_timeout);
--- 198,205 ----
extern bool disable_sig_alarm(bool is_statement_timeout);
extern void handle_sig_alarm(SIGNAL_ARGS);
+ extern bool enable_standby_sig_alarm(long delay_s, int delay_us, TimestampTz fin_time);
+ extern bool disable_standby_sig_alarm(void);
+ extern void handle_standby_sig_alarm(SIGNAL_ARGS);
+
#endif /* PROC_H */
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
***************
*** 57,69 **** extern bool IsBackendPid(int pid);
extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin,
bool excludeXmin0, bool allDbs, int excludeVacuum,
int *nvxids);
! extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin,
! Oid dbOid, bool skipExistingConflicts);
extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode);
extern int CountActiveBackends(void);
extern int CountDBBackends(Oid databaseid);
! extern void CancelDBBackends(Oid databaseid);
extern int CountUserBackends(Oid roleid);
extern bool CountOtherDBBackends(Oid databaseId,
int *nbackends, int *nprepared);
--- 57,68 ----
extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin,
bool excludeXmin0, bool allDbs, int excludeVacuum,
int *nvxids);
! extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid);
extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode);
extern int CountActiveBackends(void);
extern int CountDBBackends(Oid databaseid);
! extern void CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending);
extern int CountUserBackends(Oid roleid);
extern bool CountOtherDBBackends(Oid databaseId,
int *nbackends, int *nprepared);
*** a/src/include/storage/procsignal.h
--- b/src/include/storage/procsignal.h
***************
*** 37,42 **** typedef enum
--- 37,43 ----
PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
PROCSIG_RECOVERY_CONFLICT_LOCK,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
+ PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;
*** a/src/include/storage/standby.h
--- b/src/include/storage/standby.h
***************
*** 19,30 ****
extern int vacuum_defer_cleanup_age;
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
! extern void InitRecoveryTransactionEnvironment(void);
! extern void ShutdownRecoveryTransactionEnvironment(void);
/*
* Standby Rmgr (RM_STANDBY_ID)
--- 19,33 ----
extern int vacuum_defer_cleanup_age;
+ extern void InitRecoveryTransactionEnvironment(void);
+ extern void ShutdownRecoveryTransactionEnvironment(void);
+
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
! extern void ResolveRecoveryConflictWithBufferPin(void);
! extern void SendRecoveryConflictWithBufferPin(void);
/*
* Standby Rmgr (RM_STANDBY_ID)
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers