Re: [HACKERS] Hot Standby remaining issues
Regarding this item from the wiki page: The standby delay is measured as current timestamp - timestamp of last replayed commit record. If there's little activity in the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8 hours. The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long reporting query is started in the standby. Ten minutes later, a small transaction is executed in the master that conflicts with the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting transaction began, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was replayed. * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) Update recoveryLastXTime at checkpoints doesn't help when the master is completely idle, because we skip checkpoints in that case. It's better than nothing, of course. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
On Fri, 2009-12-04 at 10:37 +0200, Heikki Linnakangas wrote: Regarding this item from the wiki page: The standby delay is measured as current timestamp - timestamp of last replayed commit record. If there's little activity in the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8 hours. The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long reporting query is started in the standby. Ten minutes later, a small transaction is executed in the master that conflicts with the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting transaction began, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was replayed. * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) Update recoveryLastXTime at checkpoints doesn't help when the master is completely idle, because we skip checkpoints in that case. It's better than nothing, of course. Not if archive_timeout is set, which it would be in warm standby case. We can do even better than this with SR. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
Simon Riggs wrote: On Fri, 2009-12-04 at 10:37 +0200, Heikki Linnakangas wrote: Regarding this item from the wiki page: The standby delay is measured as current timestamp - timestamp of last replayed commit record. If there's little activity in the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8 hours. The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long reporting query is started in the standby. Ten minutes later, a small transaction is executed in the master that conflicts with the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting transaction began, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was replayed. * Simon says... changed to allow checkpoints to update recoveryLastXTime (Simon DONE) Update recoveryLastXTime at checkpoints doesn't help when the master is completely idle, because we skip checkpoints in that case. It's better than nothing, of course. Not if archive_timeout is set, which it would be in warm standby case. We can do even better than this with SR. If the system is completely idle, and no WAL is written, we skip xlog switches too, even if archive_timeout is set . It would be pointless to create a stream of WAL files with no content except for the XLOG_SWITCH records. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: If the system is completely idle, and no WAL is written, we skip xlog switches too, even if archive_timeout is set . It would be pointless to create a stream of WAL files with no content except for the XLOG_SWITCH records. It's not pointless if you want to monitor that your backup system is healthy. This was previously mentioned a couple years ago: http://archives.postgresql.org/pgsql-general/2007-10/msg01448.php It turns out that it's been working fine under 8.3. Of course, we can always add a crontab job to do some small bogus update to force WAL switches, so it's not the end of the world if we lose the 8.3 behavior; but my preference would be that if a WAL switch interval is specified, the WAL files switch at least that often. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
Heikki Linnakangas wrote: Simon Riggs wrote: @@ -654,10 +656,13 @@ LockAcquire(const LOCKTAG *locktag, elog(PANIC, lock table corrupted); } LWLockRelease(partitionLock); -ereport(ERROR, -(errcode(ERRCODE_OUT_OF_MEMORY), - errmsg(out of shared memory), - errhint(You might need to increase max_locks_per_transaction.))); +if (reportLockTableError) +ereport(ERROR, +(errcode(ERRCODE_OUT_OF_MEMORY), + errmsg(out of shared memory), + errhint(You might need to increase max_locks_per_transaction.))); +else +return LOCKACQUIRE_NOT_AVAIL; } locallock-proclock = proclock; That seems dangerous when dontWait==false. Ah, I see now that you're only setting reportLockTableError just before you call LockAcquire, and reset it afterwards. It's safe then, but it should rather be another argument to the function, as how the global variable is really being used. The patch doesn't actually fix the issue it was supposed to fix. If a read-only transaction holds a lot of locks, consuming so much lock space that there's none left for the startup process to hold the lock it wants, it will abort and bring down postmaster. The patch attempts to kill any conflicting lockers, but those are handled fine already (if there's any conflicting locks, LockAcquire will return LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting locks using up the lock space. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
On Wed, 2009-12-02 at 12:49 +0200, Heikki Linnakangas wrote: If a read-only transaction holds a lot of locks, consuming so much lock space that there's none left for the startup process to hold the lock it wants, it will abort and bring down postmaster. The patch attempts to kill any conflicting lockers, but those are handled fine already (if there's any conflicting locks, LockAcquire will return LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting locks using up the lock space. Oh dear, another nuke 'em all from orbit scenario. Will do. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
Simon Riggs wrote: On Wed, 2009-12-02 at 12:49 +0200, Heikki Linnakangas wrote: If a read-only transaction holds a lot of locks, consuming so much lock space that there's none left for the startup process to hold the lock it wants, it will abort and bring down postmaster. The patch attempts to kill any conflicting lockers, but those are handled fine already (if there's any conflicting locks, LockAcquire will return LOCKACQUIRE_NOT_AVAIL anyway). The problem is with non-conflicting locks using up the lock space. Oh dear, another nuke 'em all from orbit scenario. Will do. Yeah. This case is much like the OOM killer on Linux. Not really nuke 'em all but nuke someone, don't care who.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
On Wed, 2009-12-02 at 16:41 +, Simon Riggs wrote: On Tue, 2009-12-01 at 20:26 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: commit 02c3eadb766201db084b668daa271db4a900adc9 Author: Simon Riggs sri...@ebony.(none) Date: Sat Nov 28 06:23:33 2009 + Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off. Various comments added also. This patch makes it unsafe to start hot standby mode from a shutdown checkpoint, because we don't know if wal_standby_info was enabled in the master. Hmm, what happens if someone enables wal_standby_info in postgresql.conf while the server is shutdown. It would still be a valid starting point in that case. I'll just make a note, I think. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
Simon Riggs wrote: Hmm, what happens if someone enables wal_standby_info in postgresql.conf while the server is shutdown. It would still be a valid starting point in that case. Yeah, true. I'll just make a note, I think. Yeah, a manual (or automatic, if you just wait) checkpoint will produce a new checkpoint record showing that it's safe to start standby again. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
Simon Riggs wrote: commit 02c3eadb766201db084b668daa271db4a900adc9 Author: Simon Riggs sri...@ebony.(none) Date: Sat Nov 28 06:23:33 2009 + Added wal_standby_info GUC to turn RM_STANDBY_ID messages on/off. Various comments added also. This patch makes it unsafe to start hot standby mode from a shutdown checkpoint, because we don't know if wal_standby_info was enabled in the master. I still don't think we need the GUC. But for future-proofing, perhaps we should add a flag to shutdown checkpoint records, indicating whether it's safe to start hot standby from it. That way, if we decide to add a GUC like that at a later stage, we don't need to change the on-disk format. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
Simon Riggs wrote: @@ -654,10 +656,13 @@ LockAcquire(const LOCKTAG *locktag, elog(PANIC, lock table corrupted); } LWLockRelease(partitionLock); - ereport(ERROR, - (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg(out of shared memory), - errhint(You might need to increase max_locks_per_transaction.))); + if (reportLockTableError) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg(out of shared memory), + errhint(You might need to increase max_locks_per_transaction.))); + else + return LOCKACQUIRE_NOT_AVAIL; } locallock-proclock = proclock; That seems dangerous when dontWait==false. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
On Wed, 2009-11-25 at 13:00 +0200, Heikki Linnakangas wrote: I've put up a wiki page with the issues I see with the patch as it stands. They're roughly categorized by seriousness. http://wiki.postgresql.org/wiki/Hot_Standby_TODO New issues can and probably will still pop up, let's add them to the list as they're found so that we know what still needs to be done. You had a list of work items at the hot standby main page, but I believe it's badly out-of-date. Please move any still relevant items to the above list, if any. 4 changes on TODO included here, including all must-fix issues. This is a combined patch. Will push changes to git also, so each commit is visible separately. Lots of wiki comments added. -- Simon Riggs www.2ndQuadrant.com diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 75aca23..5af05fe 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1985,7 +1985,7 @@ if (!triggered) /listitem listitem para - LOCK TABLE, though only when explicitly IN ACCESS SHARE MODE + LOCK TABLE, though only when explicitly in one of these modes: ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE. /para /listitem listitem @@ -2033,7 +2033,7 @@ if (!triggered) listitem para LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE. - LOCK TABLE that explicitly requests a lock other than ACCESS SHARE MODE. + LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE. /para /listitem listitem @@ -2110,10 +2110,10 @@ if (!triggered) para In recovery, transactions will not be permitted to take any table lock - higher than AccessShareLock. In addition, transactions may never assign + higher than RowExclusiveLock. In addition, transactions may never assign a TransactionId and may never write WAL. Any LOCK TABLE command that runs on the standby and requests a specific - lock type other than AccessShareLock will be rejected. + lock mode higher than ROW EXCLUSIVE MODE will be rejected. /para para @@ -2207,15 +2207,16 @@ if (!triggered) para We have a number of choices for resolving query conflicts. The default - is that we wait and hope the query completes. The server will wait automatically until the lag between - primary and standby is at most max_standby_delay seconds. Once that grace - period expires, we take one of the following actions: + is that we wait and hope the query completes. The server will wait + automatically until the lag between primary and standby is at most + max_standby_delay seconds. Once that grace period expires, we take one + of the following actions: itemizedlist listitem para - If the conflict is caused by a lock, we cancel the standby transaction - immediately, even if it is idle-in-transaction. + If the conflict is caused by a lock, we cancel the conflicting standby + transaction immediately, even if it is idle-in-transaction. /para /listitem listitem @@ -2224,7 +2225,9 @@ if (!triggered) that a conflict has occurred and that it must cancel itself to avoid the risk that it silently fails to read relevant data because that data has been removed. (This is very similar to the much feared - error message snapshot too old). + error message snapshot too old). Some cleanup records only cause + conflict with older queries, though some types of cleanup record + affect all queries. /para para @@ -2364,6 +2367,9 @@ LOG: database system is ready to accept read only connections para As a result, you cannot create additional indexes that exist solely on the standby, nor can statistics that exist solely on the standby. + If these administrator commands are needed they should be executed + on the primary so that the changes will propagate through to the + standby. /para para @@ -2373,7 +2379,8 @@ LOG: database system is ready to accept read only connections managed by the Startup process that owns all AccessExclusiveLocks held by transactions being replayed by recovery. pg_stat_activity does not show an entry for the Startup process, nor do recovering transactions - show as active. + show as active. Note that Startup process does not acquire locks to + make database changes and thus no locks are shown in pg_locks. /para para diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c index 364756d..05e1c5e 100644 --- a/src/backend/access/nbtree/nbtpage.c +++ b/src/backend/access/nbtree/nbtpage.c @@ -659,7 +659,10 @@ _bt_page_recyclable(Page page) * order when replaying the effects of a VACUUM, just as we do for the * original VACUUM itself. lastBlockVacuumed allows us to tell whether an * intermediate range of blocks has had no changes at all by VACUUM, - * and so must be scanned anyway during replay. + * and so must be scanned anyway during replay. We always
[HACKERS] Hot Standby remaining issues
I've put up a wiki page with the issues I see with the patch as it stands. They're roughly categorized by seriousness. http://wiki.postgresql.org/wiki/Hot_Standby_TODO New issues can and probably will still pop up, let's add them to the list as they're found so that we know what still needs to be done. You had a list of work items at the hot standby main page, but I believe it's badly out-of-date. Please move any still relevant items to the above list, if any. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby remaining issues
On Wed, 2009-11-25 at 13:00 +0200, Heikki Linnakangas wrote: I've put up a wiki page with the issues I see with the patch as it stands. They're roughly categorized by seriousness. http://wiki.postgresql.org/wiki/Hot_Standby_TODO New issues can and probably will still pop up, let's add them to the list as they're found so that we know what still needs to be done. You had a list of work items at the hot standby main page, but I believe it's badly out-of-date. Please move any still relevant items to the above list, if any. I've linked the two pages together and identified the ones I'm currently working on, plus added a few comments. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers