Re: [HACKERS] Hot standby and b-tree killed items
Heikki Linnakangas wrote: Gregory Stark wrote: A vacuum being replayed -- even in a different database -- could trigger the error. Or with the btree split issue, a data load -- again even in a different database -- would be quite likely cause your SELECT to be killed. Hmm, I wonder if we should/could track the latestRemovedXid separately for each database. There's no reason why we need to kill a read-only query in database X when a table in database Y is vacuumed. What about shared catalogs? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 10:49 +0200, Heikki Linnakangas wrote: Whenever a B-tree index scan fetches a heap tuple that turns out to be dead, the B-tree item is marked as killed by calling _bt_killitems. When the page gets full, all the killed items are removed by calling _bt_vacuum_one_page. That's a problem for hot standby. If any of the killed b-tree items point to a tuple that is still visible to a running read-only transaction, we have the same situation as with vacuum, and have to either wait for the read-only transaction to finish before applying the WAL record or kill the transaction. It looks like there's some cosmetic changes related to that in the patch, the signature of _bt_delitems is modified, but there's no actual changes that would handle that situation. I didn't see it on the TODO on the hot standby wiki either. Am I missing something, or the patch? ResolveRedoVisibilityConflicts() describes the current patch's position on this point, which on review is wrong, I agree. It looks like I assumed that _bt_delitems is only called during VACUUM, which I knew it wasn't. I know I was going to split XLOG_BTREE_VACUUM into two record types at one point, one for delete, one for vacuum. In the end I didn't. Anyhow, its wrong. We have infrastructure in place to make this work correctly, just need to add latestRemovedXid field to xl_btree_vacuum. So that part is easily solved. Thanks for spotting it. More like that please! -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Simon Riggs wrote: We have infrastructure in place to make this work correctly, just need to add latestRemovedXid field to xl_btree_vacuum. So that part is easily solved. That's tricky because there's no xmin/xmax on index tuples. You could conservatively use OldestXmin as latestRemovedXid, but that could stall the WAL redo a lot more than necessary. Or you could store latestRemovedXid in the page header, but that would need to be WAL-logged to ensure that it's valid after crash. Or you could look at the heap to fetch the xmin/xmax, but that would be expensive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 12:24 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: We have infrastructure in place to make this work correctly, just need to add latestRemovedXid field to xl_btree_vacuum. So that part is easily solved. That's tricky because there's no xmin/xmax on index tuples. Doh. You could conservatively use OldestXmin as latestRemovedXid, but that could stall the WAL redo a lot more than necessary. Or you could store latestRemovedXid in the page header, but that would need to be WAL-logged to ensure that it's valid after crash. Or you could look at the heap to fetch the xmin/xmax, but that would be expensive. Agreed. Probably need to use OldestXmin then. If I was going to add anything to the btree page header, it would be latestRemovedLSN, only set during recovery. That way we don't have to explicitly kill queries, we can do the a wait on OldestXmin then let them ERROR out when they find a page that has been modified. I have a suspicion that we may need some modification of that solution for all data blocks, so we don't kill too many queries. Hmmm. I wonder if we can track latestRemovedLSN for all of shared_buffers. That was initially rejected, but if we set the latestRemovedLSN to be the block's LSN when we read it in, that would be fairly useful. Either way we use 8 bytes RAM per buffer. BTW, I noticed the other day that Oracle 11g only allows you to have a read only slave *or* allows you to continue replaying. You need to manually switch back and forth between those modes. They can't do *both*, as Postgres will be able to do. That's because their undo information is stored off-block in the Undo Tablespace, so is not available for standby queries. Nice one, Postgres. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
I'm confused shouldn't read-only transactions on the slave just be hacked to not set any hint bits including lp_delete? -- Greg On 19 Dec 2008, at 03:49, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Whenever a B-tree index scan fetches a heap tuple that turns out to be dead, the B-tree item is marked as killed by calling _bt_killitems. When the page gets full, all the killed items are removed by calling _bt_vacuum_one_page. That's a problem for hot standby. If any of the killed b-tree items point to a tuple that is still visible to a running read-only transaction, we have the same situation as with vacuum, and have to either wait for the read-only transaction to finish before applying the WAL record or kill the transaction. It looks like there's some cosmetic changes related to that in the patch, the signature of _bt_delitems is modified, but there's no actual changes that would handle that situation. I didn't see it on the TODO on the hot standby wiki either. Am I missing something, or the patch? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 09:22 -0500, Greg Stark wrote: I'm confused shouldn't read-only transactions on the slave just be hacked to not set any hint bits including lp_delete? They could be, though I see no value in doing so. But that is not Heikki's point. He is discussing what happens on the primary and the effects that must then occur on the standby. He has rightly pointed out a (pluggable) hole in my logic. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Simon Riggs si...@2ndquadrant.com wrote: If I was going to add anything to the btree page header, it would be latestRemovedLSN, only set during recovery. That way we don't have to explicitly kill queries, we can do the a wait on OldestXmin then let them ERROR out when they find a page that has been modified. I have a suspicion that we may need some modification of that solution for all data blocks, so we don't kill too many queries. If the failure is caused by the timing of various database transactions, and the query is likely to run successfully after a delay and a retry, please use SQLSTATE of '40001'. Some software (ours, for one) will recognize this and retry the query automatically, so that the user impact is essentially the same as blocking. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 11:54 -0600, Kevin Grittner wrote: Simon Riggs si...@2ndquadrant.com wrote: If I was going to add anything to the btree page header, it would be latestRemovedLSN, only set during recovery. That way we don't have to explicitly kill queries, we can do the a wait on OldestXmin then let them ERROR out when they find a page that has been modified. I have a suspicion that we may need some modification of that solution for all data blocks, so we don't kill too many queries. If the failure is caused by the timing of various database transactions, and the query is likely to run successfully after a delay and a retry, please use SQLSTATE of '40001'. Some software (ours, for one) will recognize this and retry the query automatically, so that the user impact is essentially the same as blocking. I understand the need, but we won't be using SQLSTATE = 40001. That corresponds to ERRCODE_T_R_SERIALIZATION_FAILURE, which that error would not be. The error message ought to be snapshot too old, which could raise a chuckle, so I called it something else. The point you raise is a good one and I think we should publish a list of retryable error messages. I contemplated once proposing a special log level for a retryable error, but not quite a good idea. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 10:52 +, Simon Riggs wrote: You could conservatively use OldestXmin as latestRemovedXid, but that could stall the WAL redo a lot more than necessary. Or you could store latestRemovedXid in the page header, but that would need to be WAL-logged to ensure that it's valid after crash. Or you could look at the heap to fetch the xmin/xmax, but that would be expensive. Agreed. Probably need to use OldestXmin then. Just finished coding this up, plus TODO item to pin every index page. Will post after some further testing. Used RecentOldestXmin. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Simon Riggs si...@2ndquadrant.com wrote: I understand the need, but we won't be using SQLSTATE = 40001. That corresponds to ERRCODE_T_R_SERIALIZATION_FAILURE, which that error would not be. Isn't it a problem with serialization of database transactions? You hit it in a different way, but if it is a temporary failure due to the timing of the transactions, I strongly feel that that is the correct SQLSTATE to use. Perhaps more information to provide any useful context could be in the info or hint areas? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Simon Riggs si...@2ndquadrant.com writes: The error message ought to be snapshot too old, which could raise a chuckle, so I called it something else. The point you raise is a good one and I think we should publish a list of retryable error messages. I contemplated once proposing a special log level for a retryable error, but not quite a good idea. I'm a bit concerned about the idea of killing off queries to allow WAL to proceed. While I have nothing against that being an option I think we should be aiming to make it not necessary for correctness and not the default. By default I think WAL replay should stick to stalling WAL replay and only resort to killing queries if the user specifically requests it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 18:59 +, Gregory Stark wrote: Simon Riggs si...@2ndquadrant.com writes: The error message ought to be snapshot too old, which could raise a chuckle, so I called it something else. The point you raise is a good one and I think we should publish a list of retryable error messages. I contemplated once proposing a special log level for a retryable error, but not quite a good idea. I'm a bit concerned about the idea of killing off queries to allow WAL to proceed. While I have nothing against that being an option I think we should be aiming to make it not necessary for correctness and not the default. By default I think WAL replay should stick to stalling WAL replay and only resort to killing queries if the user specifically requests it. Increasing the waiting time increases the failover time and thus decreases the value of the standby as an HA system. Others value high availability higher than you and so we had agreed to provide an option to allow the max waiting time to be set. max_standby_delay is set in recovery.conf, value 0 (forever) - 2,000,000 secs, settable in milliseconds. So think of it like a deadlock detector for recovery apply. Also, there is a set of functions to control the way recovery proceeds, much as you might control an MP3 player (start, stop, pause). There ares also functions to pause at specific xids, pause at specific time, pause at the next cleanup record. That allows you to set the max_standby_delay lower and then freeze the server for longer to run a long query if required. It also allows you to do PITR by trial and error rather than one shot specify-in-advance settings. There is a function to manually end recovery at a useful place if desired. I hope your needs and wishes are catered for by that? (I have a Plan B in case we need it during wider user testing, as explained up thread.) -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Simon Riggs si...@2ndquadrant.com wrote: max_standby_delay is set in recovery.conf, value 0 (forever) - 2,000,000 secs, settable in milliseconds. So think of it like a deadlock detector for recovery apply. Aha! A deadlock is a type of serialization failure. (In fact, on databases with lock-based concurrency control rather than MVCC, it can be the ONLY type of serialization failure.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 13:47 -0600, Kevin Grittner wrote: Simon Riggs si...@2ndquadrant.com wrote: max_standby_delay is set in recovery.conf, value 0 (forever) - 2,000,000 secs, settable in milliseconds. So think of it like a deadlock detector for recovery apply. Aha! A deadlock is a type of serialization failure. (In fact, on databases with lock-based concurrency control rather than MVCC, it can be the ONLY type of serialization failure.) The SQL Standard specifically names this error as thrown when it detects the inability to guarantee the serializability of two or more concurrent SQL-transactions. Now that really should only apply when running with SERIALIZABLE transactions, but I grant you the standard doesn't explicitly say that. You give me the strange sense that you want this because of some quirk in your software, rather than an overwhelming desire to see these two situations described the same. I guess making it that SQLSTATE would make it simpler to understand why the error occurs and also how to handle it (i.e. resubmit). So there probably is a wide argument for making developers jobs a little easier by doing it. i.e. usability will be improved if we do that. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Simon Riggs si...@2ndquadrant.com writes: Increasing the waiting time increases the failover time and thus decreases the value of the standby as an HA system. Others value high availability higher than you and so we had agreed to provide an option to allow the max waiting time to be set. Sure, it's a nice option to have. But I think the default should be to pause WAL replay. The question I had was whether your solution for btree pointers marked dead and later dropped from the index works when the user hasn't configured a timeout and doesn't want standby queries killed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Kevin Grittner kevin.gritt...@wicourts.gov writes: Simon Riggs si...@2ndquadrant.com wrote: max_standby_delay is set in recovery.conf, value 0 (forever) - 2,000,000 secs, settable in milliseconds. So think of it like a deadlock detector for recovery apply. Aha! A deadlock is a type of serialization failure. (In fact, on databases with lock-based concurrency control rather than MVCC, it can be the ONLY type of serialization failure.) I think the fundamental difference is that a deadlock or serialization failure can be predicted as a potential problem when writing the code. This is something that can happen for any query any time, even plain old read-only select queries. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Simon Riggs si...@2ndquadrant.com wrote: The SQL Standard specifically names this error as thrown when it detects the inability to guarantee the serializability of two or more concurrent SQL-transactions. Now that really should only apply when running with SERIALIZABLE transactions, I disagree. Database integrity could not be guaranteed without detection of conflicting modification in READ COMMITTED on up, and this is the normal means of indicating these problems. but I grant you the standard doesn't explicitly say that. I think that's intentional. You give me the strange sense that you want this because of some quirk in your software, rather than an overwhelming desire to see these two situations described the same. Well, we are very unlikely to ever use this feature, so it's not really something I care about for us; it just struck me that there may be others that care about categorizing errors accurately according the the SQL standard, and that what you were describing sounded like a new type of serialization failure in the PostgreSQL environment, and should be classified that way. The primary quirkiness of our software is that it needs to be able to run with a number of different database products, and we do want to take advantage of whatever information is available in a portable format. This is not the only standard SQLSTATE we look for and handle appropriately for the documented meaning, but it is an important one, as it has simplified application programming and reduced the confusing error messages which reach our end users. I guess making it that SQLSTATE would make it simpler to understand why the error occurs and also how to handle it (i.e. resubmit). Precisely. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Kevin Grittner kevin.gritt...@wicourts.gov writes: PostgreSQL is much less prone to serialization failures, but it is certainly understandable if hot standby replication introduces new cases of it. In this case it will be possible to get this error even if you're just running a single SELECT query -- and that's the *only* query in the database at all. A vacuum being replayed -- even in a different database -- could trigger the error. Or with the btree split issue, a data load -- again even in a different database -- would be quite likely cause your SELECT to be killed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Gregory Stark st...@enterprisedb.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: PostgreSQL is much less prone to serialization failures, but it is certainly understandable if hot standby replication introduces new cases of it. In this case it will be possible to get this error even if you're just running a single SELECT query -- and that's the *only* query in the database at all. A vacuum being replayed -- even in a different database -- could trigger the error. Or with the btree split issue, a data load -- again even in a different database -- would be quite likely cause your SELECT to be killed. OK. Does that make serialization failure a bad description of the problem? If these steps are serialized (run one after the other), is there a problem? It just seems that the hot standby near-synchronous replication creates a situation where tasks on multiple, linked databases might need to be serialized. It does seem like it will be important to provide as much information to the user about what's causing the problem, and hints about what to do. PostgreSQL has nice features for that, though. Since I have no vested interest here, I'm not inclined to belabor the point. I was really just trying to make sure the feature was as useful as possible to others, some of whom might be looking for standard SQLSTATE values to help the software take the right course. If others feel the 40001 code would confuse more than enlighten, I'll respect that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Gregory Stark st...@enterprisedb.com wrote: I think the fundamental difference is that a deadlock or serialization failure can be predicted as a potential problem when writing the code. This is something that can happen for any query any time, even plain old read-only select queries. I've heard that on Oracle it is (or at least was) possible to get a serialization failure on a single SELECT statement which was the only user-requested activity on the system, because it could conflict with automatic maintenance operations. In Sybase and Microsoft databases it is definitely possible for a plain old read-only SELECT statement to be a deadlock victim (reported as a serialization failure) if some of the data it is referencing is being updated concurrently. In these (and many other) products, a lock must be acquired before a row can be read. Imagine, the SELECT locks a row against updates, another transaction locks some other row against any access, then the UPDATE tries to change the row locked by the SELECT while the SELECT tries to read the row locked by the UPDATE. PostgreSQL is much less prone to serialization failures, but it is certainly understandable if hot standby replication introduces new cases of it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Friday 19 December 2008 05:52:42 Simon Riggs wrote: BTW, I noticed the other day that Oracle 11g only allows you to have a read only slave *or* allows you to continue replaying. You need to manually switch back and forth between those modes. They can't do *both*, as Postgres will be able to do. That's because their undo information is stored off-block in the Undo Tablespace, so is not available for standby queries. Nice one, Postgres. I think this is true for physical replay, but Oracle also offers the option to do logical replay (where transaction logs are converted into sql and run against the standby; i believe this is similar to what continuant is trying to do with thier latest offering). In that scenario you can do read and replay at the same time, though I think there are some conflicts possible; fewer than what postgres will have, since I think most of thier DDL can be done online. (This might require some extra modules / high end version of Oracle, please consult your local Oracle wizard for more details) -- Robert Treat Conjecture: http://www.xzilla.net Consulting: http://www.omniti.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 20:54 +, Gregory Stark wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: PostgreSQL is much less prone to serialization failures, but it is certainly understandable if hot standby replication introduces new cases of it. In this case it will be possible to get this error even if you're just running a single SELECT query -- and that's the *only* query in the database at all. A vacuum being replayed -- even in a different database -- could trigger the error. Or with the btree split issue, a data load -- again even in a different database -- would be quite likely cause your SELECT to be killed. Quite likely? You're all doomed I say!, his eyes rolling wildly. :-) The standby is an extension of the primary and is quite literally running the same transactions. This only query idea isn't the right way to think about it. It's fairly easily possible to predict it will happen and it will happen only in same database as transactions on the primary. And as we just said, you can control whether and/or after how long this will happen in some detail. Industry context: In the worst case this is as bad as Oracle 11g. In many/most cases it is much better. Perhaps we should listen to the people that have said they don't want queries cancelled, even if the alternative is inconsistent answers. That is easily possible yet is not currently an option. Plus we have the option I referred to up thread, which is to defer query cancel until the query reads a modified data block. I'm OK with implementing either of those, as non-default options. Do we need those options or are we ok? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
On Fri, 2008-12-19 at 19:29 -0500, Robert Treat wrote: On Friday 19 December 2008 05:52:42 Simon Riggs wrote: BTW, I noticed the other day that Oracle 11g only allows you to have a read only slave *or* allows you to continue replaying. You need to manually switch back and forth between those modes. They can't do *both*, as Postgres will be able to do. That's because their undo information is stored off-block in the Undo Tablespace, so is not available for standby queries. Nice one, Postgres. I think this is true for physical replay, but Oracle also offers the option to do logical replay (where transaction logs are converted into sql and run against the standby; i believe this is similar to what continuant is trying to do with thier latest offering). In that scenario you can do read and replay at the same time, though I think there are some conflicts possible; fewer than what postgres will have, since I think most of thier DDL can be done online. That is also an option I have argued that we need, BTW. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Gregory Stark wrote: Simon Riggs si...@2ndquadrant.com writes: Increasing the waiting time increases the failover time and thus decreases the value of the standby as an HA system. Others value high availability higher than you and so we had agreed to provide an option to allow the max waiting time to be set. Sure, it's a nice option to have. But I think the default should be to pause WAL replay. I think I agree that pausing should be the default. If for no other reason, because I can't think of a good default for max_standby_delay. It would be nice to have a setting to specify the max. amount of unapplied WAL before killing queries. When the primary isn't doing much, you might want wait longer before killing queries, and if you're falling behind a lot, you might want to kill queries more aggressively to catch up. I guess that doesn't quite fit the current architecture; you'd need to peek ahead to see how much unapplied WAL there is. The question I had was whether your solution for btree pointers marked dead and later dropped from the index works when the user hasn't configured a timeout and doesn't want standby queries killed. Yes, it's not any different from vacuum WAL records. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby and b-tree killed items
Heikki Linnakangas wrote: Gregory Stark wrote: The question I had was whether your solution for btree pointers marked dead and later dropped from the index works when the user hasn't configured a timeout and doesn't want standby queries killed. Yes, it's not any different from vacuum WAL records. No wait, there is a nasty corner-case. When an index tuple is marked as killed, no WAL record is written. Since there's now WAL record, it won't be killed in the slave yet. But if we take a full-page image of that page later for some other operation, the LP_DEAD flag is included in the full-page image. If the flag sneaks into the slave without an explicit WAL record like that, there's no latestRemovedXid for the slave to wait on. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
Hi, I found that no one is registered as hot standby reviewer. I'd like to review the patch, mainly by testing through some benchmark test. Regards; 2008/11/2 Simon Riggs [EMAIL PROTECTED]: Hot Standby patch, including all major planned features. Allows users to connect to server in archive recovery and run queries. Applies cleanly, passes make check. There's no flaky code, kludges or other last minute rush-for-deadline stuff. It all works, though really for a patch this size and scope I expect many bugs. As a result I am describing this as WIP, though it is much more than a prototype. All the code has been planned out in advance, so there's been very little on-the-fly changes required during final development. I'm fully committed to making all required changes and fixes in reasonable times. I will continue detailed testing over the next few weeks to re-check everything prior to commit. Initially, I would ask people to spend time thinking about this conceptually to check that I have all the correct subsystems and cater for all the little side tweaks that exist in various parts of the server When you test this, please do it on a server built with --enable-cassert and keep a detailed log using trace_recovery_messages = DEBUG4 (or 2-3). Code has been specifically designed to be performance neutral or better for normal workloads, so the WAL volume, number of times we take WAL locks etc should be identical (on 64-bit systems). The patch includes further tuning of subtransaction commits, so I hope the patch may even be a few % win on more complex workloads with many PL/pgSQL functions using EXCEPTION. Enabling the bgwriter during recovery seems likely to be a huge gain on performance for larger shared_buffers settings, which should offset a considerable increase in CPU usage during recovery. Performance test results would be appreciated * for normal running - just to test that it really is neutral * for recovery - if query access not used is it faster than before Points of significant note for detailed reviewers * Prepared transactions not implemented yet. No problems foreseen, but want to wait to see if other refactorings are required. * Touching every block of a btree index during a replay of VACUUM is not yet implemented. Would like some second opinions that it is even required. I have code prototyped for it, but it feels too wacky even after Heikki and I agreed it was required. Do we still think that? * locking correctness around flat file refresh still not enabled * need some discussiona round how to handle max_connections changes cleanly. -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote: Hi, I found that no one is registered as hot standby reviewer. I'd like to review the patch, mainly by testing through some benchmark test. You can yourself edit the Wiki page, though you need to register first. But its very straight forward. I added myself as reviewer to Hot standby few days back since I did some code review and testing. I intend to spend some more time after new patch is posted. Thanks, Pavan Pavan Deolasee. EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Hot Standby (commit fest version - v5)
Thank you for your advise. I'll edit the Wiki page. 2008/12/3 Pavan Deolasee [EMAIL PROTECTED]: On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote: Hi, I found that no one is registered as hot standby reviewer. I'd like to review the patch, mainly by testing through some benchmark test. You can yourself edit the Wiki page, though you need to register first. But its very straight forward. I added myself as reviewer to Hot standby few days back since I did some code review and testing. I intend to spend some more time after new patch is posted. Thanks, Pavan Pavan Deolasee. EnterpriseDB http://www.enterprisedb.com -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Thu, Nov 20, 2008 at 8:15 PM, Pavan Deolasee [EMAIL PROTECTED]wrote: On Thu, Nov 20, 2008 at 7:50 PM, Simon Riggs [EMAIL PROTECTED]wrote: (I assume you mean bgwriter, not archiver process). Yeah, its the bgwriter, IIRC hung while taking checkpoint. Sorry, its the startup process thats stuck in the checkpoint. Here is the stack trace: (gdb) bt #0 0x00110402 in __kernel_vsyscall () #1 0x0095564b in semop () from /lib/libc.so.6 #2 0x0825c703 in PGSemaphoreLock (sema=0xb7c52c7c, interruptOK=0 '\0') at pg_sema.c:420 #3 0x0829ff5e in LWLockAcquire (lockid=WALInsertLock, mode=LW_EXCLUSIVE) at lwlock.c:456 #4 0x080d5c7e in XLogInsert (rmid=0 '\0', info=16 '\020', rdata=0xbfda1798) at xlog.c:746 #5 0x080e2e0f in CreateCheckPoint (flags=6) at xlog.c:6674 #6 0x080e1afd in StartupXLOG () at xlog.c:6077 #7 0x08104f2f in AuxiliaryProcessMain (argc=2, argv=0xbfda19e4) at bootstrap.c:421 #8 0x0826d285 in StartChildProcess (type=StartupProcess) at postmaster.c:4104 #9 0x082690d9 in PostmasterMain (argc=3, argv=0x9c89a60) at postmaster.c:1034 #10 0x081f90ff in main (argc=3, argv=0x9c89a60) at main.c:188 ISTM that the postmaster somehow does not receive (may be because of what I reported in my other mail) PMSIGNAL_RECOVERY_START and hence bgwriter is not started. The startup process then itself tries to take a checkpoint and hangs. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Fri, 2008-11-21 at 17:08 +0530, Pavan Deolasee wrote: Sorry, its the startup process thats stuck in the checkpoint. Here is the stack trace: Already fixed in new version I'm preparing for you. Both the startup process and bgwriter can perform restartpoints, so its not a problem whether we reached the point at which we pmsignal or not. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Thu, 2008-11-20 at 11:51 +0530, Pavan Deolasee wrote: I wonder if we should refactor lazy_scan_heap() so that *all* the real work of collecting information about dead tuples happens only in heap_page_prune(). Frankly, there is only a rare chance that a tuple may become DEAD after the pruning happened on the page. We can ignore such tuples; they will be vacuumed/pruned in the next cycle. This would save us a second check of HeapTupleSatisfiesVacuum on the tuples which are just now checked in heap_page_prune(). In addition, the following additional WAL records are then not necessary because heap_page_prune() must have already logged the latestRemovedXid. I like this idea. I've attempted to plug every gap, but perhaps the best way here is to remove the gap completely. In my testing, I only saw this case happen a couple of times in many tests. Rarely executed code gives sporadic bugs, so I would be happy to remove it and the standby support stuff that goes with it. I would suggest that we just remove the switch statement: switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf)) and alter the following if test since tupgone is also removed. That will cause HEAPTUPLE_DEAD rows to be fed to heap_freeze_tuple(). Comments on that function claim that is a bad thing, but we know that any row that was *not* removed by heap_page_prune() and is now dead must have died very recently and so will never be frozen. Let me know if you're happy with that change and I'll make it so. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Thu, Nov 20, 2008 at 3:38 PM, Simon Riggs [EMAIL PROTECTED] wrote: I would suggest that we just remove the switch statement: switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf)) and alter the following if test since tupgone is also removed. That will cause HEAPTUPLE_DEAD rows to be fed to heap_freeze_tuple(). Comments on that function claim that is a bad thing, but we know that any row that was *not* removed by heap_page_prune() and is now dead must have died very recently and so will never be frozen. Let me know if you're happy with that change and I'll make it so. Yeah, I think we should be safe. We continuously hold EX lock on the buffer since the prune operation is carried out. So the only new DEAD tuples may arrive because some transaction aborted in between, changing INSERT_IN_PROGRESS tuple to DEAD. But these tuples won't pass the cutoff_xid test and should never be frozen. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Thu, 2008-11-20 at 12:03 +0530, Pavan Deolasee wrote: On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED] wrote: Hot Standby patch, including all major planned features. While experimenting with the patch, I noticed that sometimes the archiver process indefinitely waits for WALInsertLock. I haven't spent much time debugging that, but the following chunk clearly seems to be buggy. The WALInsertLock is not released if (leavingArchiveRecovery == true). Mmmm, it seems this is correct. I had to reconstruct this section of code after recent bitrot, so it looks I introduced a bug doing that. What I'm surprised about is that I got a similar hang myself in testing and in my notes I have this ticked as resolved. The fix is trivial, though I'm sorry it was there for you to find at all. (I assume you mean bgwriter, not archiver process). -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Thu, Nov 20, 2008 at 7:50 PM, Simon Riggs [EMAIL PROTECTED] wrote: (I assume you mean bgwriter, not archiver process). Yeah, its the bgwriter, IIRC hung while taking checkpoint. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Hot Standby (commit fest version - v5)
Pavan Deolasee [EMAIL PROTECTED] writes: I wonder if we should refactor lazy_scan_heap() so that *all* the real work of collecting information about dead tuples happens only in heap_page_prune(). Frankly, there is only a rare chance that a tuple may become DEAD after the pruning happened on the page. We can ignore such tuples; they will be vacuumed/pruned in the next cycle. This would save us a second check of HeapTupleSatisfiesVacuum on the tuples which are just now checked in heap_page_prune(). In addition, the following additional WAL records are then not necessary because heap_page_prune() must have already logged the latestRemovedXid. I don't think you can do that. Couldn't someone else have run heap_page_prune between vacuum's first and second visit to the page? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Thu, 2008-11-20 at 10:33 -0500, Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: I wonder if we should refactor lazy_scan_heap() so that *all* the real work of collecting information about dead tuples happens only in heap_page_prune(). Frankly, there is only a rare chance that a tuple may become DEAD after the pruning happened on the page. We can ignore such tuples; they will be vacuumed/pruned in the next cycle. This would save us a second check of HeapTupleSatisfiesVacuum on the tuples which are just now checked in heap_page_prune(). In addition, the following additional WAL records are then not necessary because heap_page_prune() must have already logged the latestRemovedXid. I don't think you can do that. Couldn't someone else have run heap_page_prune between vacuum's first and second visit to the page? I just looked at that in more detail and decided it was more difficult than it first appeared. So I've left it for now. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Thu, Nov 20, 2008 at 9:03 PM, Tom Lane [EMAIL PROTECTED] wrote: I don't think you can do that. Couldn't someone else have run heap_page_prune between vacuum's first and second visit to the page? You mean the second visit in the first pass where we again check for HeapTupleSatisfiesVacuum ? We hold exclusive lock continuously in the first pass. So its not possible for someone else to call heap_page_prune. If its the second visit in the second heap scan, then it removes only the dead tuples recorded in the first pass. So we should be good there too. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED] wrote: Hot Standby patch, including all major planned features. I wonder if we should refactor lazy_scan_heap() so that *all* the real work of collecting information about dead tuples happens only in heap_page_prune(). Frankly, there is only a rare chance that a tuple may become DEAD after the pruning happened on the page. We can ignore such tuples; they will be vacuumed/pruned in the next cycle. This would save us a second check of HeapTupleSatisfiesVacuum on the tuples which are just now checked in heap_page_prune(). In addition, the following additional WAL records are then not necessary because heap_page_prune() must have already logged the latestRemovedXid. + /* + * For Hot Standby we need to know the highest transaction id that will + * be removed by any change. VACUUM proceeds in a number of passes so + * we need to consider how each pass operates. The first pass runs + * heap_page_prune(), which can issue XLOG_HEAP2_CLEAN records as it + * progresses - these will have a latestRemovedXid on each record. + * In many cases this removes all of the tuples to be removed. + * Then we look at tuples to be removed, but do not actually remove them + * until phase three. However, index records for those rows are removed + * in phase two and index blocks do not have MVCC information attached. + * So before we can allow removal of *any* index tuples we need to issue + * a WAL record indicating what the latestRemovedXid will be at the end + * of phase three. This then allows Hot Standby queries to block at the + * correct place, i.e. before phase two, rather than during phase three + * as we issue more XLOG_HEAP2_CLEAN records. If we need to run multiple + * phase two/three because of memory constraints we need to issue multiple + * log records also. + */ + static void + vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats) + { + /* +* No need to log changes for temp tables, they do not contain +* data visible on the standby server. +*/ + if (rel-rd_istemp) + return; + + (void) log_heap_cleanup_info(rel-rd_node, vacrelstats-latestRemovedXid); + } Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED] wrote: Hot Standby patch, including all major planned features. While experimenting with the patch, I noticed that sometimes the archiver process indefinitely waits for WALInsertLock. I haven't spent much time debugging that, but the following chunk clearly seems to be buggy. The WALInsertLock is not released if (leavingArchiveRecovery == true). --- 6565,6592 } } ! if (leavingArchiveRecovery) ! checkPoint.redo = GetRedoLocationForArchiveCheckpoint(); ! else { ! /* !* Compute new REDO record ptr = location of next XLOG record. !* !* NB: this is NOT necessarily where the checkpoint record itself will be, !* since other backends may insert more XLOG records while we're off doing !* the buffer flush work. Those XLOG records are logically after the !* checkpoint, even though physically before it. Got that? !*/ ! checkPoint.redo = GetRedoLocationForCheckpoint(); ! /* !* Now we can release WAL insert lock, allowing other xacts to proceed !* while we are flushing disk buffers. !*/ ! LWLockRelease(WALInsertLock); } /* * If enabled, log checkpoint start. We postpone this until now so as not * to log anything if we decided to skip the checkpoint. */ Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Hot standby v5 patch assertion failure
Simon Riggs wrote: On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote: Trying out a few different scenarios I ran across this: 1/ Setup master and replica with replica using pg_standby 2/ Create a new database (bench in my case) 3/ Initialize pgbench schema size 100 4/ Run with 2 clients and 1 transactions 5/ Replica gets assertion failure I've been unable to reproduce this error in more than 2 days of bashing. The bash test I use is a pgbench variant designed to cause write contention, while at the same time running reads against those same blocks on standby, plus running parallel installcheck. I suspect now there was a problem in ProcArrayClearUnobservedXids(), so I clear the array each time now, whether or not we are in assert mode. i.e. better hygiene around reused data structures. So I *haven't* reworked my earlier code, just checked it all again. So, new patch enclosed. This fixes everything reported so far, plus another 2 bugs I found and fixed during re-test. Patching with v5d, I can no longer reproduce this either. Excellent! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby v5 patch assertion failure
On Mon, 2008-11-03 at 06:41 +, Simon Riggs wrote: On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote: Trying out a few different scenarios I ran across this: CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 DEBUG: start recovery xid = 7002 lsn = 0/6F012EE4 CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 TRAP: FailedAssertion(!(!((UnobservedXids[index]) != ((TransactionId) 0))), File: procarray.c, Line: 2037) OK, thanks Mark. I'll start looking at it now. It's nice to know the exact line something fails on. I'd instrumented the whole of the UnobservedXids code to trap failures. I've had a couple of errors in that already during development. But what to do about it? I'm thinking the best way to handle this is just to simplify this part of the code some, rather than continue tweaking it. The code attempts to optimise the removal of UnobservedXids, but that feels now like a premature optimisation. So I can probably drop ~100 lines of code. I'm now adding the btree logic also, as well as updating the patch to current head. So I'll return with an updated patch as soon as all that's done and I've run a complete re-test. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby v5 patch assertion failure
On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote: Trying out a few different scenarios I ran across this: CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 DEBUG: start recovery xid = 7002 lsn = 0/6F012EE4 CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 TRAP: FailedAssertion(!(!((UnobservedXids[index]) != ((TransactionId) 0))), File: procarray.c, Line: 2037) OK, thanks Mark. I'll start looking at it now. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
On Thu, 2008-10-30 at 10:13 +, Simon Riggs wrote: On Tue, 2008-10-21 at 15:06 +0100, Simon Riggs wrote: We can't augment the commit/abort messages because we must cater for non-transactional invalidations also, plus commit xlrecs are already complex enough. So we log invalidations prior to commit, queue them and then trigger the send at commit (if it happens). Augmenting the commit messages seems like the better approach. It allows invalidation messages to be fired as they are read off the xlrec. Still need the additional message type to handle nontransactional invalidation. There are other messages possibly more complex than this already. Just a quick note to say that this approach has worked fine and I now have both cache invalidation and locking working correctly. Rather than submit something now in an unseemly rush I'll tidy up and add it onto the list tomorrow after some tidy up. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
On Tue, 2008-10-21 at 15:06 +0100, Simon Riggs wrote: We can't augment the commit/abort messages because we must cater for non-transactional invalidations also, plus commit xlrecs are already complex enough. So we log invalidations prior to commit, queue them and then trigger the send at commit (if it happens). Augmenting the commit messages seems like the better approach. It allows invalidation messages to be fired as they are read off the xlrec. Still need the additional message type to handle nontransactional invalidation. There are other messages possibly more complex than this already. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
Simon Riggs [EMAIL PROTECTED] writes: We can't augment the commit/abort messages because we must cater for non-transactional invalidations also, plus commit xlrecs are already complex enough. So we log invalidations prior to commit, queue them and then trigger the send at commit (if it happens). Augmenting the commit messages seems like the better approach. It allows invalidation messages to be fired as they are read off the xlrec. Still need the additional message type to handle nontransactional invalidation. There are other messages possibly more complex than this already. I guess I hadn't been paying attention, but: adding syscache inval traffic to WAL seems like a completely horrid idea, both from the complexity and performance standpoints. What about using the existing syscache logic to re-derive inval information from watching the update operations? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: We can't augment the commit/abort messages because we must cater for non-transactional invalidations also, plus commit xlrecs are already complex enough. So we log invalidations prior to commit, queue them and then trigger the send at commit (if it happens). Augmenting the commit messages seems like the better approach. It allows invalidation messages to be fired as they are read off the xlrec. Still need the additional message type to handle nontransactional invalidation. There are other messages possibly more complex than this already. I guess I hadn't been paying attention, but: adding syscache inval traffic to WAL seems like a completely horrid idea, both from the complexity and performance standpoints. Well, it's coming out fairly simple actually. Can you explain where you think the performance loss is? My expectation is less than a 0.1% WAL volume overhead for typical systems. My comment this morning was to say I've managed to augment the commit record, so we're not even sending many additional messages. It also makes much of the Hot Standby patch fairly simple, even if it is large. Write something to WAL, act on it on the other side. I've paid very close attention to minimising the effects on both the number of lock acquisitions and total WAL volume, but having said that I expect there to be many tuning opportunities. What about using the existing syscache logic to re-derive inval information from watching the update operations? That does sound possible, but it makes some big assumptions about transactional machinery being in place. It ain't. Subtransactions make everything about 5 times more difficult, so it seems pretty scary to me. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: Caches and Locks
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote: What about using the existing syscache logic to re-derive inval information from watching the update operations? That does sound possible, but it makes some big assumptions about transactional machinery being in place. It ain't. Subtransactions make everything about 5 times more difficult, so it seems pretty scary to me. Um. Yeah, subtransactions would be a PITA. Never mind that then ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Monday 20 October 2008 05:25:29 Simon Riggs wrote: I'm looking to implement the following functions for Hot Standby, to allow those with administrative tools or management applications to have more control during recovery. Please let me know if other functions are required. What else do we need? Is it possible to give the master/slave knowledge about each other? -- Robert Treat Conjecture: http://www.xzilla.net Consulting: http://www.omniti.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Mon, 2008-10-27 at 11:42 -0400, Robert Treat wrote: On Monday 20 October 2008 05:25:29 Simon Riggs wrote: I'm looking to implement the following functions for Hot Standby, to allow those with administrative tools or management applications to have more control during recovery. Please let me know if other functions are required. What else do we need? Is it possible to give the master/slave knowledge about each other? Yes, but for what reason? The project I'm working on is Hot Standby, not streaming replication. That will link things together better than they are now, so I'd probably rather not prejudge/duplicate that. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
Simon Riggs wrote: On Mon, 2008-10-27 at 11:42 -0400, Robert Treat wrote: On Monday 20 October 2008 05:25:29 Simon Riggs wrote: I'm looking to implement the following functions for Hot Standby, to allow those with administrative tools or management applications to have more control during recovery. Please let me know if other functions are required. What else do we need? Is it possible to give the master/slave knowledge about each other? Yes, but for what reason? The project I'm working on is Hot Standby, not streaming replication. That will link things together better than they are now, so I'd probably rather not prejudge/duplicate that. I think this could make sense in the case of a network partition (split brain). If we need to perform a reconciliation after a partition we will need to know when a slave has switched to master and what transactions have been processed on each side. Another use case is probably to implement failback once the former master has been restarted to just send the diff since failover happened. Actually to complement pg_last_recovered_xact_xid(), pg_last_completed_xact_xid() or pg_last_recovered_xact_timestamp(), I would like to have something like pg_xact_xid_status(txid) that would return something of a xid_status type that can be completed, recovered or not_found with a timestamp that would only be meaningful if the status is recovered or completed. Regards, Emmanuel -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development Consulting -- Web: http://www.frogthinker.org email: [EMAIL PROTECTED] Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Mon, 2008-10-27 at 13:08 -0400, Robert Treat wrote: Was thinking that admin tools that show hot standby information might also want to show the corresponding slave information (from the point of view of the master). Well, the standby might be persuaded to know something about the master, but not the other way around. The master:standby relationship is 1:Many, and not restricted in the way things are daisy-chained. So it's more than just a pair of servers. All the slaves will say they are the same if you ask them, so you can't rely on that to identify them. So you need to specify which two servers you're interested in comparing and how to identify them externally. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Monday 27 October 2008 12:12:18 Simon Riggs wrote: On Mon, 2008-10-27 at 11:42 -0400, Robert Treat wrote: On Monday 20 October 2008 05:25:29 Simon Riggs wrote: I'm looking to implement the following functions for Hot Standby, to allow those with administrative tools or management applications to have more control during recovery. Please let me know if other functions are required. What else do we need? Is it possible to give the master/slave knowledge about each other? Yes, but for what reason? Was thinking that admin tools that show hot standby information might also want to show the corresponding slave information (from the point of view of the master). It might also allow tools to not have to be configured for all servers... ie connect to one and lookup the other. The project I'm working on is Hot Standby, not streaming replication. That will link things together better than they are now, so I'd probably rather not prejudge/duplicate that. It's possible this type of information isn't appropriate for our Hot Standby implementation, but it is somewhat common in asynchronous and/or master/slave systems. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Mon, 2008-10-20 at 10:25 +0100, Simon Riggs wrote: What else do we need? * pg_freeze_recovery() * pg_unfreeze_recovery() Two more functions pg_freeze_recovery_cleanup() pg_unfreeze_recovery_cleanup() These would allow recovery to continue normally, except for row removal operations which would halt the progress of recovery. It would eventually be possible to have a function that halts recovery whenever row removal takes place for a list of tables. Not planning on implementing that initially though. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
* pg_last_recovered_xact_xid() Will throw an ERROR if *not* executed in recovery mode. returns bigint * pg_last_completed_xact_xid() Will throw an ERROR *if* executed in recovery mode. returns bigint Should these return xid? And shouldn't these two be folded together ? It seems most usages of this xid(/lsn?) will be agnostic to the recovery mode. Or if not, it seems more convenient to have a function that returns both recovery mode and xid, no ? Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Tue, 2008-10-21 at 09:44 +0200, Zeugswetter Andreas OSB sIT wrote: * pg_last_recovered_xact_xid() Will throw an ERROR if *not* executed in recovery mode. returns bigint * pg_last_completed_xact_xid() Will throw an ERROR *if* executed in recovery mode. returns bigint Should these return xid? And shouldn't these two be folded together ? It seems most usages of this xid(/lsn?) will be agnostic to the recovery mode. Or if not, it seems more convenient to have a function that returns both recovery mode and xid, no ? You are right that it would be better to have a single function. Functions that return multiple values are a pain to use and develop, plus we can always run the other function if we are in doubt. txid_last_completed() returns bigint (txid) seems better. I am more than happy to add an id version as well, if anybody sees the need for that. Just say. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
* pg_last_recovered_xact_xid() Will throw an ERROR if *not* executed in recovery mode. returns bigint * pg_last_completed_xact_xid() Will throw an ERROR *if* executed in recovery mode. returns bigint Should these return xid? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Mon, 2008-10-20 at 16:22 -0400, Robert Haas wrote: * pg_last_recovered_xact_xid() Will throw an ERROR if *not* executed in recovery mode. returns bigint * pg_last_completed_xact_xid() Will throw an ERROR *if* executed in recovery mode. returns bigint Should these return xid? Perhaps, but they match txid_current() which returns bigint. http://developer.postgresql.org/pgdocs/postgres/functions-info.html Thanks for checking. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: First integrated patch
On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote: On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote: First integrated patch for Hot Standby, allowing queries to be executed while in recovery mode. The patch tests successfully with the enclosed files: * primary_setup_test.sql - run it on primary node * standby_allowed.sql - run on standby - should all succeed * standby_disallowed.sql - run on standby - should all fail plus other manual testing. This is still WIP - its good enough to release for comments, though I am not yet confident enough to claim it bug free. What this doesn't do YET: * cope fully with subxid cache overflows (some parts still to add) * cope with prepared transactions on master * work correctly when running queries AND replaying WAL * work correctly with regard to AccessExclusiveLocks, which should prevent access to tables These last four points are what I'm working on over the next two weeks, plus any other holes people point out along the way. I have worked out designs for most of these aspects and will discuss them on -hackers, though most design notes are in the Wiki. I'm still looking into prepared transactions. Comments appreciated. It appears to be working, at least in some fashion. The supplied tests all pass. Cool Thanks for testing so far. At first glance it seems like I have to force changes to the standby with pg_switch_xlog(). hmm. You'll have to explain some more. Normally files don't get sent until they are full, so yes, you would need to do a pg_switch_xlog(). This is not streaming replication. Others are working on that. This probably isn't right: postgres=# \d No relations found. postgres=# select count(*) from foo; count - 100 (1 row) I created a table, pg_switch_xlog, query several times,i dropped a table, pg_switch_xlog, table is 'gone', but still returns data exit/enter session, now its gone. Sometimes I have to exit/enter session to get an up to date standby. These are just first impressions... Replaying and queries don't mix yet, so that is expected. I'm working on this in phases. This patch is phase 1 - it is not the finished patch. Phase 2: working on correct block locking to allow concurrent DML changes to occcur while we run queries. Phase 3: working on correct relation locking/relcache to allow concurrent DDL changes to occur while we run queries. I have designs of the above and expect to complete in next two weeks. The reason for the above behaviour is that DDL changes need to fire relcache invalidation messages so that the query backend sees the change. The reason the table is still there is because the files haven't been dropped yet. So everything you have seen is expected, by me. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: First integrated patch
On Sat, Oct 18, 2008 at 4:11 AM, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote: On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote: First integrated patch for Hot Standby, allowing queries to be executed while in recovery mode. The patch tests successfully with the enclosed files: * primary_setup_test.sql - run it on primary node * standby_allowed.sql - run on standby - should all succeed * standby_disallowed.sql - run on standby - should all fail plus other manual testing. This is still WIP - its good enough to release for comments, though I am not yet confident enough to claim it bug free. What this doesn't do YET: * cope fully with subxid cache overflows (some parts still to add) * cope with prepared transactions on master * work correctly when running queries AND replaying WAL * work correctly with regard to AccessExclusiveLocks, which should prevent access to tables These last four points are what I'm working on over the next two weeks, plus any other holes people point out along the way. I have worked out designs for most of these aspects and will discuss them on -hackers, though most design notes are in the Wiki. I'm still looking into prepared transactions. Comments appreciated. It appears to be working, at least in some fashion. The supplied tests all pass. Cool Thanks for testing so far. At first glance it seems like I have to force changes to the standby with pg_switch_xlog(). hmm. You'll have to explain some more. Normally files don't get sent until they are full, so yes, you would need to do a pg_switch_xlog(). This is not streaming replication. Others are working on that. right...this was expected. less the missing parts, things are working very well. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: First integrated patch
On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote: First integrated patch for Hot Standby, allowing queries to be executed while in recovery mode. The patch tests successfully with the enclosed files: * primary_setup_test.sql - run it on primary node * standby_allowed.sql - run on standby - should all succeed * standby_disallowed.sql - run on standby - should all fail plus other manual testing. This is still WIP - its good enough to release for comments, though I am not yet confident enough to claim it bug free. What this doesn't do YET: * cope fully with subxid cache overflows (some parts still to add) * cope with prepared transactions on master * work correctly when running queries AND replaying WAL * work correctly with regard to AccessExclusiveLocks, which should prevent access to tables These last four points are what I'm working on over the next two weeks, plus any other holes people point out along the way. I have worked out designs for most of these aspects and will discuss them on -hackers, though most design notes are in the Wiki. I'm still looking into prepared transactions. Comments appreciated. It appears to be working, at least in some fashion. The supplied tests all pass. At first glance it seems like I have to force changes to the standby with pg_switch_xlog(). hmm. This probably isn't right: postgres=# \d No relations found. postgres=# select count(*) from foo; count - 100 (1 row) I created a table, pg_switch_xlog, query several times,i dropped a table, pg_switch_xlog, table is 'gone', but still returns data exit/enter session, now its gone. Sometimes I have to exit/enter session to get an up to date standby. These are just first impressions... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby Design
On Wed, 2008-09-24 at 00:30 -0400, Robert Treat wrote: here are some scattered thoughts i had while reading it : Thanks for your comments. It is very detailed, so further feedback is going to be very beneficial in making this all work in the way we hope and expect. * However, some WAL redo actions will be for DDL actions. These DDL actions are repeating actions that have already committed on the primary node, so they must not fail on the standby node. These DDL locks take priority and will automatically cancel any read-only transactions that get in their way. Some people will want the option to stack-up ddl transactions behind long-running queries (one of the main use cases of a hot slave is reporting stye and other long running queries) Scheduling tools can help here. Run set of queries on Standby, then when complete run DDL on Primary and have its changes filter through. * Actions not allowed on Standby are: DML - Insert, Update, Delete, COPY FROM, Truncate copy from suprised me a bit, since it is something i could see people wanting to do... did you mean COPY TO in this case? I checked... COPY TO is allowed, since it extracts data. So pg_dump will run. COPY FROM will not be allowed since it loads data. So pg_restore will not run. * Statspack functions should work OK, so tools such as pgAdminIII should work. pgAgent will not. I presume this means the backend kill function would work? Yes it will. Also, can you go into why pgAgent would not work? (I presume it's because you can't update information that needs to be changed when jobs run, if thats the case it might be worth thinking about making pgAgent work across different clusters) Yes * Looking for suggestions about what monitoring capability will be required. one place to start might be to think about which checks in check_nagios might still apply. Possibly also looking to systems like slony for some guidence... for example check_pgsql will work, but its very simple. check_postgres will also work, though many some actions could give different or confusing results. e.g. last vacuum time will not be maintained for example, since no vacuum occurs on the standby. slony won't run on the standby either, so those checks won't work either. everyone will want some way to check how far the lag is on a stnadby machine. Agreed * The following commands will not be accepted during recovery mode GRANT, REVOKE, REASSIGN How is user management done on a standby? can you have users that dont exist on the primary (it would seem not). No you can't have different users. (In time, I see this as a good thing because it will allow us to move queries around to different nodes for execution so that a single database acts like a hive mind.) Nearly everything must be set via the Primary. Users, passwords. Changes to .conf files will be possible. There is no mechanism to auto synchronise the .conf files between nodes. The Standby is a Clone and not a Slave. A Slave is a separate database that is forced to duplicate the actions of the Master. The Standby is an exact copy, in every detail that matters. I can see it might be desirable to have it work that way, but that's not going to happen in the first release. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby Design
On Wed, Sep 24, 2008 at 5:30 AM, Robert Treat [EMAIL PROTECTED] wrote: I presume this means the backend kill function would work? Also, can you go into why pgAgent would not work? (I presume it's because you can't update information that needs to be changed when jobs run, if thats the case it might be worth thinking about making pgAgent work across different clusters) Probably - it needs to be able to update its catalogs to record job results and ensure that only a single node runs any given job instance. We could probably update the code to optionally accept a connection string instead of a database name, which would allow us to operate against servers other than the one with the catalogs installed. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby Design
On Wed, 2008-09-24 at 08:28 +0100, Dave Page wrote: On Wed, Sep 24, 2008 at 5:30 AM, Robert Treat [EMAIL PROTECTED] wrote: I presume this means the backend kill function would work? Also, can you go into why pgAgent would not work? (I presume it's because you can't update information that needs to be changed when jobs run, if thats the case it might be worth thinking about making pgAgent work across different clusters) Probably - it needs to be able to update its catalogs to record job results and ensure that only a single node runs any given job instance. We could probably update the code to optionally accept a connection string instead of a database name, which would allow us to operate against servers other than the one with the catalogs installed. It would be useful to be able to schedule jobs against multiple nodes. Not much need for maintenance actions on a Standby node, but I think if I say there aren't any needed at all that would likely be wrong. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby Design
On Wednesday 24 September 2008 03:27:44 Simon Riggs wrote: On Wed, 2008-09-24 at 00:30 -0400, Robert Treat wrote: here are some scattered thoughts i had while reading it : Thanks for your comments. It is very detailed, so further feedback is going to be very beneficial in making this all work in the way we hope and expect. * However, some WAL redo actions will be for DDL actions. These DDL actions are repeating actions that have already committed on the primary node, so they must not fail on the standby node. These DDL locks take priority and will automatically cancel any read-only transactions that get in their way. Some people will want the option to stack-up ddl transactions behind long-running queries (one of the main use cases of a hot slave is reporting stye and other long running queries) Scheduling tools can help here. Run set of queries on Standby, then when complete run DDL on Primary and have its changes filter through. true... i am just reminded of Oracle 8's log replay, where you had to stop replay to run any queries... given that was a usefull feature, I suspect we'll hear complaints about it not going that way. I think one could argue that we might be able to provide such an option in the future, if not in the first release. * Actions not allowed on Standby are: DML - Insert, Update, Delete, COPY FROM, Truncate copy from suprised me a bit, since it is something i could see people wanting to do... did you mean COPY TO in this case? I checked... COPY TO is allowed, since it extracts data. So pg_dump will run. COPY FROM will not be allowed since it loads data. So pg_restore will not run. ah, of course... the late hour had me thinking backwards. pg_dump on the clone will be a big plus. snip * Looking for suggestions about what monitoring capability will be required. one place to start might be to think about which checks in check_nagios might still apply. Possibly also looking to systems like slony for some guidence... for example check_pgsql will work, but its very simple. check_postgres will also work, though many some actions could give different or confusing results. e.g. last vacuum time will not be maintained for example, since no vacuum occurs on the standby. yep. slony won't run on the standby either, so those checks won't work either. everyone will want some way to check how far the lag is on a stnadby machine. Agreed right... my thought with slony was, what do people monitor on thier slony slaves? (there is actually a nagios script for that iirc) snip The Standby is a Clone and not a Slave. A Slave is a separate database that is forced to duplicate the actions of the Master. The Standby is an exact copy, in every detail that matters. This is an interesting clarification. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby Design
On Wed, 2008-09-24 at 12:35 -0400, Robert Treat wrote: On Wednesday 24 September 2008 03:27:44 Simon Riggs wrote: On Wed, 2008-09-24 at 00:30 -0400, Robert Treat wrote: * However, some WAL redo actions will be for DDL actions. These DDL actions are repeating actions that have already committed on the primary node, so they must not fail on the standby node. These DDL locks take priority and will automatically cancel any read-only transactions that get in their way. Some people will want the option to stack-up ddl transactions behind long-running queries (one of the main use cases of a hot slave is reporting stye and other long running queries) Scheduling tools can help here. Run set of queries on Standby, then when complete run DDL on Primary and have its changes filter through. true... i am just reminded of Oracle 8's log replay, where you had to stop replay to run any queries... given that was a usefull feature, I suspect we'll hear complaints about it not going that way. I think one could argue that we might be able to provide such an option in the future, if not in the first release. It probably sounds worse than it is. If you drop a table on the Primary, then somebody running a query against it on the Standby is in for a shock. But on the other hand, why are you dropping a table that people still consider worth reading? DROP TABLE should be carefully researched before use, so I don't think its a big problem. We could make it wait for a while before cancelling, as an option, if you think its important? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby Design
On Tuesday 23 September 2008 14:15:34 Simon Riggs wrote: Hot Standby design has been growing on the PostgreSQL Wiki for some weeks now. I've updated the design to reflect all feedback received so far on various topics. http://wiki.postgresql.org/wiki/Hot_Standby It's not hugely detailed in every area, but it gives a flavour of the topics and issues related to them. Comments or questions welcome here, or I will discuss specific areas in more detail as I tackle those topics. very nice work. very in depth. unfortunatly, this means it is long and the hour is late... so here are some scattered thoughts i had while reading it : * However, some WAL redo actions will be for DDL actions. These DDL actions are repeating actions that have already committed on the primary node, so they must not fail on the standby node. These DDL locks take priority and will automatically cancel any read-only transactions that get in their way. Some people will want the option to stack-up ddl transactions behind long-running queries (one of the main use cases of a hot slave is reporting stye and other long running queries) * Actions not allowed on Standby are: DML - Insert, Update, Delete, COPY FROM, Truncate copy from suprised me a bit, since it is something i could see people wanting to do... did you mean COPY TO in this case? * Statspack functions should work OK, so tools such as pgAdminIII should work. pgAgent will not. I presume this means the backend kill function would work? Also, can you go into why pgAgent would not work? (I presume it's because you can't update information that needs to be changed when jobs run, if thats the case it might be worth thinking about making pgAgent work across different clusters) * Looking for suggestions about what monitoring capability will be required. one place to start might be to think about which checks in check_nagios might still apply. Possibly also looking to systems like slony for some guidence... for example everyone will want some way to check how far the lag is on a stnadby machine. * The following commands will not be accepted during recovery mode GRANT, REVOKE, REASSIGN How is user management done on a standby? can you have users that dont exist on the primary (it would seem not). ... more to come i'm sure, but fading out... thanks again for the work so far Simon. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hot standby system
Ühel kenal päeval, R, 2006-07-21 kell 20:00, kirjutas Chris Campbell: The documentation [1] says this about On-line backup and point-in- time recovery: If we continuously feed the series of WAL files to another machine that has been loaded with the same base backup file, we have a hot standby system: at any point we can bring up the second machine and it will have a nearly-current copy of the database. Is this possible today in a stable and robust way? If so, can we document the procedure? If not, should we alter the documentation so it's not misleading? I've had several people ask me where to enable the hot standby feature, not realizing that PostgreSQL only has some of the raw materials that could be used to architect such a thing. We have written a management script that manages the setup und failover procedures. It is a part of the SkyTools package we announced at the postgresql conference in Toronto. We failed to put it up on PgFoundry during the code sprint (it took a whole day to enable the project there :( ) and have not had enough time to do it after coming back. But it will be on PgFoundry eventually. Marko has the details if you need it faster -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] hot standby system
Is this possible today in a stable and robust way? If so, can we document the procedure? If not, should we alter the documentation so it's not misleading? I've had several people ask me where to enable the hot standby feature, not realizing that PostgreSQL only has some of the raw materials that could be used to architect such a thing. Well it works fine depending on how you set it up :) Please feel free to submit a patch to the docs. Sincerely, Joshua D. Drake Thanks! - Chris [1] http://www.postgresql.org/docs/8.1/interactive/backup-online.html ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq