Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Thu, Jan 31, 2013 at 9:48 PM, Josh Berkus j...@agliodbs.com wrote: On 02/01/2013 12:01 PM, Josh Berkus wrote: If we're going to start installing safeguards against doing stupid things, there's a long list of scenarios that happen far more regularly than this ever will and cause far more damage. What's wrong with making it easier for sysadmins to troubleshoot things? Again, I'm not talking about erroring out, I'm talking about logging a warning. Or to put it another way: Robert, you just did a nobody wants that to me. I thought you were opposed to such things on this list. I respectfully disagree. I'm saying that *I* don't want that, which I think is different. To interpret my opposition against saying nobody wants that to mean you can never oppose anything someone else thinks is a good idea would preclude meaningful dialogue on most of what we talk about here. And clearly there is at least some demand for this feature, because you and Craig Ringer both want it. So let me try to restate my objection to this specific feature more clearly. I think that we should be careful about warning the user about things that might not actually be mistakes. I'm not aware that we currently issue ANY warnings of that type. When we emit error messages, we sometimes suggest one possible cause of the error, and such messages are clearly labelled as HINT. But we don't, for example, emit an error or a WARNING or ERROR about a DELETE or UPDATE statement that lacks a WHERE clause, even though many people might like to have such a feature. We don't warn a user hey, float8 is imprecise, consider using numeric or hey, numeric is slow, consider using float8 or setting autovacuum_naptime to an hour is probably dummer than pouring sugar in your gas tank, even though all of those things are true and some people might like to be warned. We only warn or error out when something happens that we are 100% sure is bad. And, in this particular case, it has been suggested that there are legitimate reasons why a replication topology might temporarily involve loops, so I believe this fails that criterion. Second, we have often discussed the importance of avoiding log spam. Warnings that are likely to be repeated a large number of times when they occur have repeatedly been voted down on those grounds. I believe that objection also applies to this case. It is more appropriate to make information about the status of the system available via some status-inquiry function; for example, if you were to recast this as adding a slave-side function that attempts to return the IP of the current master, or NULL if no master, that would answer this objection (but not necessarily all of the other ones). Third, we usually apply a criterion that warnings or errors must represent conditions that we can reliably detect; in other words, we typically do not add checks for situations that we will only sometimes be able to identify. And, in this case, it's a little unclear how we would actually identify loops. Presumably, we'd do it by sending a chain of unique per-node identifiers along with the WAL, and looking for your own identifier in the path, but we don't have any sort of unique per-node identifier right now, and how would you create one? If someone shuts down the cluster, duplicates it, and starts up both copies, we want that to work. Any identifier embedded in the cluster by such a process would be duplicated. You could use something like the node IP and port number, which wouldn't have that pitfall, but as we all know, IPs can be duplicated (e.g. due to NAT) so this isn't necessarily reliable either. If you do come up with a suitable unique per-node identifier, then this is fairly simple to make work for streaming replication, but it's tricky to see how to make it work with archiving. Is that more clear? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 01/02/13 20:43, Peter Geoghegan wrote: On Sunday, 27 January 2013, Robert Haas robertmh...@gmail.com mailto:robertmh...@gmail.com wrote: If we're going to start installing safeguards against doing stupid things, there's a long list of scenarios that happen far more regularly than this ever will and cause far more damage. +1 +1 ...and there are other areas that we could spend our energy on that would be more worthwhile I think. One I'd like to see is the opposite of: $ pg_ctl promote i.e: $ pg_ctl demote So a retired master would read a (newly supplied perhaps) recovery.conf and start to apply changes from there (with suitable safeguards). We have failover pretty painless now... but reconstruction of the original primary as a new standby is still too fiddly/resource/time consuming etc. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 01/28/2013 02:13 AM, Robert Haas wrote: If we're going to start installing safeguards against doing stupid things, there's a long list of scenarios that happen far more regularly than this ever will and cause far more damage. I'm not sure this approach is consistent with other decisions made in the past, and with the project's general goals of stability and quality. In particular, by the above rationale, why would the project have for 8.3 removed the implicit casts to/from 'text'? It's a minor safeguard against users doing stupid things. Many other such safeguards exist - and despite my frustration with some details of the implicit casts, overall I think that such safeguards are a good and desirable thing. Safeguards to stop users doing stupid things are part of good usability, so long as they do not interfere excessively with performance, functionality, maintainability, etc. Even where data loss/corruption is not a risk, if it's feasable to detect a misconfiguration, bad command, etc, IMHO it generally makes sense to do so. Otherwise we land up in MySQL-land, littered with foot-guns, quirks, features that are easy to misconfigure and hard to tell if they're misconfigured (PITR and streaming replication are already in that category IMO), and generally get a reputation as being hard to use, hard to troubleshoot, and painful. I don't mean to be contrary; I realise that I've raised a differing view a fair bit recently, but I'm only doing so when I think there's something to contribute by doing so. Posting a me too when I agree is rather less helpful, so its the differing views that tend to actually get posted, despite being in the minority. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
If we're going to start installing safeguards against doing stupid things, there's a long list of scenarios that happen far more regularly than this ever will and cause far more damage. What's wrong with making it easier for sysadmins to troubleshoot things? Again, I'm not talking about erroring out, I'm talking about logging a warning. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 02/01/2013 12:01 PM, Josh Berkus wrote: If we're going to start installing safeguards against doing stupid things, there's a long list of scenarios that happen far more regularly than this ever will and cause far more damage. What's wrong with making it easier for sysadmins to troubleshoot things? Again, I'm not talking about erroring out, I'm talking about logging a warning. Or to put it another way: Robert, you just did a nobody wants that to me. I thought you were opposed to such things on this list. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Sunday, 27 January 2013, Robert Haas robertmh...@gmail.com wrote: If we're going to start installing safeguards against doing stupid things, there's a long list of scenarios that happen far more regularly than this ever will and cause far more damage. +1 -- Regards, Peter Geoghegan
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
All, So while testing some replication stuff on 9.2.2 I discovered that it's completely possible to connect a replica to itself. Seems like we ought to at least be able to detect and log *that*. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 27 January 2013 11:30, Josh Berkus j...@agliodbs.com wrote: So while testing some replication stuff on 9.2.2 I discovered that it's completely possible to connect a replica to itself. Seems like we ought to at least be able to detect and log *that*. How do we do that? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Sun, Jan 27, 2013 at 6:30 AM, Josh Berkus j...@agliodbs.com wrote: So while testing some replication stuff on 9.2.2 I discovered that it's completely possible to connect a replica to itself. Seems like we ought to at least be able to detect and log *that*. We could certainly alter the protocol so that it can detect that situation, but like Simon, I dowanna. I rarely get the chance to agree wholeheartedly with Simon, so let me just take a moment to revel in it here: you have discovered a non-problem problem. Sure, if you do that, nothing useful will happen. But there are lots of non-useful things in the world you can do, and it is neither practical nor sensible to try to prevent them all. And again, yes, you could do that by accident when you meant to do something more sane, but again, there are any number of other ways to accidentally do something truly worthless. If we're going to start installing safeguards against doing stupid things, there's a long list of scenarios that happen far more regularly than this ever will and cause far more damage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 9 January 2013 01:51, Josh Berkus j...@agliodbs.com wrote: Anyway, I'm not saying we solve this now. I'm saying, put it on the TODO list in case someone has time/an itch to scratch. I think its reasonable to ask whether a usability feature needs to exist whenever a problem is encountered. That shouldn't need to translate to a new feature/TODO every time we ask the question though. IMHO, in this case, we should document this as an issue that can happen and we should caution that careful testing is required. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? Daniel already did object I think we have other problems that need solving much more than this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Jan 08, 2013 at 10:46:12AM -0800, Josh Berkus wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? I'm pretty sure the logical change stuff Andres et al. are working on will be able to include the originating node, which makes cycle detection dead simple. Other restrictions on the graph like, must be a tree might be more complicated. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 8 January 2013 19:53, David Fetter da...@fetter.org wrote: On Tue, Jan 08, 2013 at 10:46:12AM -0800, Josh Berkus wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? I'm pretty sure the logical change stuff Andres et al. are working on will be able to include the originating node, which makes cycle detection dead simple. That's different thing really, but I see what you mean. The problem here is how you tell whether an indirect connection is connected to the master. It's not just a hard problem its a transient problem, where any one person's view of the answer might be in the midst of changing as you measure it. So throwing an error message might make certain cluster configs inoperable. I'd prefer to be able to bring up a complex cluster in any order, rather than in waves of startups all needing synchronisation to avoid error. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Jan 8, 2013 at 11:51 AM, Simon Riggs si...@2ndquadrant.com wrote: On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote: On 1/5/13 1:21 PM, Peter Geoghegan wrote: On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. Impossible to justify, I'd say. Does anyone have any objections to my adding this to the TODO list, in case some clever GSOC student comes up with a way to do it *without* adding a bunch of infrastructure? Daniel already did object To briefly reiterate my objection, I observed that one may want to enter a case of cyclicality on a temporary basis -- to assist with some intermediate states in remastering, and it'd be nice if Postgres didn't try to get in the way of that. I would like to have enough reporting to be able to write tools that detect cyclicity and other configuration error, and I think that may exist already in recovery.conf/its successor in postgresql.conf. A notable problem here is that UDFs, by their mechanical nature, don't quite cover all the use cases, as they require the server to be running and available for hot standby to run. It seems like reading recovery.conf or its successor is probably the best option here. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
Daniel, To briefly reiterate my objection, I observed that one may want to enter a case of cyclicality on a temporary basis -- to assist with some intermediate states in remastering, and it'd be nice if Postgres didn't try to get in the way of that. I don't think it *should* fail. I think it should write a WARNING to the logs, to make it easy to debug if the cycle was created accidentally. I would like to have enough reporting to be able to write tools that detect cyclicity and other configuration error, and I think that may exist already in recovery.conf/its successor in postgresql.conf. A notable problem here is that UDFs, by their mechanical nature, don't quite cover all the use cases, as they require the server to be running and available for hot standby to run. It seems like reading recovery.conf or its successor is probably the best option here. Well, pg_conninfo will still be in postgresql.conf. But that doesn't help you if you're playing fast and loose with virtual IP addresses ... and arguably, people using Virtual IPs are more likely to accidentally create a cycle. Anyway, I'm not saying we solve this now. I'm saying, put it on the TODO list in case someone has time/an itch to scratch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Jan 8, 2013 at 5:51 PM, Josh Berkus j...@agliodbs.com wrote: Daniel, To briefly reiterate my objection, I observed that one may want to enter a case of cyclicality on a temporary basis -- to assist with some intermediate states in remastering, and it'd be nice if Postgres didn't try to get in the way of that. I don't think it *should* fail. I think it should write a WARNING to the logs, to make it easy to debug if the cycle was created accidentally. Well, in the conversation so long ago that was more openly considered, which may not be true in the present era...just covering my old tracks exactly. I would like to have enough reporting to be able to write tools that detect cyclicity and other configuration error, and I think that may exist already in recovery.conf/its successor in postgresql.conf. A notable problem here is that UDFs, by their mechanical nature, don't quite cover all the use cases, as they require the server to be running and available for hot standby to run. It seems like reading recovery.conf or its successor is probably the best option here. Well, pg_conninfo will still be in postgresql.conf. But that doesn't help you if you're playing fast and loose with virtual IP addresses ... and arguably, people using Virtual IPs are more likely to accidentally create a cycle. That's a good point. Even simpler than virtual-IP is DNS, wherein the resolution can be rebound, but a pre-existing connection to an old IP will happily remain, and will be hard to know that from postgresql.conf and friends. I guess that means the hard case is when hot standby is not (yet) on, but the server is actively recovering WAL...UDFs are out, and scanning postgresql.conf is not necessarily an accurate picture of the situation. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
Robert, I'm sure it's possible; I don't *think* it's terribly easy. The usual algorithm for cycle detection is to have each node send to the next node the path that the data has taken. But, there's no unique identifier for each slave that I know of - you could use IP address, but that's not really unique. And, if the WAL passes through an archive, how do you deal with that? Not that I know how to do this, but it seems like a more direct approach is to check whether there's a master anywhere up the line. H. Still sounds fairly difficult. I'm sure somebody could figure all of this stuff out, but it seems fairly complicated for the benefit we'd get. I just don't think this is going to be a terribly common problem; if it turns out I'm wrong, I may revise my opinion. :-) I don't think it'll be that common either. The problem is that when it does happen, it'll be very hard for the hapless sysadmin involved to troubleshoot. To me, it seems that lag monitoring between master and standby is something that anyone running a complex replication configuration should be doing - and yeah, I think anything involving four standbys (or cascading) qualifies as complex. If you're doing that, you should notice pretty quickly that your replication lag is increasing steadily. There are many reasons why replication lag would increase steadily. You might also check pg_stat_replication the master and notice that there are no connections there any more. Well, if you've created a true cycle, every server has one or more replicas. The original case I presented was the most probably cause of accidental cycles: the original master dies, and the on-call sysadmin accidentally connects the first replica to the last replica while trying to recover the cluster. AFAICT, the only way to troubleshoot a cycle is to test every server in the network to see if it's a master and has replicas, and if no server is a master with replicas, it's a cycle. Again, not fast or intuitive. Could someone miss those tell-tale signs? Sure. But they could also set autovacuum_naptime to an hour and then file a support ticket complaining that about table bloat - and they do. Personally, as user screw-ups go, I'd consider that scenario (and its fourteen cousins, twenty-seven second cousins, and three hundred and ninety two other extended family members) as higher-priority and lower effort to fix than this particular thing. I agree that this isn't a particularly high-priority issue. I do think it should go on the TODO list, though, just in case we get a GSOC student or other new contributor who wants to tackle it. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Thu, Dec 20, 2012 at 5:28 PM, Joshua Berkus j...@agliodbs.com wrote: What would such a test look like? It's not obvious to me that there's any rapid way for a user to detect this situation, without checking each server individually. Change something on the master and observe that none of the supposed standbys notice? That doesn't sound like an infallible test, or a 60-second one. My point is that in a complex situation (imagine a shop with 9 replicated servers in 3 different cascaded groups, immediately after a failover of the original master), it would be easy for a sysadmin, responding to middle of the night page, to accidentally fat-finger an IP address and create a cycle instead of a new master. And once he's done that, a longish troubleshooting process to figure out what's wrong and why writes aren't working, especially if he goes to bed and some other sysadmin picks up the Writes failing to PostgreSQL ticket. *if* it's relatively easy for us to detect cycles (that's a big if, I'm not sure how we'd do it), then it would help a lot for us to at least emit a WARNING. That would short-cut a lot of troubleshooting. I'm sure it's possible; I don't *think* it's terribly easy. The usual algorithm for cycle detection is to have each node send to the next node the path that the data has taken. But, there's no unique identifier for each slave that I know of - you could use IP address, but that's not really unique. And, if the WAL passes through an archive, how do you deal with that? I'm sure somebody could figure all of this stuff out, but it seems fairly complicated for the benefit we'd get. I just don't think this is going to be a terribly common problem; if it turns out I'm wrong, I may revise my opinion. :-) To me, it seems that lag monitoring between master and standby is something that anyone running a complex replication configuration should be doing - and yeah, I think anything involving four standbys (or cascading) qualifies as complex. If you're doing that, you should notice pretty quickly that your replication lag is increasing steadily. You might also check pg_stat_replication the master and notice that there are no connections there any more. Could someone miss those tell-tale signs? Sure. But they could also set autovacuum_naptime to an hour and then file a support ticket complaining that about table bloat - and they do. Personally, as user screw-ups go, I'd consider that scenario (and its fourteen cousins, twenty-seven second cousins, and three hundred and ninety two other extended family members) as higher-priority and lower effort to fix than this particular thing. YMMV, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Wed, Dec 19, 2012 at 5:14 PM, Joshua Berkus j...@agliodbs.com wrote: What would such a test look like? It's not obvious to me that there's any rapid way for a user to detect this situation, without checking each server individually. Change something on the master and observe that none of the supposed standbys notice? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
Robert, What would such a test look like? It's not obvious to me that there's any rapid way for a user to detect this situation, without checking each server individually. Change something on the master and observe that none of the supposed standbys notice? That doesn't sound like an infallible test, or a 60-second one. My point is that in a complex situation (imagine a shop with 9 replicated servers in 3 different cascaded groups, immediately after a failover of the original master), it would be easy for a sysadmin, responding to middle of the night page, to accidentally fat-finger an IP address and create a cycle instead of a new master. And once he's done that, a longish troubleshooting process to figure out what's wrong and why writes aren't working, especially if he goes to bed and some other sysadmin picks up the Writes failing to PostgreSQL ticket. *if* it's relatively easy for us to detect cycles (that's a big if, I'm not sure how we'd do it), then it would help a lot for us to at least emit a WARNING. That would short-cut a lot of troubleshooting. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 12/18/2012 11:57 PM, Simon Riggs wrote: On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote: So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? Why not just monitor the config you just created? Anybody that actually tests their config would spot this. I think you are being optimistic. We should probably have some logic that prevents circular replication. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 19 December 2012 08:11, Joshua D. Drake j...@commandprompt.com wrote: On 12/18/2012 11:57 PM, Simon Riggs wrote: On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote: So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? Why not just monitor the config you just created? Anybody that actually tests their config would spot this. I think you are being optimistic. We should probably have some logic that prevents circular replication. My logic is that if you make a 1 minute test you will notice your mistake, which is glaringly obvious. That is sufficient to prevent that mistake, IMHO. If you don't test your config and don't monitor either, good luck with HA. Patches welcome, if you think this important enough. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 12/19/2012 12:34 AM, Simon Riggs wrote: My logic is that if you make a 1 minute test you will notice your mistake, which is glaringly obvious. That is sufficient to prevent that mistake, IMHO. If you don't test your config and don't monitor either, good luck with HA. I am not arguing whether you are right. I am arguing whether or not we want to shoot all but our experts users in the foot. People make mistakes, when reasonable we should help them not make those mistakes. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
Simon, My logic is that if you make a 1 minute test you will notice your mistake, which is glaringly obvious. That is sufficient to prevent that mistake, IMHO. What would such a test look like? It's not obvious to me that there's any rapid way for a user to detect this situation, without checking each server individually. If there's a quick and easy way to test for cycles from the user side, we should put it in documentation somewhere. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On Tue, Dec 18, 2012 at 7:03 PM, Josh Berkus j...@agliodbs.com wrote: 2. should we warn the user, or refuse to start up? One nice property of allowing cyclicity is that it's easier to syndicate application of WAL to a series of standbys before promotion of exactly one to act as a primary (basically, to perform catch-up). One could imagine someone wanting a configuration that was like: +r2 | | r1 ---+ This is only one step before: r1r2 or r2r1 (and, most importantly, after the cycle quiesces one can choose either one) For my use, I'm not convinced that such checks and warnings are useful if delivered by default, and I think outright rejection of cyclicity is harmful. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cascading replication: should we detect/prevent cycles?
Folks, So as a test I tried to connect a group of 9.3 streaming replicas in a circle (4 replicas). This was very easy to do: 1. create r1 as replica of master 2. create r2 as replica of r1 3. create r3 as replica of r2 4. create r4 as replica of r3 5. start traffic on master 6. shut down r1 7. point r1 to r4 in recovery.conf 8. start r1 replicas are now successfully connected in this pattern: r1 --- r2 ^ | | | | v r4 --- r3 pg_stat_replication displays the correct information on each replica. So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote: So, my question is: 1. should we detect for replication cycles? *Can* we? 2. should we warn the user, or refuse to start up? Why not just monitor the config you just created? Anybody that actually tests their config would spot this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers