subject:"\[HACKERS\] Cascading replication\: should we detect\/prevent cycles\?"

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-02-02 Thread Robert Haas

On Thu, Jan 31, 2013 at 9:48 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/01/2013 12:01 PM, Josh Berkus wrote:
 If we're going to start installing safeguards against doing stupid
 things, there's a long list of scenarios that happen far more
 regularly than this ever will and cause far more damage.

 What's wrong with making it easier for sysadmins to troubleshoot things?
  Again, I'm not talking about erroring out, I'm talking about logging a
 warning.

 Or to put it another way:  Robert, you just did a nobody wants that to
 me.  I thought you were opposed to such things on this list.

I respectfully disagree.  I'm saying that *I* don't want that, which I
think is different.  To interpret my opposition against saying nobody
wants that to mean you can never oppose anything someone else thinks
is a good idea would preclude meaningful dialogue on most of what we
talk about here.  And clearly there is at least some demand for this
feature, because you and Craig Ringer both want it.  So let me try to
restate my objection to this specific feature more clearly.

I think that we should be careful about warning the user about things
that might not actually be mistakes.  I'm not aware that we currently
issue ANY warnings of that type.  When we emit error messages, we
sometimes suggest one possible cause of the error, and such messages
are clearly labelled as HINT.  But we don't, for example, emit an
error or a WARNING or ERROR about a DELETE or UPDATE statement that
lacks a WHERE clause, even though many people might like to have such
a feature.  We don't warn a user hey, float8 is imprecise, consider
using numeric or hey, numeric is slow, consider using float8 or
setting autovacuum_naptime to an hour is probably dummer than pouring
sugar in your gas tank, even though all of those things are true and
some people might like to be warned.  We only warn or error out when
something happens that we are 100% sure is bad.  And, in this
particular case, it has been suggested that there are legitimate
reasons why a replication topology might temporarily involve loops, so
I believe this fails that criterion.

Second, we have often discussed the importance of avoiding log spam.
Warnings that are likely to be repeated a large number of times when
they occur have repeatedly been voted down on those grounds.  I
believe that objection also applies to this case.  It is more
appropriate to make information about the status of the system
available via some status-inquiry function; for example, if you were
to recast this as adding a slave-side function that attempts to return
the IP of the current master, or NULL if no master, that would answer
this objection (but not necessarily all of the other ones).

Third, we usually apply a criterion that warnings or errors must
represent conditions that we can reliably detect; in other words, we
typically do not add checks for situations that we will only sometimes
be able to identify.  And, in this case, it's a little unclear how we
would actually identify loops.  Presumably, we'd do it by sending a
chain of unique per-node identifiers along with the WAL, and looking
for your own identifier in the path, but we don't have any sort of
unique per-node identifier right now, and how would you create one?
If someone shuts down the cluster, duplicates it, and starts up both
copies, we want that to work.  Any identifier embedded in the cluster
by such a process would be duplicated.  You could use something like
the node IP and port number, which wouldn't have that pitfall, but as
we all know, IPs can be duplicated (e.g. due to NAT) so this isn't
necessarily reliable either.  If you do come up with a suitable unique
per-node identifier, then this is fairly simple to make work for
streaming replication, but it's tricky to see how to make it work with
archiving.

Is that more clear?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-02-01 Thread Mark Kirkwood


On 01/02/13 20:43, Peter Geoghegan wrote:



On Sunday, 27 January 2013, Robert Haas robertmh...@gmail.com
mailto:robertmh...@gmail.com wrote:
  If we're going to start installing safeguards against doing stupid
  things, there's a long list of scenarios that happen far more
  regularly than this ever will and cause far more damage.

+1


+1

...and there are other areas that we could spend our energy on that 
would be more worthwhile I think. One I'd like to see is the opposite of:


$ pg_ctl promote

i.e:

$ pg_ctl demote

So a retired master would read a (newly supplied perhaps) 
recovery.conf and start to apply changes from there (with suitable 
safeguards). We have failover pretty painless now... but reconstruction 
of the original primary as a new standby is still too 
fiddly/resource/time consuming etc.


Regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-02-01 Thread Craig Ringer

On 01/28/2013 02:13 AM, Robert Haas wrote:
 If we're going to start installing safeguards against doing stupid
 things, there's a long list of scenarios that happen far more
 regularly than this ever will and cause far more damage. 
I'm not sure this approach is consistent with other decisions made in
the past, and with the project's general goals of stability and quality.

In particular, by the above rationale, why would the project have for
8.3 removed the implicit casts to/from 'text'? It's a minor safeguard
against users doing stupid things. Many other such safeguards exist -
and despite my frustration with some details of the implicit casts,
overall I think that such safeguards are a good and desirable thing.

Safeguards to stop users doing stupid things are part of good usability,
so long as they do not interfere excessively with performance,
functionality, maintainability, etc. Even where data loss/corruption is
not a risk, if it's feasable to detect a misconfiguration, bad command,
etc, IMHO it generally makes sense to do so. Otherwise we land up in
MySQL-land, littered with foot-guns, quirks, features that are easy to
misconfigure and hard to tell if they're misconfigured (PITR and
streaming replication are already in that category IMO), and generally
get a reputation as being hard to use, hard to troubleshoot, and painful.

I don't mean to be contrary; I realise that I've raised a differing view
a fair bit recently, but I'm only doing so when I think there's
something to contribute by doing so. Posting a me too when I agree is
rather less helpful, so its the differing views that tend to actually
get posted, despite being in the minority.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-31 Thread Josh Berkus


 If we're going to start installing safeguards against doing stupid
 things, there's a long list of scenarios that happen far more
 regularly than this ever will and cause far more damage.

What's wrong with making it easier for sysadmins to troubleshoot things?
 Again, I'm not talking about erroring out, I'm talking about logging a
warning.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-31 Thread Josh Berkus

On 02/01/2013 12:01 PM, Josh Berkus wrote:
 
 If we're going to start installing safeguards against doing stupid
 things, there's a long list of scenarios that happen far more
 regularly than this ever will and cause far more damage.
 
 What's wrong with making it easier for sysadmins to troubleshoot things?
  Again, I'm not talking about erroring out, I'm talking about logging a
 warning.

Or to put it another way:  Robert, you just did a nobody wants that to
me.  I thought you were opposed to such things on this list.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-31 Thread Peter Geoghegan

On Sunday, 27 January 2013, Robert Haas robertmh...@gmail.com wrote:
 If we're going to start installing safeguards against doing stupid
 things, there's a long list of scenarios that happen far more
 regularly than this ever will and cause far more damage.

+1


-- 
Regards,
Peter Geoghegan

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-27 Thread Josh Berkus

All,

So while testing some replication stuff on 9.2.2 I discovered that it's
completely possible to connect a replica to itself.  Seems like we ought
to at least be able to detect and log *that*.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-27 Thread Simon Riggs

On 27 January 2013 11:30, Josh Berkus j...@agliodbs.com wrote:

 So while testing some replication stuff on 9.2.2 I discovered that it's
 completely possible to connect a replica to itself.  Seems like we ought
 to at least be able to detect and log *that*.

How do we do that?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-27 Thread Robert Haas

On Sun, Jan 27, 2013 at 6:30 AM, Josh Berkus j...@agliodbs.com wrote:
 So while testing some replication stuff on 9.2.2 I discovered that it's
 completely possible to connect a replica to itself.  Seems like we ought
 to at least be able to detect and log *that*.

We could certainly alter the protocol so that it can detect that
situation, but like Simon, I dowanna.  I rarely get the chance to
agree wholeheartedly with Simon, so let me just take a moment to revel
in it here: you have discovered a non-problem problem.  Sure, if you
do that, nothing useful will happen.  But there are lots of non-useful
things in the world you can do, and it is neither practical nor
sensible to try to prevent them all.  And again, yes, you could do
that by accident when you meant to do something more sane, but again,
there are any number of other ways to accidentally do something truly
worthless.

If we're going to start installing safeguards against doing stupid
things, there's a long list of scenarios that happen far more
regularly than this ever will and cause far more damage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-09 Thread Simon Riggs

On 9 January 2013 01:51, Josh Berkus j...@agliodbs.com wrote:

 Anyway, I'm not saying we solve this now.  I'm saying, put it on the
 TODO list in case someone has time/an itch to scratch.

I think its reasonable to ask whether a usability feature needs to
exist whenever a problem is encountered. That shouldn't need to
translate to a new feature/TODO every time we ask the question though.

IMHO, in this case, we should document this as an issue that can
happen and we should caution that careful testing is required.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Josh Berkus


On 1/5/13 1:21 PM, Peter Geoghegan wrote:

On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:

I'm sure it's possible; I don't *think* it's terribly easy.


I'm inclined to agree that this isn't a terribly pressing issue.
Certainly, the need to introduce a bunch of new infrastructure to
detect this case seems hard to justify.


Impossible to justify, I'd say.

Does anyone have any objections to my adding this to the TODO list, in 
case some clever GSOC student comes up with a way to do it *without* 
adding a bunch of infrastructure?


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Simon Riggs

On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote:
 On 1/5/13 1:21 PM, Peter Geoghegan wrote:

 On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:

 I'm sure it's possible; I don't *think* it's terribly easy.


 I'm inclined to agree that this isn't a terribly pressing issue.
 Certainly, the need to introduce a bunch of new infrastructure to
 detect this case seems hard to justify.


 Impossible to justify, I'd say.

 Does anyone have any objections to my adding this to the TODO list, in case
 some clever GSOC student comes up with a way to do it *without* adding a
 bunch of infrastructure?

Daniel already did object

I think we have other problems that need solving much more than this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread David Fetter

On Tue, Jan 08, 2013 at 10:46:12AM -0800, Josh Berkus wrote:
 On 1/5/13 1:21 PM, Peter Geoghegan wrote:
 On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:
 I'm sure it's possible; I don't *think* it's terribly easy.
 
 I'm inclined to agree that this isn't a terribly pressing issue.
 Certainly, the need to introduce a bunch of new infrastructure to
 detect this case seems hard to justify.
 
 Impossible to justify, I'd say.
 
 Does anyone have any objections to my adding this to the TODO list,
 in case some clever GSOC student comes up with a way to do it
 *without* adding a bunch of infrastructure?

I'm pretty sure the logical change stuff Andres et al. are working on
will be able to include the originating node, which makes cycle
detection dead simple.

Other restrictions on the graph like, must be a tree might be more
complicated.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Simon Riggs

On 8 January 2013 19:53, David Fetter da...@fetter.org wrote:
 On Tue, Jan 08, 2013 at 10:46:12AM -0800, Josh Berkus wrote:
 On 1/5/13 1:21 PM, Peter Geoghegan wrote:
 On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:
 I'm sure it's possible; I don't *think* it's terribly easy.
 
 I'm inclined to agree that this isn't a terribly pressing issue.
 Certainly, the need to introduce a bunch of new infrastructure to
 detect this case seems hard to justify.

 Impossible to justify, I'd say.

 Does anyone have any objections to my adding this to the TODO list,
 in case some clever GSOC student comes up with a way to do it
 *without* adding a bunch of infrastructure?

 I'm pretty sure the logical change stuff Andres et al. are working on
 will be able to include the originating node, which makes cycle
 detection dead simple.

That's different thing really, but I see what you mean.

The problem here is how you tell whether an indirect connection is
connected to the master. It's not just a hard problem its a transient
problem, where any one person's view of the answer might be in the
midst of changing as you measure it. So throwing an error message
might make certain cluster configs inoperable.

I'd prefer to be able to bring up a complex cluster in any order,
rather than in waves of startups all needing synchronisation to avoid
error.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Daniel Farina

On Tue, Jan 8, 2013 at 11:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote:
 On 1/5/13 1:21 PM, Peter Geoghegan wrote:

 On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:

 I'm sure it's possible; I don't *think* it's terribly easy.


 I'm inclined to agree that this isn't a terribly pressing issue.
 Certainly, the need to introduce a bunch of new infrastructure to
 detect this case seems hard to justify.


 Impossible to justify, I'd say.

 Does anyone have any objections to my adding this to the TODO list, in case
 some clever GSOC student comes up with a way to do it *without* adding a
 bunch of infrastructure?

 Daniel already did object

To briefly reiterate my objection, I observed that one may want to
enter a case of cyclicality on a temporary basis -- to assist with
some intermediate states in remastering, and it'd be nice if Postgres
didn't try to get in the way of that.

I would like to have enough reporting to be able to write tools that
detect cyclicity and other configuration error, and I think that may
exist already in recovery.conf/its successor in postgresql.conf.  A
notable problem here is that UDFs, by their mechanical nature, don't
quite cover all the use cases, as they require the server to be
running and available for hot standby to run.  It seems like reading
recovery.conf or its successor is probably the best option here.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Josh Berkus

Daniel,


 To briefly reiterate my objection, I observed that one may want to
 enter a case of cyclicality on a temporary basis -- to assist with
 some intermediate states in remastering, and it'd be nice if Postgres
 didn't try to get in the way of that.

I don't think it *should* fail.  I think it should write a WARNING to
the logs, to make it easy to debug if the cycle was created accidentally.

 I would like to have enough reporting to be able to write tools that
 detect cyclicity and other configuration error, and I think that may
 exist already in recovery.conf/its successor in postgresql.conf.  A
 notable problem here is that UDFs, by their mechanical nature, don't
 quite cover all the use cases, as they require the server to be
 running and available for hot standby to run.  It seems like reading
 recovery.conf or its successor is probably the best option here.

Well, pg_conninfo will still be in postgresql.conf.  But that doesn't
help you if you're playing fast and loose with virtual IP addresses ...
and arguably, people using Virtual IPs are more likely to accidentally
create a cycle.

Anyway, I'm not saying we solve this now.  I'm saying, put it on the
TODO list in case someone has time/an itch to scratch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Daniel Farina

On Tue, Jan 8, 2013 at 5:51 PM, Josh Berkus j...@agliodbs.com wrote:
 Daniel,


 To briefly reiterate my objection, I observed that one may want to
 enter a case of cyclicality on a temporary basis -- to assist with
 some intermediate states in remastering, and it'd be nice if Postgres
 didn't try to get in the way of that.

 I don't think it *should* fail.  I think it should write a WARNING to
 the logs, to make it easy to debug if the cycle was created accidentally.

Well, in the conversation so long ago that was more openly considered,
which may not be true in the present era...just covering my old tracks
exactly.

 I would like to have enough reporting to be able to write tools that
 detect cyclicity and other configuration error, and I think that may
 exist already in recovery.conf/its successor in postgresql.conf.  A
 notable problem here is that UDFs, by their mechanical nature, don't
 quite cover all the use cases, as they require the server to be
 running and available for hot standby to run.  It seems like reading
 recovery.conf or its successor is probably the best option here.

 Well, pg_conninfo will still be in postgresql.conf.  But that doesn't
 help you if you're playing fast and loose with virtual IP addresses ...
 and arguably, people using Virtual IPs are more likely to accidentally
 create a cycle.

That's a good point. Even simpler than virtual-IP is DNS, wherein the
resolution can be rebound, but a pre-existing connection to an old IP
will happily remain, and will be hard to know that from
postgresql.conf and friends.  I guess that means the hard case is when
 hot standby is not (yet) on, but the server is actively recovering
WAL...UDFs are out, and scanning postgresql.conf is not necessarily an
accurate picture of the situation.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-05 Thread Joshua Berkus

Robert,

 I'm sure it's possible; I don't *think* it's terribly easy.  The
 usual
 algorithm for cycle detection is to have each node send to the next
 node the path that the data has taken.  But, there's no unique
 identifier for each slave that I know of - you could use IP address,
 but that's not really unique.  And, if the WAL passes through an
 archive, how do you deal with that?  

Not that I know how to do this, but it seems like a more direct approach is to 
check whether there's a master anywhere up the line.  H.  Still sounds 
fairly difficult.

 I'm sure somebody could figure
 all of this stuff out, but it seems fairly complicated for the
 benefit
 we'd get.  I just don't think this is going to be a terribly common
 problem; if it turns out I'm wrong, I may revise my opinion.  :-)

I don't think it'll be that common either.  The problem is that when it does 
happen, it'll be very hard for the hapless sysadmin involved to troubleshoot.

 To me, it seems that lag monitoring between master and standby is
 something that anyone running a complex replication configuration
 should be doing - and yeah, I think anything involving four standbys
 (or cascading) qualifies as complex.  If you're doing that, you
 should
 notice pretty quickly that your replication lag is increasing
 steadily.  

There are many reasons why replication lag would increase steadily.

 You might also check pg_stat_replication the master and
 notice that there are no connections there any more. 

Well, if you've created a true cycle, every server has one or more replicas.  
The original case I presented was the most probably cause of accidental cycles: 
the original master dies, and the on-call sysadmin accidentally connects the 
first replica to the last replica while trying to recover the cluster.

AFAICT, the only way to troubleshoot a cycle is to test every server in the 
network to see if it's a master and has replicas, and if no server is a master 
with replicas, it's a cycle.  Again, not fast or intuitive.

 Could someone
 miss those tell-tale signs?  Sure.  But they could also set
 autovacuum_naptime to an hour and then file a support ticket
 complaining that about table bloat - and they do.  Personally, as
 user
 screw-ups go, I'd consider that scenario (and its fourteen cousins,
 twenty-seven second cousins, and three hundred and ninety two other
 extended family members) as higher-priority and lower effort to fix
 than this particular thing.

I agree that this isn't a particularly high-priority issue.  I do think it 
should go on the TODO list, though, just in case we get a GSOC student or other 
new contributor who wants to tackle it.

--Josh




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-05 Thread Peter Geoghegan

On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:
 I'm sure it's possible; I don't *think* it's terribly easy.

I'm inclined to agree that this isn't a terribly pressing issue.
Certainly, the need to introduce a bunch of new infrastructure to
detect this case seems hard to justify.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-21 Thread Robert Haas

On Thu, Dec 20, 2012 at 5:28 PM, Joshua Berkus j...@agliodbs.com wrote:
  What would such a test look like?  It's not obvious to me that
  there's any rapid way for a user to detect this situation, without
  checking each server individually.

 Change something on the master and observe that none of the supposed
 standbys notice?

 That doesn't sound like an infallible test, or a 60-second one.

 My point is that in a complex situation (imagine a shop with 9 replicated 
 servers in 3 different cascaded groups, immediately after a failover of the 
 original master), it would be easy for a sysadmin, responding to middle of 
 the night page, to accidentally fat-finger an IP address and create a cycle 
 instead of a new master.  And once he's done that, a longish troubleshooting 
 process to figure out what's wrong and why writes aren't working, especially 
 if he goes to bed and some other sysadmin picks up the Writes failing to 
 PostgreSQL ticket.

 *if* it's relatively easy for us to detect cycles (that's a big if, I'm not 
 sure how we'd do it), then it would help a lot for us to at least emit a 
 WARNING.  That would short-cut a lot of troubleshooting.

I'm sure it's possible; I don't *think* it's terribly easy.  The usual
algorithm for cycle detection is to have each node send to the next
node the path that the data has taken.  But, there's no unique
identifier for each slave that I know of - you could use IP address,
but that's not really unique.  And, if the WAL passes through an
archive, how do you deal with that?  I'm sure somebody could figure
all of this stuff out, but it seems fairly complicated for the benefit
we'd get.  I just don't think this is going to be a terribly common
problem; if it turns out I'm wrong, I may revise my opinion.  :-)

To me, it seems that lag monitoring between master and standby is
something that anyone running a complex replication configuration
should be doing - and yeah, I think anything involving four standbys
(or cascading) qualifies as complex.  If you're doing that, you should
notice pretty quickly that your replication lag is increasing
steadily.  You might also check pg_stat_replication the master and
notice that there are no connections there any more.  Could someone
miss those tell-tale signs?  Sure.  But they could also set
autovacuum_naptime to an hour and then file a support ticket
complaining that about table bloat - and they do.  Personally, as user
screw-ups go, I'd consider that scenario (and its fourteen cousins,
twenty-seven second cousins, and three hundred and ninety two other
extended family members) as higher-priority and lower effort to fix
than this particular thing.

YMMV, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-20 Thread Robert Haas

On Wed, Dec 19, 2012 at 5:14 PM, Joshua Berkus j...@agliodbs.com wrote:
 What would such a test look like?  It's not obvious to me that there's any 
 rapid way for a user to detect this situation, without checking each server 
 individually.

Change something on the master and observe that none of the supposed
standbys notice?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-20 Thread Joshua Berkus

Robert,

  What would such a test look like?  It's not obvious to me that
  there's any rapid way for a user to detect this situation, without
  checking each server individually.
 
 Change something on the master and observe that none of the supposed
 standbys notice?

That doesn't sound like an infallible test, or a 60-second one.

My point is that in a complex situation (imagine a shop with 9 replicated 
servers in 3 different cascaded groups, immediately after a failover of the 
original master), it would be easy for a sysadmin, responding to middle of the 
night page, to accidentally fat-finger an IP address and create a cycle instead 
of a new master.  And once he's done that, a longish troubleshooting process to 
figure out what's wrong and why writes aren't working, especially if he goes to 
bed and some other sysadmin picks up the Writes failing to PostgreSQL ticket.

*if* it's relatively easy for us to detect cycles (that's a big if, I'm not 
sure how we'd do it), then it would help a lot for us to at least emit a 
WARNING.  That would short-cut a lot of troubleshooting.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Joshua D. Drake



On 12/18/2012 11:57 PM, Simon Riggs wrote:


On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote:


So, my question is:

1. should we detect for replication cycles?  *Can* we?
2. should we warn the user, or refuse to start up?


Why not just monitor the config you just created? Anybody that
actually tests their config would spot this.


I think you are being optimistic. We should probably have some logic 
that prevents circular replication.








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Simon Riggs

On 19 December 2012 08:11, Joshua D. Drake j...@commandprompt.com wrote:

 On 12/18/2012 11:57 PM, Simon Riggs wrote:


 On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote:

 So, my question is:

 1. should we detect for replication cycles?  *Can* we?
 2. should we warn the user, or refuse to start up?


 Why not just monitor the config you just created? Anybody that
 actually tests their config would spot this.


 I think you are being optimistic. We should probably have some logic that
 prevents circular replication.

My logic is that if you make a 1 minute test you will notice your
mistake, which is glaringly obvious. That is sufficient to prevent
that mistake, IMHO.

If you don't test your config and don't monitor either, good luck with HA.

Patches welcome, if you think this important enough.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Joshua D. Drake



On 12/19/2012 12:34 AM, Simon Riggs wrote:


My logic is that if you make a 1 minute test you will notice your
mistake, which is glaringly obvious. That is sufficient to prevent
that mistake, IMHO.

If you don't test your config and don't monitor either, good luck with HA.


I am not arguing whether you are right. I am arguing whether or not we 
want to shoot all but our experts users in the foot. People make 
mistakes, when reasonable we should help them not make those mistakes.


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Joshua Berkus

Simon,

 My logic is that if you make a 1 minute test you will notice your
 mistake, which is glaringly obvious. That is sufficient to prevent
 that mistake, IMHO.

What would such a test look like?  It's not obvious to me that there's any 
rapid way for a user to detect this situation, without checking each server 
individually.

If there's a quick and easy way to test for cycles from the user side, we 
should put it in documentation somewhere.

--Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-19 Thread Daniel Farina

On Tue, Dec 18, 2012 at 7:03 PM, Josh Berkus j...@agliodbs.com wrote:
 2. should we warn the user, or refuse to start up?

One nice property of allowing cyclicity is that it's easier to
syndicate application of WAL to a series of standbys before promotion
of exactly one to act as a primary (basically, to perform catch-up).
One could imagine someone wanting a configuration that was like:

 +r2
 | |
r1 ---+

This is only one step before:

r1r2

or

r2r1

(and, most importantly, after the cycle quiesces one can choose either one)

For my use, I'm not convinced that such checks and warnings are useful
if delivered by default, and I think outright rejection of cyclicity
is harmful.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-18 Thread Josh Berkus

Folks,

So as a test I tried to connect a group of 9.3 streaming replicas in a
circle (4 replicas).  This was very easy to do:

1. create r1 as replica of master
2. create r2 as replica of r1
3. create r3 as replica of r2
4. create r4 as replica of r3
5. start traffic on master
6. shut down r1
7. point r1 to r4 in recovery.conf
8. start r1

replicas are now successfully connected in this pattern:

r1 --- r2
 ^   |
 |   |
 |   v
r4 --- r3

pg_stat_replication displays the correct information on each replica.

So, my question is:

1. should we detect for replication cycles?  *Can* we?
2. should we warn the user, or refuse to start up?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2012-12-18 Thread Simon Riggs

On 19 December 2012 03:03, Josh Berkus j...@agliodbs.com wrote:

 So, my question is:

 1. should we detect for replication cycles?  *Can* we?
 2. should we warn the user, or refuse to start up?

Why not just monitor the config you just created? Anybody that
actually tests their config would spot this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

[HACKERS] Cascading replication: should we detect/prevent cycles?

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

29 matches

Site Navigation

Mail list logo

Footer information