Re: [HACKERS] Streaming-only Remastering
On Sun, Jun 17, 2012 at 1:11 PM, Josh Berkus wrote: > >> Instead of using re-synchronization (e.g. repmgr in its relation to >> rsync), I intend to proxy and also inspect the streaming replication >> traffic and then quiesce all standbys and figure out what node is >> farthest ahead. Once I figure out the node that is farthest ahead, if >> it is not a node that is eligible for promotion to the master, I need >> to exchange its changes to nodes that are eligible for promotion[0], >> and then promote one of those, repointing all other standbys to that >> node. This must all take place nominally within a second or thirty. >> Conceptually it is simple, but mechanically it's somewhat intense, >> especially in relation to the inconvenience of doing this incorrectly. > > So you're suggesting that it would be great to be able to > double-remaster? i.e. given OM = Original Master, 1S = standby furthest > ahead, NM = desired new master, to do: Yeah. Although it seems like it would degenerate to single-remastering applied a couple times, no? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
> Instead of using re-synchronization (e.g. repmgr in its relation to > rsync), I intend to proxy and also inspect the streaming replication > traffic and then quiesce all standbys and figure out what node is > farthest ahead. Once I figure out the node that is farthest ahead, if > it is not a node that is eligible for promotion to the master, I need > to exchange its changes to nodes that are eligible for promotion[0], > and then promote one of those, repointing all other standbys to that > node. This must all take place nominally within a second or thirty. > Conceptually it is simple, but mechanically it's somewhat intense, > especially in relation to the inconvenience of doing this incorrectly. So you're suggesting that it would be great to be able to double-remaster? i.e. given OM = Original Master, 1S = standby furthest ahead, NM = desired new master, to do: 1S <--- OM ---> NM OM dies, then: 1S ---> NM until NM is caught up, then 1S <--- NM Yes? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
Simon, > The "major limitation" was solved by repmgr close to 2 years ago now. > So while you're correct that the patch to fix that assumed that > archiving worked as well, it has been possible to operate happily > without it. repmgr is not able to remaster using only streaming replication. It also requires an SSH connection, as well as a bunch of other administative setup (and compiling from source on most platforms, a not at all insignificant obstacle). So you haven't solved the problem, you've just provided a somewhat less awkward packaged workaround. It's certainly possible to devise all kinds of workarounds for the problem; I have a few myself in Bash and Python. What I want is to stop using workarounds. Without the requirement for archiving, PostgreSQL binary replication is almost ideally simple to set up and administer. Turn settings on in server A and Server B, run pg_basebackup and you're replicating. It's like 4 steps, all but one of which can be scripted through puppet. However, the moment you add log-shipping to the mix things get an order of magnitude more complicated, repmgr or not. There's really only too things standing in the way of binary replication being completely developer-friendly. Remastering is the big one, and the separate recovery.conf is the small one. We can fix both. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
On Fri, Jun 15, 2012 at 3:53 PM, Simon Riggs wrote: > On 10 June 2012 19:47, Joshua Berkus wrote: > >> So currently we have a major limitation in binary replication, where it is >> not possible to "remaster" your system (that is, designate the most >> caught-up standby as the new master) based on streaming replication only. >> This is a major limitation because the requirement to copy physical logs >> over scp (or similar methods), manage and expire them more than doubles the >> administrative overhead of managing replication. This becomes even more of >> a problem if you're doing cascading replication. > > The "major limitation" was solved by repmgr close to 2 years ago now. > So while you're correct that the patch to fix that assumed that > archiving worked as well, it has been possible to operate happily > without it. Remastering is one of the biggest thorns in my side over the last year. I don't think it's yet a trivially mechanized issue yet, but I do need to get there, and probably a few alterations in Postgres would help, although I have not itemized what they are (rather, I was intending to work around problems with what I have today). But since it is apropos to this discussion, here's what I've been thinking along these lines: Instead of using re-synchronization (e.g. repmgr in its relation to rsync), I intend to proxy and also inspect the streaming replication traffic and then quiesce all standbys and figure out what node is farthest ahead. Once I figure out the node that is farthest ahead, if it is not a node that is eligible for promotion to the master, I need to exchange its changes to nodes that are eligible for promotion[0], and then promote one of those, repointing all other standbys to that node. This must all take place nominally within a second or thirty. Conceptually it is simple, but mechanically it's somewhat intense, especially in relation to the inconvenience of doing this incorrectly. I surmise someone could come up with supporting mechanisms to make it less burdensome to write. One snarl is the interaction with the archive and restore commands: Postgres might, for example, have been in the middle of download and replaying a WAL segment even when I wish to be quiesced, and there's not a great way to stop it[1]. Ideally, I could replace those archive/dearchive commands with software that speaks the streaming replication protocol and just have less code involved overall. I think that is technically possible today, but maybe could be made easier, in particular being able to more easily chunk and align the WAL stream into units of some kind from the streaming protocol. Maybe it's already possible, but it will take a little thinking. I had already written off getting this level of cohesion in the next year (intending a detailed mix of archive_command and streaming protocol software), but it's not something that leaves me close to satisfied by any measure. Furthermore, some use cases demand that no matter what the user setting with regard to syncrep is that Postgres not make progress unless it has synchronously replicated to a special piece of proxy software. This is useful if one wants to offload the exact location and storage strategy for crash recovery to another piece of software. That's the obvious next step after a cohesive delegation of (de-)archiving. So, all in all, Postgres has no great way to cohesively delegate all WAL-persistence and WAL-restoration and I don't know if the streaming protocol + sync rep facilities can completely conveniently subsume all those use cases (but I think it probably can without enormous modification). I think it should learn what it needs to learn to make that happen. It might even allow the existing shell-command based (de-)archiver to live as a contrib. [0]: Use case: When a small standby used for some reporting happens to be the farthest ahead) [1]: Details: a simple touched file to no-op the restore_command is unsatisfying, because the restore_command may have already been started by postgres, so now you have to make your restore_command coordinate with your streaming replication proxy software to be safe or wait "long enough" for a single segment to replay as so one can be assured that the system is quiesced. I see this is an anti-feature of the current file-based archiving strategy) -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
On Sat, Jun 16, 2012 at 6:53 AM, Simon Riggs wrote: > On 10 June 2012 19:47, Joshua Berkus wrote: > >> So currently we have a major limitation in binary replication, where it is >> not possible to "remaster" your system (that is, designate the most >> caught-up standby as the new master) based on streaming replication only. >> This is a major limitation because the requirement to copy physical logs >> over scp (or similar methods), manage and expire them more than doubles the >> administrative overhead of managing replication. This becomes even more of >> a problem if you're doing cascading replication. > > The "major limitation" was solved by repmgr close to 2 years ago now. It was solved for limited (but important) cases. For example, repmgr does (afaik, maybe I missed a major update at some point?) still require you to have set up ssh with trusted keys between the servers. There are many usecases where that's not an acceptable solution. One of the more obvious ones being when you're on Windows. repmgr hasn't really *solved* it, it has provided a well working workaround... IIRC repmgs is also GPLv3, which means that some companies just won't look at it... Not many, but some. And it's a license that's incompatible with PostgreSQL itself. > So while you're correct that the patch to fix that assumed that > archiving worked as well, it has been possible to operate happily > without it. > > http://www.repmgr.org > > New versions for 9.2 will be out soon. That's certainly good, but that doesn't actually solve the problem either. It updates the good workaround. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
On 10 June 2012 19:47, Joshua Berkus wrote: > So currently we have a major limitation in binary replication, where it is > not possible to "remaster" your system (that is, designate the most caught-up > standby as the new master) based on streaming replication only. This is a > major limitation because the requirement to copy physical logs over scp (or > similar methods), manage and expire them more than doubles the administrative > overhead of managing replication. This becomes even more of a problem if > you're doing cascading replication. The "major limitation" was solved by repmgr close to 2 years ago now. So while you're correct that the patch to fix that assumed that archiving worked as well, it has been possible to operate happily without it. http://www.repmgr.org New versions for 9.2 will be out soon. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
On 6/10/12 11:47 AM, Joshua Berkus wrote: > So currently we have a major limitation in binary replication, where it is > not possible to "remaster" your system (that is, designate the most caught-up > standby as the new master) based on streaming replication only. This is a > major limitation because the requirement to copy physical logs over scp (or > similar methods), manage and expire them more than doubles the administrative > overhead of managing replication. This becomes even more of a problem if > you're doing cascading replication. > > Therefore I think this is a high priority for 9.3. > > As far as I can tell, the change required for remastering over streaming is > relatively small; we just need to add a new record type to the streaming > protocol, and then start writing the timeline change to that. Are there > other steps required which I'm not seeing? *sound of crickets chirping* Is there other work involved which isn't immediately apparent? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming-only Remastering
On Sun, Jun 10, 2012 at 11:47 AM, Joshua Berkus wrote: > So currently we have a major limitation in binary replication, where it is > not possible to "remaster" your system (that is, designate the most caught-up > standby as the new master) based on streaming replication only. This is a > major limitation because the requirement to copy physical logs over scp (or > similar methods), manage and expire them more than doubles the administrative > overhead of managing replication. This becomes even more of a problem if > you're doing cascading replication. > > Therefore I think this is a high priority for 9.3. > > As far as I can tell, the change required for remastering over streaming is > relatively small; we just need to add a new record type to the streaming > protocol, and then start writing the timeline change to that. Are there > other steps required which I'm not seeing? > Problem that may exist and is likely out of scope: It is possible for a master with multiple slave servers to have slaves which have not read all of the logs off of the master. It is annoying to have to rebuild a replica because it was 1kb behind in reading logs from the master. If the new master could deliver the last bit of the old masters logs that would be very nice. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming-only Remastering
So currently we have a major limitation in binary replication, where it is not possible to "remaster" your system (that is, designate the most caught-up standby as the new master) based on streaming replication only. This is a major limitation because the requirement to copy physical logs over scp (or similar methods), manage and expire them more than doubles the administrative overhead of managing replication. This becomes even more of a problem if you're doing cascading replication. Therefore I think this is a high priority for 9.3. As far as I can tell, the change required for remastering over streaming is relatively small; we just need to add a new record type to the streaming protocol, and then start writing the timeline change to that. Are there other steps required which I'm not seeing? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers