On 23 March 2017 at 09:39, Andres Freund <and...@anarazel.de> wrote: > We can't just assume that snapbuild is going to work correctly when it's > prerequisites - pinned xmin horizon - isn't working.
Makes sense. >> What do _you_ see as the minimum acceptable way to achieve the ability >> for a logical decoding client to follow failover of an upstream to a >> physical standby? In the end, you're one of the main people whose view >> carries weight in this area, and I don't want to develop yet another > > I think your approach here wasn't that bad? There's a lot of cleaning > up/shoring up needed, and we probably need a smarter feedback system. I > don't think anybody here has objected to the fundamental approach? That's useful, thanks. I'm not arguing that the patch as it stands is ready, and appreciate the input re the general design. > I still think decoding-on-standby is simply not the right approach as > the basic/first HA approach for logical rep. It's a nice later-on > feature. But that's an irrelevant aside. I don't really agree that it's irrelevant. Right now Pg has no HA capability for logical decoding clients. We've now added logical replication, but it has no way to provide for upstream node failure and ensure a consistent switch-over, whether to a logical or physical replica. Since real world servers fail or need maintenance, this is kind of a problem for practical production use. Because of transaction serialization for commit-time order replay, logical replication experiences saw-tooth replication lag, where large or long xacts such as batch jobs effectively stall all later xacts until they are fully replicated. We cannot currently start replicating a big xact until it commits on the upstream, so that lag can easily be ~2x the runtime on the upstream. So while you can do sync rep on a logical standby, it tends to result in big delays on COMMITs relative to physical rep, even if app are careful to keep transactions small. When the app DR planning people come and ask you what the max data loss window / max sync rep lag is, you have to say ".... dunno? depends on what else was running on the server at the time." AFAICT, changing those things will require the ability to begin streaming reorder buffers for big xacts before commit, which as the logical decoding on 2PC thread shows is not exactly trivial. We'll also need to be able to apply them concurrently with other xacts on the other end. Those are both big and complex things IMO, and I'll be surprised if we can do either in Pg11 given that AFAIK nobody has even started work on either of them or has a detailed plan. Presuming we get some kind of failover to logical replica upstreams into Pg11, it'll have significant limitations relative to what we can deliver to users by using physical replication. Especially when it comes to bounded-time lag for HA, sync rep, etc. And I haven't seen a design for it, though Petr and I have discussed some with regards to pglogical. That's why I think we need to do HA on the physical side first. Because it's going to take a long time to get equivalent functionality for logical rep based upstreams, and when it is we'll still have to teach management tools and other non-logical-rep logical decoding clients about the new way of doing things. Wheras for physical HA setups to support logical downstreams requires only relatively minor changes and gets us all the physical HA features _now_. That's why we pursued failover slots - as a simple, minimal solution to allowing logical decoding clients to inter-operate with Pg in a physical HA configuration. TBH, I still think we should just add them. Sure, they don't help us achieve decoding on standby, but they're a lot simpler and they help Pg's behaviour with slots match user expectations for how the rest of Pg behaves, i.e. if it's on the master it'll be on the replica too. And as you've said, decoding on standby is a nice-to-have, wheras I think some kind of HA support is rather more important. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers