On 14 July 2016 at 03:26, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, Jul 8, 2016 at 5:47 AM, Craig Ringer <cr...@2ndquadrant.com> > wrote: > >> DDL is our standard way of getting things into the system catalogs. > >> We have no system catalog metadata that is intended to be populated by > >> any means other than DDL. > > > > Replication slots? (Arguably not catalogs, I guess) > > > > Replication origins? > > Those things aren't catalogs, are they? I mean, as I said in the > other email I just sent in reply to Simon, if you did a pg_dump and a > pg_restore, I don't think it would be useful to preserve replication > slot LSNs afterwards. If I'm wrong, and that is a useful thing to do, > then we should have a pg_dump flag to do it. Either way, I think we > do have some work to do figuring out how you can dump, restore, and > then resume logical replication, probably by establishing a new slot > and then incrementally resynchronizing without having to copy > unchanged rows.
Yes, I'd like that too. I'd also like to have fully parallized writeable queries right now. But we can't build everything all at once. Before doing parallelized writes, things like dsm, dsm queues, group locking, worker management, and read parallelism were all necessary. It's the same with cluster-wide management, dump and restore of replication state to re-create a replication setup elsewhere, etc. We have to build the groundwork first. Trying to pour the top storey concrete when the bottom storey isn't even there yet isn't going to work out. You've argued effectively the same thing elsewhere, saying that the pglogical submission tried to do too much and should be further cut down. I think we're in broad agreement about the desirable direction. What I'm trying to say is that dump and restore of a logical replication configuration's state is way harder than you probably expect it to be, and is not something it's realistic to do at the same time as introducing the bare bones of logical replication. We absolutely should dump > > We have no extension points for DDL. > > > > For function interfaces, we do. > > > > That, alone, makes a function based interface overwhelmingly compelling > > unless there are specific things we *cannot reasonably do* without DDL. > > I don't understand this. We add new DDL in new releases, and we avoid > changing the meaning existing of DDL. Using function interfaces won't > make it possible to change the meaning of existing syntax, and it > won't make it any more possible to add new syntax. It will just make > replication commands be spelled differently from everything else. > Say you want to upgrade from 9.4 to 10.0 using the new logical replication features. How would that be possible if you can't add the required interfaces for setting up the downstream side to 9.4 as an extension? I think what we're leaning toward here is "don't do that". Tools like pglogical will have to carry that load until the Pg versions with built-in replication become the "old" versions to be upgraded _from_. Ideally the new infrastructure won't have to make normal (non-walsender) libpq connections and will work entirely over the walsender protocol. That's not extensible at all, so the point becomes kind of moot, it just can't be used for downversion upgrades. Pity, but cleaner in the long run. It does make me wonder if we should look at extension points for the walsender protocol though, now we're like to have a future desire for newer versions to connect to older versions - it'd be great if we could do something like pg_upgrade_support to allow an enhanced logical migration from 10.0 to 11.0 by installing some extension in 10.0 first. > > > In many cases it's actively undesirable to dump and restore logical > > replication state. Most, I'd say. There probably are cases where it's > > desirable to retain logical replication state such that restoring a dump > > resumes replication, but I challenge you to come up with any sensible and > > sane way that can actually be implemented. Especially since you must > > obviously consider the possibility of both upstream and downstream being > > restored from dumps. > > Yes, these issues need lots of thought, but I think that replication > set definitions, at least, are sensible to dump and reload. > Yes, I agree that replication set definitions should be able to be dumped and reloaded. > I'm concerned about dump-and-restore > preserving as much state as is usefully possible, because I think > that's critical for the user experience > Right. See the pain points caused by our current dump issues like the brokenness around dumping security labels, grants, etc on the database its self. It certainly matters. The keyword there is "usefully" though. Replication sets: definitely useful. Knowledge about what peers we were connected to and what we were up to on those peers: possibly useful, if we have some way to meaningfully encode that knowledge, but far from crucial, especially since we can't actually resume replay from them without replication slots and replication identifiers we can't dump. It seems we were mostly crossing wires about different assumptions about what dump and restore would include. > However, as far as sharding is concerned, no matter how it gets > implemented, I think logical replication is a key feature. > Postgres-XC/XL has the idea of "replicated" tables which are present > on every data node, and that's very important for efficient > implementation of joins. If you do a join between a little table and > a big sharded table, you want to be able to push that down to the > shards, and you can only do that if the entirety of the little table > is present on every shard or by creating a temporary copy on every > shard. In many cases, the former will be preferable. So, think it's > important for sharding that logical replication is fully integrated > into core in such a manner as to be available as a building block for > other features. > Yep. I've been looking at ways to integrate pglogical into XL, but it's very far from easy. It's one area where moving it in core would be helpful. > At the least, I'm guessing that we'll want a way for whatever code is > planning join execution to figure out which tables have up-to-date > copies on servers that are involved in the query. As far as the > FDW-based approach to sharding is concerned, one thing to think about > is whether postgres_fdw and logical replication could share one notion > of where the remote servers are. > Yeah, that brings us back to the whole node management concept. I think there are a few folks doing some preliminary work on that who may be able to chime in. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services