Re: [HACKERS] pglogical - logical replication contrib module

Steve Singer Fri, 22 Jan 2016 19:18:40 -0800

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested



This reply will covers a 10,000 foot level review of the feature (some of my 
other replies to the thread cover specifics that came up in testing and code 
level review will come later)

1) Do we want logical replication in core/contrib

10 year ago a popular feeling in the postgresql project was that replication 
didn't belong in core because there were too many different styles.  People 
then went on to complain that it were too many replication projects to choose 
from and that they were hard to use and had lots of corner cases.   The 
evolution of WAL based replication showed us how popular in-core replication 
is.  Users like being able to use in-core features and our community process 
tends to produce better quality in-core solutions than external projects.

I am of the opinion that if we can come up with a solution that meets some 
common use cases then it would be good to have those features in core/contrib.  
At this stage I am not going to get into a discussion of a contrib extension 
versus built in as not an extension.   I don't think a single replication 
solution
will ever meet all use-cases.   I feel that the extensible infrastructure we 
have so far built for logical replication means that people who want to develop 
solutions for use-cases not covered will be in a good position.  This doesn't 
mean we can't or shouldn't try to cover some use cases in core.


2) Does this patch provide a set of logical replication features that meet many 
popular use-cases

Below I will review some use-cases and try to assess how pglogical meets them.

 ** Streaming Postgresql Upgrade

pg_upgrade is great for many situations but sometimes you don't want an in 
place upgrade but you want a streaming upgrade.  Possibly because you don't 
want application downtime but instead you just want to point your applications 
at the upgraded database server in a controlled manner.   Othertimes you
might want an option of upgrading to a newer version of PG but maintain the 
option of having to rollback to the older version if things go badly.

I think pglogical should be able to handle this use case pretty well (assuming 
the source version of PG is actually new enough to include pglogical). 
Support for replicating sequences would need to be added before this is as 
smooth but once sequence support was added I think this would work well.
I also don't see any reason why you couldn't replicate from 9.7 -> 9.6 thought 
since the wire format is abstracted from the internal representation.  This is 
of course dependent not the application not doing anything that is inherently 
in-compatible between the two versions 


** Query only replicas (with temp tables or additional indexes)

Sometimes you want a replica for long running or heavy queries.  Requirements 
for temp tables, additional indexes or maybe the effect on vacuum means that 
our existing WAL based replicas are unsuitable. 

I think pglogical should be able to handle this use case pretty well with the 
caveat being that your replica is an asynchronous replica and will always lag
the origin by some amount.

** Replicating a subset of tables into a different database

Sometimes you wan to replicate a handful of tables from one database to another 
database.  Maybe the first database is the system of record for the data and 
the second database needs an up to date copy for querying.

Pglogical should meet this use case pretty well, it has flexible support for 
selecting which tables get replicated from which source.  Pglogical doesn't 
have any facilities to rename the tables between the origin and replica but 
they could be added later.

** Sharding

Systems that do application level sharding (or even sharding with a fdw) often 
have non-sharded tables that need to be available on all shards for relational 
integrity or joins.   Logical replication is one way to make sure that the 
replicated data gets to all the shards.  Sharding systems also sometimes want
to take the data from individual shards and replicate it to a consolidation 
server for reporting purposes.

Pglogical seems to meet this use case, I guess you would have a designated 
origin for the shared data/global data that all shards would subscribe to
with a set containing the designated data.  For the consolidation use case you 
would have the consolidation server subscribe to all shards

I am less clear about how someone would want DDL changes to work for these 
cases.  The DDL support in the patch is pretty limited so I am not going to 
think much now about how we would want DDL to work.


** Schema changes involving rewriting big tables

Sometimes you have a DDL change on a large table that will involve a table 
rewrite and the best way of deploying the change is to make the DDL change
on a replicate then once it is finished promote the replica to the origin in 
some controlled fashion.  This avoids having to lock the table on the origin
for hours.

pglogical seems to allow minor schema changes on the replica such as changing a 
type but it doesn't seem to allow a DO INSTEAD trigger on the replica.  I don't 
think pglogical currently meets this use case particularly well



** Failover

WAL replication is probably a better choice for someone just looking for 
failover support from replication.  Someone who is looking at pglogical for 
failover related use cases probably has one or more of the other uses cases I 
mentioned  and wants a logical node to
take over for a failed origin.   If a node fails you can take some of the 
remaining subscribers and have them resubscribe to one of the remaining nodes 
but there is no support for a) Figuring out which of the remaining nodes is 
most ahead b)  Letting the subscribers figure out which updates from the old 
origin that are missing and getting them from a surviving node (they can 
truncate and re-copy the data but that might be very expensive)

I am not sure what would be involved in taking a streaming WAL replica and have 
it stand it replace the failed node.

Lack of replicating sequences would also make failing over to a pglogical 
replica awkward.

** Geographically  distributed applications

Sometimes people have database in different geographical locations and they 
want to perform queries and writes locally but replicate all the data to all 
the other locations.   This is a multi-master eventually consistent use case.

The lack of sequence support would be an issue for these use cases.  I think 
you could also only configure the cluster in a fully connected grid (with 
forward_origins='none').  A lot of deployments you would want some amount of 
cascading and structure which isn't yet supported.   I also suspect that 
managing a grid cluster with more than a handful of nodes will be unwieldy 
(particularly compared to some of the eventual consistent nosql alternatives)

The features BDR has that were removed for pglogical are probably really useful 
for this use-case (which I think was the original BDR use-case)

** Scaling across multiple machines

Sometimes people ask for replication systems that let them support  more load 
than a single database server supports but with consistency.
Other use-case applies if you want 'eventually consistent'  this use case is 
for situations where you want something other than eventual consistent.

I don't think pglogical is intended to address this.



3) Do we like the design of pglogical

I like the fact that background workers are used instead of external daemon 
processes
I like the fact that you can configure almost everything through SQL
I like that the output plugin is separate because this patch is big enough as 
it is and I can see uses for it other than pglogical.

The core abstractions are
 * Nodes-, every database in the pglogical cluster is a node
 * Sets - A collection of tables that behave similarly
 * Subscriptions - A link between a provider and a replica. There can only be 
one subscription between a provider and replica

Metadata is not transferred between nodes.  What I mean by this is that nodes 
don't have a global view of the cluster they know about their own subscriptions 
but nothing else.  This is different than a system like slony where sl_node and 
sl_subscription contain a global view of your cluster state.  Not sending 
metadata to all nodes in the cluster simplifies a bunch of things (you don't 
have to worry about sending metadata around and if a given piece of metadata is 
stale) but the downside is that I think the tooling to perform a lot of cluster 
reconfigure operations will need to be a lot smarter.

Petr, and Craig have you thought about how you might support getting the 
cluster back into a sane state after a node fails with minimal pain.
A lot of the reason why slony needs all this metadata is to support that kind 
of thing. I don't think we need this for the first version but it would be nice 
to know that the design could accommodate such a thing.


4) Do we like the syntax

I think the big debate about syntax is do we want functions or pure SQL (ie 
CREATE SUBSCRIPTION default1 provider_dsn=...). If we want this as an extension
then it needs to be functions.  I support this decision I think the benefits of 
keeping pglogical as an extension is the right tradeoff.  In a few releases we 
can always add SQL syntax if we want it to no longer be an extension

If I have any comments have on the names or arguments of individual functions I 
will send them later.


5) Is the maintenance burden of this patch too high

The contrib module is big, significantly bigger than most (all?) of the other 
contrib modules and that doesn't include the output plugin.  I see a lot of 
potential use-cases that I think pglogical can (or will eventually) be able to 
handle and I think that justifies the maintenance burden.  If others disagree 
they should speak up.

I am concerned about testing, I don't think the .sql based regression tests are 
going to adequately test a replication system that supports concurrent activity 
on different databases/servers.  I remember hearing talk about a  python based 
test suite that was rejected in another thread.  Having perl tests that use 
-DBI has also been rejected.

The other day Tom made this comment as part of the 'Releasing in September' 
thread:

---
I do not think we should necessarily try to include every testing tool
in the core distribution.  What is important is that they be readily
available: easy to find, easy to use, documented, portable.
-----

The standard make contrib check tests need to do some testing of pglogical but 
I think it will require testing much more thorough than what we can get with in 
core test tooling.  I think this is a good case to look at test tooling that 
doesn't live in core.

Overall I am very impressed with pglogical and see a lot of potential
-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical - logical replication contrib module

Reply via email to