On 12-06-15 04:03 PM, Robert Haas wrote:
On Thu, Jun 14, 2012 at 4:13 PM, Andres Freund<and...@2ndquadrant.com>  wrote:
I don't plan to throw in loads of conflict resolution smarts. The aim is to get
to the place where all the infrastructure is there so that a MM solution can
be built by basically plugging in a conflict resolution mechanism. Maybe
providing a very simple one.
I think without in-core support its really, really hard to build a sensible MM
implementation. Which doesn't mean it has to live entirely in core.
Of course, several people have already done it, perhaps most notably Bucardo.

Anyway, it would be good to get opinions from more people here.  I am
sure I am not the only person with an opinion on the appropriateness
of trying to build a multi-master replication solution in core or,
indeed, the only person with an opinion on any of these other issues.

This sounds like a good place for me to chime in.

I feel that in-core support to capture changes and turn them into change records that can be replayed on other databases, without relying on triggers and log tables, would be good to have.

I think we want some flexible enough that people write consumers of the LCRs to do conflict resolution for multi-master but I am not sure that the conflict resolution support actually belongs in core.

Most of the complexity of slony (both in terms of lines of code, and issues people encounter using it) comes not from the log triggers or replay of the logged data but comes from the configuration of the cluster.
Controlling things like

* Which tables replicate from a node to which other nodes
* How do you change the cluster configuration on a running system (adding nodes, removing nodes, moving the origin of a table, adding tables to replication etc...)

This is the harder part of the problem, I think we need to first get the infrastructure committed (that the current patch set deals with) to capturing, transporting and translating the LCR's into the system before get too caught up in the configuration aspects. I think we will have a hard time agreeing on behaviours for some of that other stuff that are both flexible for enough use cases and simple enough for administrators. I'd like to see in-core support for a lot of that stuff but I'm not holding my breath.

It is not good for those other opinions to be saved for a later date.

Hm. Yes, you could do that. But I have to say I don't really see a point.
Maybe the fact that I do envision multimaster systems at some point is
clouding my judgement though as its far less easy in that case.
Why?  I don't think that particularly changes anything.

It also complicates the wal format as you now need to specify whether you
transport a full or a primary-key only tuple...
Why?  If the schemas are in sync, the target knows what the PK is
perfectly well.  If not, you're probably in trouble anyway.



I think though that we do not want to enforce that mode of operation for
tightly coupled instances. For those I was thinking of using command triggers
to synchronize the catalogs.
One of the big screwups of the current replication solutions is exactly that
you cannot sensibly do DDL which is not a big problem if you have a huge
system with loads of different databases and very knowledgeable people et al.
but at the beginning it really sucks. I have no problem with making one of the
nodes the "schema master" in that case.
Also I would like to avoid the overhead of the proxy instance for use-cases
where you really want one node replicated as fully as possible with the slight
exception of being able to have summing tables, different indexes et al.
In my view, a logical replication solution is precisely one in which
the catalogs don't need to be in sync.  If the catalogs have to be in
sync, it's not logical replication.  ISTM that what you're talking
about is sort of a hybrid between physical replication (pages) and
logical replication (tuples) - you want to ship around raw binary
tuple data, but not entire pages.  The problem with that is it's going
to be tough to make robust.  Users could easily end up with answers
that are total nonsense, or probably even crash the server.


I see three catalogs in play here.
1. The catalog on the origin
2. The catalog on the proxy system (this is the catalog used to translate the WAL records to LCR's). The proxy system will need essentially the same pgsql binaries (same architecture, important complie flags etc..) as the origin
3. The catalog on the destination system(s).

The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't need to be in-sync with catalog 1. I think catalogs 2 and 3 are combined in the current patch set (though I haven't yet looked at the code closely). I think the performance optimizations Andres has implemented to update tuples through low-level functions should be left for later and that we should be generating SQL in the apply cache so we don't start assuming much about catalog 3.

guarantee.  And, without such a guarantee, I don't believe that we can
create a high-performance, robust, in-core replication solution.


Part of what people expect from a robust in-core solution is that it should work with the the other in-core features. If we have to list a bunch of in-core type as being incompatible with logical replication then people will look at logical replication with the same 'there be dragons here' attitude that scare many people away from the existing third party replication solutions. Non-core or third party user defined types are a slightly different matter because we can't control what they do.


Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to