Ok, what about the following plan:
1. Support custom WAL records (as far as I know 2ndQuadrant has such patch).
2. Add one more function to logical decoding allowing to deal with
custom records.
So the idea is that we somehow record DDL in WAL (for example using
executor hook),
then them are proceeded using logical decoding, calling special logical
deocding plugin function to handle this records.
For example we can store DDL in WAL just as SQL statements and so easily
replay them.
In this case DDL will be replicated using the same mechanism and through
the same channel as DML.
On 17.02.2016 12:16, Craig Ringer wrote:
On 17 February 2016 at 16:24, Konstantin Knizhnik
<k.knizh...@postgrespro.ru <mailto:k.knizh...@postgrespro.ru>> wrote:
Thanks for your explanation. I have to agree with your arguments
that in general case replication of DDL statement using logical
decoding seems to be problematic. But we are mostly considering
logical decoding in quite limited context: replication between two
identical Postgres database nodes (multimaster).
Yep, much like BDR. Where all this infrastructure came from and is/was
aimed at.
Do you think that it in this case replication of DLL can be done
as sequence of low level operations with system catalog tables
including manipulation with locks?
No.
For one thing logical decoding doesn't see catalog tuple changes right
now. Though I imagine that could be changed easily enough.
More importantly - oids. You add a column to a table:
ALTER TABLE mytable ADD COLUMN mycolumn some_type UNIQUE NOT NULL
DEFAULT some_function()
This writes to catalogs including:
pg_attribute
pg_constraint
pg_index
pg_class (for the index relation)
... probably more. It also refers to pg_class (for the definition of
mytable), pg_type (definition of some_type), pg_proc (definition of
some_function), the b-tree operator class for some_type in pg_opclass,
the b-tree indexam in pg_am, ... more.
Everything is linked by oids, and the oids are all node local. You
can't just blindly re-use them. If "some_type" is hstore, the oid of
hstore in pg_type might be different on the upstream and downstream.
The only exception is the oids of built-in types and even then that's
not guaranteed across major versions.
So if you blindly replicate catalog row changes you'll get a horrible
mess. That's before considering a table's relfilenode, which is
initially the same as its oid, but subject to change if truncated or
rewritten.
To even begin to do this half-sanely you'd have to maintain a mapping
of upstream object oids->names on the downstream, with invalidations
replicated from the upstream. That's only the beginning. There's
handling of extensions and lots more fun.
So in your example with ALTER TABLE statement, can we correctly
replicate it to other nodes
as request to set exclusive lock + some manipulations with catalog
tables and data table itself?
Nope. No hope, not unless "some manipulations with catalog tables and
data table its self" is a lot more comprehensive than I think you mean.
1. Add option whether to include operations on system catalog
tables in logical replication or not.
I would like to have this anyway.
2. Make it possible to replicate lock requests (can be useful not
only for DDLs)
I have no idea how you'd even begin to do that.
I looked how DDL was implemented in BDR and did it in similar way
in our multimaster.
But it is awful: we need to have two different channels for
propagating changes.
Yeah, it's not beautiful, but maybe you misunderstood something? The
DDL is written to a table, and that table's changes are replayed along
with everything else. It's consistent and keeps DDL changes as part of
the xact that performed them. Maybe you misunderstood how it works in
BDR and missed the indirection via a table?
Additionally, in multimaster we want to enforce cluster wide ACID.
It certainly includes operations with metadata. It will be very
difficult to implement if replication of DML and DDL is done in
two different ways...
That's pretty much why BDR does it this way, warts and all. Though it
doesn't offer cluster-wide ACID it does need atomic commit of xacts
that may contain DML, DDL, or some mix of the two.
Let me ask one more question concerning logical replication: how
difficult it will be from your point of view to support two phase
commit in logical replication? Are there some principle problems?
I haven't looked closely yet. Andres will know more.
I very, very badly want to be able to decode 2PC prepared xacts myself.
The main issue I'm aware of is locking - specifically the inability to
impersonate another backend and treat locks held by that backend
(which might be a fake backend for a pg_prepared_xacts entry) as held
by ourselves for the purpose of being able to access relations, etc.
The work Robert is doing on group locking looks absolutely ideal for
this, but won't land before 9.7.
(Closely related, I also want to be able to hook into commit and
transform a normal COMMIT into a PREPARE TRANSACTION, <do some stuff>,
COMMIT PREPARED with the application that issued the commit none the
wiser. This will allow pessimistic 2PC-based conflict handling for
must-succeed xacts like those that do DDL).
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company