Re: [HACKERS] Logical replication and multimaster

konstantin knizhnik Wed, 02 Dec 2015 22:55:38 -0800

On Dec 3, 2015, at 4:09 AM, Craig Ringer wrote:

> On 1 December 2015 at 00:20, Konstantin Knizhnik <k.knizh...@postgrespro.ru> 
> wrote:
> 
> > We have implemented ACID multimaster based on logical replication and our 
> > DTM (distributed transaction manager) plugin.
> 
> What are you using for an output plugin and for replay?


I have implemented output plugin for multimaster based on Michael's 
decoder_raw+receiver_raw.
Right now it decodes WAL into correspondent SQL insert/update statements.
Certainly it is very inefficient way and in future I will replace it with some 
binary protocol, as it is used for example in BDR
(but BDR plugin contains a lot of stuff related with detecting and handling 
conflicts which is not relevant for multimaster).
But right now performance of Multimaster is not limited by logical replication 
protocol - if I remove DTM and use asynchronous replication (lightweight 
version of BDR:)
then I get 38k TPS instead of 12k.


> 
> I'd really like to collaborate using pglogical_output if at all possible. 
> Petr's working really hard to get the pglogical downstrem out too, with me 
> helping where I can.
> 
> I'd hate to be wasting time and effort working in parallel on overlapping 
> functionality. I did a LOT of work to make pglogical_output extensible and 
> reusable for different needs, with hooks used heavily instead of making 
> things specific to the pglogical downstream. A protocol documented in detail. 
> A json output mode as an option. Parameters for clients to negotiate options. 
> etc.
> 
> Would a different name for the upstream output plugin help?


And where I can get  pglogical_output plugin? Sorry, but I can't quickly find 
reference with Google...
Also I wonder if this plugin perform DDL replication (most likely not). But 
then naive question - why DDL was excluded from logical replication protocol?
Are there some principle problems with it? In BDR it was handled in alternative 
way, using executor callback. It will be much easier if DDL can be replicated 
in the same way as normal SQL statements.


>  
> And according to 2ndquadrant results, BDR performance is very close to hot 
> standby.
> 
> Yes... but it's asynchronous multi-master. Very different to what you're 
> doing.
>  
> I wonder if it is principle limitation of logical replication approach which 
> is efficient only for asynchronous replication or it can be somehow 
> tuned/extended to efficiently support synchronous replication?
> 
> I'm certain there are improvements to be made for synchronous replication.
> 
> We have also considered alternative approaches:
> 1. Statement based replication.
> 
> Just don't go there. Really.
>  
> It seems to be better to have one connection between nodes, but provide 
> parallel execution of received transactions at destination side.
> 
> I agree. This is something I'd like to be able to do through logical 
> decoding. As far as I can tell there's no fundamental barrier to doing so, 
> though there are a few limitations when streaming logical xacts:
> 
> - We can't avoid sending transactions that get rolled back
> 
> - We can't send the commit timestamp, commit LSN, etc at BEGIN time, so 
> last-update-wins
>   conflict resolution can't be done based on commit timestamp
> 
> - When streaming, the xid must be in each message, not just in begin/commit.
> 
> - The apply process can't use the SPI to apply changes directly since we 
> can't multiplex transactions. It'll need to use
>   shmem to communicate with a pool of workers, dispatching messages to 
> workers as they arrive. Or it can multiplex
>   a set of libpq connections in async mode, which I suspect may prove to be 
> better.
> 
> I've made provision for streaming support in the pglogical_output extension. 
> It'll need core changes to allow logical decoding to stream changes though.
> 
> Separately, I'd also like to look at decoding and sending sequence advances, 
> which are something that happens outside transaction boundaries.
> 
>  
> We have now in PostgreSQL some infrastructure for background works, but there 
> is still no abstraction of workers pool and job queue which can provide 
> simple way to organize parallel execution of some jobs. I wonder if somebody 
> is working now on it or we should try to propose our solution?
> 
> I think a worker pool would be quite useful to have.
> 
> For BDR and for pglogical we had to build an infrastructure on top of static 
> and dynamic bgworkers. A static worker launches a dynamic bgworker for each 
> database. The dynamic bgworker for the database looks at extension-provided 
> user catalogs to determine whether it should launch more dynamic bgworkers 
> for each connection to a peer node.
> 
> Because the bgworker argument is a single by-value Datum the argument passed 
> is an index into a static shmem array of structs. The struct is populated 
> with the target database oid (or name, for 9.4, due to bgworker API 
> limitations) and other info needed to start the worker.
> 
> Because registered static and dynamic bgworkers get restarted by the 
> postmaster after a crash/restart cycle, and the restarted static worker will 
> register new dynamic workers after restart, we have to jump through some 
> annoying hoops to avoid duplicate bgworkers. A generation counter is stored 
> in postmaster memory and incremented on crash recovery then copied to shmem. 
> The high bits of the Datum argument to the workers embeds the generation 
> counter. They compare their argument's counter to the one in shmem and exit 
> if the counter differs, so the relaunched old generation of workers exits 
> after a crash/restart cycle. See the thread on BGW_NO_RESTART_ON_CRASH for 
> details.
> 
> In pglogical we're instead using BGW_NEVER_RESTART workers and doing restarts 
> ourselves when needed, ignoring the postmaster's ability to restart bgworkers 
> when the worker crashes.
> 
> It's likely that most projects using bgworkers for this sort of thing will 
> need similar functionality, so generalizing it into a worker pool API makes a 
> lot of sense. In the process we could really use API to examine currently 
> registered and running bgworkers. Interested in collaborating on that?
> 
> Another thing I've wanted as part of this work is a way to get a one-time 
> authentication cookie from the server that can be passed as a libpq 
> connection option to get a connection without having to know a password or 
> otherwise mess with pg_hba.conf. Basically a way to say "I'm a bgworker 
> running with superuser rights within Pg, and I want to make a libpq 
> connection to this database. I'm inherently trusted, so don't mess with 
> pg_hba.conf and passwords, just let me in".
> 
> -- 
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Logical replication and multimaster

Reply via email to