Re: [HACKERS] Slow synchronous logical replication

Konstantin Knizhnik Sat, 07 Oct 2017 13:00:06 -0700

On 10/07/2017 10:42 PM, Andres Freund wrote:

Hi,


On 2017-10-07 22:39:09 +0300, konstantin knizhnik wrote:

In our sharded cluster project we are trying to use logical relication for
providing HA (maintaining redundant shard copies).
Using asynchronous logical replication has not so much sense in context of HA.
This is why we try to use synchronous logical replication.
Unfortunately it shows very bad performance. With 50 shards and level of
redundancy=1 (just one copy) cluster is 20 times slower then without logical
replication.
With asynchronous replication it is "only" two times slower.

As far as I understand, the reason of such bad performance is that synchronous replication
mechanism was originally developed for streaming replication, when all replicas have the same
content and LSNs. When it is used for logical replication, it behaves very inefficiently.
Commit has to wait confirmations from all receivers mentioned in
"synchronous_standby_names" list. So we are waiting not only for our own single
logical replication standby, but all other standbys as well. Number of synchronous standbyes is
equal to number of shards divided by number of nodes. To provide uniform distribution number of
shards should >> than number of nodes, for example for 10 nodes we usually create 100
shards. As a result we get awful performance and blocking of any replication channel blocks all
backends.

So my question is whether my understanding is correct and synchronous logical
replication can not be efficiently used in such manner.
If so, the next question is how difficult it will be to make synchronous
replication mechanism for logical replication more efficient and are there some
plans to work in this direction?

This seems to be a question that is a) about a commercial project we
don't know much about b) hasn't received a lot of investigation.

Sorry, If I was not clear.
The question was about logical replication mechanism in mainstream version of 
Postgres.
I think that most of people are using asynchronous logical replication and 
synchronous LR is something exotic and not well tested and investigated.
It will be great if I am wrong:)

Concerning our sharded cluster (pg_shardman) - it is not a commercial product 
yet, it is in development phase.
We are going to open its sources when it will be more or less stable.
But unlike multimaster, this sharded cluster is mostly built from existed 
components: pg_pathman  + postgres_fdw + logical replication.
So we are just trying to combine them all into some integrated system.
But currently the most obscure point is logical replication.

And the main goal of my e-mail was to know the opinion of authors and users of 
LR whether it is good idea to use LR to provide fault tolerance in sharded 
cluster.
Or some other approaches, for example sharding with redundancy or using 
streaming replication are preferable?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow synchronous logical replication

Reply via email to