Re: Synchronous commit behavior during network outage

Ondřej Žižka Mon, 26 Apr 2021 10:01:22 -0700

Hello Andrey,

I went through the thread for your patch and seems to me as anacceptable solution...

> The only case patch does not handle is sudden backend crash -Postgres will recover without a restart.

We also use a HA tool (Patroni). If the whole machine fails, it willfind a new master and it should be OK. We use a 4 node setup (2 syncreplicas and 1 async from every replica). If there is an issue just withsync replica (async operated normally) and the master fails completelyin this situation, it will be solved by Patroni (the async replicabecome another sync), but if it is just the backend process, the masterwill not failover and changes will be still visible...

If the sync replica outage is temporal it will be solved itself when thenode will establish a replication slot again... If the outage is "long",Patroni will remove the "old" sync replica from the cluster and theasync replica reading from the master would be new sync. So yes... In 2node setup, this can be an issue, but in 4 node setup, this seems to melike a solution.The only situation I can imagine is a situation when the clientconnections use a different network than the replication network and thereplication network would be down completely, but the client networkwill be up. In that case, the master can be an "isolated island" and ifit fails, we can lose the changed data.Is this situation also covered in your model: "transaction effectsshould not be observable on primary until requirements ofsynchronous_commit are satisfied."


Do you agree with my thoughts?

Maybe would be possible to implement it into PostgreSQL with a note indocumentation, that a multinode (>=3 nodes) cluster is necessary.


Regards
Ondrej

On 22/04/2021 05:55, Andrey Borodin wrote:

Hi Ondrej!

19 апр. 2021 г., в 22:19, Ondřej Žižka <[email protected]> написал(а):

Do you think, that would be possible to implement a process that would solve 
this use case?
Thank you
Ondrej

Feel free to review patch fixing this at [0]. It's classified as "Server 
Features", but I'm sure it's a bug fix.

Yandex.Cloud PG runs with this patch for more than half a year. Because we 
cannot afford loosing data in HA clusters.

It's somewhat incomplete solution, because PG restart or crash recovery will 
make waiting transactions visible. But we protect from this on HA tool's side.

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/33/2402/

Re: Synchronous commit behavior during network outage

Reply via email to