Re: Disallow cancellation of waiting for synchronous replication

Maksim Milyutin Wed, 25 Dec 2019 01:35:08 -0800

On 21.12.2019 00:19, Tom Lane wrote:

Three is still a problem when backend is not canceled, but terminated [2].

Exactly.  If you don't have a fix that handles that case, you don't have
anything.  In fact, you've arguably made things worse, by increasing the
temptation to terminate or "kill -9" the nonresponsive session.

I assume that the termination of backend that causes termination ofPostgreSQL instance in Andrey's patch proposal have to be resolved byexternal HA agents that could interrupt such terminations as parentprocess of postmaster and make appropriate decisions e.g., restartPostgreSQL node in closed from external users state (via pg_hba.confmanipulation) until all sync replicas synchronize changes from master.Stolon HA tool implements this strategy [1]. This logic (waiting forall replicas declared in synchronous_standby_names replicate all WALfrom master) could be implemented inside PostgreSQL kernel after startrecovery process before database is opened to users and this can be doneseparately later.

Another approach is to implement two-phase commit over master and syncreplicas (as it did Oracle in old versions [2]) where the risk to getlocal committed data under instance restarting and query canceling isminimal (after starting of final commitment phase). But this approachhas latency penalty and complexity to resolve partial (prepared but notcommitted) transactions under coordinator (in this case master node)failure in automatic mode. Nicely if this approach will be implementedlater as option of synchronous commit.

1.https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

2.https://docs.oracle.com/cd/B28359_01/server.111/b28326/repmaster.htm#i33607


--
Best regards,
Maksim Milyutin

Re: Disallow cancellation of waiting for synchronous replication

Reply via email to