On 21.12.2019 00:19, Tom Lane wrote:

Three is still a problem when backend is not canceled, but terminated [2].
Exactly.  If you don't have a fix that handles that case, you don't have
anything.  In fact, you've arguably made things worse, by increasing the
temptation to terminate or "kill -9" the nonresponsive session.


I assume that the termination of backend that causes termination of PostgreSQL instance in Andrey's patch proposal have to be resolved by external HA agents that could interrupt such terminations as parent process of postmaster and make appropriate decisions e.g., restart PostgreSQL node in closed from external users state (via pg_hba.conf manipulation) until all sync replicas synchronize changes from master. Stolon HA tool implements this strategy  [1]. This logic (waiting for all replicas declared in synchronous_standby_names replicate all WAL from master) could be implemented inside PostgreSQL kernel after start recovery process before database is opened to users and this can be done separately later.

Another approach is to implement two-phase commit over master and sync replicas (as it did Oracle in old versions [2]) where the risk to get local committed data under instance restarting and query canceling is minimal (after starting of final commitment phase). But this approach has latency penalty and complexity to resolve partial (prepared but not committed) transactions under coordinator (in this case master node) failure in automatic mode. Nicely if this approach will be implemented later as option of synchronous commit.


1. https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances

2. https://docs.oracle.com/cd/B28359_01/server.111/b28326/repmaster.htm#i33607

--
Best regards,
Maksim Milyutin



Reply via email to