Re: Disallow cancellation of waiting for synchronous replication

Andrey Borodin Thu, 02 Jan 2020 09:27:04 -0800

> 2 янв. 2020 г., в 19:13, Robert Haas <robertmh...@gmail.com> написал(а):
> 
> On Sun, Dec 29, 2019 at 4:13 AM Andrey Borodin <x4...@yandex-team.ru> wrote:
>> Not loosing data - is a nice property of the database either.
> 
> Sure, but there's more than one way to fix that problem, as I pointed
> out in my first response.
Sorry, it took some more reading iterations of your message for me to 
understand the problem you are writing about.

You proposed two solutions:
1. Client analyze warning an understand that data is not actually committed. 
This, as you pointed out, does not solve the problem: data is lost for another 
client, who never saw the warning.
Actually, "client" is a stateless number of connections unable to communicate 
with each other by any means beside database. They cannot share information 
about not committed transactions (they would need a database, thus chicken and 
the egg problem).

2. Add another message "CANCEL --force" to stop synchronous replication for 
specific backend.
We already have a way to stop synchronous replication "alter system set 
synchronous_standby_names to 'working.stand.by'; select pg_reload_conf();". 
This will stop it for every backend, but "CANCEL --force" will be more picky.
User still can loose data when they issue idempotent query based on data, 
committed by "CANCEL --force". Moreover, user can loose data if his upsert is 
based on data committed by someone else with "set synchronous_commit to off".
We could fix upserts: make them wait for replication even if nothing was 
changed, but this will not cover the case when user is doing SELECT and decides 
not to insert anything.
We can fix SELECT: if user asks for synchronous_commit=remote_write - give him 
snapshot no newer than synchronously committed data. ISTM this would solve all 
above problems, but I do not see implications of this approach. We should add 
all XIDs to XIP if their commit LSN > sync rep LSN. But I'm not sure all other 
transactional mechanics will be OK with this.

From practical point of view - when all writing queries use same 
synchronous_commit level - easiest solution is to just disallow cancel of sync 
replication. In psql we can just reset connection on second CTRL+C. That's more 
generic than "CANCEL --force".

When all queries runs with same synchronous_commit there is no point in 
protocol message for canceling sync rep for single connection. Just drop that 
connection. Ignoring cancel is the only way to satisfy synchronous_commit 
level, which is constant for transaction.
When queries run in various synchronous_commit - things are much more 
complicated. Adding protocol message to change synchronous_commit for running 
queries does not seems to be a viable option.

> I continue to think that the root cause of this issue is that we can't
> distinguish between cancelling the query and cancelling the sync rep
> wait.
Yes, it is. But canceling sync rep wait exists already. Just change 
synchronous_stanby_names. Canceling sync rep for one client - is, effectively, 
changing synchronous commit level for running transaction. It opens a way for 
way more difficult complications.

> The client in this case is asking for both when it really only
> wants the former, and then ignoring the warning that the latter is
> what actually occurred.
Client is not ignoring warnings. Data is lost for the client which never 
received warning. If we could just fix our code, I would not be making so much 
noise. There are workarounds, but they are very pleasant to explain.


Best regards, Andrey Borodin.
Re: Disallow cancellation of waiting for synchronous replication

Reply via email to