Re: Disallow cancellation of waiting for synchronous replication

Robert Haas Thu, 02 Jan 2020 06:26:53 -0800

On Mon, Dec 30, 2019 at 9:39 AM Bruce Momjian <br...@momjian.us> wrote:
> This gets to the heart of something I was hoping to discuss.  When is
> something committed?  You would think it is when the client receives the
> commit message, but Postgres can commit something, and try to inform the
> client but fail to inform, perhaps due to network problems.


This kind of problem can happen even without synchronous replication.
I've alluded to this problem in a couple of blog posts I've done on
sync rep.

If an application is connected to the database and sends a COMMIT
command (or a data-modifying command outside a transaction that will
commit implicitly) and the connection is closed before it receives a
response, it does not know whether the COMMIT actually happened. It
will have to wait until the database is back up and running and then
go examine the state of the database with SELECT statements and try to
figure out whether the changes it wanted actually got made. Otherwise
it doesn't know whether the failure that resulted in a loss of network
connectivity occurred before or after the commit.

I imagine that most applications are way too dumb to do this properly
and just report errors to the user and let the user decide what to do
to try to recover. And I imagine that most users are not terribly
careful about it and such events cause minor data loss/corruption on a
regular basis. But there are also probably some applications where
people are really fanatical about it.

>  In Robert's
> case above, we send a "success", but it is only a success on the primary
> and not on the synchronous standby.
>
> In the first case I mentioned, we commit without guaranteeing the client
> knows, but in the second case, we tell the client success with a warning
> that the synchronous standby didn't get the commit.  Are clients even
> checking warning messages?  You see it in psql, but what about
> applications that use Postgres.  Do they even check for warnings?
> Should administrators be informed via email or some command when this
> happens?

I suspect a lot of clients are not checking warning messages, but
whether that's really the server's problem is arguable. I think we've
developed a general practice over the years of trying to avoid warning
messages as a way of telling users about problems, and that's a good
idea in general precisely because they might just get ignored, but
there are cases where it is really the only possible way forward. It
would certainly be pretty bad to have the COMMIT succeed on the local
node but produce an ERROR; that would doubtless be much more confusing
than what it's doing now. There's nothing at all to prevent
administrators from watching the logs for such warnings and taking
whatever action they deem appropriate.

I continue to think that the root cause of this issue is that we can't
distinguish between cancelling the query and cancelling the sync rep
wait. The client in this case is asking for both when it really only
wants the former, and then ignoring the warning that the latter is
what actually occurred.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Disallow cancellation of waiting for synchronous replication

Reply via email to