On Mon, Dec 30, 2019 at 9:39 AM Bruce Momjian <br...@momjian.us> wrote: > This gets to the heart of something I was hoping to discuss. When is > something committed? You would think it is when the client receives the > commit message, but Postgres can commit something, and try to inform the > client but fail to inform, perhaps due to network problems.
This kind of problem can happen even without synchronous replication. I've alluded to this problem in a couple of blog posts I've done on sync rep. If an application is connected to the database and sends a COMMIT command (or a data-modifying command outside a transaction that will commit implicitly) and the connection is closed before it receives a response, it does not know whether the COMMIT actually happened. It will have to wait until the database is back up and running and then go examine the state of the database with SELECT statements and try to figure out whether the changes it wanted actually got made. Otherwise it doesn't know whether the failure that resulted in a loss of network connectivity occurred before or after the commit. I imagine that most applications are way too dumb to do this properly and just report errors to the user and let the user decide what to do to try to recover. And I imagine that most users are not terribly careful about it and such events cause minor data loss/corruption on a regular basis. But there are also probably some applications where people are really fanatical about it. > In Robert's > case above, we send a "success", but it is only a success on the primary > and not on the synchronous standby. > > In the first case I mentioned, we commit without guaranteeing the client > knows, but in the second case, we tell the client success with a warning > that the synchronous standby didn't get the commit. Are clients even > checking warning messages? You see it in psql, but what about > applications that use Postgres. Do they even check for warnings? > Should administrators be informed via email or some command when this > happens? I suspect a lot of clients are not checking warning messages, but whether that's really the server's problem is arguable. I think we've developed a general practice over the years of trying to avoid warning messages as a way of telling users about problems, and that's a good idea in general precisely because they might just get ignored, but there are cases where it is really the only possible way forward. It would certainly be pretty bad to have the COMMIT succeed on the local node but produce an ERROR; that would doubtless be much more confusing than what it's doing now. There's nothing at all to prevent administrators from watching the logs for such warnings and taking whatever action they deem appropriate. I continue to think that the root cause of this issue is that we can't distinguish between cancelling the query and cancelling the sync rep wait. The client in this case is asking for both when it really only wants the former, and then ignoring the warning that the latter is what actually occurred. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company