Dear hackers,

I lately had a hard time to find the root cause for some wired behavior with the async API of libpq when running client and server on Windows. When the connection aborts with an error - most notably with an error at the connection setup - it sometimes fails with a wrong error message:

Instead of:

    connection to server at "::1", port 5433 failed: FATAL:  role "a" does not exist

it fails with:

    connection to server at "::1", port 5433 failed: server closed the connection unexpectedly

I found out, that the recv() function of the Winsock API has some wired behavior. If the connection receives a TCP RST flag, recv() immediately returns -1, regardless if all previous data has been retrieved. So when the connection is closed hard, the behavior is timing dependent on the client side. It may drop the last packet or it delivers it to libpq, if libpq calls recv() quick enough.

This behavior is described at closesocket() here:
https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket

This is called a hard or abortive close, because the socket's virtual circuit is reset immediately, and any unsent data is lost. On Windows, any *recv* call on the remote side of the circuit will fail with WSAECONNRESET <https://docs.microsoft.com/en-us/windows/desktop/WinSock/windows-sockets-error-codes-2>.

Unfortunately each connection is closed hard by a Windows PostgreSQL server with TCP flag RST. That in turn is another Winsock API behavior, that is that every socket, that wasn't closed by the application is closed hard with the RST flag at process termination. I didn't find any official documentation about this behavior.

Explicit closing the socket before process termination leads to a graceful close even on Windows. That is done by the attached patch. I think delivering the correct error message to the user is much more important that closing the process in sync with the socket.


Some background: I'm the maintainer of ruby-pg, the PostgreSQL client library for ruby. The next version of ruby-pg will switch to the async API for connection setup. Using this API changes the timing of socket operations and therefore often leads to the above wrong message. Previous versions made use of the sync API, which usually doesn't suffer from this issue. The original issue is here: https://github.com/ged/ruby-pg/issues/404

--

Kind Regards
Lars Kanis

From 079c3dce7c4580a797267b6e42b33399606b4f9d Mon Sep 17 00:00:00 2001
From: Lars Kanis <l...@greiz-reinsdorf.de>
Date: Wed, 17 Nov 2021 21:04:45 +0100
Subject: [PATCH] Windows: Gracefully close the socket on process exit

This is to avoid a hard close of the socket with tcp RST flag,
which in turn can lead to a dropped error message on a Windows client.
---
 src/backend/libpq/pqcomm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 9ebba025c..7a81fa913 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -286,6 +286,18 @@ socket_close(int code, Datum arg)
 		 * We do set sock to PGINVALID_SOCKET to prevent any further I/O,
 		 * though.
 		 */
+
+#ifdef WIN32
+		/*
+		 * On Windows we still do an explicit close here, because Windows
+		 * closes sockets with RST flag instead of FIN at process termination.
+		 * On the client side it then leads to a WSAECONNRESET error, which
+		 * can (depending on the timing) drop the last error message, leading
+		 * to a valuable information loss for the user.
+		 */
+		closesocket(MyProcPort->sock);
+#endif
+
 		MyProcPort->sock = PGINVALID_SOCKET;
 	}
 }
-- 
2.32.0

Reply via email to