On Thu Jul 3, 2025 at 2:03 AM CEST, Jacob Champion wrote:
On Wed, Jul 2, 2025 at 3:18 PM Jelte Fennema-Nio <postg...@jeltef.nl> wrote:
I will hold off on detailed review until Heikki gives an opinion on
the design (or we get closer to the end of the month), to avoid making
busy work for you -- but I will say that I think you need to prove
that the new `failure:` case in getBackendKeyData() is safe, because I
don't think any of the other failure modes behave that way inside
pqParseInput3().

I changed it slightly now to align with the handleSyncLoss function its
implementation.
From 973e7204cb6d99450c003cfe729586287a500359 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <github-t...@jeltef.nl>
Date: Wed, 25 Jun 2025 08:36:51 +0200
Subject: [PATCH v3 1/2] libpq: Complain about missing BackendKeyData later

It turns out that some third party backend implementations^1 don't send
BackendKeyData as a way to indicate that they don't. While the protocol
docs left it up for interpretation if that is valid behavior, libpq has
been accepting it up until the libpq shipped with PG17. It does not seem
like that the libpq behavior was intentional though, since it did so by
sending CancelRequest messages with all zeros to such servers (instead
of returning an error or making the cancel a no-op).

For PG18 this behaviour was changed to return an error when trying to
create the cancel object (either a PGcancel or PGcancelConn). This
wasn't done with any discussion, but was done as part of supporting
different lengths of cancel packets for the new 3.2 version of the
protocol.

This commit changes the behavior once more to only return an error when
the cancel object is actually used to send a cancellation, instead of
when merely creating the object. The reason to do so is that some
clients create such cancel objects as part of their connection creation
logic (thus having the cancel object ready for later when they need it).
So by returning an error when creating that object, the connection
attempt would fail. So by changing when we return the error such clients
will still be able to connect to the third party backend implementations
in question, but when actually trying to cancel a query on one of these
backends the user will be notified that that is not possible for the
server that they are connected to.

^1: AWS RDS Proxy is definitely one of them, and CockroachDB might be
another.
---
 doc/src/sgml/protocol.sgml       | 16 +++++++++++++-
 src/interfaces/libpq/fe-cancel.c | 37 ++++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 82fe3f93761..0eb96360134 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -535,7 +535,21 @@
         This message provides secret-key data that the frontend must
         save if it wants to be able to issue cancel requests later.
         The frontend should not respond to this message, but should
-        continue listening for a ReadyForQuery message.
+        continue listening for a ReadyForQuery message. The PostgreSQL backend
+        will always send this message before sending the inital ReadyForQuery
+        message but other backend implementations of the protocol may not.
+        Since protocol version 3.2, if the server sent no BackendKeyData, then
+        that means that the backend does not support canceling queries using
+        the CancelRequest messages. In protocol versions before 3.2 the
+        behaviour is undefined if the client receives no BackendKeyData.
+        Up until the libpq shipped in PostgreSQL 17, it would send a
+        CancelRequest with all zeros to 3.0 connections that did not send a
+        BackendKeyData message. Since the libpq shipped with PostgreSQL 18 it
+        does not send any CancelRequest at all for such connections anymore,
+        thus aligning with the behaviour for protocol 3.2. Throwing an error
+        as a client when not receiving BackendKeyData on a 3.0 connection is
+        not recommended, because certain server implementations are known not
+        to send it.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/interfaces/libpq/fe-cancel.c b/src/interfaces/libpq/fe-cancel.c
index 65517c5703b..c7e90135c06 100644
--- a/src/interfaces/libpq/fe-cancel.c
+++ b/src/interfaces/libpq/fe-cancel.c
@@ -86,13 +86,6 @@ PQcancelCreate(PGconn *conn)
 		return (PGcancelConn *) cancelConn;
 	}
 
-	/* Check that we have received a cancellation key */
-	if (conn->be_cancel_key_len == 0)
-	{
-		libpq_append_conn_error(cancelConn, "no cancellation key received");
-		return (PGcancelConn *) cancelConn;
-	}
-
 	/*
 	 * Indicate that this connection is used to send a cancellation
 	 */
@@ -111,7 +104,7 @@ PQcancelCreate(PGconn *conn)
 	 * Copy cancellation token data from the original connection
 	 */
 	cancelConn->be_pid = conn->be_pid;
-	if (conn->be_cancel_key != NULL)
+	if (conn->be_cancel_key_len > 0)
 	{
 		cancelConn->be_cancel_key = malloc(conn->be_cancel_key_len);
 		if (cancelConn->be_cancel_key == NULL)
@@ -206,6 +199,17 @@ PQcancelStart(PGcancelConn *cancelConn)
 	if (!cancelConn || cancelConn->conn.status == CONNECTION_BAD)
 		return 0;
 
+	/*
+	 * Check that we actually have a concel key. We check this here as apposed
+	 * to in PQcancelCreate because users of libpq might call PQcancelCreate
+	 * even when they don't need to cancel a connection.
+	 */
+	if (cancelConn->conn.be_cancel_key_len == 0)
+	{
+		libpq_append_conn_error(&cancelConn->conn, "no cancellation key received");
+		return 0;
+	}
+
 	if (cancelConn->conn.status != CONNECTION_ALLOCATED)
 	{
 		libpq_append_conn_error(&cancelConn->conn,
@@ -379,7 +383,14 @@ PQgetCancel(PGconn *conn)
 
 	/* Check that we have received a cancellation key */
 	if (conn->be_cancel_key_len == 0)
-		return NULL;
+	{
+		/*
+		 * In case there is no cancel key, we return an all-zero PGCancel
+		 * object. Actually calling PQcancel on this will fail, but we allow
+		 * creating the PGCancel object
+		 */
+		return calloc(1, sizeof(PGcancel));
+	}
 
 	cancel_req_len = offsetof(CancelRequestPacket, cancelAuthCode) + conn->be_cancel_key_len;
 	cancel = malloc(offsetof(PGcancel, cancel_req) + cancel_req_len);
@@ -544,6 +555,14 @@ PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
 		return false;
 	}
 
+	if (cancel->cancel_pkt_len == 0)
+	{
+		strlcpy(errbuf, "PQcancel() -- no cancellation key received", errbufsize);
+		/* strlcpy probably doesn't change errno, but be paranoid */
+		SOCK_ERRNO_SET(save_errno);
+		return false;
+	}
+
 	/*
 	 * We need to open a temporary connection to the postmaster. Do this with
 	 * only kernel calls.

base-commit: fe05430ace8e0b3c945cf581564458a5983a07b6
-- 
2.43.0

From b8624b137ba291eaa422e7a5086d6a24801d9ce5 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <github-t...@jeltef.nl>
Date: Wed, 25 Jun 2025 08:54:15 +0200
Subject: [PATCH v3 2/2] libpq: Be strict about accept cancel key lengths

The protocol documentation states that the maximum length of a cancel key
is 256 bytes. This starts checking for that limit in libpq. Otherwise
third party backend implementations will probably start using more bytes
anyway. We also start requiring that a protocol 3.0 connection does not
send a longer cancel key, to make sure that servers don't start breaking
old 3.0-only clients by accident. Finally this also restricts the
minimum key length to 4 bytes (both in the protocol spec and in the
libpq implementation).
---
 doc/src/sgml/protocol.sgml          |  2 +-
 src/interfaces/libpq/fe-connect.c   |  3 +++
 src/interfaces/libpq/fe-protocol3.c | 30 +++++++++++++++++++++++++++--
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0eb96360134..982f4a8d210 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -4160,7 +4160,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
          message, indicated by the length field.
         </para>
         <para>
-          The maximum key length is 256 bytes. The
+          The minimum and maximum key length are 4 and 256 bytes respectively. The
           <productname>PostgreSQL</productname> server only sends keys up to
           32 bytes, but the larger maximum size allows for future server
           versions, as well as connection poolers and other middleware, to use
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 51a9c416584..f094611fe0d 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -4322,6 +4322,9 @@ keep_going:						/* We will come back to here until there is
 				if (PQisBusy(conn))
 					return PGRES_POLLING_READING;
 
+				if (conn->status == CONNECTION_BAD)
+					goto error_return;
+
 				res = PQgetResult(conn);
 
 				/*
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 1599de757d1..818e7a47191 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -1547,13 +1547,31 @@ getBackendKeyData(PGconn *conn, int msgLength)
 
 	cancel_key_len = 5 + msgLength - (conn->inCursor - conn->inStart);
 
+	if (cancel_key_len != 4 && conn->pversion == PG_PROTOCOL(3, 0))
+	{
+		libpq_append_conn_error(conn, "received invalid BackendKeyData message: cancel key length %d is different from 4, which is not supported in version 3.0 of the protocol", cancel_key_len);
+		goto failure;
+	}
+
+	if (cancel_key_len < 4)
+	{
+		libpq_append_conn_error(conn, "received invalid BackendKeyData message: cancel key length %d is below minimum of 4 bytes", cancel_key_len);
+		goto failure;
+	}
+
+	if (cancel_key_len > 256)
+	{
+		libpq_append_conn_error(conn, "received invalid BackendKeyData message: cancel key length %d exceeds maximum of 256 bytes", cancel_key_len);
+		goto failure;
+	}
+
 	conn->be_cancel_key = malloc(cancel_key_len);
 	if (conn->be_cancel_key == NULL)
 	{
 		libpq_append_conn_error(conn, "out of memory");
-		/* discard the message */
-		return EOF;
+		goto failure;
 	}
+
 	if (pqGetnchar(conn->be_cancel_key, cancel_key_len, conn))
 	{
 		free(conn->be_cancel_key);
@@ -1562,6 +1580,14 @@ getBackendKeyData(PGconn *conn, int msgLength)
 	}
 	conn->be_cancel_key_len = cancel_key_len;
 	return 0;
+
+failure:
+	pqSaveErrorResult(conn);
+	conn->asyncStatus = PGASYNC_READY;	/* drop out of PQgetResult wait loop */
+	/* flush input data since we're giving up on processing it */
+	pqDropConnection(conn, true);
+	conn->status = CONNECTION_BAD;	/* No more connection to backend */
+	return EOF;
 }
 
 
-- 
2.43.0

Reply via email to