On Thu Jul 3, 2025 at 2:03 AM CEST, Jacob Champion wrote:
On Wed, Jul 2, 2025 at 3:18 PM Jelte Fennema-Nio <postg...@jeltef.nl> wrote: I will hold off on detailed review until Heikki gives an opinion on the design (or we get closer to the end of the month), to avoid making busy work for you -- but I will say that I think you need to prove that the new `failure:` case in getBackendKeyData() is safe, because I don't think any of the other failure modes behave that way inside pqParseInput3().
I changed it slightly now to align with the handleSyncLoss function its implementation.
From 973e7204cb6d99450c003cfe729586287a500359 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio <github-t...@jeltef.nl> Date: Wed, 25 Jun 2025 08:36:51 +0200 Subject: [PATCH v3 1/2] libpq: Complain about missing BackendKeyData later It turns out that some third party backend implementations^1 don't send BackendKeyData as a way to indicate that they don't. While the protocol docs left it up for interpretation if that is valid behavior, libpq has been accepting it up until the libpq shipped with PG17. It does not seem like that the libpq behavior was intentional though, since it did so by sending CancelRequest messages with all zeros to such servers (instead of returning an error or making the cancel a no-op). For PG18 this behaviour was changed to return an error when trying to create the cancel object (either a PGcancel or PGcancelConn). This wasn't done with any discussion, but was done as part of supporting different lengths of cancel packets for the new 3.2 version of the protocol. This commit changes the behavior once more to only return an error when the cancel object is actually used to send a cancellation, instead of when merely creating the object. The reason to do so is that some clients create such cancel objects as part of their connection creation logic (thus having the cancel object ready for later when they need it). So by returning an error when creating that object, the connection attempt would fail. So by changing when we return the error such clients will still be able to connect to the third party backend implementations in question, but when actually trying to cancel a query on one of these backends the user will be notified that that is not possible for the server that they are connected to. ^1: AWS RDS Proxy is definitely one of them, and CockroachDB might be another. --- doc/src/sgml/protocol.sgml | 16 +++++++++++++- src/interfaces/libpq/fe-cancel.c | 37 ++++++++++++++++++++++++-------- 2 files changed, 43 insertions(+), 10 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 82fe3f93761..0eb96360134 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -535,7 +535,21 @@ This message provides secret-key data that the frontend must save if it wants to be able to issue cancel requests later. The frontend should not respond to this message, but should - continue listening for a ReadyForQuery message. + continue listening for a ReadyForQuery message. The PostgreSQL backend + will always send this message before sending the inital ReadyForQuery + message but other backend implementations of the protocol may not. + Since protocol version 3.2, if the server sent no BackendKeyData, then + that means that the backend does not support canceling queries using + the CancelRequest messages. In protocol versions before 3.2 the + behaviour is undefined if the client receives no BackendKeyData. + Up until the libpq shipped in PostgreSQL 17, it would send a + CancelRequest with all zeros to 3.0 connections that did not send a + BackendKeyData message. Since the libpq shipped with PostgreSQL 18 it + does not send any CancelRequest at all for such connections anymore, + thus aligning with the behaviour for protocol 3.2. Throwing an error + as a client when not receiving BackendKeyData on a 3.0 connection is + not recommended, because certain server implementations are known not + to send it. </para> </listitem> </varlistentry> diff --git a/src/interfaces/libpq/fe-cancel.c b/src/interfaces/libpq/fe-cancel.c index 65517c5703b..c7e90135c06 100644 --- a/src/interfaces/libpq/fe-cancel.c +++ b/src/interfaces/libpq/fe-cancel.c @@ -86,13 +86,6 @@ PQcancelCreate(PGconn *conn) return (PGcancelConn *) cancelConn; } - /* Check that we have received a cancellation key */ - if (conn->be_cancel_key_len == 0) - { - libpq_append_conn_error(cancelConn, "no cancellation key received"); - return (PGcancelConn *) cancelConn; - } - /* * Indicate that this connection is used to send a cancellation */ @@ -111,7 +104,7 @@ PQcancelCreate(PGconn *conn) * Copy cancellation token data from the original connection */ cancelConn->be_pid = conn->be_pid; - if (conn->be_cancel_key != NULL) + if (conn->be_cancel_key_len > 0) { cancelConn->be_cancel_key = malloc(conn->be_cancel_key_len); if (cancelConn->be_cancel_key == NULL) @@ -206,6 +199,17 @@ PQcancelStart(PGcancelConn *cancelConn) if (!cancelConn || cancelConn->conn.status == CONNECTION_BAD) return 0; + /* + * Check that we actually have a concel key. We check this here as apposed + * to in PQcancelCreate because users of libpq might call PQcancelCreate + * even when they don't need to cancel a connection. + */ + if (cancelConn->conn.be_cancel_key_len == 0) + { + libpq_append_conn_error(&cancelConn->conn, "no cancellation key received"); + return 0; + } + if (cancelConn->conn.status != CONNECTION_ALLOCATED) { libpq_append_conn_error(&cancelConn->conn, @@ -379,7 +383,14 @@ PQgetCancel(PGconn *conn) /* Check that we have received a cancellation key */ if (conn->be_cancel_key_len == 0) - return NULL; + { + /* + * In case there is no cancel key, we return an all-zero PGCancel + * object. Actually calling PQcancel on this will fail, but we allow + * creating the PGCancel object + */ + return calloc(1, sizeof(PGcancel)); + } cancel_req_len = offsetof(CancelRequestPacket, cancelAuthCode) + conn->be_cancel_key_len; cancel = malloc(offsetof(PGcancel, cancel_req) + cancel_req_len); @@ -544,6 +555,14 @@ PQcancel(PGcancel *cancel, char *errbuf, int errbufsize) return false; } + if (cancel->cancel_pkt_len == 0) + { + strlcpy(errbuf, "PQcancel() -- no cancellation key received", errbufsize); + /* strlcpy probably doesn't change errno, but be paranoid */ + SOCK_ERRNO_SET(save_errno); + return false; + } + /* * We need to open a temporary connection to the postmaster. Do this with * only kernel calls. base-commit: fe05430ace8e0b3c945cf581564458a5983a07b6 -- 2.43.0
From b8624b137ba291eaa422e7a5086d6a24801d9ce5 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio <github-t...@jeltef.nl> Date: Wed, 25 Jun 2025 08:54:15 +0200 Subject: [PATCH v3 2/2] libpq: Be strict about accept cancel key lengths The protocol documentation states that the maximum length of a cancel key is 256 bytes. This starts checking for that limit in libpq. Otherwise third party backend implementations will probably start using more bytes anyway. We also start requiring that a protocol 3.0 connection does not send a longer cancel key, to make sure that servers don't start breaking old 3.0-only clients by accident. Finally this also restricts the minimum key length to 4 bytes (both in the protocol spec and in the libpq implementation). --- doc/src/sgml/protocol.sgml | 2 +- src/interfaces/libpq/fe-connect.c | 3 +++ src/interfaces/libpq/fe-protocol3.c | 30 +++++++++++++++++++++++++++-- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 0eb96360134..982f4a8d210 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -4160,7 +4160,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" message, indicated by the length field. </para> <para> - The maximum key length is 256 bytes. The + The minimum and maximum key length are 4 and 256 bytes respectively. The <productname>PostgreSQL</productname> server only sends keys up to 32 bytes, but the larger maximum size allows for future server versions, as well as connection poolers and other middleware, to use diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 51a9c416584..f094611fe0d 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -4322,6 +4322,9 @@ keep_going: /* We will come back to here until there is if (PQisBusy(conn)) return PGRES_POLLING_READING; + if (conn->status == CONNECTION_BAD) + goto error_return; + res = PQgetResult(conn); /* diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c index 1599de757d1..818e7a47191 100644 --- a/src/interfaces/libpq/fe-protocol3.c +++ b/src/interfaces/libpq/fe-protocol3.c @@ -1547,13 +1547,31 @@ getBackendKeyData(PGconn *conn, int msgLength) cancel_key_len = 5 + msgLength - (conn->inCursor - conn->inStart); + if (cancel_key_len != 4 && conn->pversion == PG_PROTOCOL(3, 0)) + { + libpq_append_conn_error(conn, "received invalid BackendKeyData message: cancel key length %d is different from 4, which is not supported in version 3.0 of the protocol", cancel_key_len); + goto failure; + } + + if (cancel_key_len < 4) + { + libpq_append_conn_error(conn, "received invalid BackendKeyData message: cancel key length %d is below minimum of 4 bytes", cancel_key_len); + goto failure; + } + + if (cancel_key_len > 256) + { + libpq_append_conn_error(conn, "received invalid BackendKeyData message: cancel key length %d exceeds maximum of 256 bytes", cancel_key_len); + goto failure; + } + conn->be_cancel_key = malloc(cancel_key_len); if (conn->be_cancel_key == NULL) { libpq_append_conn_error(conn, "out of memory"); - /* discard the message */ - return EOF; + goto failure; } + if (pqGetnchar(conn->be_cancel_key, cancel_key_len, conn)) { free(conn->be_cancel_key); @@ -1562,6 +1580,14 @@ getBackendKeyData(PGconn *conn, int msgLength) } conn->be_cancel_key_len = cancel_key_len; return 0; + +failure: + pqSaveErrorResult(conn); + conn->asyncStatus = PGASYNC_READY; /* drop out of PQgetResult wait loop */ + /* flush input data since we're giving up on processing it */ + pqDropConnection(conn, true); + conn->status = CONNECTION_BAD; /* No more connection to backend */ + return EOF; } -- 2.43.0