Hi hackers,

Please find attached a patch to truncate (in ProcessStartupPacket())
the port->database_name and port->user_name in such a way to not break
multibyte character boundary.

Indeed, currently, one could create a database that way:

postgres=# create database ääääääääääääääääääääääääääääääää;
NOTICE:  identifier "ääääääääääääääääääääääääääääääää" will be truncated to 
"äääääääääääääääääääääääääääääää"
CREATE DATABASE

The database name has been truncated from 64 bytes to 62 bytes thanks to 
pg_mbcliplen()
which ensures to not break multibyte character boundary.

postgres=# select datname, OCTET_LENGTH(datname),encoding from pg_database;
             datname             | octet_length | encoding
---------------------------------+--------------+----------
 äääääääääääääääääääääääääääääää |           62 |        6

Trying to connect with the 64 bytes name:

$ psql -d ääääääääääääääääääääääääääääääää
psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" failed: FATAL:  
database "äääääääääääääääääääääääääääääää" does not exist


It fails because the truncation done in ProcessStartupPacket():

"
if (strlen(port→database_name) >= NAMEDATALEN)
port→database_name[NAMEDATALEN - 1] = '\0';
"

does not take care about multibyte character boundary.

On the other hand it works with non multibyte character involved:

postgres=# create database 
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke;
NOTICE:  identifier "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke" 
will be truncated to "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk"
CREATE DATABASE

postgres=# select datname, OCTET_LENGTH(datname),encoding from pg_database;
                             datname                             | octet_length 
| encoding
-----------------------------------------------------------------+--------------+----------
 abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk |           63 
|        6

The database name is truncated to 63 bytes and then using the 64 bytes name 
would work:

$ psql -d abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke
psql (16beta1)
Type "help" for help.

The comment in ProcessStartupPacket() states:

"
    /*
     * Truncate given database and user names to length of a Postgres name.
     * This avoids lookup failures when overlength names are given.
     */
"

The last sentence is not right in case of mutlibyte character (as seen
in the first example).

About the patch:

As the database encoding is not known yet in ProcessStartupPacket() (
and we are even not sure the database provided does exist), the proposed
patch does not rely on pg_mbcliplen() but on pg_encoding_mbcliplen().

The proposed patch does use the client encoding that it retrieves that way:

- use the one requested in the startup packet (if we come across it)
- use the one from the locale (if we did not find a client encoding request
in the startup packet)
- use PG_SQL_ASCII (if none of the above have been satisfied)

Happy to discuss any other thoughts or suggestions if any.

With the proposed patch in place, using the first example above (and the
64 bytes name) we would get:

$ PGCLIENTENCODING=LATIN1 psql -d ääääääääääääääääääääääääääääääää
psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" failed: FATAL:  
database "äääääääääääääääääääääääääääääää" does not exist

but this one would allow us to connect:

$ PGCLIENTENCODING=UTF8 psql -d ääääääääääääääääääääääääääääääää
psql (16beta1)
Type "help" for help.

The patch does not provide documentation update or related TAP test (but could 
be added
if we feel the need).

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 6ccb4f03f56dda9e9a2616c6e0875a97a8711d72 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Tue, 20 Jun 2023 10:26:16 +0000
Subject: [PATCH v1] multibyte truncation for database and user name

database_name and user_name truncation done in ProcessStartupPacket() do not
take care of multibyte character boundary. In case of multibyte, this somehow
breaks the initial goal to avoid lookup failures when overlength names are 
given.
---
 src/backend/postmaster/postmaster.c | 48 ++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 4 deletions(-)
 100.0% src/backend/postmaster/

diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 4c49393fc5..72991c4eab 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1951,6 +1951,7 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool 
gss_done)
        char       *buf;
        ProtocolVersion proto;
        MemoryContext oldcontext;
+       int                     client_encoding = -1;
 
        pq_startmsgread();
 
@@ -2238,6 +2239,18 @@ retry1:
                                {
                                        port->application_name = 
pg_clean_ascii(valptr, 0);
                                }
+                               /* Record the client_encoding if we come across 
it */
+                               else if (strcmp(nameptr, "client_encoding") == 
0)
+                               {
+                                       client_encoding = 
pg_valid_client_encoding(valptr);
+
+                                       if (client_encoding < 0)
+                                               ereport(FATAL,
+                                                               
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                                                
errmsg("invalid value for parameter \"%s\": \"%s\"",
+                                                                               
"client_encoding",
+                                                                               
valptr)));
+                               }
                        }
                        offset = valoffset + strlen(valptr) + 1;
                }
@@ -2291,13 +2304,40 @@ retry1:
        }
 
        /*
-        * Truncate given database and user names to length of a Postgres name.
-        * This avoids lookup failures when overlength names are given.
+        * Given the client encoding, truncate given database and user names to
+        * length of a Postgres name. This avoids lookup failures when 
overlength
+        * names are given.
         */
+       if (client_encoding < 0)
+       {
+               /*
+                * client_encoding has not been found in the startup packet so 
let's
+                * try to get it from locale.
+                */
+               client_encoding = pg_get_encoding_from_locale(NULL, true);
+
+               if (client_encoding < 0)
+                       client_encoding = PG_SQL_ASCII;
+       }
+
        if (strlen(port->database_name) >= NAMEDATALEN)
-               port->database_name[NAMEDATALEN - 1] = '\0';
+       {
+               int                     newlen;
+
+               newlen = pg_encoding_mbcliplen(client_encoding, 
port->database_name,
+                                                                          
strlen(port->database_name),
+                                                                          
NAMEDATALEN - 1);
+               port->database_name[newlen] = '\0';
+       }
        if (strlen(port->user_name) >= NAMEDATALEN)
-               port->user_name[NAMEDATALEN - 1] = '\0';
+       {
+               int                     newlen;
+
+               newlen = pg_encoding_mbcliplen(client_encoding, port->user_name,
+                                                                          
strlen(port->user_name),
+                                                                          
NAMEDATALEN - 1);
+               port->user_name[newlen] = '\0';
+       }
 
        if (am_walsender)
                MyBackendType = B_WAL_SENDER;
-- 
2.34.1

Reply via email to