While working on [0], I was wondering why the collations ucs_basic and unicode are not in pg_collation.dat. I traced this back through history, and I think this was just lost in a game of telephone.

The initial commit for pg_collation.h (414c5a2ea6) has only the default collation in pg_collation.h (pre .dat), with initdb handling everything else. Over time, additional collations "C" and "POSIX" were moved to pg_collation.h, and other logic was moved from initdb to pg_import_system_collations(). But ucs_basic was untouched. Commit 0b13b2a771 rearranged the relative order of operations in initdb and added the current comment "We don't want to pin these", but looking at the email[1], I think this was more a guess about the previous intent.

I suggest we fix this now; see attached patch.


[0]: https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d3c2%40enterprisedb.com

[1]: https://www.postgresql.org/message-id/28195.1498172402%40sss.pgh.pa.us
From 0d2c6b92a3340833f13bab395e0556ce1f045226 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Tue, 28 Mar 2023 12:04:34 +0200
Subject: [PATCH] Move definition of standard collations from initdb to
 pg_collation.dat

---
 src/bin/initdb/initdb.c              | 15 +--------------
 src/include/catalog/pg_collation.dat |  7 +++++++
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index bae97539fc..9ccbf998ec 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1695,20 +1695,7 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
-       /*
-        * Add SQL-standard names.  We don't want to pin these, so they don't go
-        * in pg_collation.dat.  But add them before reading system collations, 
so
-        * that they win if libc defines a locale with the same name.
-        */
-       PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, colliculocale)"
-                                 "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, 
%u, '%c', true, -1, 'und');\n\n",
-                                 BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU);
-
-       PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, collcollate, 
collctype)"
-                                 "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, 
%u, '%c', true, %d, 'C', 'C');\n\n",
-                                 BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, 
PG_UTF8);
-
-       /* Now import all collations we can find in the operating system */
+       /* Import all collations we can find in the operating system */
        PG_CMD_PUTS("SELECT pg_import_system_collations('pg_catalog');\n\n");
 }
 
diff --git a/src/include/catalog/pg_collation.dat 
b/src/include/catalog/pg_collation.dat
index f4bda1c769..14df398ad2 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -23,5 +23,12 @@
   descr => 'standard POSIX collation',
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
+{ oid => '962',
+  descr => 'sorts using the Unicode Collation Algorithm with default settings',
+  collname => 'unicode', collprovider => 'i', collencoding => '-1',
+  colliculocale => 'und' },
+{ oid => '963', descr => 'sorts by Unicode code point',
+  collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
+  collcollate => 'C', collctype => 'C' },
 
 ]
-- 
2.40.0

Reply via email to