On Sat, Oct 22, 2022 at 10:24 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> ... But it
> doesn't provide a way for me to create a new database that uses 63 on
> purpose when I know what I'm doing.  There are various reasons I might
> want to do that.

Thinking some more about this, I guess that could be addressed by
having an explicit way to request either the library version or
collversion-style version when creating a database or collation, but
not actually storing it in daticulocale/colliculocale.  That could be
done either as part of the string that is trimmed off before storing
it (so it's only used briefly during creation to find a non-default
library)... Perhaps that'd look like initdb --icu-locale "67:en" (ICU
library version) or "154.14:en" (individual collation version) or some
new syntax in a few places.  Thereafter, it would always be looked up
by searching for the right library by [dat]collversion as Peter E
suggested.

Let me try harder to vocalise some more thoughts that have stopped me
from trying to code the search-by-collversion design so far:

Suppose your pgdata encounters a PostgreSQL linked against a later ICU
library, most likely after an OS upgrade or migratoin, a pg_upgrade,
or via streaming replication.  You might get a new error "can't find
ICU collation 'en' with version '153.14'; HINT: install missing ICU
library version", and somehow you'll have to work out which one might
contain 'en' v153.14 and install it with apt-get etc.  Then it'll
magically work: your postgres linked against (say) 71 will happily
work with the dlopen'd 67.  This is enough if you want to stay on 67
until the heat death of the universe.  So far so good.

Problem 1:  Suppose you're ready to start using (say) v72.  I guess
you'd use the REFRESH command, which would open the main linked ICU's
collversion and stamp that into the catalogue, at which point new
sessions would start using that, and then you'd have to rebuild all
your indexes (with no help from PG to tell you how to find everything
that needs to be rebuilt, as belaboured in previous reverted work).
Aside from the possibility of getting the rebuilding job wrong (as
belaboured elsewhere), it's not great, because there is still a
transitional period where you can be using the wrong version for your
data.  So this requires some careful planning and understanding from
the administrator.

I admit that the upgrade story is a tiny bit better than the v5
DB2-style patch, which starts using the new version immediately if you
didn't use a prefix (and logs the usual warnings about collversion
mismatch) instead of waiting for you to run REFRESH.  But both of them
have a phase where they might use the wrong library to access an
index.  That's dissatisfying, and leads me to prefer the simple
DB2-style solution that at least admits up front that it's not very
clever.  The DB2-style patch could be improved a bit here with the
addition of one more GUC: default_icu_library, so the administrator,
rather than the packager, remains in control of which version we use
for non-prefixed iculocale values (likely to be what almost everyone
is interested in), defaulting to what the packager linked against.
I've added that to the patch for illustration (though obviously the
error messages produced by collversion mismatch could use some
adjustment, ie to clarify that the warning might be cleared by
installing and selecting a different library version).

Problem 2:  If ICU 67 ever decides to report a different version for a
given collation (would it ever do that?  I don't expect so, but ...),
we'd be unable to open the collation with the search-by-collversion
design, and potentially the database.  What is a user supposed to do
then?  Presumably our error/hint for that would be "please insert the
correct ICU library into drive A", but now there is no correct
library; if you can even diagnose what's happened, I guess you might
downgrade the ICU library using package tools or whatever if possible,
but otherwise you'd be stuck, if you just can't get the right library.
Is this a problem?  Would you want to be able to say "I don't care,
computer, please just press on"?  So I think we need a way to turn off
the search-by-collversion thing.  How should it look?

I'd love to hear others' thoughts on how we can turn this into a
workable solution.  Hopefully while staying simple...
From 0355984c9a80ff15bfac51677fea30b9be68226b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Wed, 8 Jun 2022 17:43:53 +1200
Subject: [PATCH v6] WIP: Multi-version ICU.

Add a layer of indirection when accessing ICU, so that multiple major
versions of the library can be used at once.  Versions other than the
one that PostgreSQL was linked against are opened with dlopen(), but we
refuse to open version higher than the one were were compiled against.
The ABI might change in future releases so that wouldn't be safe.

By default, the system linker's default search path is used to find
libraries, but icu_library_path may be used to specify an absolute path
to look in.  ICU libraries are expected to have been built without ICU's
--disable-renaming option.  That is, major versions must use distinct
symbol names.

This arrangement means that at least one major version of ICU is always
available -- the one that PostgreSQL was linked again.  It should be
simple on most software distributions to install extra versions using a
package manager, or to build extra libraries as required, to access
older ICU releases.  For example, on Debian bullseye the packages are
named libicu63, libicu67, libicu71.

In this version of the patch, '63:en' used as a database default locale
or COLLATION object requests ICU library 63, and 'en' requests the
library version seleted by the GUC default_icu_library_version,
defaulting to the version that the executable is linked against.

XXX Many other designs possible, to discuss!

Discussion: https://postgr.es/m/CA%2BhUKGL4VZRpP3CkjYQkv4RQ6pRYkPkSNgKSxFBwciECQ0mEuQ%40mail.gmail.com
---
 src/backend/access/hash/hashfunc.c            |  16 +-
 src/backend/commands/collationcmds.c          |  20 +
 src/backend/utils/adt/formatting.c            |  53 ++-
 src/backend/utils/adt/pg_locale.c             | 376 +++++++++++++++++-
 src/backend/utils/adt/varchar.c               |  16 +-
 src/backend/utils/adt/varlena.c               |  56 +--
 src/backend/utils/misc/guc_tables.c           |  28 ++
 src/backend/utils/misc/postgresql.conf.sample |   5 +
 src/include/utils/pg_locale.h                 |  74 ++++
 src/tools/pgindent/typedefs.list              |   3 +
 10 files changed, 581 insertions(+), 66 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index b57ed946c4..0a61538efd 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -298,11 +298,11 @@ hashtext(PG_FUNCTION_ARGS)
 
 			ulen = icu_to_uchar(&uchar, VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
 
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-									uchar, ulen, NULL, 0);
+			bsize = PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+													 uchar, ulen, NULL, 0);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-							uchar, ulen, buf, bsize);
+			PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+											 uchar, ulen, buf, bsize);
 
 			result = hash_any(buf, bsize);
 
@@ -355,11 +355,11 @@ hashtextextended(PG_FUNCTION_ARGS)
 
 			ulen = icu_to_uchar(&uchar, VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
 
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-									uchar, ulen, NULL, 0);
+			bsize = PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+													 uchar, ulen, NULL, 0);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-							uchar, ulen, buf, bsize);
+			PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+											 uchar, ulen, buf, bsize);
 
 			result = hash_any_extended(buf, bsize, PG_GETARG_INT64(1));
 
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index fcfc02d2ae..26e747d9d7 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -812,6 +812,26 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 					CreateComments(collid, CollationRelationId, 0,
 								   icucomment);
 			}
+
+			/* Also create an object pinned to an ICU major version. */
+			collid = CollationCreate(psprintf("%s-x-icu-%d", langtag, U_ICU_VERSION_MAJOR_NUM),
+									 nspid, GetUserId(),
+									 COLLPROVIDER_ICU, true, -1,
+									 NULL, NULL,
+									 psprintf("%d:%s", U_ICU_VERSION_MAJOR_NUM, iculocstr),
+									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
+									 true, true);
+			if (OidIsValid(collid))
+			{
+				ncreated++;
+
+				CommandCounterIncrement();
+
+				icucomment = get_icu_locale_comment(name);
+				if (icucomment)
+					CreateComments(collid, CollationRelationId, 0,
+								   icucomment);
+			}
 		}
 	}
 #endif							/* USE_ICU */
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 26f498b5df..0c3c7724d7 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1599,6 +1599,11 @@ typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
 									 const UChar *src, int32_t srcLength,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
+typedef int32_t (*ICU_Convert_BI_Func) (UChar *dest, int32_t destCapacity,
+										const UChar *src, int32_t srcLength,
+										UBreakIterator *bi,
+										const char *locale,
+										UErrorCode *pErrorCode);
 
 static int32_t
 icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
@@ -1623,18 +1628,41 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
-				(errmsg("case conversion failed: %s", u_errorName(status))));
+				(errmsg("case conversion failed: %s",
+						PG_ICU_LIB(mylocale)->errorName(status))));
 	return len_dest;
 }
 
+/*
+ * Like icu_convert_case, but func takes a break iterator (which we don't
+ * make use of).
+ */
 static int32_t
-u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
-						const UChar *src, int32_t srcLength,
-						const char *locale,
-						UErrorCode *pErrorCode)
+icu_convert_case_bi(ICU_Convert_BI_Func func, pg_locale_t mylocale,
+					UChar **buff_dest, UChar *buff_source, int32_t len_source)
 {
-	return u_strToTitle(dest, destCapacity, src, srcLength,
-						NULL, locale, pErrorCode);
+	UErrorCode	status;
+	int32_t		len_dest;
+
+	len_dest = len_source;		/* try first with same length */
+	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+	status = U_ZERO_ERROR;
+	len_dest = func(*buff_dest, len_dest, buff_source, len_source, NULL,
+					mylocale->info.icu.locale, &status);
+	if (status == U_BUFFER_OVERFLOW_ERROR)
+	{
+		/* try again with adjusted length */
+		pfree(*buff_dest);
+		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+		status = U_ZERO_ERROR;
+		len_dest = func(*buff_dest, len_dest, buff_source, len_source, NULL,
+						mylocale->info.icu.locale, &status);
+	}
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("case conversion failed: %s",
+						PG_ICU_LIB(mylocale)->errorName(status))));
+	return len_dest;
 }
 
 #endif							/* USE_ICU */
@@ -1702,7 +1730,8 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 			UChar	   *buff_conv;
 
 			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToLower, mylocale,
+			len_conv = icu_convert_case(PG_ICU_LIB(mylocale)->strToLower,
+										mylocale,
 										&buff_conv, buff_uchar, len_uchar);
 			icu_from_uchar(&result, buff_conv, len_conv);
 			pfree(buff_uchar);
@@ -1824,7 +1853,8 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 			UChar	   *buff_conv;
 
 			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToUpper, mylocale,
+			len_conv = icu_convert_case(PG_ICU_LIB(mylocale)->strToUpper,
+										mylocale,
 										&buff_conv, buff_uchar, len_uchar);
 			icu_from_uchar(&result, buff_conv, len_conv);
 			pfree(buff_uchar);
@@ -1947,8 +1977,9 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 			UChar	   *buff_conv;
 
 			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToTitle_default_BI, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
+			len_conv = icu_convert_case_bi(PG_ICU_LIB(mylocale)->strToTitle,
+										   mylocale,
+										   &buff_conv, buff_uchar, len_uchar);
 			icu_from_uchar(&result, buff_conv, len_conv);
 			pfree(buff_uchar);
 			pfree(buff_conv);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..666a79b907 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -58,6 +58,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_control.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 #include "utils/guc_hooks.h"
@@ -69,6 +70,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -79,14 +81,32 @@
 #include <shlwapi.h>
 #endif
 
+#include <dlfcn.h>
+
 #define		MAX_L10N_DATA		80
 
+#ifdef USE_ICU
+
+/*
+ * We don't want to call into dlopen'd ICU libraries that are newer than the
+ * one we were compiled and linked against, just in case there is an
+ * incompatible API change.
+ */
+#define PG_MAX_ICU_MAJOR_VERSION U_ICU_VERSION_MAJOR_NUM
+
+/* An old ICU release that we know has the right API. */
+#define PG_MIN_ICU_MAJOR_VERSION 54
+
+#endif
+
 
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
 char	   *locale_numeric;
 char	   *locale_time;
+char	   *icu_library_path;
+int			default_icu_library_version;
 
 /*
  * lc_time localization cache.
@@ -1398,29 +1418,354 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+#ifdef USE_ICU
+
 struct pg_locale_struct default_locale;
 
+/* Linked list of ICU libraries we have loaded. */
+static pg_icu_library *icu_library_list = NULL;
+
+/*
+ * Free an ICU library.  pg_icu_library objects that are successfully
+ * constructed stick around for the lifetime of the backend, but this is used
+ * to clean up if initialization fails.
+ */
+static void
+free_icu_library(pg_icu_library *lib)
+{
+	if (lib->libicui18n_handle)
+		dlclose(lib->libicui18n_handle);
+	if (lib->libicuuc_handle)
+		dlclose(lib->libicuuc_handle);
+	pfree(lib);
+}
+
+static void *
+get_icu_function(void *handle, const char *function, int version)
+{
+	char		name[80];
+
+	snprintf(name, sizeof(name), "%s_%d", function, version);
+
+	return dlsym(handle, name);
+}
+
+/*
+ * Probe a dynamically loaded library to see which major version of ICU it
+ * contains.
+ */
+static int
+get_icu_library_major_version(void *handle)
+{
+	for (int i = PG_MIN_ICU_MAJOR_VERSION; i <= PG_MAX_ICU_MAJOR_VERSION; ++i)
+		if (get_icu_function(handle, "ucol_open", i) ||
+			get_icu_function(handle, "u_strToUpper", i))
+			return i;
+
+	/*
+	 * It's a later version we don't dare use, an old version we don't
+	 * support, an ICU build with symbol suffixes disabled, or not ICU.
+	 */
+	return -1;
+}
+
+/*
+ * We have to load a couple of different libraries, so we'll reuse the code to
+ * do that.
+ */
+static void *
+load_icu_library(pg_icu_library *lib, const char *name)
+{
+	void	   *handle;
+	int			found_major_version;
+
+	handle = dlopen(name, RTLD_NOW | RTLD_GLOBAL);
+	if (handle == NULL)
+	{
+		int			errno_save = errno;
+
+		free_icu_library(lib);
+		errno = errno_save;
+
+		ereport(ERROR,
+				(errmsg("could not load library \"%s\": %m", name)));
+	}
+
+	found_major_version = get_icu_library_major_version(handle);
+	if (found_major_version < 0)
+	{
+		free_icu_library(lib);
+		ereport(ERROR,
+				(errmsg("could not find compatible ICU major version in library \"%s\"",
+						name)));
+	}
+
+	if (found_major_version != lib->major_version)
+	{
+		free_icu_library(lib);
+		ereport(ERROR,
+				(errmsg("expected to find ICU major version %d in library \"%s\", but found %d",
+						lib->major_version, name, found_major_version)));
+	}
+
+	return handle;
+}
+
+/*
+ * Given an ICU major version number, return the object we need to access it,
+ * or fail while trying to load it.
+ */
+static pg_icu_library *
+get_icu_library(int major_version)
+{
+	pg_icu_library *lib;
+
+	/* XXX Move range check into guc_table.c? */
+	if (major_version < PG_MIN_ICU_MAJOR_VERSION ||
+		major_version > PG_MAX_ICU_MAJOR_VERSION)
+		elog(ERROR,
+			"ICU version must be between %d and %d",
+			 PG_MIN_ICU_MAJOR_VERSION,
+			 PG_MAX_ICU_MAJOR_VERSION);
+
+	/* Try to find it in our list of existing libraries. */
+	for (lib = icu_library_list; lib; lib = lib->next)
+		if (lib->major_version == major_version)
+			return lib;
+
+	/* Make a new entry. */
+	lib = MemoryContextAllocZero(TopMemoryContext, sizeof(*lib));
+	if (major_version == U_ICU_VERSION_MAJOR_NUM)
+	{
+		/*
+		 * This is the version we were compiled and linked against.  Simply
+		 * assign the function pointers.
+		 *
+		 * These assignments will fail to compile if an incompatible API
+		 * change is made to some future version of ICU, at which point we
+		 * might need to consider special treatment for different major
+		 * version ranges, with intermediate trampoline functions.
+		 */
+		lib->major_version = major_version;
+		lib->open = ucol_open;
+		lib->close = ucol_close;
+		lib->getVersion = ucol_getVersion;
+		lib->versionToString = u_versionToString;
+		lib->strcoll = ucol_strcoll;
+		lib->strcollUTF8 = ucol_strcollUTF8;
+		lib->getSortKey = ucol_getSortKey;
+		lib->nextSortKeyPart = ucol_nextSortKeyPart;
+		lib->setUTF8 = uiter_setUTF8;
+		lib->errorName = u_errorName;
+		lib->strToUpper = u_strToUpper;
+		lib->strToLower = u_strToLower;
+		lib->strToTitle = u_strToTitle;
+
+		/*
+		 * Also assert the size of a couple of types used as output buffers,
+		 * as a canary to tell us to add extra padding in the (unlikely) event
+		 * that a later release makes these values smaller.
+		 */
+		StaticAssertStmt(U_MAX_VERSION_STRING_LENGTH == 20,
+						 "u_versionToString output buffer size changed incompatibly");
+		StaticAssertStmt(U_MAX_VERSION_LENGTH == 4,
+						 "ucol_getVersion output buffer size changed incompatibly");
+	}
+	else
+	{
+		/* This is an older version, so we'll need to use dlopen(). */
+		char		libicui18n_name[MAXPGPATH];
+		char		libicuuc_name[MAXPGPATH];
+
+		/*
+		 * We don't like to open versions newer than what we're linked
+		 * against, to reduce the risk of an API change biting us.
+		 */
+		if (major_version > U_ICU_VERSION_MAJOR_NUM)
+			elog(ERROR, "ICU major version %d higher than linked version %d, refusing to open",
+				 major_version, U_ICU_VERSION_MAJOR_NUM);
+
+		lib->major_version = major_version;
+
+		/*
+		 * See
+		 * https://unicode-org.github.io/icu/userguide/icu4c/packaging.html#icu-versions
+		 * for conventions on library naming on POSIX and Windows systems.
+		 */
+
+		/* Load the collation library. */
+		snprintf(libicui18n_name,
+				 sizeof(libicui18n_name),
+#ifdef WIN32
+				 "%s%sicui18n%d." DLSUFFIX,
+				 icu_library_path,
+				 icu_library_path[0] ? "\\" : "",
+#else
+				 "%s%slibicui18n" DLSUFFIX ".%d",
+				 icu_library_path,
+				 icu_library_path[0] ? "/" : "",
+#endif
+				 major_version);
+		lib->libicui18n_handle = load_icu_library(lib, libicui18n_name);
+
+		/* Load the ctype library. */
+		snprintf(libicuuc_name,
+				 sizeof(libicuuc_name),
+#ifdef WIN32
+				 "%s%sicuuc%d." DLSUFFIX,
+				 icu_library_path,
+				 icu_library_path[0] ? "\\" : "",
+#else
+				 "%s%slibicuuc" DLSUFFIX ".%d",
+				 icu_library_path,
+				 icu_library_path[0] ? "/" : "",
+#endif
+				 major_version);
+		lib->libicuuc_handle = load_icu_library(lib, libicuuc_name);
+
+		/* Look up all the functions we need. */
+		lib->open = get_icu_function(lib->libicui18n_handle,
+									 "ucol_open",
+									 major_version);
+		lib->close = get_icu_function(lib->libicui18n_handle,
+									  "ucol_close",
+									  major_version);
+		lib->getVersion = get_icu_function(lib->libicui18n_handle,
+										   "ucol_getVersion",
+										   major_version);
+		lib->versionToString = get_icu_function(lib->libicui18n_handle,
+												"u_versionToString",
+												major_version);
+		lib->strcoll = get_icu_function(lib->libicui18n_handle,
+										"ucol_strcoll",
+										major_version);
+		lib->strcollUTF8 = get_icu_function(lib->libicui18n_handle,
+											"ucol_strcollUTF8",
+											major_version);
+		lib->getSortKey = get_icu_function(lib->libicui18n_handle,
+										   "ucol_getSortKey",
+										   major_version);
+		lib->nextSortKeyPart = get_icu_function(lib->libicui18n_handle,
+												"ucol_nextSortKeyPart",
+												major_version);
+		lib->setUTF8 = get_icu_function(lib->libicui18n_handle,
+										"uiter_setUTF8",
+										major_version);
+		lib->errorName = get_icu_function(lib->libicui18n_handle,
+										  "u_errorName",
+										  major_version);
+		lib->strToUpper = get_icu_function(lib->libicuuc_handle,
+										   "u_strToUpper",
+										   major_version);
+		lib->strToLower = get_icu_function(lib->libicuuc_handle,
+										   "u_strToLower",
+										   major_version);
+		lib->strToTitle = get_icu_function(lib->libicuuc_handle,
+										   "u_strToTitle",
+										   major_version);
+		if (!lib->open ||
+			!lib->close ||
+			!lib->getVersion ||
+			!lib->versionToString ||
+			!lib->strcoll ||
+			!lib->strcollUTF8 ||
+			!lib->getSortKey ||
+			!lib->nextSortKeyPart ||
+			!lib->setUTF8 ||
+			!lib->errorName ||
+			!lib->strToUpper ||
+			!lib->strToLower ||
+			!lib->strToTitle)
+		{
+			free_icu_library(lib);
+			ereport(ERROR,
+					(errmsg("could not find expected symbols in library \"%s\"",
+							libicui18n_name)));
+		}
+	}
+
+	lib->next = icu_library_list;
+	icu_library_list = lib;
+
+	return lib;
+}
+
+/*
+ * Look up the library to use for a given collcollate string.
+ */
+static pg_icu_library *
+get_icu_library_for_collation(const char *collcollate, const char **rest)
+{
+	int			major_version;
+	char	   *separator;
+	char	   *after_prefix;
+
+	separator = strchr(collcollate, ':');
+
+	/*
+	 * If it's a traditional value without a prefix, use the default ICU
+	 * library.  That's the one we were linked against, or another one if
+	 * default_icu_library_version has been set.
+	 */
+	if (separator == NULL)
+	{
+		*rest = collcollate;
+
+		if (default_icu_library_version > 0)
+			major_version = default_icu_library_version;
+		else
+			major_version = U_ICU_VERSION_MAJOR_NUM;
+		return get_icu_library(major_version);
+	}
+
+	/* If it has a prefix, interpret it as an ICU major version. */
+	major_version = strtol(collcollate, &after_prefix, 10);
+	if (after_prefix != separator)
+		elog(ERROR,
+			 "could not parse ICU major library version: \"%s\"",
+			 collcollate);
+	if (major_version < PG_MIN_ICU_MAJOR_VERSION ||
+		major_version > PG_MAX_ICU_MAJOR_VERSION)
+		elog(ERROR,
+			 "ICU major library verision out of supported range: \"%s\"",
+			 collcollate);
+
+	/* The part after the separate will be passed to the library. */
+	*rest = separator + 1;
+
+	return get_icu_library(major_version);
+}
+
+#endif
+
 void
 make_icu_collator(const char *iculocstr,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
+	pg_icu_library *lib;
 	UCollator  *collator;
 	UErrorCode	status;
 
+	lib = get_icu_library_for_collation(iculocstr, &iculocstr);
 	status = U_ZERO_ERROR;
-	collator = ucol_open(iculocstr, &status);
+	collator = lib->open(iculocstr, &status);
 	if (U_FAILURE(status))
 		ereport(ERROR,
 				(errmsg("could not open collator for locale \"%s\": %s",
-						iculocstr, u_errorName(status))));
+						iculocstr, lib->errorName(status))));
 
-	if (U_ICU_VERSION_MAJOR_NUM < 54)
+	/*
+	 * XXX can we just drop this cruft and make 54 the minimum supported
+	 * version?
+	 */
+	if (lib->major_version < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
+	resultp->info.icu.lib = lib;
 #else							/* not USE_ICU */
 	/* could get here if a collation was created by a build with ICU */
 	ereport(ERROR,
@@ -1651,21 +1996,23 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
+		pg_icu_library *lib;
 		UCollator  *collator;
 		UErrorCode	status;
 		UVersionInfo versioninfo;
 		char		buf[U_MAX_VERSION_STRING_LENGTH];
 
+		lib = get_icu_library_for_collation(collcollate, &collcollate);
 		status = U_ZERO_ERROR;
-		collator = ucol_open(collcollate, &status);
+		collator = lib->open(collcollate, &status);
 		if (U_FAILURE(status))
 			ereport(ERROR,
 					(errmsg("could not open collator for locale \"%s\": %s",
-							collcollate, u_errorName(status))));
-		ucol_getVersion(collator, versioninfo);
-		ucol_close(collator);
+							collcollate, lib->errorName(status))));
+		lib->getVersion(collator, versioninfo);
+		lib->close(collator);
 
-		u_versionToString(versioninfo, buf);
+		lib->versionToString(versioninfo, buf);
 		collversion = pstrdup(buf);
 	}
 	else
@@ -1733,6 +2080,8 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 
 
 #ifdef USE_ICU
+
+
 /*
  * Converter object for converting between ICU's UChar strings and C strings
  * in database encoding.  Since the database encoding doesn't change, we only
@@ -1954,19 +2303,22 @@ void
 check_icu_locale(const char *icu_locale)
 {
 #ifdef USE_ICU
+	pg_icu_library *lib;
 	UCollator  *collator;
 	UErrorCode	status;
 
+	lib = get_icu_library_for_collation(icu_locale, &icu_locale);
 	status = U_ZERO_ERROR;
-	collator = ucol_open(icu_locale, &status);
+	collator = lib->open(icu_locale, &status);
 	if (U_FAILURE(status))
 		ereport(ERROR,
 				(errmsg("could not open collator for locale \"%s\": %s",
-						icu_locale, u_errorName(status))));
+						icu_locale, lib->errorName(status))));
 
-	if (U_ICU_VERSION_MAJOR_NUM < 54)
+	/* XXX can we just drop this cruft? */
+	if (lib->major_version < 54)
 		icu_set_collation_attributes(collator, icu_locale);
-	ucol_close(collator);
+	lib->close(collator);
 #else
 	ereport(ERROR,
 			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 68e2e6f7a7..e0c86870e0 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -1026,11 +1026,11 @@ hashbpchar(PG_FUNCTION_ARGS)
 
 			ulen = icu_to_uchar(&uchar, keydata, keylen);
 
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-									uchar, ulen, NULL, 0);
+			bsize = PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+													 uchar, ulen, NULL, 0);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-							uchar, ulen, buf, bsize);
+			PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+											 uchar, ulen, buf, bsize);
 
 			result = hash_any(buf, bsize);
 
@@ -1087,11 +1087,11 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 
 			ulen = icu_to_uchar(&uchar, VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
 
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-									uchar, ulen, NULL, 0);
+			bsize = PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+													 uchar, ulen, NULL, 0);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-							uchar, ulen, buf, bsize);
+			PG_ICU_LIB(mylocale)->getSortKey(PG_ICU_COL(mylocale),
+											 uchar, ulen, buf, bsize);
 
 			result = hash_any_extended(buf, bsize, PG_GETARG_INT64(1));
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index c5e7ee7ca2..cf891a5654 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1667,13 +1667,14 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 					UErrorCode	status;
 
 					status = U_ZERO_ERROR;
-					result = ucol_strcollUTF8(mylocale->info.icu.ucol,
-											  arg1, len1,
-											  arg2, len2,
-											  &status);
+					result = PG_ICU_LIB(mylocale)->strcollUTF8(PG_ICU_COL(mylocale),
+															   arg1, len1,
+															   arg2, len2,
+															   &status);
 					if (U_FAILURE(status))
 						ereport(ERROR,
-								(errmsg("collation failed: %s", u_errorName(status))));
+								(errmsg("collation failed: %s",
+										PG_ICU_LIB(mylocale)->errorName(status))));
 				}
 				else
 #endif
@@ -1686,9 +1687,9 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 					ulen1 = icu_to_uchar(&uchar1, arg1, len1);
 					ulen2 = icu_to_uchar(&uchar2, arg2, len2);
 
-					result = ucol_strcoll(mylocale->info.icu.ucol,
-										  uchar1, ulen1,
-										  uchar2, ulen2);
+					result = PG_ICU_LIB(mylocale)->strcoll(PG_ICU_COL(mylocale),
+														   uchar1, ulen1,
+														   uchar2, ulen2);
 
 					pfree(uchar1);
 					pfree(uchar2);
@@ -2388,13 +2389,14 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 				UErrorCode	status;
 
 				status = U_ZERO_ERROR;
-				result = ucol_strcollUTF8(sss->locale->info.icu.ucol,
-										  a1p, len1,
-										  a2p, len2,
-										  &status);
+				result = PG_ICU_LIB(sss->locale)->strcollUTF8(PG_ICU_COL(sss->locale),
+															  a1p, len1,
+															  a2p, len2,
+															  &status);
 				if (U_FAILURE(status))
 					ereport(ERROR,
-							(errmsg("collation failed: %s", u_errorName(status))));
+							(errmsg("collation failed: %s",
+									PG_ICU_LIB(sss->locale)->errorName(status))));
 			}
 			else
 #endif
@@ -2407,9 +2409,9 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 				ulen1 = icu_to_uchar(&uchar1, a1p, len1);
 				ulen2 = icu_to_uchar(&uchar2, a2p, len2);
 
-				result = ucol_strcoll(sss->locale->info.icu.ucol,
-									  uchar1, ulen1,
-									  uchar2, ulen2);
+				result = PG_ICU_LIB(sss->locale)->strcoll(PG_ICU_COL(sss->locale),
+														  uchar1, ulen1,
+														  uchar2, ulen2);
 
 				pfree(uchar1);
 				pfree(uchar2);
@@ -2569,24 +2571,24 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 					uint32_t	state[2];
 					UErrorCode	status;
 
-					uiter_setUTF8(&iter, sss->buf1, len);
+					PG_ICU_LIB(sss->locale)->setUTF8(&iter, sss->buf1, len);
 					state[0] = state[1] = 0;	/* won't need that again */
 					status = U_ZERO_ERROR;
-					bsize = ucol_nextSortKeyPart(sss->locale->info.icu.ucol,
-												 &iter,
-												 state,
-												 (uint8_t *) sss->buf2,
-												 Min(sizeof(Datum), sss->buflen2),
-												 &status);
+					bsize = PG_ICU_LIB(sss->locale)->nextSortKeyPart(PG_ICU_COL(sss->locale),
+																	 &iter,
+																	 state,
+																	 (uint8_t *) sss->buf2,
+																	 Min(sizeof(Datum), sss->buflen2),
+																	 &status);
 					if (U_FAILURE(status))
 						ereport(ERROR,
 								(errmsg("sort key generation failed: %s",
-										u_errorName(status))));
+										PG_ICU_LIB(sss->locale)->errorName(status))));
 				}
 				else
-					bsize = ucol_getSortKey(sss->locale->info.icu.ucol,
-											uchar, ulen,
-											(uint8_t *) sss->buf2, sss->buflen2);
+					bsize = PG_ICU_LIB(sss->locale)->getSortKey(PG_ICU_COL(sss->locale),
+																uchar, ulen,
+																(uint8_t *) sss->buf2, sss->buflen2);
 			}
 			else
 #endif
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934..9489268b39 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2939,6 +2939,20 @@ struct config_int ConfigureNamesInt[] =
 		check_max_worker_processes, NULL, NULL
 	},
 
+	{
+		{"default_icu_library_version",
+			PGC_SIGHUP,
+			COMPAT_OPTIONS_PREVIOUS,
+			gettext_noop("Default major version of ICU library to use for collations if not specified."),
+			NULL
+		},
+		&default_icu_library_version,
+		0,
+		0,
+		1000,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"max_logical_replication_workers",
 			PGC_POSTMASTER,
@@ -3922,6 +3936,20 @@ struct config_string ConfigureNamesString[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"icu_library_path", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,
+			gettext_noop("Sets the path for dynamically loadable ICU libraries."),
+			gettext_noop("If versions of ICU other than the one that "
+						 "PostgreSQL is linked against are needed, they will "
+						 "be opened from this directory.  If empty, the "
+						 "system linker search path will be used."),
+			GUC_SUPERUSER_ONLY
+		},
+		&icu_library_path,
+		"",
+		NULL, NULL, NULL
+	},
+
 	{
 		{"krb_server_keyfile", PGC_SIGHUP, CONN_AUTH_AUTH,
 			gettext_noop("Sets the location of the Kerberos server key file."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351..2713c92124 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -727,6 +727,11 @@
 #lc_numeric = 'C'			# locale for number formatting
 #lc_time = 'C'				# locale for time formatting
 
+#default_icu_library_version = 0	# default major version of ICU library
+					# (0 for the linked version)
+#icu_library_path = ''			# path for dynamically loaded ICU
+					# libraries
+
 # default configuration for text search
 #default_text_search_config = 'pg_catalog.simple'
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a875942123..d26e5738f9 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -17,6 +17,7 @@
 #endif
 #ifdef USE_ICU
 #include <unicode/ucol.h>
+#include <unicode/ubrk.h>
 #endif
 
 #ifdef USE_ICU
@@ -40,6 +41,8 @@ extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
 extern PGDLLIMPORT char *locale_numeric;
 extern PGDLLIMPORT char *locale_time;
+extern PGDLLIMPORT char *icu_library_path;
+extern PGDLLIMPORT int default_icu_library_version;
 
 /* lc_time localization cache */
 extern PGDLLIMPORT char *localized_abbrev_days[];
@@ -63,6 +66,71 @@ extern struct lconv *PGLC_localeconv(void);
 
 extern void cache_locale_time(void);
 
+#ifdef USE_ICU
+
+/*
+ * An ICU library version that we're either linked against or have loaded at
+ * runtime.
+ */
+typedef struct pg_icu_library
+{
+	int			major_version;
+	void	   *libicui18n_handle;
+	void	   *libicuuc_handle;
+	UCollator  *(*open) (const char *loc, UErrorCode *status);
+	void		(*close) (UCollator *coll);
+	void		(*getVersion) (const UCollator *coll, UVersionInfo info);
+	void		(*versionToString) (const UVersionInfo versionArray,
+									char *versionString);
+				UCollationResult(*strcoll) (const UCollator *coll,
+											const UChar *source,
+											int32_t sourceLength,
+											const UChar *target,
+											int32_t targetLength);
+				UCollationResult(*strcollUTF8) (const UCollator *coll,
+												const char *source,
+												int32_t sourceLength,
+												const char *target,
+												int32_t targetLength,
+												UErrorCode *status);
+	int32_t		(*getSortKey) (const UCollator *coll,
+							   const UChar *source,
+							   int32_t sourceLength,
+							   uint8_t *result,
+							   int32_t resultLength);
+	int32_t		(*nextSortKeyPart) (const UCollator *coll,
+									UCharIterator *iter,
+									uint32_t state[2],
+									uint8_t *dest,
+									int32_t count,
+									UErrorCode *status);
+	void		(*setUTF8) (UCharIterator *iter,
+							const char *s,
+							int32_t length);
+	const char *(*errorName) (UErrorCode code);
+	int32_t		(*strToUpper) (UChar *dest,
+							   int32_t destCapacity,
+							   const UChar *src,
+							   int32_t srcLength,
+							   const char *locale,
+							   UErrorCode *pErrorCode);
+	int32_t		(*strToLower) (UChar *dest,
+							   int32_t destCapacity,
+							   const UChar *src,
+							   int32_t srcLength,
+							   const char *locale,
+							   UErrorCode *pErrorCode);
+	int32_t		(*strToTitle) (UChar *dest,
+							   int32_t destCapacity,
+							   const UChar *src,
+							   int32_t srcLength,
+							   UBreakIterator *titleIter,
+							   const char *locale,
+							   UErrorCode *pErrorCode);
+	struct pg_icu_library *next;
+} pg_icu_library;
+
+#endif
 
 /*
  * We define our own wrapper around locale_t so we can keep the same
@@ -84,12 +152,18 @@ struct pg_locale_struct
 		{
 			const char *locale;
 			UCollator  *ucol;
+			pg_icu_library *lib;
 		}			icu;
 #endif
 		int			dummy;		/* in case we have neither LOCALE_T nor ICU */
 	}			info;
 };
 
+#ifdef USE_ICU
+#define PG_ICU_LIB(x) ((x)->info.icu.lib)
+#define PG_ICU_COL(x) ((x)->info.icu.ucol)
+#endif
+
 typedef struct pg_locale_struct *pg_locale_t;
 
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d9b839c979..0ccbbb711e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1101,6 +1101,7 @@ HeapTupleTableSlot
 HistControl
 HotStandbyState
 I32
+ICU_Convert_BI_Func
 ICU_Convert_Func
 ID
 INFIX
@@ -2854,6 +2855,7 @@ TypeName
 U
 U32
 U8
+UBreakIterator
 UChar
 UCharIterator
 UColAttribute
@@ -3482,6 +3484,7 @@ pg_funcptr_t
 pg_gssinfo
 pg_hmac_ctx
 pg_hmac_errno
+pg_icu_library
 pg_int64
 pg_local_to_utf_combined
 pg_locale_t
-- 
2.30.2

Reply via email to