On Fri, 2023-04-28 at 14:35 -0700, Jeff Davis wrote:
> On Thu, 2023-04-27 at 14:23 +0200, Daniel Verite wrote:
> > This should be pg_strcasecmp(...) == 0
> 
> Good catch, thank you! Fixed in updated patches.

Rebased patches.

=== 0001: do not convert C to en-US-u-va-posix

I plan to commit this soon. If someone specifies "C", they are probably
expecting memcmp()-like behavior, or some kind of error/warning that it
can't be provided.

Removing this transformation means that if you specify iculocale=C,
you'll get an error or warning (depending on icu_validation_level),
because C is not a recognized icu locale. Depending on how some of the
other issues in this thread are sorted out, we may want to relax the
validation.

=== 0002: fix @euro, etc. in ICU >= 64

I'd like to commit this soon too, but I'll wait for someone to take a
look. It makes it more reliable to map libc names to icu locale names
regardless of the ICU version.

It doesn't solve the problem for locales like "de__PHONEBOOK", but
those don't seem to be a libc format (I think just an old ICU format),
so I don't see a big reason to carry it forward. It might be another
reason to turn down the validation level to WARNING, though.

=== 0003: support C memcmp() behavior with ICU provider

The current patch 0003 has a problem, because in previous postgres
versions (going all the way back), we allowed "C" as a valid ICU
locale, that would actually be passed to ICU as a locale name. But ICU
didn't recognize it, and it would end up opening the root locale. So we
can't simply redefine "C" to mean "memcmp", because that would
potentially break indexes.

I see the following potential solutions:

   1. Represent the memcmp behavior with iculocale=NULL, or some other
catalog hack, so that we can distinguish between a locale "C" upgraded
from a previous version (which should pass "C" to ICU and get the root
locale), and a new collation defined with locale "C" (which should have
memcmp behavior). The catalog representation for locale information is
already complex, so I'm not excited about this option, but it will
work.

   2. When provider=icu and locale=C, magically transform that into
provider=libc to get memcmp-like behavior for new collations but
preserve the existing behavior for upgraded collations. Not especially
clean, but if we issue a NOTICE perhaps that would avoid confusion.

   3. Like #2, except create a new provider type "none" which may be
slightly less confusing.

=== 0004: make LOCALE apply to ICU for CREATE DATABASE

To understand this patch it helps to understand the confusing situation
with CREATE DATABASE in version 15:

The keywords LC_CTYPE and LC_COLLATE set the server environment
LC_CTYPE/LC_COLLATE for that database and can be specified regardless
of the provider. LOCALE can be specified along with (or instead of)
LC_CTYPE and LC_COLLATE, in which case whichever of LC_CTYPE or
LC_COLLATE is unspecified defaults to the setting of LOCALE. Iff the
provider is libc, LC_CTYPE and LC_COLLATE also act as the database
default collation's locale. If the provider is icu, then none of
LOCALE, LC_CTYPE, or LC_COLLATE affect the database default collation's
locale at all; that's controlled by ICU_LOCALE (which may be omitted if
the template's daticulocale is non-NULL).

The idea of patch 0004 is to address the last part, which is probably
the most confusing aspect. But for that to work smoothly, we need
something like 0003 so that LOCALE=C gives the same semantics
regardless of the provider.

Regards,
        Jeff Davis

From ddda683963959a175dff17ab0e3d8519641498b9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Fri, 21 Apr 2023 14:03:57 -0700
Subject: [PATCH v4 1/4] ICU: do not convert locale 'C' to 'en-US-u-va-posix'.

The conversion was intended to be for convenience, but it's more
likely to be confusing than useful.

The user can still directly specify 'en-US-u-va-posix' if desired.

Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.ca...@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 19 +------------------
 src/bin/initdb/initdb.c                       | 17 +----------------
 .../regress/expected/collate.icu.utf8.out     |  8 ++++++++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 ++++
 4 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f0b6567da1..51b4221a39 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2782,26 +2782,10 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		if (elevel > 0)
-			ereport(elevel,
-					(errmsg("could not get language from locale \"%s\": %s",
-							loc_str, u_errorName(status))));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2889,8 +2873,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..4086834458 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2238,24 +2238,10 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		pg_fatal("could not get language from locale \"%s\": %s",
-				 loc_str, u_errorName(status));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2327,8 +2313,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..99f12d2e73 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1020,6 +1020,7 @@ CREATE ROLE regress_test_role;
 CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' ||
@@ -1034,17 +1035,24 @@ BEGIN
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
+RESET icu_validation_level;
 RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+WARNING:  ICU locale "c" has unknown language "c"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..d9778faacc 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -358,6 +358,7 @@ CREATE SCHEMA test_schema;
 
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 
 do $$
 BEGIN
@@ -373,13 +374,16 @@ BEGIN
 END
 $$;
 
+RESET icu_validation_level;
 RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 RESET icu_validation_level;
 
-- 
2.34.1

From 3db6abd0fe56e9c4b6653e04e28f6f77381c2fc8 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Fri, 28 Apr 2023 12:22:41 -0700
Subject: [PATCH v4 2/4] ICU: fix up old libc-style locale strings.

Before transforming a locale string into a language tag, fix up old
libc-style locale strings such as 'fr_FR@euro'. Older ICU versions did
this automatically, but ICU version 64 removed that support.

Discussion: https://postgr.es/m/654a49f7ff7461bcf47be4181430678d45f93858.camel%40j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 ++++++++++++++++-
 src/bin/initdb/initdb.c                       | 63 ++++++++++++++++++-
 .../regress/expected/collate.icu.utf8.out     | 11 ++++
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +++
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 51b4221a39..0e7343b28b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2766,6 +2766,60 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 	pfree(lower_str);
 }
 
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = palloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pstrdup(loc_str);
+}
+
 #endif
 
 /*
@@ -2782,6 +2836,7 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
@@ -2798,7 +2853,7 @@ icu_language_tag(const char *loc_str, int elevel)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2818,6 +2873,8 @@ icu_language_tag(const char *loc_str, int elevel)
 		break;
 	}
 
+	pfree(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pfree(langtag);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4086834458..600c8d93f3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2229,6 +2229,64 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+#ifdef USE_ICU
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = pg_malloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pg_strdup(loc_str);
+}
+
+#endif
+
 /*
  * Convert to canonical BCP47 language tag. Must be consistent with
  * icu_language_tag().
@@ -2238,6 +2296,7 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
@@ -2254,7 +2313,7 @@ icu_language_tag(const char *loc_str)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2273,6 +2332,8 @@ icu_language_tag(const char *loc_str)
 		break;
 	}
 
+	pg_free(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pg_free(langtag);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 99f12d2e73..d520674edf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1046,6 +1046,8 @@ CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
+ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
@@ -1056,7 +1058,16 @@ HINT:  To disable ICU locale validation, set parameter icu_validation_level to D
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index d9778faacc..ab9a8484b9 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -381,12 +381,19 @@ CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, nee
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
 
-- 
2.34.1

From 05597bea2f48cb1ef78a745401bcabdd29245b84 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH v4 3/4] ICU: support locale "C" with the same behavior as
 libc.

The "C" locale doesn't actually use a provider at all, it's a special
locale that uses memcmp() and built-in character classification. Make
it behave the same in ICU as libc (even though it doesn't actually
make use of either provider).

Discussion: https://postgr.es/m/87v8hoexdv....@news-spur.riddles.org.uk
---
 src/backend/commands/collationcmds.c          | 43 ++++++----
 src/backend/commands/dbcommands.c             | 42 +++++----
 src/backend/utils/adt/pg_locale.c             | 86 ++++++++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 12 +--
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +-
 5 files changed, 131 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..7e69a889fb 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,39 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 						 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the locale string. Otherwise,
-			 * canonicalize to a language tag.
-			 */
-			if (!IsBinaryUpgrade)
+			if (pg_strcasecmp(colliculocale, "C") == 0 ||
+				pg_strcasecmp(colliculocale, "POSIX") == 0)
 			{
-				char *langtag = icu_language_tag(colliculocale,
-												 icu_validation_level);
-
-				if (langtag && strcmp(colliculocale, langtag) != 0)
+				if (!collisdeterministic)
+					ereport(ERROR,
+							(errmsg("nondeterministic collations not supported for C or POSIX locale")));
+				if (collicurules != NULL)
+					ereport(ERROR,
+							(errmsg("RULES not supported for C or POSIX locale")));
+			}
+			else
+			{
+				/*
+				 * During binary upgrade, preserve the locale
+				 * string. Otherwise, canonicalize to a language tag.
+				 */
+				if (!IsBinaryUpgrade)
 				{
-					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
-									langtag, colliculocale)));
+					char *langtag = icu_language_tag(colliculocale,
+													 icu_validation_level);
+
+					if (langtag && strcmp(colliculocale, langtag) != 0)
+					{
+						ereport(NOTICE,
+								(errmsg("using standard form \"%s\" for locale \"%s\"",
+										langtag, colliculocale)));
 
-					colliculocale = langtag;
+						colliculocale = langtag;
+					}
 				}
-			}
 
-			icu_validate_locale(colliculocale);
+				icu_validate_locale(colliculocale);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..8ef33871f0 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1058,27 +1058,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
 
-		/*
-		 * During binary upgrade, or when the locale came from the template
-		 * database, preserve locale string. Otherwise, canonicalize to a
-		 * language tag.
-		 */
-		if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
+		if (pg_strcasecmp(dbiculocale, "C") == 0 ||
+			pg_strcasecmp(dbiculocale, "POSIX") == 0)
 		{
-			char *langtag = icu_language_tag(dbiculocale,
-											 icu_validation_level);
-
-			if (langtag && strcmp(dbiculocale, langtag) != 0)
+			if (dbicurules != NULL)
+				ereport(ERROR,
+						(errmsg("ICU_RULES not supported for C or POSIX locale")));
+		}
+		else
+		{
+			/*
+			 * During binary upgrade, or when the locale came from the
+			 * template database, preserve locale string. Otherwise,
+			 * canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
 			{
-				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
-								langtag, dbiculocale)));
+				char *langtag = icu_language_tag(dbiculocale,
+												 icu_validation_level);
+
+				if (langtag && strcmp(dbiculocale, langtag) != 0)
+				{
+					ereport(NOTICE,
+							(errmsg("using standard form \"%s\" for locale \"%s\"",
+									langtag, dbiculocale)));
 
-				dbiculocale = langtag;
+					dbiculocale = langtag;
+				}
 			}
-		}
 
-		icu_validate_locale(dbiculocale);
+			icu_validate_locale(dbiculocale);
+		}
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 0e7343b28b..76ca42441d 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1246,8 +1246,15 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		}
 		else
 		{
-			cache_entry->collate_is_c = false;
-			cache_entry->ctype_is_c = false;
+			Datum		datum;
+			const char *colliculocale;
+
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colliculocale);
+			colliculocale = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(colliculocale, "C") == 0) ||
+										 (strcmp(colliculocale, "POSIX") == 0));
+			cache_entry->ctype_is_c = cache_entry->collate_is_c;
 		}
 
 		cache_entry->flags_valid = true;
@@ -1279,16 +1286,27 @@ lc_collate_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_COLLATE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_COLLATE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_COLLATE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_COLLATE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1332,16 +1350,27 @@ lc_ctype_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_CTYPE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_CTYPE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_CTYPE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_CTYPE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1375,7 +1404,14 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 
-	collator = pg_ucol_open(iculocstr);
+	if (pg_strcasecmp(iculocstr, "C") == 0 ||
+		pg_strcasecmp(iculocstr, "POSIX") == 0)
+	{
+		Assert(icurules == NULL);
+		collator = NULL;
+	}
+	else
+		collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1650,6 +1686,10 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (pg_strcasecmp("C", collcollate) == 0 ||
+		pg_strcasecmp("POSIX", collcollate) == 0)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
@@ -1668,9 +1708,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	else
 #endif
 		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
+			pg_strncasecmp("C.", collcollate, 2) != 0)
 	{
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
@@ -2457,6 +2495,14 @@ pg_ucol_open(const char *loc_str)
 	if (loc_str == NULL)
 		elog(ERROR, "opening default collator is not supported");
 
+	/*
+	 * Must never open special values C or POSIX, which are treated specially
+	 * and not passed to the provider.
+	 */
+	if (pg_strcasecmp(loc_str, "C") == 0 ||
+		pg_strcasecmp(loc_str, "POSIX") == 0)
+		elog(ERROR, "unexpected ICU locale string: %s", loc_str);
+
 	/*
 	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
 	 * the root locale. If the first component of the locale is "und", replace
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d520674edf..f217658151 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1042,8 +1042,10 @@ ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
-ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+ERROR:  nondeterministic collations not supported for C or POSIX locale
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
+ERROR:  RULES not supported for C or POSIX locale
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
@@ -1051,16 +1053,14 @@ ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMEN
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
-WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-WARNING:  ICU locale "c" has unknown language "c"
-HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
 NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index ab9a8484b9..e4bbd2c009 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -379,16 +379,19 @@ RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
+
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
-- 
2.34.1

From 310bdcd136e44bfca1eea4da5181886eac02d52d Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v4 4/4] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107...@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 ++--
 doc/src/sgml/ref/createdb.sgml                |  5 ++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++--
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++---
 src/bin/initdb/initdb.c                       | 31 ++++++++++++-------
 src/bin/scripts/createdb.c                    | 13 +++-----
 src/bin/scripts/t/020_createdb.pl             |  4 +--
 src/test/icu/t/010_database.pl                | 23 +++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 28 ++++++++---------
 10 files changed, 80 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..844773ff44 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 7e69a889fb..e481f20dc8 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -288,7 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					if (langtag && strcmp(colliculocale, langtag) != 0)
 					{
 						ereport(NOTICE,
-								(errmsg("using standard form \"%s\" for locale \"%s\"",
+								(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 										langtag, colliculocale)));
 
 						colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 8ef33871f0..b447dc55f3 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1080,7 +1087,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				if (langtag && strcmp(dbiculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, dbiculocale)));
 
 					dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 600c8d93f3..7e316c8ba9 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2157,7 +2157,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2452,7 +2456,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2468,6 +2472,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	/*
@@ -2504,14 +2510,18 @@ setlocales(void)
 			printf(_("Using default ICU locale \"%s\".\n"), icu_locale);
 		}
 
-		/* canonicalize to a language tag */
-		langtag = icu_language_tag(icu_locale);
-		printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
-			   langtag, icu_locale);
-		pg_free(icu_locale);
-		icu_locale = langtag;
-
-		icu_validate_locale(icu_locale);
+		if (pg_strcasecmp(icu_locale, "C") != 0 &&
+			pg_strcasecmp(icu_locale, "POSIX") != 0)
+		{
+			/* canonicalize to a language tag */
+			langtag = icu_language_tag(icu_locale);
+			printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
+				   langtag, icu_locale);
+			pg_free(icu_locale);
+			icu_locale = langtag;
+
+			icu_validate_locale(icu_locale);
+		}
 
 		/*
 		 * In supported builds, the ICU locale ID will be opened during
@@ -3343,7 +3353,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..9ca86a3e53 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
@@ -219,6 +211,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..3db9fe931f 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -126,7 +126,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -134,7 +134,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index f217658151..566e91d2d9 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1063,11 +1063,11 @@ CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+NOTICE:  using standard form "und-u-cu-eur" for ICU locale "@EURO"
 CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+NOTICE:  using standard form "und-u-co-pinyin" for ICU locale "@pinyin"
 CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
+NOTICE:  using standard form "und-u-co-stroke" for ICU locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
@@ -1213,9 +1213,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1223,7 +1223,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1240,12 +1240,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1254,7 +1254,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1282,13 +1282,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1338,9 +1338,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

Reply via email to