On 2022-10-01 15:07, Peter Eisentraut wrote:
On 22.09.22 20:06, Marina Polyakova wrote:
On 2022-09-21 17:53, Peter Eisentraut wrote:
Committed with that test, thanks.  I think that covers all the ICU
issues you reported for PG15 for now?

I thought about the order of the ICU checks - if it is ok to check that the selected encoding is supported by ICU after printing all the locale & encoding information, why not to move almost all the ICU checks here?..

It's possible that we can do better, but I'm not going to add things
like that to PG 15 at this point unless it fixes a faulty behavior.

Will PG 15 always have this order of ICU checks, is the current behaviour correct enough? On the other hand, there may be a better fix for PG 16+ and not all changes can be backported...

On 2022-09-16 10:56, Peter Eisentraut wrote:
On 15.09.22 17:41, Marina Polyakova wrote:
I agree with you. Here's another version of the patch. The locale/encoding checks and reports in initdb have been reordered, because now the encoding is set first and only then the ICU locale is checked.

I committed something based on the first version of your patch.  This
reordering of the messages here was a little too much surgery for me
at this point.  For instance, there are also messages in #ifdef WIN32
code that would need to be reordered as well.  I kept the overall
structure of the code the same and just inserted the additional
proposed checks.

If you want to pursue the reordering of the checks and messages
overall, a patch for the master branch could be considered.

I've worked on this again (see attached patch) but I'm not sure if the messages of encoding mismatches are clear enough without the full locale information. For

$ initdb -D data --icu-locale en --locale-provider icu

compare the outputs:

The database cluster will be initialized with this locale configuration:
  provider:    icu
  ICU locale:  en
  LC_COLLATE:  de_DE.iso885915@euro
  LC_CTYPE:    de_DE.iso885915@euro
  LC_MESSAGES: en_US.utf8
  LC_MONETARY: de_DE.iso885915@euro
  LC_NUMERIC:  de_DE.iso885915@euro
  LC_TIME:     de_DE.iso885915@euro
The default database encoding has been set to "UTF8".
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination.

and

Encoding "UTF8" implied by locale will be set as the default database encoding.
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination.

The same without ICU, e.g. for

$ initdb -D data

the output with locale information:

The database cluster will be initialized with this locale configuration:
  provider:    libc
  LC_COLLATE:  en_US.utf8
  LC_CTYPE:    de_DE.iso885915@euro
  LC_MESSAGES: en_US.utf8
  LC_MONETARY: de_DE.iso885915@euro
  LC_NUMERIC:  de_DE.iso885915@euro
  LC_TIME:     de_DE.iso885915@euro
The default database encoding has accordingly been set to "LATIN9".
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (LATIN9) and the encoding that the selected locale uses (UTF8) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination.

and the "shorter" version:

Encoding "LATIN9" implied by locale will be set as the default database encoding.
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (LATIN9) and the encoding that the selected locale uses (UTF8) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination.

BTW, what did you mean that "there are also messages in #ifdef WIN32 code that would need to be reordered as well"?..

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 96b46cbc020ef8b85b6d54d3d4ca8ad116277832..242d68f58287aeb6f95619c2ce8b78e38433cf18 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+#ifndef USE_ICU
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#else
 		if (!(is_encoding_supported_by_icu(encoding)))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("ICU locale must be specified")));
 
 		check_icu_locale(dbiculocale);
+#endif
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8b751bcb5cda3a1f4c7803a68bc0a4a..743f11e1d1fd62d500fc2846976472d13e08046b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 	}
 }
 
-#endif							/* USE_ICU */
-
 /*
  * Check if the given locale ID is valid, and ereport(ERROR) if it isn't.
  */
 void
 check_icu_locale(const char *icu_locale)
 {
-#ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
 
@@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale)
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
 	ucol_close(collator);
-#else
-	ereport(ERROR,
-			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("ICU is not supported in this build")));
-#endif
 }
 
+#endif							/* USE_ICU */
+
 /*
  * These functions convert from/to libc's wchar_t, *not* pg_wchar_t.
  * Therefore we keep them here rather than with the mbutils code.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f61a04305590c7d09ec59fe202d6f22c0a605827..47dde552e218055f5d73003799b4ccc7c96a49a7 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2043,24 +2043,35 @@ check_locale_encoding(const char *locale, int user_enc)
 }
 
 /*
- * check if the chosen encoding matches is supported by ICU
+ * check if the chosen locale and encoding are supported by ICU
  *
  * this should match the similar check in the backend createdb() function
  */
-static bool
-check_icu_locale_encoding(int user_enc)
+static void
+check_icu_locale_encoding(void)
 {
-	if (!(is_encoding_supported_by_icu(user_enc)))
+#ifndef USE_ICU
+	pg_fatal("ICU is not supported in this build");
+#else
+	if (!(is_encoding_supported_by_icu(encodingid)))
 	{
 		pg_log_error("encoding mismatch");
 		pg_log_error_detail("The encoding you selected (%s) is not supported with the ICU provider.",
-							pg_encoding_to_char(user_enc));
+							pg_encoding_to_char(encodingid));
 		pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, "
 						  "or choose a matching combination.",
 						  progname);
-		return false;
+		exit(1);
 	}
-	return true;
+
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
+#endif
 }
 
 /*
@@ -2113,20 +2124,6 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
-
-	if (locale_provider == COLLPROVIDER_ICU)
-	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
-
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
-#ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
-#endif
-	}
 }
 
 /*
@@ -2297,18 +2294,86 @@ setup_bin_paths(const char *argv0)
 	canonicalize_path(share_path);
 }
 
+static void
+set_encoding(void)
+{
+	int			ctype_enc;
+
+	if (encoding)
+	{
+		encodingid = get_encoding_id(encoding);
+		return;
+	}
+
+	if (locale_provider == COLLPROVIDER_ICU)
+	{
+		encodingid = PG_UTF8;
+		printf(_("Encoding \"%s\" implied by locale will be set as the default database encoding.\n"),
+			   pg_encoding_to_char(encodingid));
+		return;
+	}
+
+	ctype_enc = pg_get_encoding_from_locale(lc_ctype, true);
+
+	if (ctype_enc == -1)
+	{
+		/* Couldn't recognize the locale's codeset */
+		pg_log_error("could not find suitable encoding for locale \"%s\"",
+					 lc_ctype);
+		pg_log_error_hint("Rerun %s with the -E option.", progname);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	if (!pg_valid_server_encoding_id(ctype_enc))
+	{
+		/*
+		 * We recognized it, but it's not a legal server encoding. On Windows,
+		 * UTF-8 works with any locale, so we can fall back to UTF-8.
+		 */
+#ifdef WIN32
+		encodingid = PG_UTF8;
+		printf(_("Encoding \"%s\" implied by locale is not allowed as a server-side encoding.\n"
+				 "The default database encoding will be set to \"%s\" instead.\n"),
+			   pg_encoding_to_char(ctype_enc),
+			   pg_encoding_to_char(encodingid));
+		return;
+#else
+		pg_log_error("locale \"%s\" requires unsupported encoding \"%s\"",
+					 lc_ctype, pg_encoding_to_char(ctype_enc));
+		pg_log_error_detail("Encoding \"%s\" is not allowed as a server-side encoding.",
+							pg_encoding_to_char(ctype_enc));
+		pg_log_error_hint("Rerun %s with a different locale selection.",
+						  progname);
+		exit(1);
+#endif
+	}
+
+	encodingid = ctype_enc;
+	printf(_("Encoding \"%s\" implied by locale will be set as the default database encoding.\n"),
+		   pg_encoding_to_char(encodingid));
+}
+
 void
 setup_locale_encoding(void)
 {
 	setlocales();
 
+	set_encoding();
+
+	if (!check_locale_encoding(lc_ctype, encodingid) ||
+		!check_locale_encoding(lc_collate, encodingid))
+		exit(1);				/* check_locale_encoding printed the error */
+
+	if (locale_provider == COLLPROVIDER_ICU)
+		check_icu_locale_encoding();
+
 	if (locale_provider == COLLPROVIDER_LIBC &&
 		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0 &&
-		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
+		strcmp(lc_ctype, lc_messages) == 0)
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
@@ -2329,68 +2394,6 @@ setup_locale_encoding(void)
 			   lc_numeric,
 			   lc_time);
 	}
-
-	if (!encoding && locale_provider == COLLPROVIDER_ICU)
-	{
-		encodingid = PG_UTF8;
-		printf(_("The default database encoding has been set to \"%s\".\n"),
-			   pg_encoding_to_char(encodingid));
-	}
-	else if (!encoding)
-	{
-		int			ctype_enc;
-
-		ctype_enc = pg_get_encoding_from_locale(lc_ctype, true);
-
-		if (ctype_enc == -1)
-		{
-			/* Couldn't recognize the locale's codeset */
-			pg_log_error("could not find suitable encoding for locale \"%s\"",
-						 lc_ctype);
-			pg_log_error_hint("Rerun %s with the -E option.", progname);
-			pg_log_error_hint("Try \"%s --help\" for more information.", progname);
-			exit(1);
-		}
-		else if (!pg_valid_server_encoding_id(ctype_enc))
-		{
-			/*
-			 * We recognized it, but it's not a legal server encoding. On
-			 * Windows, UTF-8 works with any locale, so we can fall back to
-			 * UTF-8.
-			 */
-#ifdef WIN32
-			encodingid = PG_UTF8;
-			printf(_("Encoding \"%s\" implied by locale is not allowed as a server-side encoding.\n"
-					 "The default database encoding will be set to \"%s\" instead.\n"),
-				   pg_encoding_to_char(ctype_enc),
-				   pg_encoding_to_char(encodingid));
-#else
-			pg_log_error("locale \"%s\" requires unsupported encoding \"%s\"",
-						 lc_ctype, pg_encoding_to_char(ctype_enc));
-			pg_log_error_detail("Encoding \"%s\" is not allowed as a server-side encoding.",
-								pg_encoding_to_char(ctype_enc));
-			pg_log_error_hint("Rerun %s with a different locale selection.",
-							  progname);
-			exit(1);
-#endif
-		}
-		else
-		{
-			encodingid = ctype_enc;
-			printf(_("The default database encoding has accordingly been set to \"%s\".\n"),
-				   pg_encoding_to_char(encodingid));
-		}
-	}
-	else
-		encodingid = get_encoding_id(encoding);
-
-	if (!check_locale_encoding(lc_ctype, encodingid) ||
-		!check_locale_encoding(lc_collate, encodingid))
-		exit(1);				/* check_locale_encoding printed the error */
-
-	if (locale_provider == COLLPROVIDER_ICU &&
-		!check_icu_locale_encoding(encodingid))
-		exit(1);
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 164fc11cbffc8c466f84d51c07106b602d022bc6..3a5166408ce2ef7831d7bd80e1ec95cb8867517a 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -102,12 +102,14 @@ if ($ENV{with_icu} eq 'yes')
 		qr/initdb: error: ICU locale must be specified/,
 		'locale provider ICU requires --icu-locale');
 
-	command_ok(
+	command_like(
 		[
 			'initdb',                '--no-sync',
+			'-A',                    'trust',
 			'--locale-provider=icu', '--icu-locale=en',
 			"$tempdir/data3"
 		],
+		qr/Encoding "UTF8" implied by locale will be set as the default database encoding/,
 		'option --icu-locale');
 
 	command_fails_like(
@@ -123,28 +125,41 @@ if ($ENV{with_icu} eq 'yes')
 		[
 			'initdb',                '--no-sync',
 			'--locale-provider=icu', '--encoding=SQL_ASCII',
-			'--icu-locale=en', "$tempdir/dataX"
+			"$tempdir/dataX"
 		],
 		qr/error: encoding mismatch/,
 		'fails for encoding not supported by ICU');
 }
 else
 {
-	command_fails(
+	command_fails_like(
 		[ 'initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2" ],
+		qr/error: ICU is not supported in this build/,
 		'locale provider ICU fails since no ICU support');
+
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--encoding=SQL_ASCII',
+			"$tempdir/data2"
+		],
+		qr/error: ICU is not supported in this build/,
+		'locale provider ICU with not supported ICU encoding fails since no ICU support'
+	);
 }
 
-command_fails(
+command_fails_like(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
+	qr/error: unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
-command_fails(
+command_fails_like(
 	[
 		'initdb',                 '--no-sync',
 		'--locale-provider=libc', '--icu-locale=en',
 		"$tempdir/dataX"
 	],
+	qr/error: --icu-locale cannot be specified unless locale provider "icu" is chosen/,
 	'fails for invalid option combination');
 
 done_testing();
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index a74bf3b0d8bb9642d0b24f5781d5794727466a88..45b44fce460376240a2617fa207e44e686439fab 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -30,11 +30,12 @@ if ($ENV{with_icu} eq 'yes')
 	# This fails because template0 uses libc provider and has no ICU
 	# locale set.  It would succeed if template0 used the icu
 	# provider.  XXX Maybe split into multiple tests?
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu', 'foobar4'
 		],
+		qr/ERROR:  ICU locale must be specified/,
 		'create database with ICU fails without ICU locale specified');
 
 	$node->issues_sql_like(
@@ -46,12 +47,13 @@ if ($ENV{with_icu} eq 'yes')
 		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
 		'create database with ICU locale specified');
 
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu',
 			'--icu-locale=@colNumeric=lower', 'foobarX'
 		],
+		qr/ERROR:  could not open collator for locale/,
 		'fails for invalid ICU locale');
 
 	$node->command_fails_like(
@@ -78,18 +80,39 @@ if ($ENV{with_icu} eq 'yes')
 }
 else
 {
-	$node->command_fails(
+	$node->command_fails_like(
 		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		qr/ERROR:  ICU is not supported in this build/,
 		'create database with ICU fails since no ICU support');
+
+	$node->command_fails_like(
+		[
+			'createdb',             '-T',
+			'template0',            '--locale-provider=icu',
+			'--encoding=SQL_ASCII', 'foobar4'
+		],
+		qr/ERROR:  ICU is not supported in this build/,
+		'create database with ICU and not supported ICU encoding fails since no ICU support'
+	);
 }
 
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
-$node->command_fails(
+$node->command_fails_like(
 	[ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	qr/ERROR:  unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
+$node->command_fails_like(
+	[
+		'createdb',        '-T',
+		'template0',       '--locale-provider=libc',
+		'--icu-locale=en', 'foobarX'
+	],
+	qr/ERROR:  ICU locale cannot be specified unless locale provider is ICU/,
+	'fails for invalid option combination');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a87594212364aaac3337ef07188dd291d62299d7..6e8af399769fdb257a65446b3c476bbfeca49f33 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -104,8 +104,8 @@ extern char *get_collation_actual_version(char collprovider, const char *collcol
 #ifdef USE_ICU
 extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
 extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
 extern void check_icu_locale(const char *icu_locale);
+#endif
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,

Reply via email to