Re: Move defaults toward ICU in 16?

Jeff Davis Fri, 03 Mar 2023 21:45:34 -0800

On Thu, 2023-03-02 at 10:37 +0100, Peter Eisentraut wrote:
> I would skip this.  There was a brief discussion about this at [0], 
> where I pointed out that if we are going to do something like that, 
> there would be other candidates among the optional dependencies to 
> promote, such as certainly openssl and arguably lz4.  If we don't do 
> this consistently across dependencies, then there will be confusion.


The difference is that ICU affects semantics of collations, and
collations are not really an optional feature. If postgres is built
without ICU, that will affect the default at initdb time (after patch
003, anyway), which will then affect the default collations in all
databases.

> In practice, I don't think it matters.  Almost all installations are 
> made by packagers, who will make their own choices.

Mostly true, but the discussion at [0] reveals that some people do
build postgresql themselves for whatever reason.

When I first started out with postgres I always built from source. That
was quite a while ago, so maybe that means nothing; but it would be sad
to think that the build-it-yourself experience doesn't matter.

> >    0002: update template0 in new cluster (as described above)
> 
> This makes sense.  I'm confused what the code originally wanted to 
> achieve, e.g.,
> 
> -/*
> - * Check that every database that already exists in the new cluster
> is
> - * compatible with the corresponding database in the old one.
> - */
> -static void
> -check_databases_are_compatible(void)
> 
> Was there once support for the new cluster having additional
> databases 
> in place?  Weird.

It looks like 33755e8edf was the last significant change here. CC
Heikki for comment.

> In any case, I think you can remove additional code from
> get_db_infos() 
> related to fields that are no longer used, such as db_encoding, and
> the 
> corresponding struct fields in DbInfo.

You're right: there's not much of an intersection between the code that
needs a DbInfo and the code that needs the locale fields. I created a
separate DbLocaleInfo struct for the template0 locale information, and
removed the locale fields from DbInfo.

I also added a TAP test.

> >    0003: default initdb to use ICU
> 
> What are the changes in the citext tests about?

There's a test in citext_utf8.sql for:

  SELECT 'i'::citext = 'İ'::citext AS t;

citext_eq uses DEFAULT_COLLATION_OID, comparing the results after
applying lower(). Apparently:

  lower('İ' collate "en_US") = 'i' -- true
  lower('İ' collate "tr-TR-x-icu") = 'i' -- true
  lower('İ' collate "en-US-x-icu") = 'i' -- false

the test was passing before because it seems to be true for all libc
locales. But for ICU, it seems to only be true in the "tr-TR" locale at
colstrength=secondary (and true for other ICU locales at
colstrength=primary).

We can't fix the test by being explicit about the collation, because
citext doesn't pass it down; it always uses the default collation. We
could fix citext to pass it down properly, but that seems like a
different patch.

In any case, citext doesn't seem very important to those using ICU (we
have a doc note suggesting ICU instead), so I don't see a strong reason
to test the combination. So, I just exit the test early if it's ICU. I
added a better comment.


>   Is it the same issue as 
> in unaccent?  In that case, the OR should be an AND?  Maybe add a
> comment?
> 
> Why is unaccent is "broken" if the default collation is provided by
> ICU 
> and LC_CTYPE=C?  Is that a general problem?  Should we prevent this 
> combination?

A different issue: unaccent is calling t_isspace(), which is just not
properly handling locales. t_isspace() always passes NULL as the last
argument to char2wchar. There are TODO comments throughout that file.

Specifically what happens:
  lc_ctype_is_c(DEFAULT_COLLATION_OID) returns false
  so it calls char2wchar(), which calls mbstowcs()
  which returns an error because the LC_CTYPE=C

Right now, that's a longstanding issue for all users of t_isspace() and
related functions: tsearch, ltree, pg_trgm, dict_xsyn, and unaccent. I
assume it was known and considered unimportant, otherwise we wouldn't
have left the TODO comments in there.

I believe it's only a problem when the provider is ICU and the LC_CTYPE
is C. I think a quick fix would be to just test LC_CTYPE directly (from
the environment or setlocale(LC_CTYPE, NULL)) rather than try to
extract it from the default collation. It sounds like a separate patch,
and should be handled as a bugfix and backported.

A better fix would be to support character classification in ICU. I
don't think that's hard, but ICU has quite a few options, and we'd need
to discuss which are the right ones to support. We may also want to
pass collation information down rather than just using the database
default, but that may depend on the caller and we should discuss that,
as well.

> What are the changes in the ecpg tests about?  Looks harmless, but if
> there is a need, maybe it should be commented somewhere, otherwise
> what 
> prevents someone from changing it back?

ICU is not compatible with SQL_ASCII, so I had to remove the
ENCODING=SQL_ASCII line from the ecpg test build. CC Michael Meskes who
added the line in 1fa6be6f69 in case he has a comment.

But when I did that, I got CI failures on windows because it couldn't
transcode between LATIN1 and WIN1252. So I changed the ecpg test to
just use SQL_ASCII for the client_encoding (not the server encoding).
Michael Meskes added the client_encoding parameter test in 5e7710e725,
so he might have a comment about that as well.

Since I removed the code, I didn't see a clear place to add a comment,
but if you have a suggestion I'll take it.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

From 43bf9a98620c9c91a65b78f9f929a50c3d11efc5 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Wed, 8 Feb 2023 12:06:26 -0800
Subject: [PATCH v5 3/3] Use ICU by default at initdb time.

If the ICU locale is not specified, initialize the default collator
and retrieve the locale name from that.

Discussion: https://postgr.es/m/510d284759f6e943ce15096167760b2edcb2e700.ca...@j-davis.com
---
 contrib/citext/expected/citext_utf8.out       |  9 ++-
 contrib/citext/expected/citext_utf8_1.out     |  9 ++-
 contrib/citext/sql/citext_utf8.sql            |  9 ++-
 contrib/unaccent/expected/unaccent.out        |  9 +++
 contrib/unaccent/expected/unaccent_1.out      |  8 ++
 contrib/unaccent/sql/unaccent.sql             | 11 +++
 doc/src/sgml/ref/initdb.sgml                  | 48 +++++++-----
 src/bin/initdb/Makefile                       |  4 +-
 src/bin/initdb/initdb.c                       | 76 ++++++++++++++++++-
 src/bin/initdb/t/001_initdb.pl                |  7 +-
 src/bin/pg_dump/t/002_pg_dump.pl              |  2 +-
 src/bin/scripts/t/020_createdb.pl             |  2 +-
 src/interfaces/ecpg/test/Makefile             |  3 -
 src/interfaces/ecpg/test/connect/test5.pgc    |  2 +-
 .../ecpg/test/expected/connect-test5.c        |  2 +-
 .../ecpg/test/expected/connect-test5.stderr   |  2 +-
 src/interfaces/ecpg/test/meson.build          |  1 -
 src/test/icu/t/010_database.pl                |  2 +-
 18 files changed, 164 insertions(+), 42 deletions(-)
 create mode 100644 contrib/unaccent/expected/unaccent_1.out

diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out
index 666b07ccec..77b4586d8f 100644
--- a/contrib/citext/expected/citext_utf8.out
+++ b/contrib/citext/expected/citext_utf8.out
@@ -1,9 +1,16 @@
 /*
  * This test must be run in a database with UTF-8 encoding
  * and a Unicode-aware locale.
+ *
+ * Also disable this file for ICU, because the test for the the
+ * Turkish dotted I is not correct for many ICU locales. citext always
+ * uses the default collation, so it's not easy to restrict the test
+ * to the "tr-TR-x-icu" collation where it will succeed.
  */
 SELECT getdatabaseencoding() <> 'UTF8' OR
-       current_setting('lc_ctype') = 'C'
+       current_setting('lc_ctype') = 'C' OR
+       (SELECT datlocprovider='i' FROM pg_database
+        WHERE datname=current_database())
        AS skip_test \gset
 \if :skip_test
 \quit
diff --git a/contrib/citext/expected/citext_utf8_1.out b/contrib/citext/expected/citext_utf8_1.out
index 433e985349..d1e1fe1a9d 100644
--- a/contrib/citext/expected/citext_utf8_1.out
+++ b/contrib/citext/expected/citext_utf8_1.out
@@ -1,9 +1,16 @@
 /*
  * This test must be run in a database with UTF-8 encoding
  * and a Unicode-aware locale.
+ *
+ * Also disable this file for ICU, because the test for the the
+ * Turkish dotted I is not correct for many ICU locales. citext always
+ * uses the default collation, so it's not easy to restrict the test
+ * to the "tr-TR-x-icu" collation where it will succeed.
  */
 SELECT getdatabaseencoding() <> 'UTF8' OR
-       current_setting('lc_ctype') = 'C'
+       current_setting('lc_ctype') = 'C' OR
+       (SELECT datlocprovider='i' FROM pg_database
+        WHERE datname=current_database())
        AS skip_test \gset
 \if :skip_test
 \quit
diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql
index d068000b42..8530c68dd7 100644
--- a/contrib/citext/sql/citext_utf8.sql
+++ b/contrib/citext/sql/citext_utf8.sql
@@ -1,10 +1,17 @@
 /*
  * This test must be run in a database with UTF-8 encoding
  * and a Unicode-aware locale.
+ *
+ * Also disable this file for ICU, because the test for the the
+ * Turkish dotted I is not correct for many ICU locales. citext always
+ * uses the default collation, so it's not easy to restrict the test
+ * to the "tr-TR-x-icu" collation where it will succeed.
  */
 
 SELECT getdatabaseencoding() <> 'UTF8' OR
-       current_setting('lc_ctype') = 'C'
+       current_setting('lc_ctype') = 'C' OR
+       (SELECT datlocprovider='i' FROM pg_database
+        WHERE datname=current_database())
        AS skip_test \gset
 \if :skip_test
 \quit
diff --git a/contrib/unaccent/expected/unaccent.out b/contrib/unaccent/expected/unaccent.out
index ee0ac71a1c..cef98ee60c 100644
--- a/contrib/unaccent/expected/unaccent.out
+++ b/contrib/unaccent/expected/unaccent.out
@@ -1,3 +1,12 @@
+-- unaccent is broken if the default collation is provided by ICU and
+-- LC_CTYPE=C
+SELECT current_setting('lc_ctype') = 'C' AND
+       (SELECT datlocprovider='i' FROM pg_database
+        WHERE datname=current_database())
+	AS skip_test \gset
+\if :skip_test
+\quit
+\endif
 CREATE EXTENSION unaccent;
 -- must have a UTF8 database
 SELECT getdatabaseencoding();
diff --git a/contrib/unaccent/expected/unaccent_1.out b/contrib/unaccent/expected/unaccent_1.out
new file mode 100644
index 0000000000..0a4a3838ab
--- /dev/null
+++ b/contrib/unaccent/expected/unaccent_1.out
@@ -0,0 +1,8 @@
+-- unaccent is broken if the default collation is provided by ICU and
+-- LC_CTYPE=C
+SELECT current_setting('lc_ctype') = 'C' AND
+       (SELECT datlocprovider='i' FROM pg_database
+        WHERE datname=current_database())
+	AS skip_test \gset
+\if :skip_test
+\quit
diff --git a/contrib/unaccent/sql/unaccent.sql b/contrib/unaccent/sql/unaccent.sql
index 3fc0c706be..027dfb964a 100644
--- a/contrib/unaccent/sql/unaccent.sql
+++ b/contrib/unaccent/sql/unaccent.sql
@@ -1,3 +1,14 @@
+
+-- unaccent is broken if the default collation is provided by ICU and
+-- LC_CTYPE=C
+SELECT current_setting('lc_ctype') = 'C' AND
+       (SELECT datlocprovider='i' FROM pg_database
+        WHERE datname=current_database())
+	AS skip_test \gset
+\if :skip_test
+\quit
+\endif
+
 CREATE EXTENSION unaccent;
 
 -- must have a UTF8 database
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 5b2bdac101..4f37386ea3 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -89,10 +89,28 @@ PostgreSQL documentation
    and character set encoding. These can also be set separately for each
    database when it is created. <command>initdb</command> determines those
    settings for the template databases, which will serve as the default for
-   all other databases.  By default, <command>initdb</command> uses the
-   locale provider <literal>libc</literal>, takes the locale settings from
-   the environment, and determines the encoding from the locale settings.
-   This is almost always sufficient, unless there are special requirements.
+   all other databases.
+  </para>
+
+  <para>
+   By default, <command>initdb</command> uses the ICU library to provide
+   locale services if the server was built with ICU support; otherwise it uses
+   the <literal>libc</literal> locale provider (see <xref
+   linkend="locale-providers"/>). To choose the specific ICU locale ID to
+   apply, use the option <option>--icu-locale</option>.  Note that for
+   implementation reasons and to support legacy code,
+   <command>initdb</command> will still select and initialize libc locale
+   settings when the ICU locale provider is used.
+  </para>
+
+  <para>
+   Alternatively, <command>initdb</command> can use the locale provider
+   <literal>libc</literal>. To select this option, specify
+   <literal>--locale-provider=libc</literal>, or build the server without ICU
+   support. The <literal>libc</literal> locale provider takes the locale
+   settings from the environment, and determines the encoding from the locale
+   settings.  This is almost always sufficient, unless there are special
+   requirements.
   </para>
 
   <para>
@@ -103,17 +121,6 @@ PostgreSQL documentation
    categories can give nonsensical results, so this should be used with care.
   </para>
 
-  <para>
-   Alternatively, the ICU library can be used to provide locale services.
-   (Again, this only sets the default for subsequently created databases.)  To
-   select this option, specify <literal>--locale-provider=icu</literal>.
-   To choose the specific ICU locale ID to apply, use the option
-   <option>--icu-locale</option>.  Note that
-   for implementation reasons and to support legacy code,
-   <command>initdb</command> will still select and initialize libc locale
-   settings when the ICU locale provider is used.
-  </para>
-
   <para>
    When <command>initdb</command> runs, it will print out the locale settings
    it has chosen.  If you have complex requirements or specified multiple
@@ -234,7 +241,8 @@ PostgreSQL documentation
       <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
       <listitem>
        <para>
-        Specifies the ICU locale ID, if the ICU locale provider is used.
+        Specifies the ICU locale ID, if the ICU locale provider is used. By
+        default, ICU obtains the ICU locale from the ICU default collator.
        </para>
       </listitem>
      </varlistentry>
@@ -297,10 +305,12 @@ PostgreSQL documentation
       <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
-        This option sets the locale provider for databases created in the
-        new cluster.  It can be overridden in the <command>CREATE
+        This option sets the locale provider for databases created in the new
+        cluster.  It can be overridden in the <command>CREATE
         DATABASE</command> command when new databases are subsequently
-        created.  The default is <literal>libc</literal>.
+        created.  The default is <literal>icu</literal> if the server was
+        built with ICU support; otherwise the default is
+        <literal>libc</literal> (see <xref linkend="locale-providers"/>).
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index eab89c5501..d69bd89572 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/initdb
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-override CPPFLAGS := -I$(libpq_srcdir) -I$(top_srcdir)/src/timezone $(CPPFLAGS)
+override CPPFLAGS := -I$(libpq_srcdir) -I$(top_srcdir)/src/timezone $(ICU_CFLAGS) $(CPPFLAGS)
 
 # Note: it's important that we link to encnames.o from libpgcommon, not
 # from libpq, else we have risks of version skew if we run with a libpq
@@ -24,7 +24,7 @@ override CPPFLAGS := -I$(libpq_srcdir) -I$(top_srcdir)/src/timezone $(CPPFLAGS)
 # should ensure that that happens.
 #
 # We need libpq only because fe_utils does.
-LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) $(ICU_LIBS)
 
 # use system timezone data?
 ifneq (,$(with_system_tzdata))
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7a58c33ace..0776294499 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -53,6 +53,9 @@
 #include <netdb.h>
 #include <sys/socket.h>
 #include <sys/stat.h>
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
 #include <unistd.h>
 #include <signal.h>
 #include <time.h>
@@ -133,7 +136,11 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+#ifdef USE_ICU
+static char locale_provider = COLLPROVIDER_ICU;
+#else
 static char locale_provider = COLLPROVIDER_LIBC;
+#endif
 static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
@@ -2024,6 +2031,72 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+/*
+ * Check that ICU accepts the locale name; or if not specified, retrieve the
+ * default ICU locale.
+ */
+static void
+check_icu_locale()
+{
+#ifdef USE_ICU
+	UCollator	*collator;
+	UErrorCode   status;
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(icu_locale, &status);
+	if (U_FAILURE(status))
+	{
+		if (icu_locale)
+			pg_fatal("could not open collator for locale \"%s\": %s",
+					 icu_locale, u_errorName(status));
+		else
+			pg_fatal("could not open collator for default locale: %s",
+					 u_errorName(status));
+	}
+
+	/* if not specified, get locale from default collator */
+	if (icu_locale == NULL)
+	{
+		const char	*default_locale;
+
+		status = U_ZERO_ERROR;
+		default_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE,
+											  &status);
+		if (U_FAILURE(status))
+		{
+			ucol_close(collator);
+			pg_fatal("could not determine default ICU locale");
+		}
+
+		if (U_ICU_VERSION_MAJOR_NUM >= 54)
+		{
+			const bool	 strict = true;
+			char		*langtag;
+			int			 len;
+
+			len = uloc_toLanguageTag(default_locale, NULL, 0, strict, &status);
+			langtag = pg_malloc(len + 1);
+			status = U_ZERO_ERROR;
+			uloc_toLanguageTag(default_locale, langtag, len + 1, strict,
+							   &status);
+
+			if (U_FAILURE(status))
+			{
+				ucol_close(collator);
+				pg_fatal("could not determine language tag for default locale \"%s\": %s",
+						 default_locale, u_errorName(status));
+			}
+
+			icu_locale = langtag;
+		}
+		else
+			icu_locale = pg_strdup(default_locale);
+	}
+
+	ucol_close(collator);
+#endif
+}
+
 /*
  * set up the locale variables
  *
@@ -2077,8 +2150,7 @@ setlocales(void)
 
 	if (locale_provider == COLLPROVIDER_ICU)
 	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
+		check_icu_locale();
 
 		/*
 		 * In supported builds, the ICU locale ID will be checked by the
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 772769acab..e5d214e09c 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -97,11 +97,6 @@ SKIP:
 
 if ($ENV{with_icu} eq 'yes')
 {
-	command_fails_like(
-		[ 'initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2" ],
-		qr/initdb: error: ICU locale must be specified/,
-		'locale provider ICU requires --icu-locale');
-
 	command_ok(
 		[
 			'initdb',                '--no-sync',
@@ -116,7 +111,7 @@ if ($ENV{with_icu} eq 'yes')
 			'--locale-provider=icu', '--icu-locale=@colNumeric=lower',
 			"$tempdir/dataX"
 		],
-		qr/FATAL:  could not open collator for locale/,
+		qr/error: could not open collator for locale/,
 		'fails for invalid ICU locale');
 
 	command_fails_like(
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 187e4b8d07..9c354213ce 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -1758,7 +1758,7 @@ my %tests = (
 		create_sql =>
 		  "CREATE DATABASE dump_test2 LOCALE = 'C' TEMPLATE = template0;",
 		regexp => qr/^
-			\QCREATE DATABASE dump_test2 \E.*\QLOCALE = 'C';\E
+			\QCREATE DATABASE dump_test2 \E.*\QLOCALE = 'C'\E
 			/xm,
 		like => { pg_dumpall_dbprivs => 1, },
 	},
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 3ad4fbb00c..8ec58cdd64 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -13,7 +13,7 @@ program_version_ok('createdb');
 program_options_handling_ok('createdb');
 
 my $node = PostgreSQL::Test::Cluster->new('main');
-$node->init;
+$node->init(extra => ['--locale-provider=libc']);
 $node->start;
 
 $node->issues_sql_like(
diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile
index d7a7d1d1ca..cf841a3a5b 100644
--- a/src/interfaces/ecpg/test/Makefile
+++ b/src/interfaces/ecpg/test/Makefile
@@ -14,9 +14,6 @@ override CPPFLAGS := \
 	'-DSHELLPROG="$(SHELL)"' \
 	$(CPPFLAGS)
 
-# default encoding for regression tests
-ENCODING = SQL_ASCII
-
 ifneq ($(build_os),mingw32)
 abs_builddir := $(shell pwd)
 else
diff --git a/src/interfaces/ecpg/test/connect/test5.pgc b/src/interfaces/ecpg/test/connect/test5.pgc
index de29160089..d512553677 100644
--- a/src/interfaces/ecpg/test/connect/test5.pgc
+++ b/src/interfaces/ecpg/test/connect/test5.pgc
@@ -55,7 +55,7 @@ exec sql end declare section;
 	exec sql connect to 'unix:postgresql://localhost/ecpg2_regression' as main user :user USING "connectpw";
 	exec sql disconnect main;
 
-	exec sql connect to unix:postgresql://localhost/ecpg2_regression?connect_timeout=180&client_encoding=latin1 as main user regress_ecpg_user1/connectpw;
+	exec sql connect to unix:postgresql://localhost/ecpg2_regression?connect_timeout=180&client_encoding=sql_ascii as main user regress_ecpg_user1/connectpw;
 	exec sql disconnect main;
 
 	exec sql connect to "unix:postgresql://200.46.204.71/ecpg2_regression" as main user regress_ecpg_user1/connectpw;
diff --git a/src/interfaces/ecpg/test/expected/connect-test5.c b/src/interfaces/ecpg/test/expected/connect-test5.c
index c1124c627f..ec1514ed9a 100644
--- a/src/interfaces/ecpg/test/expected/connect-test5.c
+++ b/src/interfaces/ecpg/test/expected/connect-test5.c
@@ -117,7 +117,7 @@ main(void)
 #line 56 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression?connect_timeout=180 & client_encoding=latin1" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression?connect_timeout=180 & client_encoding=sql_ascii" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
 #line 58 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
diff --git a/src/interfaces/ecpg/test/expected/connect-test5.stderr b/src/interfaces/ecpg/test/expected/connect-test5.stderr
index 01a6a0a13b..51cc18916a 100644
--- a/src/interfaces/ecpg/test/expected/connect-test5.stderr
+++ b/src/interfaces/ecpg/test/expected/connect-test5.stderr
@@ -50,7 +50,7 @@
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_finish: connection main closed
 [NO_PID]: sqlca: code: 0, state: 00000
-[NO_PID]: ECPGconnect: opening database ecpg2_regression on <DEFAULT> port <DEFAULT> with options connect_timeout=180 & client_encoding=latin1 for user regress_ecpg_user1
+[NO_PID]: ECPGconnect: opening database ecpg2_regression on <DEFAULT> port <DEFAULT> with options connect_timeout=180 & client_encoding=sql_ascii for user regress_ecpg_user1
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_finish: connection main closed
 [NO_PID]: sqlca: code: 0, state: 00000
diff --git a/src/interfaces/ecpg/test/meson.build b/src/interfaces/ecpg/test/meson.build
index d0be73ccf9..04c6819a79 100644
--- a/src/interfaces/ecpg/test/meson.build
+++ b/src/interfaces/ecpg/test/meson.build
@@ -69,7 +69,6 @@ ecpg_test_files = files(
 ecpg_regress_args = [
   '--dbname=ecpg1_regression,ecpg2_regression',
   '--create-role=regress_ecpg_user1,regress_ecpg_user2',
-  '--encoding=SQL_ASCII',
 ]
 
 tests += {
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 80ab1c7789..45d77c319a 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -12,7 +12,7 @@ if ($ENV{with_icu} ne 'yes')
 }
 
 my $node1 = PostgreSQL::Test::Cluster->new('node1');
-$node1->init;
+$node1->init(extra => ['--locale-provider=libc']);
 $node1->start;
 
 $node1->safe_psql('postgres',
-- 
2.34.1

From 93dc706c706bc9799ec623c7b6bc32809e34bead Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Fri, 24 Feb 2023 08:55:11 -0800
Subject: [PATCH v5 2/3] pg_upgrade: copy locale and encoding information to
 new cluster.

Previously, pg_upgrade checked that the old and new clusters were
compatible, including the locale and encoding. But the new cluster was
just created, and only template0 from the new cluster will be
preserved (template1 and postgres are both recreated during the
upgrade process).

Because template0 is not sensitive to locale or encoding, just update
the pg_database entry to be the same as template0 from the old
cluster.

This commit makes it easier to change the default initdb locale or
encoding settings without causing needless incompatibilities.
---
 src/bin/pg_upgrade/Makefile            |   2 +
 src/bin/pg_upgrade/check.c             | 160 -------------------------
 src/bin/pg_upgrade/info.c              |  69 ++++++++---
 src/bin/pg_upgrade/meson.build         |   1 +
 src/bin/pg_upgrade/pg_upgrade.c        |  62 ++++++++++
 src/bin/pg_upgrade/pg_upgrade.h        |  12 +-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl |  56 ++++++++-
 7 files changed, 180 insertions(+), 182 deletions(-)

diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 7f8042f34a..5834513add 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -51,6 +51,8 @@ clean distclean maintainer-clean:
 	rm -rf delete_old_cluster.sh log/ tmp_check/ \
 	       reindex_hash.sql
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 7cf68dc9af..b71b00be37 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -16,9 +16,6 @@
 #include "pg_upgrade.h"
 
 static void check_new_cluster_is_empty(void);
-static void check_databases_are_compatible(void);
-static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb);
-static bool equivalent_locale(int category, const char *loca, const char *locb);
 static void check_is_install_user(ClusterInfo *cluster);
 static void check_proper_datallowconn(ClusterInfo *cluster);
 static void check_for_prepared_transactions(ClusterInfo *cluster);
@@ -33,7 +30,6 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
-static char *get_canonical_locale_name(int category, const char *locale);
 
 
 /*
@@ -194,7 +190,6 @@ check_new_cluster(void)
 	get_db_and_rel_infos(&new_cluster);
 
 	check_new_cluster_is_empty();
-	check_databases_are_compatible();
 
 	check_loadable_libraries();
 
@@ -349,94 +344,6 @@ check_cluster_compatibility(bool live_check)
 }
 
 
-/*
- * check_locale_and_encoding()
- *
- * Check that locale and encoding of a database in the old and new clusters
- * are compatible.
- */
-static void
-check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb)
-{
-	if (olddb->db_encoding != newdb->db_encoding)
-		pg_fatal("encodings for database \"%s\" do not match:  old \"%s\", new \"%s\"",
-				 olddb->db_name,
-				 pg_encoding_to_char(olddb->db_encoding),
-				 pg_encoding_to_char(newdb->db_encoding));
-	if (!equivalent_locale(LC_COLLATE, olddb->db_collate, newdb->db_collate))
-		pg_fatal("lc_collate values for database \"%s\" do not match:  old \"%s\", new \"%s\"",
-				 olddb->db_name, olddb->db_collate, newdb->db_collate);
-	if (!equivalent_locale(LC_CTYPE, olddb->db_ctype, newdb->db_ctype))
-		pg_fatal("lc_ctype values for database \"%s\" do not match:  old \"%s\", new \"%s\"",
-				 olddb->db_name, olddb->db_ctype, newdb->db_ctype);
-	if (olddb->db_collprovider != newdb->db_collprovider)
-		pg_fatal("locale providers for database \"%s\" do not match:  old \"%s\", new \"%s\"",
-				 olddb->db_name,
-				 collprovider_name(olddb->db_collprovider),
-				 collprovider_name(newdb->db_collprovider));
-	if ((olddb->db_iculocale == NULL && newdb->db_iculocale != NULL) ||
-		(olddb->db_iculocale != NULL && newdb->db_iculocale == NULL) ||
-		(olddb->db_iculocale != NULL && newdb->db_iculocale != NULL && strcmp(olddb->db_iculocale, newdb->db_iculocale) != 0))
-		pg_fatal("ICU locale values for database \"%s\" do not match:  old \"%s\", new \"%s\"",
-				 olddb->db_name,
-				 olddb->db_iculocale ? olddb->db_iculocale : "(null)",
-				 newdb->db_iculocale ? newdb->db_iculocale : "(null)");
-}
-
-/*
- * equivalent_locale()
- *
- * Best effort locale-name comparison.  Return false if we are not 100% sure
- * the locales are equivalent.
- *
- * Note: The encoding parts of the names are ignored. This function is
- * currently used to compare locale names stored in pg_database, and
- * pg_database contains a separate encoding field. That's compared directly
- * in check_locale_and_encoding().
- */
-static bool
-equivalent_locale(int category, const char *loca, const char *locb)
-{
-	const char *chara;
-	const char *charb;
-	char	   *canona;
-	char	   *canonb;
-	int			lena;
-	int			lenb;
-
-	/*
-	 * If the names are equal, the locales are equivalent. Checking this first
-	 * avoids calling setlocale() in the common case that the names are equal.
-	 * That's a good thing, if setlocale() is buggy, for example.
-	 */
-	if (pg_strcasecmp(loca, locb) == 0)
-		return true;
-
-	/*
-	 * Not identical. Canonicalize both names, remove the encoding parts, and
-	 * try again.
-	 */
-	canona = get_canonical_locale_name(category, loca);
-	chara = strrchr(canona, '.');
-	lena = chara ? (chara - canona) : strlen(canona);
-
-	canonb = get_canonical_locale_name(category, locb);
-	charb = strrchr(canonb, '.');
-	lenb = charb ? (charb - canonb) : strlen(canonb);
-
-	if (lena == lenb && pg_strncasecmp(canona, canonb, lena) == 0)
-	{
-		pg_free(canona);
-		pg_free(canonb);
-		return true;
-	}
-
-	pg_free(canona);
-	pg_free(canonb);
-	return false;
-}
-
-
 static void
 check_new_cluster_is_empty(void)
 {
@@ -460,35 +367,6 @@ check_new_cluster_is_empty(void)
 	}
 }
 
-/*
- * Check that every database that already exists in the new cluster is
- * compatible with the corresponding database in the old one.
- */
-static void
-check_databases_are_compatible(void)
-{
-	int			newdbnum;
-	int			olddbnum;
-	DbInfo	   *newdbinfo;
-	DbInfo	   *olddbinfo;
-
-	for (newdbnum = 0; newdbnum < new_cluster.dbarr.ndbs; newdbnum++)
-	{
-		newdbinfo = &new_cluster.dbarr.dbs[newdbnum];
-
-		/* Find the corresponding database in the old cluster */
-		for (olddbnum = 0; olddbnum < old_cluster.dbarr.ndbs; olddbnum++)
-		{
-			olddbinfo = &old_cluster.dbarr.dbs[olddbnum];
-			if (strcmp(newdbinfo->db_name, olddbinfo->db_name) == 0)
-			{
-				check_locale_and_encoding(olddbinfo, newdbinfo);
-				break;
-			}
-		}
-	}
-}
-
 /*
  * A previous run of pg_upgrade might have failed and the new cluster
  * directory recreated, but they might have forgotten to remove
@@ -1524,41 +1402,3 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 	else
 		check_ok();
 }
-
-
-/*
- * get_canonical_locale_name
- *
- * Send the locale name to the system, and hope we get back a canonical
- * version.  This should match the backend's check_locale() function.
- */
-static char *
-get_canonical_locale_name(int category, const char *locale)
-{
-	char	   *save;
-	char	   *res;
-
-	/* get the current setting, so we can restore it. */
-	save = setlocale(category, NULL);
-	if (!save)
-		pg_fatal("failed to get the current locale");
-
-	/* 'save' may be pointing at a modifiable scratch variable, so copy it. */
-	save = pg_strdup(save);
-
-	/* set the locale with setlocale, to see if it accepts it. */
-	res = setlocale(category, locale);
-
-	if (!res)
-		pg_fatal("failed to get system locale name for \"%s\"", locale);
-
-	res = pg_strdup(res);
-
-	/* restore old value. */
-	if (!setlocale(category, save))
-		pg_fatal("failed to restore old locale \"%s\"", save);
-
-	pg_free(save);
-
-	return res;
-}
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index c1399c09b9..33b10aac3c 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -20,6 +20,7 @@ static void create_rel_filename_map(const char *old_data, const char *new_data,
 static void report_unmatched_relation(const RelInfo *rel, const DbInfo *db,
 									  bool is_new_db);
 static void free_db_and_rel_infos(DbInfoArr *db_arr);
+static void get_template0_info(ClusterInfo *cluster);
 static void get_db_infos(ClusterInfo *cluster);
 static void get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo);
 static void free_rel_infos(RelInfoArr *rel_arr);
@@ -278,6 +279,7 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 	if (cluster->dbarr.dbs != NULL)
 		free_db_and_rel_infos(&cluster->dbarr);
 
+	get_template0_info(cluster);
 	get_db_infos(cluster);
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
@@ -293,6 +295,55 @@ get_db_and_rel_infos(ClusterInfo *cluster)
 }
 
 
+/*
+ * Get information about template0, which will be copied from the old cluster
+ * to the new cluster.
+ */
+static void
+get_template0_info(ClusterInfo *cluster)
+{
+	PGconn			*conn = connectToServer(cluster, "template1");
+	DbLocaleInfo	*locale;
+	PGresult		*dbres;
+	int				 i_datencoding;
+	int				 i_datlocprovider;
+	int				 i_datcollate;
+	int				 i_datctype;
+	int				 i_daticulocale;
+
+	dbres = executeQueryOrDie(conn,
+							  "SELECT encoding, datlocprovider, "
+							  "       datcollate, datctype, daticulocale "
+							  "FROM	pg_catalog.pg_database "
+							  "WHERE datname='template0'");
+
+	if (PQntuples(dbres) != 1)
+		pg_fatal("template0 not found");
+
+	locale = pg_malloc(sizeof(DbLocaleInfo));
+
+	i_datencoding = PQfnumber(dbres, "encoding");
+	i_datlocprovider = PQfnumber(dbres, "datlocprovider");
+	i_datcollate = PQfnumber(dbres, "datcollate");
+	i_datctype = PQfnumber(dbres, "datctype");
+	i_daticulocale = PQfnumber(dbres, "daticulocale");
+
+	locale->db_encoding = atoi(PQgetvalue(dbres, 0, i_datencoding));
+	locale->db_collprovider = PQgetvalue(dbres, 0, i_datlocprovider)[0];
+	locale->db_collate = pg_strdup(PQgetvalue(dbres, 0, i_datcollate));
+	locale->db_ctype = pg_strdup(PQgetvalue(dbres, 0, i_datctype));
+	if (PQgetisnull(dbres, 0, i_daticulocale))
+		locale->db_iculocale = NULL;
+	else
+		locale->db_iculocale = pg_strdup(PQgetvalue(dbres, 0, i_daticulocale));
+
+	cluster->template0 = locale;
+
+	PQclear(dbres);
+	PQfinish(conn);
+}
+
+
 /*
  * get_db_infos()
  *
@@ -309,11 +360,6 @@ get_db_infos(ClusterInfo *cluster)
 	DbInfo	   *dbinfos;
 	int			i_datname,
 				i_oid,
-				i_encoding,
-				i_datcollate,
-				i_datctype,
-				i_datlocprovider,
-				i_daticulocale,
 				i_spclocation;
 	char		query[QUERY_ALLOC];
 
@@ -337,11 +383,6 @@ get_db_infos(ClusterInfo *cluster)
 
 	i_oid = PQfnumber(res, "oid");
 	i_datname = PQfnumber(res, "datname");
-	i_encoding = PQfnumber(res, "encoding");
-	i_datcollate = PQfnumber(res, "datcollate");
-	i_datctype = PQfnumber(res, "datctype");
-	i_datlocprovider = PQfnumber(res, "datlocprovider");
-	i_daticulocale = PQfnumber(res, "daticulocale");
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
@@ -351,14 +392,6 @@ get_db_infos(ClusterInfo *cluster)
 	{
 		dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
 		dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
-		dbinfos[tupnum].db_encoding = atoi(PQgetvalue(res, tupnum, i_encoding));
-		dbinfos[tupnum].db_collate = pg_strdup(PQgetvalue(res, tupnum, i_datcollate));
-		dbinfos[tupnum].db_ctype = pg_strdup(PQgetvalue(res, tupnum, i_datctype));
-		dbinfos[tupnum].db_collprovider = PQgetvalue(res, tupnum, i_datlocprovider)[0];
-		if (PQgetisnull(res, tupnum, i_daticulocale))
-			dbinfos[tupnum].db_iculocale = NULL;
-		else
-			dbinfos[tupnum].db_iculocale = pg_strdup(PQgetvalue(res, tupnum, i_daticulocale));
 		snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
 				 PQgetvalue(res, tupnum, i_spclocation));
 	}
diff --git a/src/bin/pg_upgrade/meson.build b/src/bin/pg_upgrade/meson.build
index 18c27b4e72..12a97f84e2 100644
--- a/src/bin/pg_upgrade/meson.build
+++ b/src/bin/pg_upgrade/meson.build
@@ -38,6 +38,7 @@ tests += {
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
+    'env': {'with_icu': icu.found() ? 'yes' : 'no'},
     'tests': [
       't/001_basic.pl',
       't/002_pg_upgrade.pl',
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e5597d3105..8c6009151f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -51,6 +51,7 @@
 #include "fe_utils/string_utils.h"
 #include "pg_upgrade.h"
 
+static void set_locale_and_encoding(void);
 static void prepare_new_cluster(void);
 static void prepare_new_globals(void);
 static void create_new_objects(void);
@@ -139,6 +140,8 @@ main(int argc, char **argv)
 		   "Performing Upgrade\n"
 		   "------------------");
 
+	set_locale_and_encoding();
+
 	prepare_new_cluster();
 
 	stop_postmaster(false);
@@ -366,6 +369,65 @@ setup(char *argv0, bool *live_check)
 }
 
 
+/*
+ * Copy locale and encoding information into the new cluster's template0.
+ *
+ * We need to copy the encoding, datlocprovider, datcollate, datctype, and
+ * daticulocale. We don't need datcollversion because that's never set for
+ * template0.
+ */
+static void
+set_locale_and_encoding(void)
+{
+	PGconn		*conn_new_template1;
+	char		*datcollate_literal;
+	char		*datctype_literal;
+	char		*daticulocale_literal	= NULL;
+	DbLocaleInfo *locale = old_cluster.template0;
+
+	prep_status("Setting locale and encoding for new cluster");
+
+	/* escape literals with respect to new cluster */
+	conn_new_template1 = connectToServer(&new_cluster, "template1");
+
+	datcollate_literal = PQescapeLiteral(conn_new_template1,
+										 locale->db_collate,
+										 strlen(locale->db_collate));
+	datctype_literal = PQescapeLiteral(conn_new_template1,
+									   locale->db_ctype,
+									   strlen(locale->db_ctype));
+	if (locale->db_iculocale)
+		daticulocale_literal = PQescapeLiteral(conn_new_template1,
+											   locale->db_iculocale,
+											   strlen(locale->db_iculocale));
+	else
+		daticulocale_literal = pg_strdup("NULL");
+
+	/* update template0 in new cluster */
+	PQclear(executeQueryOrDie(conn_new_template1,
+							  "UPDATE pg_catalog.pg_database "
+							  "  SET encoding = %u, "
+							  "      datlocprovider = '%c', "
+							  "      datcollate = %s, "
+							  "      datctype = %s, "
+							  "      daticulocale = %s "
+							  "  WHERE datname = 'template0' ",
+							  locale->db_encoding,
+							  locale->db_collprovider,
+							  datcollate_literal,
+							  datctype_literal,
+							  daticulocale_literal));
+
+	PQfreemem(datcollate_literal);
+	PQfreemem(datctype_literal);
+	PQfreemem(daticulocale_literal);
+
+	PQfinish(conn_new_template1);
+
+	check_ok();
+}
+
+
 static void
 prepare_new_cluster(void)
 {
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 5f2a116f23..3eea0139c7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -175,13 +175,20 @@ typedef struct
 	char	   *db_name;		/* database name */
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 											 * path */
+	RelInfoArr	rel_arr;		/* array of all user relinfos */
+} DbInfo;
+
+/*
+ * Locale information about a database.
+ */
+typedef struct
+{
 	char	   *db_collate;
 	char	   *db_ctype;
 	char		db_collprovider;
 	char	   *db_iculocale;
 	int			db_encoding;
-	RelInfoArr	rel_arr;		/* array of all user relinfos */
-} DbInfo;
+} DbLocaleInfo;
 
 typedef struct
 {
@@ -252,6 +259,7 @@ typedef enum
 typedef struct
 {
 	ControlData controldata;	/* pg_control information */
+	DbLocaleInfo *template0;	/* template0 locale info */
 	DbInfoArr	dbarr;			/* dbinfos array */
 	char	   *pgdata;			/* pathname for cluster's $PGDATA directory */
 	char	   *pgconfig;		/* pathname for cluster's config file
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 62a8fa9d8b..e6990aafc4 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -90,15 +90,44 @@ my $oldnode =
   PostgreSQL::Test::Cluster->new('old_node',
 	install_path => $ENV{oldinstall});
 
+my %node_params = ();
+
 # To increase coverage of non-standard segment size and group access without
 # increasing test runtime, run these tests with a custom setting.
 # --allow-group-access and --wal-segsize have been added in v11.
-my %node_params = ();
-$node_params{extra} = [ '--wal-segsize', '1', '--allow-group-access' ]
+my $nonstandard = [];
+$nonstandard = ['--wal-segsize', '1', '--allow-group-access']
   if $oldnode->pg_version >= 11;
+$node_params{extra} = $nonstandard;
+
+# Test that pg_upgrade copies the locale settings of template0 from
+# the old to the new cluster.
+push(@{$node_params{extra}}, ('--encoding', 'UTF-8'));
+push(@{$node_params{extra}}, ('--locale-provider', 'icu'))
+  if $oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes';
+
 $oldnode->init(%node_params);
 $oldnode->start;
 
+my $result;
+
+my $original_encoding = "6"; # UTF-8
+$result = $oldnode->safe_psql(
+	'postgres', q{SELECT encoding FROM pg_database WHERE datname='template0'});
+is($result, $original_encoding,
+		"expected template0 in old cluster to have encoding '$original_encoding'"
+	);
+
+my $original_provider = "c";
+$original_provider = "i"
+  if $oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes';
+
+$result = $oldnode->safe_psql(
+	'postgres', q{SELECT datlocprovider FROM pg_database WHERE datname='template0'});
+is($result, $original_provider,
+		"expected template0 in old cluster to have locale provider '$original_provider'"
+	);
+
 # The default location of the source code is the root of this directory.
 my $srcdir = abs_path("../../..");
 
@@ -168,6 +197,16 @@ else
 
 # Initialize a new node for the upgrade.
 my $newnode = PostgreSQL::Test::Cluster->new('new_node');
+
+# Reset to original parameters.
+$node_params{extra} = $nonstandard;
+
+# The new cluster will be initialized with locale provider libc and
+# encoding SQL_ASCII, but these settings will be overwritten with
+# those of the old cluster.
+push(@{$node_params{extra}}, ('--encoding', 'SQL_ASCII'));
+push(@{$node_params{extra}}, ('--locale-provider', 'libc'));
+
 $newnode->init(%node_params);
 
 my $newbindir = $newnode->config_data('--bindir');
@@ -338,6 +377,19 @@ if (-d $log_path)
 	}
 }
 
+# Test that upgraded cluster has original locale settings.
+$result = $newnode->safe_psql(
+	'postgres', q{SELECT encoding FROM pg_database WHERE datname='template0'});
+is($result, $original_encoding,
+		"expected template0 in upgraded cluster to have encoding '$original_encoding'"
+	);
+
+$result = $newnode->safe_psql(
+	'postgres', q{SELECT datlocprovider FROM pg_database WHERE datname='template0'});
+is($result, $original_provider,
+		"expected template0 in upgraded cluster to have locale provider '$original_provider'"
+	);
+
 # Second dump from the upgraded instance.
 @dump_command = (
 	'pg_dumpall', '--no-sync', '-d', $newnode->connstr('postgres'),
-- 
2.34.1

From 949de097cdd5188a49f10b7f87fa1df4cecf2f63 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Fri, 10 Feb 2023 12:08:11 -0800
Subject: [PATCH v5 1/3] Build ICU support by default.

Discussion: https://postgr.es/m/510d284759f6e943ce15096167760b2edcb2e700.ca...@j-davis.com
---
 .cirrus.yml                    |  1 +
 configure                      | 36 ++++++----------
 configure.ac                   |  8 +++-
 doc/src/sgml/installation.sgml | 76 +++++++++++++++++++---------------
 4 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f212978752..34450a9c7b 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -751,6 +751,7 @@ task:
       time ./configure \
         --host=x86_64-w64-mingw32 \
         --enable-cassert \
+        --without-icu \
         CC="ccache x86_64-w64-mingw32-gcc" \
         CXX="ccache x86_64-w64-mingw32-g++"
       make -s -j${BUILD_JOBS} clean
diff --git a/configure b/configure
index e35769ea73..436c2446e8 100755
--- a/configure
+++ b/configure
@@ -1558,7 +1558,7 @@ Optional Packages:
                           set WAL block size in kB [8]
   --with-CC=CMD           set compiler (deprecated)
   --with-llvm             build with LLVM based JIT support
-  --with-icu              build with ICU support
+  --without-icu           build without ICU support
   --with-tcl              build Tcl modules (PL/Tcl)
   --with-tclconfig=DIR    tclConfig.sh is in DIR
   --with-perl             build Perl modules (PL/Perl)
@@ -8401,7 +8401,9 @@ $as_echo "#define USE_ICU 1" >>confdefs.h
   esac
 
 else
-  with_icu=no
+  with_icu=yes
+
+$as_echo "#define USE_ICU 1" >>confdefs.h
 
 fi
 
@@ -8470,31 +8472,17 @@ fi
 	# Put the nasty error message in config.log where it belongs
 	echo "$ICU_PKG_ERRORS" >&5
 
-	as_fn_error $? "Package requirements (icu-uc icu-i18n) were not met:
-
-$ICU_PKG_ERRORS
-
-Consider adjusting the PKG_CONFIG_PATH environment variable if you
-installed software in a non-standard prefix.
-
-Alternatively, you may set the environment variables ICU_CFLAGS
-and ICU_LIBS to avoid the need to call pkg-config.
-See the pkg-config man page for more details." "$LINENO" 5
+	as_fn_error $? "ICU library not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-icu to disable ICU support." "$LINENO" 5
 elif test $pkg_failed = untried; then
         { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
-	{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
-$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
-as_fn_error $? "The pkg-config script could not be found or is too old.  Make sure it
-is in your PATH or set the PKG_CONFIG environment variable to the full
-path to pkg-config.
-
-Alternatively, you may set the environment variables ICU_CFLAGS
-and ICU_LIBS to avoid the need to call pkg-config.
-See the pkg-config man page for more details.
-
-To get pkg-config, see <http://pkg-config.freedesktop.org/>.
-See \`config.log' for more details" "$LINENO" 5; }
+	as_fn_error $? "ICU library not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-icu to disable ICU support." "$LINENO" 5
 else
 	ICU_CFLAGS=$pkg_cv_ICU_CFLAGS
 	ICU_LIBS=$pkg_cv_ICU_LIBS
diff --git a/configure.ac b/configure.ac
index af23c15cb2..215e32120f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -853,13 +853,17 @@ AC_SUBST(enable_thread_safety)
 # ICU
 #
 AC_MSG_CHECKING([whether to build with ICU support])
-PGAC_ARG_BOOL(with, icu, no, [build with ICU support],
+PGAC_ARG_BOOL(with, icu, yes, [build without ICU support],
               [AC_DEFINE([USE_ICU], 1, [Define to build with ICU support. (--with-icu)])])
 AC_MSG_RESULT([$with_icu])
 AC_SUBST(with_icu)
 
 if test "$with_icu" = yes; then
-  PKG_CHECK_MODULES(ICU, icu-uc icu-i18n)
+  PKG_CHECK_MODULES(ICU, icu-uc icu-i18n, [],
+    [AC_MSG_ERROR([ICU library not found
+If you have ICU already installed, see config.log for details on the
+failure.  It is possible the compiler isn't looking in the proper directory.
+Use --without-icu to disable ICU support.])])
 fi
 
 #
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 5affb64d95..d4d0cbae43 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -146,6 +146,35 @@ documentation.  See standalone-profile.xsl for details.
       <application>pg_restore</application>.
      </para>
     </listitem>
+
+    <listitem>
+     <para>
+      The ICU locale provider (see <xref linkend="locale-providers"/>) is used by default. If you don't want to use it then you must specify the <option>--without-icu</option> option to <filename>configure</filename>. Using this option disables support for ICU collation features (see <xref linkend="collation"/>).
+     </para>
+     <para>
+      ICU support requires the <productname>ICU4C</productname> package to be
+      installed.  The minimum required version of
+      <productname>ICU4C</productname> is currently 4.2.
+     </para>
+
+     <para>
+      By default,
+      <productname>pkg-config</productname><indexterm><primary>pkg-config</primary></indexterm>
+      will be used to find the required compilation options.  This is
+      supported for <productname>ICU4C</productname> version 4.6 and later.
+      For older versions, or if <productname>pkg-config</productname> is not
+      available, the variables <envar>ICU_CFLAGS</envar> and
+      <envar>ICU_LIBS</envar> can be specified to
+      <filename>configure</filename>, like in this example:
+<programlisting>
+./configure ... ICU_CFLAGS='-I/some/where/include' ICU_LIBS='-L/some/where/lib -licui18n -licuuc -licudata'
+</programlisting>
+      (If <productname>ICU4C</productname> is in the default search path
+      for the compiler, then you still need to specify nonempty strings in
+      order to avoid use of <productname>pkg-config</productname>, for
+      example, <literal>ICU_CFLAGS=' '</literal>.)
+     </para>
+    </listitem>
    </itemizedlist>
   </para>
 
@@ -926,40 +955,6 @@ build-postgresql:
        </listitem>
       </varlistentry>
 
-      <varlistentry id="configure-option-with-icu">
-       <term><option>--with-icu</option></term>
-       <listitem>
-        <para>
-         Build with support for
-         the <productname>ICU</productname><indexterm><primary>ICU</primary></indexterm>
-         library, enabling use of ICU collation
-         features<phrase condition="standalone-ignore"> (see
-         <xref linkend="collation"/>)</phrase>.
-         This requires the <productname>ICU4C</productname> package
-         to be installed.  The minimum required version
-         of <productname>ICU4C</productname> is currently 4.2.
-        </para>
-
-        <para>
-         By default,
-         <productname>pkg-config</productname><indexterm><primary>pkg-config</primary></indexterm>
-         will be used to find the required compilation options.  This is
-         supported for <productname>ICU4C</productname> version 4.6 and later.
-         For older versions, or if <productname>pkg-config</productname> is
-         not available, the variables <envar>ICU_CFLAGS</envar>
-         and <envar>ICU_LIBS</envar> can be specified
-         to <filename>configure</filename>, like in this example:
-<programlisting>
-./configure ... --with-icu ICU_CFLAGS='-I/some/where/include' ICU_LIBS='-L/some/where/lib -licui18n -licuuc -licudata'
-</programlisting>
-         (If <productname>ICU4C</productname> is in the default search path
-         for the compiler, then you still need to specify nonempty strings in
-         order to avoid use of <productname>pkg-config</productname>, for
-         example, <literal>ICU_CFLAGS=' '</literal>.)
-        </para>
-       </listitem>
-      </varlistentry>
-
       <varlistentry id="configure-with-llvm">
        <term><option>--with-llvm</option></term>
        <listitem>
@@ -1231,6 +1226,19 @@ build-postgresql:
 
      <variablelist>
 
+      <varlistentry id="configure-option-without-icu">
+       <term><option>--without-icu</option></term>
+       <listitem>
+        <para>
+         Build without support for the
+         <productname>ICU</productname><indexterm><primary>ICU</primary></indexterm>
+         library, disabling the use of ICU collation features<phrase
+         condition="standalone-ignore"> (see <xref
+         linkend="collation"/>)</phrase>.
+        </para>
+       </listitem>
+      </varlistentry>
+
       <varlistentry id="configure-option-without-readline">
        <term><option>--without-readline</option></term>
        <listitem>
-- 
2.34.1

Re: Move defaults toward ICU in 16?

Reply via email to