Re: collation-related loose ends before beta2
On 6/20/23 5:02 AM, Jeff Davis wrote: Status on collation loose ends: 1. There's an open item "Switch to ICU for 17". It's a little bit confusing exactly what that means, and the CF entry refers to two items, one of which is the build-time default to --with-icu. As far as I know, building with ICU by default is a settled issue with no objections. The second issue is the initdb default, which is covered by the other open item. So I will just close that open item unless someone thinks I'm missing something. [RMT Hat] No objections. The RMT had interpreted this as "Punt on making ICU the building default to v17" but it seems the consensus is to continue to leave it in as the default for v16. 2. Open item about the unfriendly rules for choosing an ICU locale at initdb time. Tom, Robert, and Daniel Verite have expressed concerns (and at least one objection) to initdb defaulting to icu for --locale- provider. Some of the problems have been addressed, but the issue about C and C.UTF-8 locales is not settled. Even if it were settled I'm not sure we'd have a clear consensus on all the details. I don't think this should proceed to beta2 in this state, so I intend to revert back to libc as the default for initdb. [ I believe we do have a general consensus that ICU is better, but we can signal it other ways: through documentation, packaging, etc. ] [Personal hat] (Building...) I do think this raises a good point: it's really the packaging that will guide what users are using for v16. I don't know if we want to discuss/poll the packagers to see what they are thinking about this? 3. The ICU conversion from "C" to "en-US-u-va-posix": cut out this code (it was a small part of a larger change). It's only purpose was consistency between ICU versions, and nobody liked it. It's only here right now to avoid test failures due to an order-of-commits issue; but if the initdb default goes back to libc it won't matter and I can remove it. 4. icu_validation_level WARNING or ERROR: right now an invalid ICU locale raises a WARNING, but Peter Eisentraut would prefer an ERROR. I'm still inclined to leave it as a WARNING for one release and increase it to ERROR later. But if the default collation provider goes back to libc, the risk of ICU validation errors goes way down, so I don't object if Peter would like to change it back to an ERROR. [Personal hat] I'd be inclined for "WARNING" until getting a sense of what packagers who do an initdb as part of the installation process decide what collation provider they're going to use. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: collation-related loose ends before beta2
On Tue, 2023-06-20 at 12:16 -0400, Tom Lane wrote: > Jeff Davis writes: > > Status on collation loose ends: > > This all sounds good to me. Patches attached. 0001 also removes the code to get a default locale when ICU is being used, because that was a part of the same commit that changed the default provider to be ICU and I don't see a lot of value in keeping just that part. I'm planning to commit something similar to the attached patches tomorrow (Wednesday) unless I get more input. Regards, Jeff Davis From 1aaac6e154d9a1e4f728d732fd41ee263db6903e Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 20 Jun 2023 12:03:26 -0700 Subject: [PATCH v1 1/2] initdb: change default --locale-provider back to libc. Reverts 27b62377b4. Discussion: https://postgr.es/m/eff031036baa07f325de29215371a4c9e69d61f3.ca...@j-davis.com Discussion: https://postgr.es/m/3353947.1682092...@sss.pgh.pa.us --- doc/src/sgml/ref/initdb.sgml | 42 +++ src/bin/initdb/initdb.c | 23 +- src/bin/initdb/t/001_initdb.pl| 5 +++ src/bin/pg_dump/t/002_pg_dump.pl | 2 +- src/bin/scripts/t/020_createdb.pl | 2 +- src/test/icu/t/010_database.pl| 2 +- .../regress/expected/collate.icu.utf8.out | 4 +- src/test/regress/sql/collate.icu.utf8.sql | 4 +- 8 files changed, 29 insertions(+), 55 deletions(-) diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index f850dc404d..22f1011781 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -93,24 +93,10 @@ PostgreSQL documentation - By default, initdb uses the ICU library to provide - locale services if the server was built with ICU support; otherwise it uses - the libc locale provider (see ). To choose the specific ICU locale ID to - apply, use the option --icu-locale. Note that for - implementation reasons and to support legacy code, - initdb will still select and initialize libc locale - settings when the ICU locale provider is used. - - - - Alternatively, initdb can use the locale provider - libc. To select this option, specify - --locale-provider=libc, or build the server without ICU - support. The libc locale provider takes the locale - settings from the environment, and determines the encoding from the locale - settings. This is almost always sufficient, unless there are special - requirements. + By default, initdb uses the locale provider + libc (see ). The + libc locale provider takes the locale settings from the + environment, and determines the encoding from the locale settings. @@ -122,6 +108,16 @@ PostgreSQL documentation this should be used with care. + + Alternatively, initdb can use the ICU library to provide + locale services by specifying --locale-provider=icu. The + server must be built with ICU support. To choose the specific ICU locale ID + to apply, use the option --icu-locale. Note that for + implementation reasons and to support legacy code, + initdb will still select and initialize libc locale + settings when the ICU locale provider is used. + + When initdb runs, it will print out the locale settings it has chosen. If you have complex requirements or specified multiple @@ -251,11 +247,6 @@ PostgreSQL documentation Specifies the ICU locale when the ICU provider is used. Locale support is described in . - -If this option is not specified, the locale is inherited from the -environment in which initdb runs. The environment's -locale is matched to a similar ICU locale name, if possible. - @@ -330,9 +321,8 @@ PostgreSQL documentation This option sets the locale provider for databases created in the new cluster. It can be overridden in the CREATE DATABASE command when new databases are subsequently -created. The default is icu if the server was -built with ICU support; otherwise the default is -libc (see ). +created. The default is libc (see ). diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 71a3d26c37..fa3af0d75c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -143,11 +143,7 @@ static char *lc_monetary = NULL; static char *lc_numeric = NULL; static char *lc_time = NULL; static char *lc_messages = NULL; -#ifdef USE_ICU -static char locale_provider = COLLPROVIDER_ICU; -#else static char locale_provider = COLLPROVIDER_LIBC; -#endif static char *icu_locale = NULL; static char *icu_rules = NULL; static const char *default_text_search_config = NULL; @@ -2357,19 +2353,6 @@ icu_validate_locale(const char *loc_str) #endif } -/* - * Determine the default ICU locale - */ -static char * -default_icu_locale(void) -{ -#ifdef USE_ICU - return pg_strdup(uloc_getDefault()); -#else - pg_fa
Re: collation-related loose ends before beta2
Jeff Davis writes: > Status on collation loose ends: This all sounds good to me. regards, tom lane
collation-related loose ends before beta2
Status on collation loose ends: 1. There's an open item "Switch to ICU for 17". It's a little bit confusing exactly what that means, and the CF entry refers to two items, one of which is the build-time default to --with-icu. As far as I know, building with ICU by default is a settled issue with no objections. The second issue is the initdb default, which is covered by the other open item. So I will just close that open item unless someone thinks I'm missing something. 2. Open item about the unfriendly rules for choosing an ICU locale at initdb time. Tom, Robert, and Daniel Verite have expressed concerns (and at least one objection) to initdb defaulting to icu for --locale- provider. Some of the problems have been addressed, but the issue about C and C.UTF-8 locales is not settled. Even if it were settled I'm not sure we'd have a clear consensus on all the details. I don't think this should proceed to beta2 in this state, so I intend to revert back to libc as the default for initdb. [ I believe we do have a general consensus that ICU is better, but we can signal it other ways: through documentation, packaging, etc. ] 3. The ICU conversion from "C" to "en-US-u-va-posix": cut out this code (it was a small part of a larger change). It's only purpose was consistency between ICU versions, and nobody liked it. It's only here right now to avoid test failures due to an order-of-commits issue; but if the initdb default goes back to libc it won't matter and I can remove it. 4. icu_validation_level WARNING or ERROR: right now an invalid ICU locale raises a WARNING, but Peter Eisentraut would prefer an ERROR. I'm still inclined to leave it as a WARNING for one release and increase it to ERROR later. But if the default collation provider goes back to libc, the risk of ICU validation errors goes way down, so I don't object if Peter would like to change it back to an ERROR. Regards, Jeff Davis