Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-08-05 Thread Peter Eisentraut
On 8/1/17 11:28, Peter Eisentraut wrote:
> On 8/1/17 08:28, Victor Wagner wrote:
>> On Tue, 1 Aug 2017 08:16:54 -0400
>> Peter Eisentraut  wrote:
>>
>>> On 8/1/17 02:12, Victor Wagner wrote:
> We are only calling uloc_toLanguageTag() with keyword/value
> combinations that ICU itself previously told us were supported.  So
> just ignoring errors doesn't seem proper in this case.
>  
 We know that this version of ICU is broken. But what choice we
 have?  
>>> I don't know that we can already reach that conclusion.  Maybe the
>> Because it was fixed in subsequent versions.
> 
> I propose the attached patch to address this.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-08-01 Thread Peter Eisentraut
On 8/1/17 08:28, Victor Wagner wrote:
> On Tue, 1 Aug 2017 08:16:54 -0400
> Peter Eisentraut  wrote:
> 
>> On 8/1/17 02:12, Victor Wagner wrote:
 We are only calling uloc_toLanguageTag() with keyword/value
 combinations that ICU itself previously told us were supported.  So
 just ignoring errors doesn't seem proper in this case.
  
>>> We know that this version of ICU is broken. But what choice we
>>> have?  
>> I don't know that we can already reach that conclusion.  Maybe the
> Because it was fixed in subsequent versions.

I propose the attached patch to address this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From a9c89b7b1b98f0a0e3dc6d7571dfc7e7f146ed8d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 1 Aug 2017 10:49:55 -0400
Subject: [PATCH] Add support for ICU 4.2

Supporting ICU 4.2 seems useful because it ships with CentOS 6.

Versions before ICU 4.6 don't support pkg-config, so document an
installation method without using pkg-config.

In ICU 4.2, ucol_getKeywordsForLocale() sometimes returns values that
will not be accepted by uloc_toLanguageTag().  Skip loading keyword
variants in that version.

Reported-by: Victor Wagner 
---
 doc/src/sgml/installation.sgml   | 22 +++---
 src/backend/commands/collationcmds.c | 11 +++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index fa0d05efe6..12866b4bf7 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -774,10 +774,26 @@ Configuration
  Build with support for
  the ICUICU
  library.  This requires the ICU4C package
- as well
- as 
pkg-configpkg-config
  to be installed.  The minimum required version
- of ICU4C is currently 4.6.
+ of ICU4C is currently 4.2.
+
+
+
+ By default,
+ 
pkg-configpkg-config
+ will be used to find the required compilation options.  This is
+ supported for ICU4C version 4.6 and later.
+ For older versions, or if pkg-config is
+ not available, the variables ICU_CFLAGS
+ and ICU_LIBS can be specified
+ to configure, like in this example:
+
+./configure ... --with-icu ICU_CFLAGS='-I/some/where/include' 
ICU_LIBS='-L/some/where/lib -licui18n -licuuc -licudata'
+
+ (If ICU4C is in the default search path
+ for the compiler, then you still need to specify a nonempty string in
+ order to avoid use of pkg-config, for
+ example, ICU_CFLAGS=' '.)
 

   
diff --git a/src/backend/commands/collationcmds.c 
b/src/backend/commands/collationcmds.c
index b6c14c920d..a7b5871466 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -718,7 +718,17 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
/*
 * Add keyword variants
+*
+* In ICU 4.2, ucol_getKeywordsForLocale() sometimes 
returns
+* values that will not be accepted by 
uloc_toLanguageTag().  Skip
+* loading keyword variants in that version.  (Both
+* ucol_getKeywordValuesForLocale() and 
uloc_toLanguageTag() are
+* new in ICU 4.2, so older versions are not supported 
at all.)
+*
+* XXX We have no information about ICU 4.3 through 
4.7, but we
+* know the below works with 4.8.
 */
+#if U_ICU_VERSION_MAJOR_NUM > 4 || (U_ICU_VERSION_MAJOR_NUM == 4 && 
U_ICU_VERSION_MINOR_NUM > 2)
status = U_ZERO_ERROR;
en = ucol_getKeywordValuesForLocale("collation", name, 
TRUE, );
if (U_FAILURE(status))
@@ -765,6 +775,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
(errmsg("could not get keyword 
values for locale \"%s\": %s",
name, 
u_errorName(status;
uenum_close(en);
+#endif /* ICU >4.2 */
}
}
 #endif /* USE_ICU */
-- 
2.13.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-08-01 Thread Tom Lane
Victor Wagner  writes:
> Peter Eisentraut  wrote:
>> I don't know that we can already reach that conclusion.  Maybe the

> Because it was fixed in subsequent versions.

> And 4.2 is first version where this function appeared.
> So, we still have problems with SLES11 which use even older version and
> have to be supported at least two more years.

I think the answer is very clear: we do not need to support old broken
versions of ICU, especially not in our first release.  We'll have enough
to do making the code stable and performant with good versions of ICU.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-08-01 Thread Victor Wagner
On Tue, 1 Aug 2017 08:16:54 -0400
Peter Eisentraut  wrote:

> On 8/1/17 02:12, Victor Wagner wrote:
> >> We are only calling uloc_toLanguageTag() with keyword/value
> >> combinations that ICU itself previously told us were supported.  So
> >> just ignoring errors doesn't seem proper in this case.
> >>  
> > We know that this version of ICU is broken. But what choice we
> > have?  
> 
> I don't know that we can already reach that conclusion.  Maybe the

Because it was fixed in subsequent versions.

And 4.2 is first version where this function appeared.
So, we still have problems with SLES11 which use even older version and
have to be supported at least two more years.


> APIs have changed or we are not using them correctly.  Are there any
> bug reports or changelog entries related to this?  I don't think we
> should just give up and ignore errors.

We also can provide configuration option or command-line option for
initdb which would restrict list of languages for which collation
sequences would be created. This would help for all but two languages.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-08-01 Thread Peter Eisentraut
On 8/1/17 02:12, Victor Wagner wrote:
>> We are only calling uloc_toLanguageTag() with keyword/value
>> combinations that ICU itself previously told us were supported.  So
>> just ignoring errors doesn't seem proper in this case.
>>
> We know that this version of ICU is broken. But what choice we have?

I don't know that we can already reach that conclusion.  Maybe the APIs
have changed or we are not using them correctly.  Are there any bug
reports or changelog entries related to this?  I don't think we should
just give up and ignore errors.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-08-01 Thread Victor Wagner
On Mon, 31 Jul 2017 19:42:30 -0400
Peter Eisentraut  wrote:

> On 7/25/17 15:20, Victor Wagner wrote:
> > It turns out, that PostgreSQL enumerates collations for all ICU
> > locales and passes it into uloc_toLanguageTag function with strict
> > argument of this function set to TRUE. But for some locales
> > (es*@collation=tradtiional, si*@collation=dictionary) only call with
> > strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all
> > locales can be resolved with strict=TRUE.  
> 
> We are only calling uloc_toLanguageTag() with keyword/value
> combinations that ICU itself previously told us were supported.  So
> just ignoring errors doesn't seem proper in this case.
> 

We know that this version of ICU is broken. But what choice we have?
Locale code in the glibc is much more broken.
So we just have to work around bugs in underlaying  libraries anyway.
In case of ICU we just need to omit some collations.

It might cause incompatibility with newer systems where these
collations are used, but if we fall back to glibc locale, there would
be much more incompatibilites. And also there is bug in strxfrm, which
prevents use of abbreviated keys.

-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-07-31 Thread Peter Eisentraut
On 7/25/17 15:20, Victor Wagner wrote:
> It turns out, that PostgreSQL enumerates collations for all ICU locales
> and passes it into uloc_toLanguageTag function with strict argument of
> this function set to TRUE. But for some locales
> (es*@collation=tradtiional, si*@collation=dictionary) only call with
> strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
> can be resolved with strict=TRUE.

We are only calling uloc_toLanguageTag() with keyword/value combinations
that ICU itself previously told us were supported.  So just ignoring
errors doesn't seem proper in this case.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-07-26 Thread Victor Wagner
On Tue, 25 Jul 2017 16:50:38 -0400
Tom Lane  wrote:

> Victor Wagner  writes:
> > PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
> > (because it searches for libicu using pkgconfig, and pgkconfig
> > support significantly changed in ICU  version 4.6)  
> 
> > However, there are some reasons to build PostgreSQL with older
> > versions of ICU.   
> 
> No doubt, but making that happen seems like a feature, and IMO we're
> too late in the v10 beta test cycle to be taking new features on

May be PGDG people could integrate it as a patch for RPMS for
particular systems which are affected by the problem.

I'd rather say that it is bugfix. Relaxing too strict checking.
If we choose approach to allow non-strict language tags, it is oneline
patch.

If we choose more safe approach to ignore non-strict collations, it
would take about five lines 
1. Replace one ereport(ERROR with ereport(WARINING
2. Return special value (NULL) after issuing this warning
3. Handle this special value in calling function by continuing to next
loop iteration
4,5 - curly braces around 1 and 2. see patch at the end

> board, especially ones with inherently-large portability risks.  We

I don't think that there are so large portability risks. Patch can be
done such way that it would behave exactly as now if ICU version is 4.6
or above. Moreover, it is easy to make old ICU support separate
configure option (because another configure test is needed anyway).

> could maybe consider patches for this for v11 ... but will anyone
> still care by then?

Undoubtedly will. v11 is expedted to be released in 2018. And RHEL 6
is supported until 2020, and extended support would end in Nov 2023.

And it is possible that some derived distribution would last longer.


> (Concretely, my concern is that the alpha and beta testing that's
> happened so far hasn't covered pre-4.6 ICU at all.  I'd be surprised
> if the incompatibility you found so far is the only one.)

diff --git a/src/backend/commands/collationcmds.c 
b/src/backend/commands/collationcmds.c
index b6c14c9..9e5da98 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -442,11 +442,13 @@ get_icu_language_tag(const char *localename)
 
status = U_ZERO_ERROR;
uloc_toLanguageTag(localename, buf, sizeof(buf), TRUE, );
-   if (U_FAILURE(status))
-   ereport(ERROR,
+   if (U_FAILURE(status)) 
+   {
+   ereport(WARNING,
(errmsg("could not convert locale name \"%s\" 
to language tag: %s",
localename, 
u_errorName(status;
-
+   return NULL;
+   }
return pstrdup(buf);
 }

@@ -733,6 +735,10 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
char   *localeid = 
psprintf("%s@collation=%s", name, val);
 
langtag = get_icu_language_tag(localeid);
+   if (langtag == NULL) {
+   /* No such collation in library, skip */
+   continue;
+   }
collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? 
langtag : localeid;
 
/*






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-07-25 Thread Tom Lane
Victor Wagner  writes:
> PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
> (because it searches for libicu using pkgconfig, and pgkconfig support
> significantly changed in ICU  version 4.6)

> However, there are some reasons to build PostgreSQL with older versions
> of ICU. 

No doubt, but making that happen seems like a feature, and IMO we're too
late in the v10 beta test cycle to be taking new features on board,
especially ones with inherently-large portability risks.  We could maybe
consider patches for this for v11 ... but will anyone still care by then?

(Concretely, my concern is that the alpha and beta testing that's happened
so far hasn't covered pre-4.6 ICU at all.  I'd be surprised if the
incompatibility you found so far is the only one.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-07-25 Thread Victor Wagner
Collegues,

PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
(because it searches for libicu using pkgconfig, and pgkconfig support
significantly changed in ICU  version 4.6)

However, there are some reasons to build PostgreSQL with older versions
of ICU. 

For instance such Linux distributions as RHEL 6 ships ICU 4.2, and SLES
11 also ships older ICU. Sometimes, it is not feasible to compile newer
ICU from sources on the servers with these (and derived) distributions.

It is possible to compile PostgreSQL 10 with these versions of ICU
by specifying ICU_CFLAGS and ICU_LIBS explicitely.

However, backend startup fails with errors on some Spanish and
Singalese locales.

It turns out, that PostgreSQL enumerates collations for all ICU locales
and passes it into uloc_toLanguageTag function with strict argument of
this function set to TRUE. But for some locales
(es*@collation=tradtiional, si*@collation=dictionary) only call with
strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
can be resolved with strict=TRUE.

ICU docs state that if strict=FALSE, some locale fields can be
incomplete.

What solition is better:

1. Just skip locale/collation combinations which cannot be resolved
with strict=TRUE and issue warning instead of error if
uloc_toLanguageTag fails on some locale.

It would cause some ICU collations be missiong from the databases
running on the systems with old ICU.

2. If we see ICU version earlier than 4.6, pass strict=FALSE to
ucol_toLanguageTag.  It would cause databases on such systems use
fallback collation sequence for these collations.

Personally I prefer first solition, because I doubt that any of my
users would ever need Singalese collation, and probably they can
survive without traditional Spanish too.

-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers