Re: ICU for global collation

Peter Eisentraut Wed, 06 Nov 2019 02:11:07 -0800

On 2019-11-01 19:18, Daniel Verite wrote:

Even if the FTS code is improved in that matter, any extension code
with libc functions depending on LC_CTYPE is still going to be
potentially problematic. In particular when it happens to be set
to a different encoding than the database.

I think the answer here is that extension code must not do that, atleast in ways that potentially interact with other parts of the(collation-aware) database system. For example, libc and ICU might havedifferent opinions about what is a letter, because of different versionsof Unicode data in use. That would then affect tokenization etc. intext search and elsewhere. That's why things like isalpha have to gothough ICU instead, if that is the collation provider in a particularcontext.

Couldn't we simply invent per-database GUC options, as in
ALTER DATABASE myicudb SET libc_lc_ctype TO 'value';
ALTER DATABASE myicudb SET libc_lc_collate TO 'value';

where libc_lc_ctype/libc_lc_collate would specifically set
the values in the LC_CTYPE and LC_COLLATE environment vars
of any backend serving the corresponding database"?

We could do that as a transition measure to support extensions like youmention above. But our own internal code should not have to rely on that.


--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ICU for global collation

Reply via email to