Hi, When trying databases defined with ICU locales, I see that backends that serve such databases seem to have their LC_CTYPE inherited from the environment (as opposed to a per-database fixed value).
That's a problem for the backend code that depends on libc functions that themselves depend on LC_CTYPE, such as the full text search parser and dictionaries. For instance, if you start the instance with a C locale (LC_ALL=C pg_ctl...) , and tries to use FTS in an ICU UTF-8 database, it doesn't work: template1=# create database "fr-utf8" template 'template0' encoding UTF8 locale 'fr' collation_provider 'icu'; template1=# \c fr-utf8 You are now connected to database "fr-utf8" as user "daniel". fr-utf8=# show lc_ctype; lc_ctype ---------- fr (1 row) fr-utf8=# select to_tsvector('été'); ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. If I peek into the "real" LC_CTYPE when connected to this database, I can see it's "C": fr-utf8=# create extension plperl; CREATE EXTENSION fr-utf8=# create function lc_ctype() returns text as '$ENV{LC_CTYPE};' language plperl; CREATE FUNCTION fr-utf8=# select lc_ctype(); lc_ctype ---------- C Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite