The following query: SELECT U&'\017D' ~ '[[:alpha:]]' collate "en-US-x-icu";
returns true if the server encoding is UTF8, and false if the server encoding is LATIN9. That's a bug -- any behavior involving ICU should be encoding-independent. The problem seems to be confusion between pg_wchar and a unicode code point in pg_wc_isalpha() and related functions. It might be good to introduce some infrastructure here that can convert a pg_wchar into a Unicode code point, or decode a string of bytes into a string of 32-bit code points. Right now, that's possible, but it involves pg_wchar2mb() followed by encoding conversion to UTF8, followed by decoding the UTF8 to a code point. (Is there an easier path that I missed?) One wrinkle is MULE_INTERNAL, which doesn't have any conversion path to UTF8. That's not important for ICU (because ICU is not allowed for that encoding), but I'd like it if we could make this infrastructure independent of ICU, because I have some follow-up proposals to simplify character classification here and in ts_locale.c. Thoughts? Regards, Jeff Davis