On 14 Feb 2012, at 18:28, Tom Lane wrote: > > Oh, I see the reason for this: the code in cclass() in regc_locale.c > doesn't go further up than U+00FF, so no codes above that will be > thought to be letters (or members of any other character class). > Clearly we need to go further when we are dealing with UTF8. > I'm not sure what a sane limit would be though.
The Basic Multilingual Plane goes up to FFFF: https://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes