- - <crossroads0...@googlemail.com> writes: >>> The original post seemed to be a contrived attempt to say "you should >>> use ICU". >> >> Indeed. The OP should go read all the previous arguments about ICU >> in our archives. > > Not at all. I just was making a suggestion. You may use any other > library or implement it yourself (I even said that in my original > post). www.unicode.org - the official website of the Unicode > consortium, have a complete database of all Unicode characters which > can be used as a basis. > > But if you want to ignore the normalization/multiple code point issue, > point 2--the collation problem--still remains. And given that even a > crappy database as MySQL supports Unicode collation, this isn't > something to be ignored, IMHO.
Sure, supporting multiple collations in a database is definitely a known missing feature. There is a lot of work required to do it and a patch to do so was too late to make it into 8.4 and required more work so hopefully the issues will be worked out for 8.5. I suggest you read the old threads and make any contibutions you can suggesting how to solve the problems that arose. >> I don't believe that the standard forbids the use of combining chars at all. >> RFC 3629 says: >> >> ... This issue is amenable to solutions based on Unicode Normalization >> Forms, see [UAX15]. This is the relevant part. Tom was claiming that the UTF8 encoding required normalizing the string of unicode codepoints before encoding. I'm not sure that's true though, is it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers