On Wed, Aug 9, 2017 at 6:19 PM, Tsunakawa, Takayuki <tsunakawa.ta...@jp.fujitsu.com> wrote: > From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Eisentraut >> There are no case-insensitive collations in PostgreSQL (yet). > > That's sad news, as I expected ICU to bring its various collation features to > PostgreSQL. I hope it will be easy to add them.
The reason it is not easy is that text equality is based on strict binary equality. We would have to teach hash operator classes about collations to fix this, and make text_eq() hash strxfrm() or something. That requires special work. You can ask ICU for case insensitivity with Postgres 10, but the strcmp() tie breaker within varstr_cmp() will prevent it from being truly case insensitive. >> The best explanation of the difference that I can understand is here, under >> "Why do CJK strings sort incorrectly in Unicode?": >> >> https://dev.mysql.com/doc/refman/5.5/en/faqs-cjk.html > > Thanks a lot. MysQL seems to have many collations, doesn't it? Well, it depends on how you define collation, which actually gets quite complicated with ICU. You can combine certain options together with great flexibility, it seems (see my e-mail from earlier today -- "What users can do with custom ICU collations in Postgres 10"). Let's assume that there are 10 distinct options for each locale that each independently affect sort order for the base collation of the locale. I believe that there are at least 10 such options that change things, and maybe a lot more. Theoretically, this means that there are an absolutely enormous number of possible collations with ICU. Now, I wouldn't *actually* say that we have many thousands of collations with ICU, because that doesn't seem like a sensible way to explain ICU Postgres collations. The way that we think about this from using libc doesn't work well for ICU. Instead, I would say that we have hundreds of locales (not collations), each of which support some variant options (e.g., traditional Spanish sort order, alternative Japanese sort order, pictographic emoji sorting), with some further generic options for varying how case it handled, how numbers are handled, and other things like that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers