On Sun, Feb 28, 2021 at 2:39 AM Tom Forbes <t...@tomforb.es> wrote:
>
> Thank you for the clarification! I think the biggest argument for this change 
> is the fact that uppercasing Unicode can cause incorrect results to be 
> returned.
>
> Given that we now have much better support for custom index types, perhaps we 
> should change this? We need a custom expression index anyway, so it might not 
> be a huge ask to say “now you should use a gin index”?

It's worth pointing out that case mapping and transformation in
Unicode is difficult and complex. I wrote up an intro to the problem a
while back:

https://www.b-list.org/weblog/2018/nov/26/case/

One thing that's important to note is that there is no generic
one-size-fits-all-languages option that Django can just do by default
and get the right results. For example, a case mapping that does the
right thing for Turkish will do the wrong thing for (to pick a random
example) French, and vice-versa. Unicode itself provides a basic "hope
for the best" set of default case mappings that do the right thing for
many cased scripts, but also is clear in saying that you may need to
use a locale-specific mapping to get what you really want.

Postgres has the ability to configure locale, and when configured it
does the "right thing" -- for example, when the locale is tr_TR or
another Turkish locale variant, the UPPER() function should correctly
handle dotted versus dotless 'i' as required for Turkish. But Postgres
also warns that this will have performance impact, which I think is
what's being noted in the ticket.

I'm not sure there will be an easy or obvious solution here.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAL13Cg9nYMJZwm2XcsCcWG5Fqn8gqqE93FM11Xcfs4TXsmTbZQ%40mail.gmail.com.

Reply via email to