I kind of agree that using `UPPER` instead of `ILIKE` for `__icontains` on 
PostgreSQL isn’t optimal. But it is quite easy to create a functional 
trigram GIN-index which use `UPPER` to allow these lookups to use an index. 
This will be even easier in Django 3.2 where you can create functional 
indexes in your model definitions.

The recommended way to handle case insensitive searches in PostgreSQL is by 
using functional indexes 
(https://www.postgresql.org/docs/current/indexes-expressional.html).  But 
with trigram indexes this isn’t needed and they are required to get index 
scans for leading wildcard searches. Note that trigram GIN indexes do have 
drawbacks, for example they don’t support the `=`-operator and will not 
support search strings with less than 3 characters.

I’ve been in a similar position as you and to solve this I created a custom 
lookup which used `ILIKE` and used that instead of `__icontains`.

It might make sense to change the `__icontains`  and `__iendswith` lookups 
to use `ILIKE` instead of `UPPER` but I’m not really sure that it’s 
justified to break the indexed queries people already have in place. That 
is where people have the GIN trigram `UPPER` index opposed from the regular 
GIN trigram index. The performance issue you’re describing doesn’t really 
change if we use `ILIKE` over `UPPER`. You need to install an index anyway.

Regarding your issue with Turkish characters I think that it works because 
`ILIKE` internally uses some form of `LOWER` and `LOWER('İstanbul') = 
LOWER('istanbul')` would’ve worked in your case. As James wrote this 
behaviour depends on your configured locale. I think the one way to do 
these kind of searches without changing the locale is to use `tsvector` and 
`tsquery` with a Turkish configuration, you can even make them unaccented 
to allow matching both “Istanbul” and “İstanbul” with the search string 
“istanbul”.


måndag 1 mars 2021 kl. 07:06:59 UTC+1 skrev mesuto...@gmail.com:

> Hi James,
> Thanks for your explanations. However, I wanted to explain the 
> disadvantage of using "UPPER like"  instead of "ilike" in icontains, 
> istartswith and iendswith. The performance problem should not be ignored.
>
> James Bennett <ubern...@gmail.com>, 1 Mar 2021 Pzt, 04:05 tarihinde şunu 
> yazdı:
>
>> On Sun, Feb 28, 2021 at 2:39 AM Tom Forbes <t...@tomforb.es> wrote:
>> >
>> > Thank you for the clarification! I think the biggest argument for this 
>> change is the fact that uppercasing Unicode can cause incorrect results to 
>> be returned.
>> >
>> > Given that we now have much better support for custom index types, 
>> perhaps we should change this? We need a custom expression index anyway, so 
>> it might not be a huge ask to say “now you should use a gin index”?
>>
>> It's worth pointing out that case mapping and transformation in
>> Unicode is difficult and complex. I wrote up an intro to the problem a
>> while back:
>>
>> https://www.b-list.org/weblog/2018/nov/26/case/
>>
>> One thing that's important to note is that there is no generic
>> one-size-fits-all-languages option that Django can just do by default
>> and get the right results. For example, a case mapping that does the
>> right thing for Turkish will do the wrong thing for (to pick a random
>> example) French, and vice-versa. Unicode itself provides a basic "hope
>> for the best" set of default case mappings that do the right thing for
>> many cased scripts, but also is clear in saying that you may need to
>> use a locale-specific mapping to get what you really want.
>>
>> Postgres has the ability to configure locale, and when configured it
>> does the "right thing" -- for example, when the locale is tr_TR or
>> another Turkish locale variant, the UPPER() function should correctly
>> handle dotted versus dotless 'i' as required for Turkish. But Postgres
>> also warns that this will have performance impact, which I think is
>> what's being noted in the ticket.
>>
>> I'm not sure there will be an easy or obvious solution here.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django developers  (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-develop...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/CAL13Cg9nYMJZwm2XcsCcWG5Fqn8gqqE93FM11Xcfs4TXsmTbZQ%40mail.gmail.com
>> .
>>
>
>
> -- 
> İyi çalışmalar. Saygılarımla.
>
> *Mesut Öncel*
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/124180ad-dc7e-439d-ac4a-166e40e5d921n%40googlegroups.com.

Reply via email to