Re: Proposal to not implicitly create varchar/text-pattern opclass indexes on PostgreSQL

Adam Johnson Tue, 14 Apr 2020 06:01:53 -0700

For some numbers: I just checked on a long-running client project using
PostgreSQL, and it looks like several of the *_like  indexes are in use:


core=> select right(indexrelname, 10), idx_tup_read, idx_tup_fetch from
pg_stat_all_indexes where indexrelname like '%_like' order by 1;
   right    | idx_tup_read | idx_tup_fetch
------------+--------------+---------------
 090f8_like |       168181 |        168143
 0b77e_like |            0 |             0
 0cc84_like |            0 |             0
 1ab7c_like |         6260 |          6137
 32a5e_like |      2443202 |       2412585
 33678_like |            0 |             0
 37b91_like |            0 |             0
 6c02a_like |           15 |            15
 75088_like |        56746 |          9640
 90e0f_like |         8371 |          8089
 a08ec_like |         2815 |          2815
 b4ddf_like |            0 |             0
(12 rows)

The first highest read index is accidental and the query will be rewritten.
The second-highest read index here though is from the third party app
django-oauth-toolkit:
https://github.com/jazzband/django-oauth-toolkit/blob/master/oauth2_provider/models.py#L284
.
The other indexes are on small tables and I'm not concerned about them from
a performance perspective.


Perhaps it's possible to use these index statistics in the system check to
prioritize which indexes are warned about.

On Tue, 14 Apr 2020 at 13:18, Hannes Ljungberg <hannes.ljungb...@gmail.com>
wrote:

> Thanks for your reply Tim,
>
> I also fail to see a "clean" upgrade path. The way I’ve been thinking of,
> is to just as you wrote, keep the code for deletion of the index so people
> who migrate to the Index-class will get it removed properly. Since it uses
> `IF EXISTS` it should be future-proof. We could add a system check that
> uses introspection to check if the `*_like` index is present on affected
> fields with `db_index=True` and warn about its deprecation. Then after a
> couple (?) of versions we could remove all traces of this index.
>
> The negative performance impact of removing this index for users of `LIKE`
> queries is very large but it’s hard to give numbers since it depends on the
> amount and nature of the data in the affected rows. Since we wouldn’t do
> any automatic removal of the index and add documentation about the
> importance of the opclass-indexes for `LIKE` queries I’m not that worried.
>
> I realise that removing the creation of these indexes result in a not so
> pretty deprecation path but I think users of Django would benefit from
> having to be more explicit about their indexes and learn about the quirks
> of their database engines.
>
>
> Den tisdag 14 april 2020 kl. 03:01:03 UTC+2 skrev Tim Graham:
>>
>> I have some sympathy for this issue as I'm trying to make the
>> createcachetable management command use SchemaEditor rather than some
>> custom SQL construction logic*. The related problem I ran into is that the
>> primary key column (a CharField) uses unique=True which means those
>> undesired opclasses indexes are created. I couldn't find a way to
>> prevent that index besides filtering it out of the list of SQL statements.
>>
>> As for your proposal, how would you handle the upgrade path for existing
>> projects? I imagine we could provide a script to run upon upgrade to remove
>> all such existing indexes. No doubt some users won't run it. Would you keep
>> around the code in Django's SchemaEditor like
>> https://github.com/django/django/blob/53d229ff632c2a3e547f2820a94239f38ba4d4ac/django/db/backends/postgresql/schema.py#L178-L180
>> that assumes those indexes exist?
>>
>> If Django stops creating those indexes, it could create a somewhat murky
>> situation for some developers as they try to figure out the state of
>> indexes in their database. It might depend on which version of Django was
>> in use when certain migrations ran.
>>
>> Some developers might be left debugging performance issues if those
>> indexes are removed. It could be helpful to gives some numbers as to the
>> possible performance impact on applications if these indexes are removed
>> without a developer realizing it.
>>
>> Third-party apps that need it could add an Index with opclasses but then
>> they'd face the issue of duplicate opclasses indexes if their app is used
>> on an older version of Django.
>>
>> * https://github.com/django/django/pull/12590
>>
>> On Sunday, April 12, 2020 at 10:30:50 AM UTC-4, Hannes Ljungberg wrote:
>>>
>>> Hi all,
>>>
>>> I would like to continue the discussion started in this very old thread:
>>>
>>> https://groups.google.com/d/msg/django-developers/H2QFcQYsbo8/RmRb-8FVypwJ
>>>
>>> I’m sorry if I should've continued the discussion in that thread but it
>>> felt a bit wrong to bring a 5 year old thread back to life :-)
>>>
>>> Anyway, as previously described in that thread the implicit creation of
>>> the `*_pattern_ops` index when creating indexes on `CharField` and
>>> `TextField` with `db_index=True` is not ideal.
>>>
>>> In my experience `LIKE` expressions is not that common that it warrants
>>> to always create an index to cover this.
>>>
>>> For very large tables this can become a problem where insertion/update
>>> performance is negatively affected and disc space usage is twice what
>>> really is needed if no `LIKE` queries are used.
>>>
>>> And even if one would like to use `LIKE` expressions it’s not obvious
>>> that the `*_pattern_ops` is the correct index. For leading wildcard patters
>>> as `LIKE %foo` one has to use a GIN/GiST index with the `*_trgm_ops`
>>> opclass. With the current implementation we would end up with 3 indexes
>>> when 2 would be sufficient for this use case (the regular b-tree and the
>>> trigram).
>>>
>>> One could also argue that we’re not consistent with these implicit
>>> indexes. The `iexact`/`icontains` lookups require an expression index on
>>> `(UPPER(col))`  but that’s not created.
>>>
>>> One important detail is that this implicit index is _not_ created when
>>> using the class based `Index` . In my opinion it’s not very clear that one
>>> needs to handle the creation of a `*_pattern_ops` index manually when using
>>> it.
>>>
>>> This is the only documentation that I’ve been able to find about the
>>> creation of these implicit indexes:
>>> https://docs.djangoproject.com/en/dev/ref/databases/#indexes-for-varchar-and-text-columns
>>>
>>> My proposal is to remove the implicit index created by `db_index=True`
>>> and add documentation that one should use `Index.opclasses` to utilise
>>> indexes for `LIKE` queries on PostgresSQL. This would give a more
>>> consistent behaviour and being more explicit about this would help people
>>> tune their indexes.
>>>
>>> If there’s a consensus on this I would like to give this a shot. If
>>> there’s not an agreement on removing the implicit index I think we should
>>> at least make the documentation around this a bit more clear about the
>>> gotchas of `Index` vs `db_index=True`.
>>>
>>> Here’s two tickets which have previously discussed this:
>>> https://code.djangoproject.com/ticket/24507
>>> https://code.djangoproject.com/ticket/24088
>>>
>>> Stay safe!
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/6b539bf5-4de2-434f-b022-8729bf49a593%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/6b539bf5-4de2-434f-b022-8729bf49a593%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM29M9duwVObpnmjOnmZZhKMZ56jU7uJV%3DRGozbieUnZoA%40mail.gmail.com.

Re: Proposal to not implicitly create varchar/text-pattern opclass indexes on PostgreSQL

Reply via email to