#28477: Strip unused annotations from count queries
-------------------------------------+-------------------------------------
     Reporter:  Tom Forbes           |                    Owner:  Tom
         Type:                       |  Forbes
  Cleanup/optimization               |                   Status:  assigned
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  1
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Reupen Shah):

 I have also run into problems with `QuerySet.count()` being very slow on
 annotated query sets. Django uses a subquery for the count but injects a
 group by into the subquery. This typically causes the database server to
 deduplicate all matched rows using the group by columns which is, in
 general, extremely slow when there are a large number of matched rows.

 For example, consider the following model:

 {{{
 class Person(models.Model):
     """Person model."""

     first_name = models.TextField()
     last_name = models.TextField()
     country = models.TextField(null=True, blank=True)
 }}}

 and query set:

 {{{
 from django.db.models.functions import Concat
 from django.db.models import Value

 queryset = Person.objects.annotate(full_name=Concat('first_name', Value('
 '), 'last_name'))
 }}}

 `queryset.count()` generates the following query under PostgreSQL:

 {{{
 SELECT COUNT(*)
 FROM
   (SELECT "support_person"."id" AS Col1,
           CONCAT("support_person"."first_name", CONCAT(' ',
 "support_person"."last_name")) AS "full_name"
    FROM "support_person"
    GROUP BY "support_person"."id",
             CONCAT("support_person"."first_name", CONCAT(' ',
 "support_person"."last_name"))) subquery
 }}}

 `list(queryset)` generates:

 {{{
 SELECT "support_person"."id",
        "support_person"."first_name",
        "support_person"."last_name",
        "support_person"."country",
        CONCAT("support_person"."first_name", CONCAT(' ',
 "support_person"."last_name")) AS "full_name"
 FROM "support_person"
 }}}


 I am not entirely sure why the subquery for the count needs to be any
 different from the query used when the query set itself is evaluated.
 There are some relevant comments in the source code here:
 
https://github.com/django/django/blob/5deb7a86e8b54d052a3b1dbed1ae7142d362b1c5/django/db/models/sql/query.py#L404-L414

 This has all been tested under Django 2.1.7.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/28477#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/061.ac61c734d7172f23b1353d1018acb01a%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to