#19415: Clarify how aggregates work with multi-valued relationships and multiple
filter() calls
--------------------------------------+------------------------------------
     Reporter:  svniemeijer@…         |                    Owner:
         Type:  Cleanup/optimization  |                   Status:  new
    Component:  Documentation         |                  Version:  master
     Severity:  Normal                |               Resolution:
     Keywords:                        |             Triage Stage:  Accepted
    Has patch:  1                     |      Needs documentation:  0
  Needs tests:  0                     |  Patch needs improvement:  1
Easy pickings:  0                     |                    UI/UX:  0
--------------------------------------+------------------------------------

Comment (by benred42):

 I did some more digging on this issue as I was working on editing my Doc
 patch and I learned a bit more about what's going on.  Consider the
 following test case:

 {{{
 class Author(models.Model):
     name = models.CharField(max_length=10)

 class Book(models.Model):
     title = models.TextField()
     pages = models.IntegerField(null=True)
     rating = models.IntegerField(null=True)
     authors = models.ManyToManyField(Author)

 a1 = Author.objects.create(name='a1')
 a2 = Author.objects.create(name='a2')
 a3 = Author.objects.create(name='a3')
 a4 = Author.objects.create(name='a4')

 b1 = Book.objects.create(title='b1', pages=100, rating=4)
 b1.authors.set([a1])
 b2 = Book.objects.create(title='b2', pages=50, rating=2)
 b2.authors.set([a2])
 b3 = Book.objects.create(title='b3', pages=150, rating=3)
 b3.authors.set([a4])
 b4 = Book.objects.create(title='b4', pages=400, rating=5)
 b4.authors.set([a4])
 b5 = Book.objects.create(title='b5', pages=100, rating=1)
 b5.authors.set([a1,a4])
 }}}

 Now, let's look at two queries on that test case:

 {{{
 qs1 = Author.objects.filter(book__pages__lt=200,
 book__pages__gt=100).annotate(num_book=Count('book'))

 qs2 =
 
Author.objects.filter(book__pages__lt=200).filter(book__pages__gt=100).annotate(num_book=Count('book'))
 }}}

 The first query gives us the expected results:

 {{{
 for i in qs1:
     print(i, i.num_book)
 >>> a4, 1
 print(qs1.query)
 >>> SELECT "queries_author"."id", "queries_author"."name",
 "queries_author"."num", "queries_author"."extra_id",
 COUNT("queries_book_authors"."book_id") AS "num_book" FROM
 "queries_author" INNER JOIN "queries_book_authors" ON
 ("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN
 "queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id")
 WHERE ("queries_book"."pages" < 200 AND "queries_book"."pages" > 100)
 GROUP BY "queries_author"."id", "queries_author"."name",
 "queries_author"."num", "queries_author"."extra_id" ORDER BY
 "queries_author"."name" ASC
 }}}

 The second query, however, gives some rather strange results (author4 only
 has 3 books, not 4!):

 {{{
 for i in qs2:
     print(i, i.num_book)
 >>> a4, 4
 print(qs2.query)
 >>> SELECT "queries_author"."id", "queries_author"."name",
 "queries_author"."num", "queries_author"."extra_id",
 COUNT("queries_book_authors"."book_id") AS "num_book" FROM
 "queries_author" INNER JOIN "queries_book_authors" ON
 ("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN
 "queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id")
 INNER JOIN "queries_book_authors" T4 ON ("queries_author"."id" =
 T4."author_id") INNER JOIN "queries_book" T5 ON (T4."book_id" = T5."id")
 WHERE ("queries_book"."pages" < 200 AND T5."pages" > 100) GROUP BY
 "queries_author"."id", "queries_author"."name", "queries_author"."num",
 "queries_author"."extra_id" ORDER BY "queries_author"."name" ASC
 }}}

 So, the problem appears to be that `Count` is only looking at results from
 `queries_book_authors` and cannot see results from the aliased tables
 (`T4` and `T5`), thus it only counts from the first joins and not the
 latter joins.  This results, rather bizarrely, in the annotated value
 being the ''product'' of the number of results meeting the first condition
 (`"queries_book"."pages" < 200`) times the number of results meeting the
 second condition (`T5."pages" > 100`).  This result is neither intuitive
 nor helpful, so I would suggest that, going forward, the documentation
 should provide an example of the first query and a warning not to annotate
 queries with multiple filters as in the second query since the results are
 not what would be expected or useful.

 NOTE: I have not yet tested this case with just a ForeignKey relationship,
 so as of now the above statements can only be applied when filtering
 across M2M relationships.

--
Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:16>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/077.b5c612b506b2bc3811e611c74453495f%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to