#19415: Clarify how aggregates work with multi-valued relationships and multiple filter() calls --------------------------------------+------------------------------------ Reporter: svniemeijer@… | Owner: Type: Cleanup/optimization | Status: new Component: Documentation | Version: master Severity: Normal | Resolution: Keywords: | Triage Stage: Accepted Has patch: 1 | Needs documentation: 0 Needs tests: 0 | Patch needs improvement: 1 Easy pickings: 0 | UI/UX: 0 --------------------------------------+------------------------------------
Comment (by benred42): I did some more digging on this issue as I was working on editing my Doc patch and I learned a bit more about what's going on. Consider the following test case: {{{ class Author(models.Model): name = models.CharField(max_length=10) class Book(models.Model): title = models.TextField() pages = models.IntegerField(null=True) rating = models.IntegerField(null=True) authors = models.ManyToManyField(Author) a1 = Author.objects.create(name='a1') a2 = Author.objects.create(name='a2') a3 = Author.objects.create(name='a3') a4 = Author.objects.create(name='a4') b1 = Book.objects.create(title='b1', pages=100, rating=4) b1.authors.set([a1]) b2 = Book.objects.create(title='b2', pages=50, rating=2) b2.authors.set([a2]) b3 = Book.objects.create(title='b3', pages=150, rating=3) b3.authors.set([a4]) b4 = Book.objects.create(title='b4', pages=400, rating=5) b4.authors.set([a4]) b5 = Book.objects.create(title='b5', pages=100, rating=1) b5.authors.set([a1,a4]) }}} Now, let's look at two queries on that test case: {{{ qs1 = Author.objects.filter(book__pages__lt=200, book__pages__gt=100).annotate(num_book=Count('book')) qs2 = Author.objects.filter(book__pages__lt=200).filter(book__pages__gt=100).annotate(num_book=Count('book')) }}} The first query gives us the expected results: {{{ for i in qs1: print(i, i.num_book) >>> a4, 1 print(qs1.query) >>> SELECT "queries_author"."id", "queries_author"."name", "queries_author"."num", "queries_author"."extra_id", COUNT("queries_book_authors"."book_id") AS "num_book" FROM "queries_author" INNER JOIN "queries_book_authors" ON ("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN "queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id") WHERE ("queries_book"."pages" < 200 AND "queries_book"."pages" > 100) GROUP BY "queries_author"."id", "queries_author"."name", "queries_author"."num", "queries_author"."extra_id" ORDER BY "queries_author"."name" ASC }}} The second query, however, gives some rather strange results (author4 only has 3 books, not 4!): {{{ for i in qs2: print(i, i.num_book) >>> a4, 4 print(qs2.query) >>> SELECT "queries_author"."id", "queries_author"."name", "queries_author"."num", "queries_author"."extra_id", COUNT("queries_book_authors"."book_id") AS "num_book" FROM "queries_author" INNER JOIN "queries_book_authors" ON ("queries_author"."id" = "queries_book_authors"."author_id") INNER JOIN "queries_book" ON ("queries_book_authors"."book_id" = "queries_book"."id") INNER JOIN "queries_book_authors" T4 ON ("queries_author"."id" = T4."author_id") INNER JOIN "queries_book" T5 ON (T4."book_id" = T5."id") WHERE ("queries_book"."pages" < 200 AND T5."pages" > 100) GROUP BY "queries_author"."id", "queries_author"."name", "queries_author"."num", "queries_author"."extra_id" ORDER BY "queries_author"."name" ASC }}} So, the problem appears to be that `Count` is only looking at results from `queries_book_authors` and cannot see results from the aliased tables (`T4` and `T5`), thus it only counts from the first joins and not the latter joins. This results, rather bizarrely, in the annotated value being the ''product'' of the number of results meeting the first condition (`"queries_book"."pages" < 200`) times the number of results meeting the second condition (`T5."pages" > 100`). This result is neither intuitive nor helpful, so I would suggest that, going forward, the documentation should provide an example of the first query and a warning not to annotate queries with multiple filters as in the second query since the results are not what would be expected or useful. NOTE: I have not yet tested this case with just a ForeignKey relationship, so as of now the above statements can only be applied when filtering across M2M relationships. -- Ticket URL: <https://code.djangoproject.com/ticket/19415#comment:16> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/077.b5c612b506b2bc3811e611c74453495f%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.