#24385: Sum() doesn't seem to respect .distinct() in the queryset filter before
.aggregate() when filtering by m2m.
-------------------------------------+-------------------------------------
     Reporter:  mcagl                |                    Owner:  nobody
         Type:  Uncategorized        |                   Status:  new
    Component:  Database layer       |                  Version:  1.7
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by mcagl):

 It seems similar, or at least somewhat related.

 I did a little more inspecting, doing this in {{{./manage.py shell}}} to
 obtain the executed raw SQL statements:

 {{{
 import logging
 l = logging.getLogger('django.db.backends')
 l.setLevel(logging.DEBUG)
 l.addHandler(logging.StreamHandler())
 import datetime
 from decimal import Decimal
 import my_app.models as app_models
 t0 = app_models.Tag.objects.create(name="Tag0")
 t1 = app_models.Tag.objects.create(name="Tag1")
 t2 = app_models.Tag.objects.create(name="Tag2")
 t3 = app_models.Tag.objects.create(name="Tag3")
 app_models.Row.objects.create(day=datetime.date(2014, 1, 1),
 amount=Decimal('10.00'))
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 2),
 amount=Decimal('10.00'))
 r.tags.add(t0)
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 3),
 amount=Decimal('10.00'))
 r.tags.add(t1)
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 4),
 amount=Decimal('10.00'))
 r.tags.add(t0, t1)
 r = app_models.Row.objects.create(day=datetime.date(2014, 1, 5),
 amount=Decimal('10.00'))
 r.tags.add(t2, t3)
 }}}

 so mainly adding a DEBUG level logger for SQL queries, and recreating the
 objects in the setUp() method of the TestCase in the github repository.
 Then I tried the queries, obtaining this output, first without
 {{{distinct()}}}:
 {{{
 rows = app_models.Row.objects.filter(tags__in=[t0, t1]
 list(rows)
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT "my_app_row"."id", "my_app_row"."amount",
 "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
 "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
 "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
 >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>, <Row: Row
 object>]
 }}}

 {{{
 agg = rows.aggregate(Sum('amount'))
 print agg['amount__sum']
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT SUM("my_app_row"."amount") AS "amount__sum"
 FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" =
 "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)'
 - PARAMS = (1, 2); args=(1, 2)
 >>> 40.00
 }}}

 Then I added {{{distinct()}}}, which gets added as expected (and the rows
 queryset gets 3 objects instead of 4, also expected):

 {{{
 rows = app_models.Row.objects.filter(tags__in=[t0, t1]).distinct()
 list(rows)
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT DISTINCT "my_app_row"."id",
 "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN
 "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" )
 WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1,
 2)
 >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>]
 }}}

 but somewhat fails when doint aggregation in this version of the queryset,
 because the {{{DISTINCT}}} is added before {{{SUM}}}, whereas I expect
 that {{{distinct()}}} gets executed before doing the aggregation, as it's
 the python code I wrote:

 {{{
 agg = rows.aggregate(Sum('amount'))
 print agg['amount__sum']
 }}}

 {{{
 >>> (0.000) QUERY = u'SELECT DISTINCT SUM("my_app_row"."amount") AS
 "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON (
 "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE
 "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
 >>> 40.00
 }}}

 And obviously {{{agg['amount__sum']}}} contains an unexpected value,
 because at an higher level it's like the Row object with two tags is
 counted twice, completely disregarding the {{{distinct()}}} method I
 called before aggregating.


 Thanks for the quick response, I hope that this can be useful,
 Mark

--
Ticket URL: <https://code.djangoproject.com/ticket/24385#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/063.b522644ff0cac109cdf89b40581b1ae1%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to