#6422: Support for 'DISTINCT ON' queries with QuerySet.distinct()
-------------------------------------+-------------------------------------
     Reporter:  Manfred Wassmann     |                    Owner:  jgelens
  <manolo@…>                         |                   Status:  assigned
         Type:  New feature          |                  Version:  SVN
    Component:  Database layer       |               Resolution:
  (models, ORM)                      |             Triage Stage:  Accepted
     Severity:  Normal               |      Needs documentation:  0
     Keywords:  dceu2011             |  Patch needs improvement:  0
    Has patch:  1                    |                    UI/UX:  0
  Needs tests:  0                    |
Easy pickings:  0                    |
-------------------------------------+-------------------------------------

Comment (by akaariai):

 Assume you have models:
 {{{
 class A1(models.Model):
     foo = models.TextField()

 class B1(models.Model):
     a = models.ForeignKey(A1)
 }}}

 And you do:
 {{{
 qs1 = B1.objects.distinct('a__id', 'a__foo')
 print qs1.query
 qs2 = qs1.distinct('pk')
 print qs2.query
 }}}
 Now you will have a join to A even though you don't have a reference to
 it. The queries generated:

 {{{
 SELECT DISTINCT ON ("obj_creation_speed_a1"."id",
 "obj_creation_speed_a1"."foo") "obj_creation_speed_b1"."id",
 "obj_creation_speed_b1"."a_id" FROM "obj_creation_speed_b1" INNER JOIN
 "obj_creation_speed_a1" ON ("obj_creation_speed_b1"."a_id" =
 "obj_creation_speed_a1"."id")

 SELECT DISTINCT ON ("obj_creation_speed_b1"."id")
 "obj_creation_speed_b1"."id", "obj_creation_speed_b1"."a_id" FROM
 "obj_creation_speed_b1" INNER JOIN "obj_creation_speed_a1" ON
 ("obj_creation_speed_b1"."a_id" = "obj_creation_speed_a1"."id")
 }}}
 Note that in the second query, there is a join into A even if no field of
 A is used in the query. The join should be removed. But I don't think this
 is a blocker issue. If you want to fix this, there are at least these two
 choices:
    - Apply the distinct on in the compiler when the query is generated,
 order by is handle this way IIRC.
    - Keep record of which aliases are referenced by the DISTINCT ON added
 fields, and subtract the refcount by one for those aliases if the distinct
 on is changed.

 As said before, in my opinion this is not a must-fix in this ticket.

 Now, if you continue from the above created qs1 and issue
 .annotate(Max('a__id')), you will get this query:
 {{{
 qs3 = qs1.annotate(Max('a__id'))
 print qs3.query
 list(qs3)
 }}}

 The query generated is this:
 {{{
 SELECT DISTINCT ON ("obj_creation_speed_a1"."id",
 "obj_creation_speed_a1"."foo") "obj_creation_speed_b1"."id",
 "obj_creation_speed_b1"."a_id", MAX("obj_creation_speed_b1"."a_id") AS
 "a__id__max" FROM "obj_creation_speed_b1" INNER JOIN
 "obj_creation_speed_a1" ON ("obj_creation_speed_b1"."a_id" =
 "obj_creation_speed_a1"."id") GROUP BY "obj_creation_speed_b1"."id",
 "obj_creation_speed_b1"."a_id"
 }}}

 And the list(qs3) will generate this:
 {{{
 django.db.utils.DatabaseError: column "obj_creation_speed_a1.id" must
 appear in the GROUP BY clause or be used in an aggregate function
 LINE 1: SELECT DISTINCT ON ("obj_creation_speed_a1"."id", "obj_creat...
                             ^
 }}}

 You could "fix" this by appending the DISTINCT ON columns into the group
 by clause. Unfortunately (without going into details) the results aren't
 what the user would expect. If you want the results the user expects, you
 would need to do a subquery. If the distinct on is in the inner query or
 group by is in the inner query depends on the order in which the .annotate
 and .distinct are applied. A good fix for this would be to raise
 NotImplementedError("Can't handle queries with aggregates and distinct on
 clause"). I have written some SQL by hand, and can't remember ever using
 distinct on and group by in the same query. Maybe there is some use case,
 but I just don't see the need to block on this issue, either.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/6422#comment:59>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to