#6422: Support for 'DISTINCT ON' queries with QuerySet.distinct()
-------------------------------------+-------------------------------------
     Reporter:  Manfred Wassmann     |                    Owner:  jgelens
  <manolo@…>                         |                   Status:  assigned
         Type:  New feature          |                  Version:  SVN
    Component:  Database layer       |               Resolution:
  (models, ORM)                      |             Triage Stage:  Accepted
     Severity:  Normal               |      Needs documentation:  0
     Keywords:  dceu2011             |  Patch needs improvement:  0
    Has patch:  1                    |                    UI/UX:  0
  Needs tests:  0                    |
Easy pickings:  0                    |
-------------------------------------+-------------------------------------
Changes (by akaariai):

 * cc: anssi.kaariainen@… (added)


Comment:

 Sorry, I still found one more problem, and some smaller issues:
   - The distinct() in postgresql_psycopg2/operations.py doesn't take
 aliases into account. It assumes that the distinct is always for the main
 table alias. If you first do a filter on m2m relation, then a distinct on
 for the m2m relation you get:
 {{{
 # I added coworkers = models.ManyToManyField('self') to Staff model used
 in tests
 >>> print
 Staff.objects.distinct('coworkers__name').order_by('coworkers__name').query
 SELECT DISTINCT ON ("queries_staff"."name")
      "queries_staff"."id", "queries_staff"."name",
 "queries_staff"."organisation", T3."name"
 FROM "queries_staff"
 LEFT OUTER JOIN "queries_staff_coworkers"
      ON ("queries_staff"."id" = "queries_staff_coworkers"."from_staff_id")
 LEFT OUTER JOIN "queries_staff" T3
      ON ("queries_staff_coworkers"."to_staff_id" = T3."id")
 ORDER BY T3."name" ASC
 }}}
 If you execute that query, it will error, because distinct on is for
 "queries_staff"."name", but order by is for T3."name"

   - In the attached patch there is a different approach of join cleanup.
 It also fixes one other long-standing expected_failure test in queries.
 The failure is related to similar "lingering joins" done by ordering.
   - The `obj.query.distinct_fields = []` in query.distinct() is not
 needed. The variable will be set anyways in
 sql.query.add_distinct_fields(). obj.query.distinct = True can also be set
 in add_distinct_fields()
   - Should the joins be generated as OUTER or INNER joins? Not sure about
 this. I guess either way will be fine. The attached patch generates OUTER
 JOINS (because it uses similar code than the order by join creation).
   - Prevent different distinct_fields lists when combining querysets.

 I attached a patch which fixes these issues. It isn't the cleanest one
 (non-DRY, code copy from order_by). But it can be used as a basis for a
 cleaner patch if needed.

 I did also check out how to support aggregates with distinct_fields.
 However I had a hard time figuring out a realistic test case, so I removed
 that part. So, jgelens is absolutely correct here: better to first find an
 use case :)

 All in all, the only major issue is the correct alias usage in
 compiler.py. The rest is just nitpicking. In my opinion the alias usage
 needs to be fixed before commit, but I may be over-complicating things
 again... Other than that I think this feature is now very solid. I might
 be fine to just let the distinct() generated joins linger in there after
 all. That is what order_by has done for ages.

 The attached patch passes all tests, with one long-standing expected
 failure now gone. As said the patch is just to show one possible approach
 to the alias issue. I hope jgelens will check it out and see what to do to
 the above mentioned issues (if anything).

-- 
Ticket URL: <https://code.djangoproject.com/ticket/6422#comment:61>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to