#6422: Support for 'DISTINCT ON' queries with QuerySet.distinct() -------------------------------------+------------------------------------- Reporter: Manfred Wassmann | Owner: jgelens <manolo@…> | Status: assigned Type: New feature | Version: SVN Component: Database layer | Resolution: (models, ORM) | Triage Stage: Accepted Severity: Normal | Needs documentation: 0 Keywords: dceu2011 | Patch needs improvement: 0 Has patch: 1 | UI/UX: 0 Needs tests: 0 | Easy pickings: 0 | -------------------------------------+------------------------------------- Changes (by akaariai):
* cc: anssi.kaariainen@… (added) Comment: Sorry, I still found one more problem, and some smaller issues: - The distinct() in postgresql_psycopg2/operations.py doesn't take aliases into account. It assumes that the distinct is always for the main table alias. If you first do a filter on m2m relation, then a distinct on for the m2m relation you get: {{{ # I added coworkers = models.ManyToManyField('self') to Staff model used in tests >>> print Staff.objects.distinct('coworkers__name').order_by('coworkers__name').query SELECT DISTINCT ON ("queries_staff"."name") "queries_staff"."id", "queries_staff"."name", "queries_staff"."organisation", T3."name" FROM "queries_staff" LEFT OUTER JOIN "queries_staff_coworkers" ON ("queries_staff"."id" = "queries_staff_coworkers"."from_staff_id") LEFT OUTER JOIN "queries_staff" T3 ON ("queries_staff_coworkers"."to_staff_id" = T3."id") ORDER BY T3."name" ASC }}} If you execute that query, it will error, because distinct on is for "queries_staff"."name", but order by is for T3."name" - In the attached patch there is a different approach of join cleanup. It also fixes one other long-standing expected_failure test in queries. The failure is related to similar "lingering joins" done by ordering. - The `obj.query.distinct_fields = []` in query.distinct() is not needed. The variable will be set anyways in sql.query.add_distinct_fields(). obj.query.distinct = True can also be set in add_distinct_fields() - Should the joins be generated as OUTER or INNER joins? Not sure about this. I guess either way will be fine. The attached patch generates OUTER JOINS (because it uses similar code than the order by join creation). - Prevent different distinct_fields lists when combining querysets. I attached a patch which fixes these issues. It isn't the cleanest one (non-DRY, code copy from order_by). But it can be used as a basis for a cleaner patch if needed. I did also check out how to support aggregates with distinct_fields. However I had a hard time figuring out a realistic test case, so I removed that part. So, jgelens is absolutely correct here: better to first find an use case :) All in all, the only major issue is the correct alias usage in compiler.py. The rest is just nitpicking. In my opinion the alias usage needs to be fixed before commit, but I may be over-complicating things again... Other than that I think this feature is now very solid. I might be fine to just let the distinct() generated joins linger in there after all. That is what order_by has done for ages. The attached patch passes all tests, with one long-standing expected failure now gone. As said the patch is just to show one possible approach to the alias issue. I hope jgelens will check it out and see what to do to the above mentioned issues (if anything). -- Ticket URL: <https://code.djangoproject.com/ticket/6422#comment:61> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to django-updates@googlegroups.com. To unsubscribe from this group, send email to django-updates+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-updates?hl=en.