Re: QuerySet.distinct and aggregates

Russell Keith-Magee Sun, 09 May 2010 17:43:49 -0700

On Thu, May 6, 2010 at 7:50 PM, zinckiwi <zinck...@gmail.com> wrote:

>> I was hoping for an elegant solution for when I know there's going to
>> be several thousand plus instances (or hundreds of thousands after a
>> year or two of deployment). So in the end, I guess I can work with it,
>> but mostly I was hoping to find out whether the whole respecting
>> distinct() thing could be made to happen.
>
> Yeah, that's a tricky use case no matter what. I don't think there's a
> way out of *something* doing some heavy lifting, whether it be python
> or the db. But as I've seen mentioned many times (often by
> Malcolm...where's a Malcolm bat-signal when you need one? I'd love to
> hear his thoughts on this) the ORM isn't designed and can't be
> expected to handle 100% of cases. This may indeed be a situation where
> a simple extra() actually *is* the elegant solution.


You're correct that the design intent of the ORM isn't to be a 100%
replacement for SQL. If you ever find yourself in a situation where
you're asking the question "How would I represent this SQL query in
the ORM?" or "Will the ORM interpret this query in a particular way?",
then you've almost certainly reached the point where you should be
using a raw query.

As for using extra() - my advice there is to be careful. It's great
for adding a single extra select column (e.g., for a date-based
aggregate that you can't express using Django's own query language).
However, many people seem to view extra() as a crowbar you can use to
bend the ORM into outputting the SQL you want. If you're in that sort
of situation, you should be using a raw query, not extra(). Django 1.2
makes raw queries much easier specifically to avoid problems like
this.

>> Dunno if it's worthwhile filing a ticket for it, or if there is one.
>> Or maybe just a doc ticket for mentioning explicitly that aggregate
>> ignores distinct.

"ignores" isn't really the right way of looking at it - the behavior
of DISTINCT and GROUP BY are very closely related. Consider:

SELECT DISTINCT table.id, table.name, table.size
    FROM table
    INNER JOIN other ON other.id = table.join_id

is essentially the same as:

SELECT table.id, table.name, table.size
    FROM table
    INNER JOIN other ON other.id=table.join_id
    GROUP BY table.id, table.name, table.size

That is; by grouping on columns in the base table, you effectively get
distinctness on those rows. The GROUP BY clause has the effect of
collapsing duplicates of the result set that have occurred as a result
of joining into a different table.

Given that Django's aggregate clauses handle all the GROUP BY columns
transparently, this is one of those areas of conceptual leakage where
you need to be aware of how SQL handles certain queries.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: QuerySet.distinct and aggregates

Reply via email to