Re: Aggregate class: a first-attempt

Tim Chase Fri, 23 Feb 2007 04:04:13 -0800

>>>>>> quseryset = Model.objects.all()
>>>>>> queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
>>> 'some_other_field' ), avg=( 'pay', 'age' ), count=True )
>> I like this calling interface as an alternate method for its
>> fine-tuned control, but there are times it would be nice to not
>> have to have such tight coupling between the model and the
>> aggregation.  Otherwise, a change in the model requires not only
>> a change in the template to make use of the results, but in the
>> view (to request various aggregate functions) to add the desired
>> aggregation functions.
> 
> yes, and that is how it should be in my opinion. Why have the database
> calculate stuff you don't want?


My line of reasoning for allowing it to default to everything
would be the same reason that you can simply call .all() and get
all the fields rather than calling something crazy like

.all(model.field1, model.field3, model.field7)

to just selectively bring back the three fields that you "know"
you want.  It gives the template designer access to any of the
fields without needing to resort to involving the coder to make
changes to the view.  Any good database will not require a
table/hash-scan for each aggregate, but will gather the aggregate
stats in a single pass, so the cost of doing N fields vs doing
N+1 fields is barely noticeable.  And since N is bounded by the
number of fields in the model, this isn't apt to grow out of control.

>> As long as the resulting data can be easily accessed directly by
>> a template, any number of resulting formats would work just fine.
> 
> good point, the dictionary one wouldn't work very well in templates

Unless you have some brainstorm for making dictionaries work
nicely in templates, which is another bugaboo I see come up on
the ML occasionally :)

>> and then within the template refer to
>>
>>         <table>
>>         {% for item in items %}
>>                 <tr><td>{{ item.pay }}</td></tr>
>>         {% endfor %}
>>                 <tr><td>{{ items.aggregate.sum.pay }}</td></tr>
>>         </table>
> 
> this just seems wrong - doing something like this in a template just
> doesn't feel right, you should calculate the aggregation in the view -
> its an expensive operation you should always think through

The results should be cached so that if an aggregate is needed by
the template-designer (as per above), the cost happens once with
one additional DB hit.  I would be horrified if

        <tr>
        <td>{{ items.aggregate.sum.pay }}</td>
        <td>{{ items.aggregate.sum.hours }}</td>
        </tr>

triggered two database hits. :)  But if the aggregate stats are
needed, they should be generated on demand, just like .all()
where no DB call actually occurs until its data is used in things
like a {% for x in data %} call.  And if it's called, the DB is
only accessed once to bring back the data (okay, glossing over
fetchmany() calls here)


>> items = Model.objects.all()
>> return render_to_response('test.html', {
>>         'items': items,
>>         'stats': items.aggregate(),
>>         })
> 
> I think this should be:
> 
> items = Model.objects.all()
> return render_to_response('test.html', {
>         'stats_by_owner': items.aggregate( ('owner',), count=True,
> sum=('visits',) max=('rating', ), min=('rating', 'visits') ),
>         })

As mentioned, I like both syntaxes. :)  They seem to return
somewhat different things though.  The .aggregate() call would
only ever return one row for the entire dataset.  The
parameterized version would return multiple rows based on the
presence of a column-name tuple as the first parameter.
Otherwise, code starts getting unwieldy when you actually do want
all your fields.  I have a model with 20+ fields in it for
various types of money (FloatField) and time-durations
(PositiveIntegerField) involved in transactions (you've gotta
love billing information for cell-phone usage).  Without the
ability to say "sum and average everything you can", the code
would start looking like

items.aggregate((,), sum=(
        'field1',
        'field2',
...
        'field20',
        'field21',
        ), average=(
        'field1',
        'field2',
...
        'field20',
        'field21',
        ))

which makes for one very ugly view.  Even with a folding editor
like Vim, it's still there.

> {% for stat in stats_by_owner %}
>   Owner {{ stat.owner }} has written {{ stat.count }} articles wit
> ratings between {{ stat.min.rating }} and {{ stat.max.rating }}. His
> work was read {{ stat.sum.visits }} times in total and his least
> popular article was read {{ stat.min.visits }} times.
> {% endfor %}
> 
> the problem here is that if this should make any sense, owner should
> be an object, not just a field value (user id in this case), I believe
> though that this can be done...

I agree...when you have the leading tuple, this is a lovely
syntax.  It does have some rough edges where one's model
foolishly has fields such as "min" or "max" and suddenly you've
got a clash of ontologies.


> the first argument should also be optional - in case you want the total

Makes sense.

>> I think it will be a valuable addition as evaluation can be made
>> lazy (thus, it has no great impact on existing code), and I've
>> seen it requested several times in the Django-Users ML archives
>> where the answer was "Django doesn't do that, but you can drop to
>> hand-rolled SQL to do it for you."  I think it would help the
>> folks that don't think in SQL.
> 
> or the guys that don't want to rely on one DB engine

Do any of the supported DB engines not support groupings?  I
thought at least that much was pretty standard.  But yes,
requiring a developer to learn a second language (even if it's as
useful as knowing SQL is...) goes against the grain of Django's
"get out of the developer's way and let them just be productive
in Python yet do crazy-powerful stuff" philosophy.

Thanks again for your feedback and ideas.

-tkc




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Aggregate class: a first-attempt

Reply via email to