Re: Aggregate class: a first-attempt

Honza Král Fri, 23 Feb 2007 07:12:57 -0800

On 2/23/07, Tim Chase <[EMAIL PROTECTED]> wrote:
>
> >>>>>> quseryset = Model.objects.all()
> >>>>>> queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
> >>> 'some_other_field' ), avg=( 'pay', 'age' ), count=True )
> >> I like this calling interface as an alternate method for its
> >> fine-tuned control, but there are times it would be nice to not
> >> have to have such tight coupling between the model and the
> >> aggregation.  Otherwise, a change in the model requires not only
> >> a change in the template to make use of the results, but in the
> >> view (to request various aggregate functions) to add the desired
> >> aggregation functions.
> >
> > yes, and that is how it should be in my opinion. Why have the database
> > calculate stuff you don't want?
>
> My line of reasoning for allowing it to default to everything
> would be the same reason that you can simply call .all() and get
> all the fields rather than calling something crazy like
>
> .all(model.field1, model.field3, model.field7)


this is something completely different - it is easier for the db to
just take all the fields

>
> to just selectively bring back the three fields that you "know"
> you want.  It gives the template designer access to any of the
> fields without needing to resort to involving the coder to make
> changes to the view.  Any good database will not require a
> table/hash-scan for each aggregate, but will gather the aggregate
> stats in a single pass, so the cost of doing N fields vs doing
> N+1 fields is barely noticeable.  And since N is bounded by the
> number of fields in the model, this isn't apt to grow out of control.

actually this is not true - if you wish to do just one MIN(), it will
only use index, no full table scan, myISAM can even use index for
COUNT() and so on....

even if it wasn't noticeable at all, I would still be against it,
because its plain wrong, it wastes resources and allows developers not
to think of what they are doing.

>
> >> As long as the resulting data can be easily accessed directly by
> >> a template, any number of resulting formats would work just fine.
> >
> > good point, the dictionary one wouldn't work very well in templates
>
> Unless you have some brainstorm for making dictionaries work
> nicely in templates, which is another bugaboo I see come up on
> the ML occasionally :)
>
> >> and then within the template refer to
> >>
> >>         <table>
> >>         {% for item in items %}
> >>                 <tr><td>{{ item.pay }}</td></tr>
> >>         {% endfor %}
> >>                 <tr><td>{{ items.aggregate.sum.pay }}</td></tr>
> >>         </table>
> >
> > this just seems wrong - doing something like this in a template just
> > doesn't feel right, you should calculate the aggregation in the view -
> > its an expensive operation you should always think through
>
> The results should be cached so that if an aggregate is needed by
> the template-designer (as per above), the cost happens once with
> one additional DB hit.  I would be horrified if
>
>         <tr>
>         <td>{{ items.aggregate.sum.pay }}</td>
>         <td>{{ items.aggregate.sum.hours }}</td>
>         </tr>
>
> triggered two database hits. :)  But if the aggregate stats are
> needed, they should be generated on demand, just like .all()
> where no DB call actually occurs until its data is used in things
> like a {% for x in data %} call.  And if it's called, the DB is
> only accessed once to bring back the data (okay, glossing over
> fetchmany() calls here)

yes, sure, but you are talking two different things here... I say that
it seems wrong to initiate an aggregation from template. Besides, try
doing similar with current querysets - it won't get cached because
every call to filter(), order_by() etc will produce a NEW queryset
with no connection to the old one, so even if you would actually run
the latter, the former will have no access to the data.

>
>
> >> items = Model.objects.all()
> >> return render_to_response('test.html', {
> >>         'items': items,
> >>         'stats': items.aggregate(),
> >>         })
> >
> > I think this should be:
> >
> > items = Model.objects.all()
> > return render_to_response('test.html', {
> >         'stats_by_owner': items.aggregate( ('owner',), count=True,
> > sum=('visits',) max=('rating', ), min=('rating', 'visits') ),
> >         })
>
> As mentioned, I like both syntaxes. :)  They seem to return
> somewhat different things though.  The .aggregate() call would
> only ever return one row for the entire dataset.  The
> parameterized version would return multiple rows based on the
> presence of a column-name tuple as the first parameter.
> Otherwise, code starts getting unwieldy when you actually do want
> all your fields.  I have a model with 20+ fields in it for
> various types of money (FloatField) and time-durations
> (PositiveIntegerField) involved in transactions (you've gotta
> love billing information for cell-phone usage).  Without the
> ability to say "sum and average everything you can", the code
> would start looking like
>
> items.aggregate((,), sum=(
>         'field1',
>         'field2',
> ...
>         'field20',
>         'field21',
>         ), average=(
>         'field1',
>         'field2',
> ...
>         'field20',
>         'field21',
>         ))

well, in this extreme example, I would suggest you use a list:
fields = [ f.name for f in items.model._meta.fields ]
items.aggregate( sum=fields, average=fields, min=fields, max=fields )

not that bad, is it?

>
> which makes for one very ugly view.  Even with a folding editor
> like Vim, it's still there.
>
> > {% for stat in stats_by_owner %}
> >   Owner {{ stat.owner }} has written {{ stat.count }} articles wit
> > ratings between {{ stat.min.rating }} and {{ stat.max.rating }}. His
> > work was read {{ stat.sum.visits }} times in total and his least
> > popular article was read {{ stat.min.visits }} times.
> > {% endfor %}
> >
> > the problem here is that if this should make any sense, owner should
> > be an object, not just a field value (user id in this case), I believe
> > though that this can be done...
>
> I agree...when you have the leading tuple, this is a lovely
> syntax.  It does have some rough edges where one's model
> foolishly has fields such as "min" or "max" and suddenly you've
> got a clash of ontologies.

true, but only if you would want to aggregate by those fields, we
could work around that  by simply moving the grouping fields to a
separate dictionary as well:

[
  {
    'grouped_by' : { 'owner' : XX },
    'min' : { 'pay' : 100 },
    'max' : { 'pay' : 101},
  },
  .....
]

>
>
> > the first argument should also be optional - in case you want the total
>
> Makes sense.
>
> >> I think it will be a valuable addition as evaluation can be made
> >> lazy (thus, it has no great impact on existing code), and I've
> >> seen it requested several times in the Django-Users ML archives
> >> where the answer was "Django doesn't do that, but you can drop to
> >> hand-rolled SQL to do it for you."  I think it would help the
> >> folks that don't think in SQL.
> >
> > or the guys that don't want to rely on one DB engine
>
> Do any of the supported DB engines not support groupings?  I
> thought at least that much was pretty standard.  But yes,
> requiring a developer to learn a second language (even if it's as
> useful as knowing SQL is...) goes against the grain of Django's
> "get out of the developer's way and let them just be productive
> in Python yet do crazy-powerful stuff" philosophy.

every sql implementation I know of support group by, the problem is
with other things like quoting etc.

>
> Thanks again for your feedback and ideas.
>
> -tkc
>
>
>
>
> >
>


-- 
Honza Kr�l
E-Mail: [EMAIL PROTECTED]
ICQ#:   107471613
Phone:  +420 606 678585

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Aggregate class: a first-attempt

Reply via email to