Re: Aggregation Updates

Nicolas Lara Tue, 03 Jun 2008 11:23:19 -0700

On Mon, Jun 2, 2008 at 8:53 AM, Russell Keith-Magee
<[EMAIL PROTECTED]> wrote:
>
> On Sun, Jun 1, 2008 at 9:04 PM, Nicolas Lara <[EMAIL PROTECTED]> wrote:
>>
>> I dont mind changing it to indexes. To me it is more readable with the
>> list copying/slicing. I might do a bit of profiling for that to see
>> the efficiency gain or simple change it to indexes. I'll write back
>> about this soon.
>
> If we were talking about a complex calculation or difficult logic to
> extract the sublist, I might agree, but in this case, it's mostly a
> case of fields[0:last_field] and fields[first_aggregate:]. I'm not
> convinced that sublists make that any easier to read.
>
> I've noted some rough profiling numbers in another post; those numbers
> would seem to support my conjecture that creating the temporary list
> has a non-trivial cost.
>


I've changed most of the list copies. I am quite not sure it improves
readability but it does have a small performance boost (in the really
small dataset I used for testing). The readability I think I lost in
the code I recovered with comments =).

>>> * Be careful about generating efficient SQL, not just correct SQL. In
>>> particular, with the grouping clause: remember that:
>>
>> Same comment as Tim. If we group by pk then we can only SELECT the pk.
>
> As I replied to Tim - my mistake. I've been in MySQL world, and I lost
> track of MySQL's eccentricities.
>
> That said, if there was a way to accommodate this eccentricity, it
> would be nice. Using implicit GROUP BY is a significant performance
> boost under MySQL. I know this would be hard to do - if we can't, it
> isn't a dealbreaker.
>

I would also like this to be done. I haven't looked much into the
backends code but, though it might not be trivial, it shouldn't be too
hard either. I will definitely look into this.

>
>> So far I like the idea of using something like:
>>
>> values('field','field', groups='group_name')
>>
>> So the user can pick the name of the grouping. This way we allow to
>> have more meaningful names for the list of objects. For example:
>>
>> Book.objects.all().values('price',
>> group_as='books_for_price').annotate(mean_age=Avg('authors__age'))
>
> This gets around the potential name clash problem, but it leaves one
> of the bigger issues - the fact that the output become mixed. As it
> currently stands, the values dictionary contains key-value pairs which
> are field and field-value. However, the group key returns a queryset
> value. This is a different type of data, with different scope - all
> the fields in a values dictionary have the scope of a single instance,
> except the groups member, which is a collection of instances.
>
> I'm not a big fan of mixing outputs like this. To my mind, if you have
> two different types of output, or two different scopes of output, they
> should be separated somehow (for example, as a tuple of outputs). This
> isn't an entirely academic preference - consider that using your
> proposed output format, it's difficult to do something as simple as:
>
> for values in Author.objects.values('name','age'):
>   for name,age in values.items():
>      print name, age
>
> Simple use cases like this need to remain simple.

And they will. The user would only need to worry about the mixed
output when he is actually requesting the mixed output.

Actually I think you made a mistake in your example, because
values.items() wouldn't return the tuple (name, age) but the tuple
(key, value) from the dict. for which you could simply do:

 for values in Author.objects.values('name','age'):
   for key,val in values.items():
      print key, val

It is true that for doing the same when you request the grouped
objects you would need to complicate stuff a little bit. But chances
are that if you requested to get the queryset you would actually
need/want it.
I do think it is better to some extent to have the same type of
objects in a dict, but I also think in this case it wouldn't be a
problem since the the mixed output would be explicitly requested by
the user. We could even add a method to ValuesQuerySet to 'clean' the
output avoiding the queryset from being returned (in case you received
the queryset from some another module and want to handle it without
knowing what's inside). we could even add a special method to return
the values with the group queryset and have the default iterator
exclude the groups.

The problem I see with the tuple approach is that, well, you have a
tuple instead of a dict. So values would in some cases return a list
of dicts and in others a list of tuples, which I found more
inconsistent than mixed types inside a dict. Also you would need to
handle the tuple just to get to your data.

instead of:
for auth in Author.objects.values('name', 'age').annotate(....):
   print auth['name']

you would need to do:
for auth in Author.objects.values('name', 'age').annotate(...):
   print auth[1]['name']   #or auth[0]


I think any of the options I present above might be very simple and
straight forward.

What do you think?

Cheers,


-- 
Nicolas

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Aggregation Updates

Reply via email to