subject:"Aggregation Updates"

Re: Aggregation Updates

2008-06-06 Thread Russell Keith-Magee


On Sat, Jun 7, 2008 at 2:54 AM, Nicolas Lara <[EMAIL PROTECTED]> wrote:
>
>> You are explicitly asking for different output, so the fact that
>> different output happens shouldn't be entirely surprising. And
>> remember - In your propsal, you're getting back different output as
>> well - one version has the magic 'groups' member, one doesn't.
>> However, you can't tell that it's different output until you iterate
>> over the result you get back from the call.
>
> I see your point on the problems of mixed output and it is truly a not
> desired flow for working with the data.
> The tuple seems like a solution from the python point of view, but it
> becomes problematic when it comes to templates. What would be the
> method in templates for accesing the data?
>
> {% for old_authors in authors_by_age %}
> ...
> {{ old_authors.0.avg_num_books }}
>
>  {% for same_age in old_authors.1 %}
> ...
> {% endfor %}

Well, you could do it that way - or you could use the tuple syntax in
the for loop:

{% for values, authors in authors_by_age %}

> This would not play well especially with generic views.Of course it is
> always possible to wrap it in a view and add 2 separate variables to
> the context.

This would be unfortunate, but:
1) it's an edge case limitation - worst case, we could live with this
as a limitation
2) We could always update generic views to handle the alternate output syntax.

> Though non of these are the end-all solution, I feel comfortable with
> both (+0). By now I am leaving the tuple solution implemented because
> it is already done and it is very easy to change the output to
> something else (like a dict) in the future (also this feature is not a
> priority).

Agreed. Just make sure that you make a note of this change in a
feature list somewhere. When we get to the point of merging this to
trunk, one of the steps will be to write an email to the other core
developers with a summary of the  changes. This sort of thing is
something that will definitely need to be blessed by other core
developers before it goes into trunk, and we don't want to
accidentally forget to tell them about it.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-06 Thread Nicolas Lara


>
> My point is that the user shouldn't ever need to worry. A list that
> contains multiple versions of similar things is easy to process by
> iteration. A list that contains "similar things except for the one the
> isn't" is not as trivial to process.
>
>
> Bleh. Adding methods to clean up after the mess made by a different
> feature? -1 to that.
>

Agreed.

>> The problem I see with the tuple approach is that, well, you have a
>> tuple instead of a dict. So values would in some cases return a list
>> of dicts and in others a list of tuples, which I found more
>> inconsistent than mixed types inside a dict. Also you would need to
>> handle the tuple just to get to your data.
>
> Well, you're going to have to handle the data no matter what format
> the output takes. The upside of my approach is that you can assign the
> different return types to different variables, and handle them
> differently:
>
> for values, members in Author.objects.values('age','name',groups=True):
>   ...
>
> whereas your approach requires some manipulation of the return type to
> extract the data that doesn't match everything else before you can do
> any sort of iteration over the returned value data.
>
> As for having a single method with two different return types
> depending on context - there are already examples of Django APIs that
> do this. For example, values_list returns a list of tuples, unless you
> provide the flat=True argument, in which case you get a list of
> values.
>
> You are explicitly asking for different output, so the fact that
> different output happens shouldn't be entirely surprising. And
> remember - In your propsal, you're getting back different output as
> well - one version has the magic 'groups' member, one doesn't.
> However, you can't tell that it's different output until you iterate
> over the result you get back from the call.

I see your point on the problems of mixed output and it is truly a not
desired flow for working with the data.
The tuple seems like a solution from the python point of view, but it
becomes problematic when it comes to templates. What would be the
method in templates for accesing the data?

{% for old_authors in authors_by_age %}
...
{{ old_authors.0.avg_num_books }}

 {% for same_age in old_authors.1 %}
...
{% endfor %}

This would not play well especially with generic views.Of course it is
always possible to wrap it in a view and add 2 separate variables to
the context.

The alternative to this would be using a dictionary. So the return
type would be something like:
{'values': {...all the values here... }, 'group' : [...list of the
groupings ...]}

This option seems to accomodate better the use of the querysets in the
templates but complicates a little the python case:
for element in queryset:
for name, age in element['values'].items():
print name, age

I am especially not thrilled about the depth of the datastructure in
when using a dictionary (a list of dicts that contain another dict and
a queryset). but so far I haven't come up with a better solution.

Though non of these are the end-all solution, I feel comfortable with
both (+0). By now I am leaving the tuple solution implemented because
it is already done and it is very easy to change the output to
something else (like a dict) in the future (also this feature is not a
priority).

cheers =)


-- 
Nicolas

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-05 Thread Russell Keith-Magee

On Wed, Jun 4, 2008 at 2:23 AM, Nicolas Lara <[EMAIL PROTECTED]> wrote:
>
> On Mon, Jun 2, 2008 at 8:53 AM, Russell Keith-Magee
> <[EMAIL PROTECTED]> wrote:

>> for values in Author.objects.values('name','age'):
>>   for name,age in values.items():
>>  print name, age
>>
>> Simple use cases like this need to remain simple.
>
> And they will. The user would only need to worry about the mixed
> output when he is actually requesting the mixed output.

My point is that the user shouldn't ever need to worry. A list that
contains multiple versions of similar things is easy to process by
iteration. A list that contains "similar things except for the one the
isn't" is not as trivial to process.

> I do think it is better to some extent to have the same type of
> objects in a dict, but I also think in this case it wouldn't be a
> problem since the the mixed output would be explicitly requested by
> the user. We could even add a method to ValuesQuerySet to 'clean' the
> output avoiding the queryset from being returned (in case you received
> the queryset from some another module and want to handle it without
> knowing what's inside). we could even add a special method to return
> the values with the group queryset and have the default iterator
> exclude the groups.

Bleh. Adding methods to clean up after the mess made by a different
feature? -1 to that.

> The problem I see with the tuple approach is that, well, you have a
> tuple instead of a dict. So values would in some cases return a list
> of dicts and in others a list of tuples, which I found more
> inconsistent than mixed types inside a dict. Also you would need to
> handle the tuple just to get to your data.

Well, you're going to have to handle the data no matter what format
the output takes. The upside of my approach is that you can assign the
different return types to different variables, and handle them
differently:

for values, members in Author.objects.values('age','name',groups=True):
   ...

whereas your approach requires some manipulation of the return type to
extract the data that doesn't match everything else before you can do
any sort of iteration over the returned value data.

As for having a single method with two different return types
depending on context - there are already examples of Django APIs that
do this. For example, values_list returns a list of tuples, unless you
provide the flat=True argument, in which case you get a list of
values.

You are explicitly asking for different output, so the fact that
different output happens shouldn't be entirely surprising. And
remember - In your propsal, you're getting back different output as
well - one version has the magic 'groups' member, one doesn't.
However, you can't tell that it's different output until you iterate
over the result you get back from the call.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-03 Thread Nicolas Lara

On Mon, Jun 2, 2008 at 8:53 AM, Russell Keith-Magee
<[EMAIL PROTECTED]> wrote:
>
> On Sun, Jun 1, 2008 at 9:04 PM, Nicolas Lara <[EMAIL PROTECTED]> wrote:
>>
>> I dont mind changing it to indexes. To me it is more readable with the
>> list copying/slicing. I might do a bit of profiling for that to see
>> the efficiency gain or simple change it to indexes. I'll write back
>> about this soon.
>
> If we were talking about a complex calculation or difficult logic to
> extract the sublist, I might agree, but in this case, it's mostly a
> case of fields[0:last_field] and fields[first_aggregate:]. I'm not
> convinced that sublists make that any easier to read.
>
> I've noted some rough profiling numbers in another post; those numbers
> would seem to support my conjecture that creating the temporary list
> has a non-trivial cost.
>

I've changed most of the list copies. I am quite not sure it improves
readability but it does have a small performance boost (in the really
small dataset I used for testing). The readability I think I lost in
the code I recovered with comments =).

>>> * Be careful about generating efficient SQL, not just correct SQL. In
>>> particular, with the grouping clause: remember that:
>>
>> Same comment as Tim. If we group by pk then we can only SELECT the pk.
>
> As I replied to Tim - my mistake. I've been in MySQL world, and I lost
> track of MySQL's eccentricities.
>
> That said, if there was a way to accommodate this eccentricity, it
> would be nice. Using implicit GROUP BY is a significant performance
> boost under MySQL. I know this would be hard to do - if we can't, it
> isn't a dealbreaker.
>

I would also like this to be done. I haven't looked much into the
backends code but, though it might not be trivial, it shouldn't be too
hard either. I will definitely look into this.

>
>> So far I like the idea of using something like:
>>
>> values('field','field', groups='group_name')
>>
>> So the user can pick the name of the grouping. This way we allow to
>> have more meaningful names for the list of objects. For example:
>>
>> Book.objects.all().values('price',
>> group_as='books_for_price').annotate(mean_age=Avg('authors__age'))
>
> This gets around the potential name clash problem, but it leaves one
> of the bigger issues - the fact that the output become mixed. As it
> currently stands, the values dictionary contains key-value pairs which
> are field and field-value. However, the group key returns a queryset
> value. This is a different type of data, with different scope - all
> the fields in a values dictionary have the scope of a single instance,
> except the groups member, which is a collection of instances.
>
> I'm not a big fan of mixing outputs like this. To my mind, if you have
> two different types of output, or two different scopes of output, they
> should be separated somehow (for example, as a tuple of outputs). This
> isn't an entirely academic preference - consider that using your
> proposed output format, it's difficult to do something as simple as:
>
> for values in Author.objects.values('name','age'):
>   for name,age in values.items():
>  print name, age
>
> Simple use cases like this need to remain simple.

And they will. The user would only need to worry about the mixed
output when he is actually requesting the mixed output.

Actually I think you made a mistake in your example, because
values.items() wouldn't return the tuple (name, age) but the tuple
(key, value) from the dict. for which you could simply do:

 for values in Author.objects.values('name','age'):
   for key,val in values.items():
  print key, val

It is true that for doing the same when you request the grouped
objects you would need to complicate stuff a little bit. But chances
are that if you requested to get the queryset you would actually
need/want it.
I do think it is better to some extent to have the same type of
objects in a dict, but I also think in this case it wouldn't be a
problem since the the mixed output would be explicitly requested by
the user. We could even add a method to ValuesQuerySet to 'clean' the
output avoiding the queryset from being returned (in case you received
the queryset from some another module and want to handle it without
knowing what's inside). we could even add a special method to return
the values with the group queryset and have the default iterator
exclude the groups.

The problem I see with the tuple approach is that, well, you have a
tuple instead of a dict. So values would in some cases return a list
of dicts and in others a list of tuples, which I found more
inconsistent than mixed types inside a dict. Also you would need to
handle the tuple just to get to your data.

instead of:
for auth in Author.objects.values('name', 'age').annotate():
   print auth['name']

you would need to do:
for auth in Author.objects.values('name', 'age').annotate(...):
   print auth[1]['name']   #or auth[0]

I think any of the options I present above might be very simple

Re: Aggregation Updates

2008-06-02 Thread Russell Keith-Magee

On Sun, Jun 1, 2008 at 9:04 PM, Nicolas Lara <[EMAIL PROTECTED]> wrote:
>
> I dont mind changing it to indexes. To me it is more readable with the
> list copying/slicing. I might do a bit of profiling for that to see
> the efficiency gain or simple change it to indexes. I'll write back
> about this soon.

If we were talking about a complex calculation or difficult logic to
extract the sublist, I might agree, but in this case, it's mostly a
case of fields[0:last_field] and fields[first_aggregate:]. I'm not
convinced that sublists make that any easier to read.

I've noted some rough profiling numbers in another post; those numbers
would seem to support my conjecture that creating the temporary list
has a non-trivial cost.

>> * Be careful about generating efficient SQL, not just correct SQL. In
>> particular, with the grouping clause: remember that:
>
> Same comment as Tim. If we group by pk then we can only SELECT the pk.

As I replied to Tim - my mistake. I've been in MySQL world, and I lost
track of MySQL's eccentricities.

That said, if there was a way to accommodate this eccentricity, it
would be nice. Using implicit GROUP BY is a significant performance
boost under MySQL. I know this would be hard to do - if we can't, it
isn't a dealbreaker.

> yes, I am very low on documentation at this point. Also tests need
> some work. I'll fix the comments on the tests, add more regression
> tests and expand the docs.

No problems. I just wanted to make sure that these were on your to-do list.

> So far I like the idea of using something like:
>
> values('field','field', groups='group_name')
>
> So the user can pick the name of the grouping. This way we allow to
> have more meaningful names for the list of objects. For example:
>
> Book.objects.all().values('price',
> group_as='books_for_price').annotate(mean_age=Avg('authors__age'))

This gets around the potential name clash problem, but it leaves one
of the bigger issues - the fact that the output become mixed. As it
currently stands, the values dictionary contains key-value pairs which
are field and field-value. However, the group key returns a queryset
value. This is a different type of data, with different scope - all
the fields in a values dictionary have the scope of a single instance,
except the groups member, which is a collection of instances.

I'm not a big fan of mixing outputs like this. To my mind, if you have
two different types of output, or two different scopes of output, they
should be separated somehow (for example, as a tuple of outputs). This
isn't an entirely academic preference - consider that using your
proposed output format, it's difficult to do something as simple as:

for values in Author.objects.values('name','age'):
   for name,age in values.items():
  print name, age

Simple use cases like this need to remain simple.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-02 Thread [EMAIL PROTECTED]


Sorry 'bout that :) .  I don't think the why of it ultimately matters,
code snippet Russell used does the same thing at the SQL-aggregation
stuff does, the speed difference should hold.  Plus I agree it looks
cleaner.

On Jun 2, 1:31 am, Ludvig Ericson <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
>
>
> > s = 'abc'
> > t = s[:]
> > s is t
> >> True
>
> >> I'm willing to be corrected here, but my understanding was that for
> >> loop iteration was one of those optimization cases.
>
> > It looks like it's an immutable-optimization from my further
> > testing (strings being immutable):
>
>  a = [1,2,3] # use a list (mutable)
>  b = a[:]
>  a is b
> > False
>
>  a = (1,2,3)  # use a tuple (immutable)
>  b = a[:]
>  a is b
> > True
>
> Meet Python's tuple cache. Tuple cache, Tim Chase. Tim Chase, tuple  
> cache.
>
> So, I should stop pointing out what is caching and what isn't.
>
> As for Russell's test, that could very well be because the earlier  
> case binds to a variable. Use the dis module to see the actual  
> bytecode instructions that those two loops get compiled to, I'd be  
> damned if the latter doesn't skip a LOAD_CONST.
>
> Oh and my name is spelt with a single, simple V. :-)
>
> - - Ludvig
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.8 (Darwin)
>
> iEYEARECAAYFAkhDk6cACgkQXnZ94Kd6KafFlgCfdIuWIU6cPC+Fnvpqqvz9hgUU
> p9YAnRXWGmkNV6/X+zEN/PYsF5TSH3XY
> =30Fs
> -END PGP SIGNATURE-
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Ludvig Ericson


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> s = 'abc'
> t = s[:]
> s is t
>> True
>>
>> I'm willing to be corrected here, but my understanding was that for
>> loop iteration was one of those optimization cases.
>
> It looks like it's an immutable-optimization from my further
> testing (strings being immutable):
>
 a = [1,2,3] # use a list (mutable)
 b = a[:]
 a is b
> False
>
 a = (1,2,3)  # use a tuple (immutable)
 b = a[:]
 a is b
> True

Meet Python's tuple cache. Tuple cache, Tim Chase. Tim Chase, tuple  
cache.

So, I should stop pointing out what is caching and what isn't.

As for Russell's test, that could very well be because the earlier  
case binds to a variable. Use the dis module to see the actual  
bytecode instructions that those two loops get compiled to, I'd be  
damned if the latter doesn't skip a LOAD_CONST.

Oh and my name is spelt with a single, simple V. :-)

- - Ludvig
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkhDk6cACgkQXnZ94Kd6KafFlgCfdIuWIU6cPC+Fnvpqqvz9hgUU
p9YAnRXWGmkNV6/X+zEN/PYsF5TSH3XY
=30Fs
-END PGP SIGNATURE-

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Russell Keith-Magee


On Mon, Jun 2, 2008 at 8:54 AM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
>
> Lugwid appears correct:

I know he is - like I said in my reply to Ludwig, I picked a bad
example. My point was that slicing doesn't _always_  produce a copy -
it depends on the optimizations available to the bytecode compiler. By
way of demonstration:

d = [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]
for i in range(1,1000):
s = 0
sub = d[5:10]
for v in sub:
s = s + v

takes 25 seconds to execute, but

for i in range(1,1000):
s = 0
for v in d[5:10]:
s = s + v

takes 19 seconds.

Yours
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Russell Keith-Magee


On Mon, Jun 2, 2008 at 8:48 AM, Ludvig Ericson <[EMAIL PROTECTED]> wrote:
>
> What you're seeing is Python's str cache. Similarly:

(I knew that example would come back to bite me...)

Yes, this particular example is due to Python's string handling. My
point was that there are cases where a slice doesn't result in a copy.
This wasn't the best example, but it was the best example I could
think of at short notice.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Tim Chase


> To my understanding, a slice isn't automatically a copy - it will be
> in most cases, but there are cases where the bytecode compiler will
> use the original list as an optimization. One example:
> 
>  >>> s = 'abc'
>  >>> t = s[:]
>  >>> s is t
>  True
>
> I'm willing to be corrected here, but my understanding was that for
> loop iteration was one of those optimization cases.

It looks like it's an immutable-optimization from my further 
testing (strings being immutable):

 >>> a = [1,2,3] # use a list (mutable)
 >>> b = a[:]
 >>> a is b
False

 >>> a = (1,2,3)  # use a tuple (immutable)
 >>> b = a[:]
 >>> a is b
True

> necessary for code clarity - I would contend that:
> 
> for x in intial_list[a:b]:
>do stuff
> 
> is easier to read than:
> 
> sub_list = initial_list[a:b]
> for x in sub_list:
>do stuff

in most cases, heartily agreed, so no contention here.

>> Yes, it would be nice if this worked because it's logically
>> true...except that it doesn't work (at least in Postgres and
>> MS-SQL Server):
> 
> Aw...crap. You are, of course, completely correct. I've been spending
> too much time in MySQL of late, and I forgot that this was an
> eccentricity of MySQL.

It's just a shame it's not readily available in other SQL 
flavors, as it bugs me every time I need to add umpteen GROUP BY 
entries just so I can get some values for the SELECT clause.

-tim



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread [EMAIL PROTECTED]


Lugwid appears correct:

In [1]: s = ['a', 'b', 'c']
In [2]: t = s[:]
In [3]: s is t
Out[3]: False
In [4]: id(t)
Out[4]: 138735596
In [5]: id(s)
Out[5]: 138740172


On Jun 1, 7:48 pm, Ludvig Ericson <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> > To my understanding, a slice isn't automatically a copy - it will be
> > in most cases, but there are cases where the bytecode compiler will
> > use the original list as an optimization. One example:
>
>  s = 'abc'
>  t = s[:]
>  s is t
> > True
>  id(s)
> > 3081872000L
>  id(t)
> > 3081872000L
>
> What you're seeing is Python's str cache. Similarly:
>  >>> v = "abc"
>  >>> import copy
>  >>> copy.copy(v) is v
> True
>
> (Excuse the probable formating issues in the quote, Mail thinks your  
> Python code is a quote.)
>
> Ludvig "toxik" Ericson
> [EMAIL PROTECTED]
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.8 (Darwin)
>
> iEYEARECAAYFAkhDQ3UACgkQXnZ94Kd6KaeQRQCZAYD/YM+BObyjqgia79ndTwp7
> BNMAmwdhwl7ekn+Hwho69mnM981NBKV3
> =leBJ
> -END PGP SIGNATURE-
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Ludvig Ericson


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> To my understanding, a slice isn't automatically a copy - it will be
> in most cases, but there are cases where the bytecode compiler will
> use the original list as an optimization. One example:
>
 s = 'abc'
 t = s[:]
 s is t
> True
 id(s)
> 3081872000L
 id(t)
> 3081872000L

What you're seeing is Python's str cache. Similarly:
 >>> v = "abc"
 >>> import copy
 >>> copy.copy(v) is v
True

(Excuse the probable formating issues in the quote, Mail thinks your  
Python code is a quote.)

Ludvig "toxik" Ericson
[EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkhDQ3UACgkQXnZ94Kd6KaeQRQCZAYD/YM+BObyjqgia79ndTwp7
BNMAmwdhwl7ekn+Hwho69mnM981NBKV3
=leBJ
-END PGP SIGNATURE-

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Russell Keith-Magee

On Sun, Jun 1, 2008 at 8:20 PM, Tim Chase
<[EMAIL PROTECTED]> wrote:
>
>  > * On a similar note, I can see a lot of places where you
>  > seem to be copies of lists (or sublists) - this is an
>  > expensive operation if you do it a lot, especially since
>  > most cases you could get the same effect by keeping the
>  > index values and slicing the original list as required.
>  > Handling for row[:aggregate_start] seems to be the worst
>  > culprit here.
>
> My understanding is that slicing a list produces a new list,
> so I'm not sure this is a real gain.  Each object in the
> list is only held once (unless copy.deepcopy() is called)
> and the new (sub)list merely contains references to those
> object-ids, like any other python variable.

To my understanding, a slice isn't automatically a copy - it will be
in most cases, but there are cases where the bytecode compiler will
use the original list as an optimization. One example:

>>> s = 'abc'
 >>> t = s[:]
 >>> s is t
True
 >>> id(s)
3081872000L
 >>> id(t)
3081872000L

I'm willing to be corrected here, but my understanding was that for
loop iteration was one of those optimization cases.

Either way, part of my original comment was driving at cleaning up the
implementation code. I'm not convinced that having lots of variable
names representing relatively simple slices of an original list is
necessary for code clarity - I would contend that:

for x in intial_list[a:b]:
   do stuff

is easier to read than:

sub_list = initial_list[a:b]
for x in sub_list:
   do stuff

If the former case is an optimization, that makes the argument even
more convincing, but even if it isn't, I would argue that the cleanup
would be worthwhile.

> Yes, it would be nice if this worked because it's logically
> true...except that it doesn't work (at least in Postgres and
> MS-SQL Server):

Aw...crap. You are, of course, completely correct. I've been spending
too much time in MySQL of late, and I forgot that this was an
eccentricity of MySQL. Thanks for slapping me upside the head when I
needed it :-)

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Craig Ogg


On Sun, Jun 1, 2008 at 4:39 AM, Russell Keith-Magee
<[EMAIL PROTECTED]> wrote:
> SELECT author.id, author.name, author.age, COUNT(book.id)
>  FROM author INNER JOIN book ON author.id=book.author_id
>  GROUP BY author.id, author.name, author.age;
>
> is the same as
>
> SELECT author.id, author.name, author.age, COUNT(book.id)
>  FROM author INNER JOIN book ON author.id=book.author_id
>  GROUP BY author.id;
>

This is a MySQL only thing:

   http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html

Craig

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Nicolas Lara


hello,

On Sun, Jun 1, 2008 at 7:39 AM, Russell Keith-Magee
<[EMAIL PROTECTED]> wrote:
>
> On Sun, Jun 1, 2008 at 4:02 AM, Nicolas E. Lara G.
> <[EMAIL PROTECTED]> wrote:
>>
>> Hello,
>> Today I've commited what could be called the first working version of
>> aggregate support. For those of you not keeping track of  the project,
>> it can be found at: http://code.google.com/p/django-aggregation/
>
> Hi Nicolas,
>
> Looking good! Here's a few comments/review notes. Most are fairly
> minor stylistic things, but there's one big note at the end.
>

Thanks =)

> * There are a few occasions where you iterate over some data to create
> a dictionary, then use update to bulk insert into another dictionary.
> One example is the collation of *args with aliases for insertion into
> **kwargs is one; it also happens a few times in iterators. i.e.,
> rather than:
>
> newargs = {}
> for arg in args:
>newargs[arg.alias] = arg
> kwargs.update(newargs)
>
> just use:
>
> for arg in args:
>kwargs[arg.alias] = arg
>
> The second version runs faster, and easier to read IMHO.
>

True. I'll change that.

> * On a similar note, I can see a lot of places where you seem to be
> copies of lists (or sublists) - this is an expensive operation if you
> do it a lot, especially since most cases you could get the same effect
> by keeping the index values and slicing the original list as required.
> Handling for row[:aggregate_start] seems to be the worst culprit here.
>

I dont mind changing it to indexes. To me it is more readable with the
list copying/slicing. I might do a bit of profiling for that to see
the efficiency gain or simple change it to indexes. I'll write back
about this soon.

> * I noticed that when you use map(), you have comments about using
> list comprehensions as an alternative. I'd be inclined to simply use
> the list comprehension. List comprehension syntax is supported in
> Python 2.3, and we know map will be deprecated, so we might as well
> future proof ourselves from the outset.

Yes, That's probably /me not accepting the fact that map, filter and
reduce are going away (especially reduce!). I'll change them to list
comprehensions.

> * Be careful about generating efficient SQL, not just correct SQL. In
> particular, with the grouping clause: remember that:
>
> SELECT author.id, author.name, author.age, COUNT(book.id)
>  FROM author INNER JOIN book ON author.id=book.author_id
>  GROUP BY author.id, author.name, author.age;
>
> is the same as
>
> SELECT author.id, author.name, author.age, COUNT(book.id)
>  FROM author INNER JOIN book ON author.id=book.author_id
>  GROUP BY author.id;
>
> except that the latter will be much faster because group uniqueness is
> easier to compute.

Same comment as Tim. If we group by pk then we can only SELECT the pk.

>
> * The modeltests are starting to look pretty good; however, some of
> the comments need a little work to clarify exactly what is being
> returned. For example: "Average Author age" - is that the average of
> all authors, or an average age computed somehow for each author?. You
> don't need to go overboard, but being a little bit more verbose in
> these descriptions wouldn't be a bad thing - after all, this is
> documentation.
>
> * The description for COUNT DISTINCT shouldn't go into code detail -
> it should be describing what distinctness means in that context.
>
> * The tests in the regression suite need explanatory notes for the
> cases that they are testing
>


yes, I am very low on documentation at this point. Also tests need
some work. I'll fix the comments on the tests, add more regression
tests and expand the docs.


> * Now, the big one: values()['group']. This is an interesting idea,
> and I can see that it could be potentially useful, but the
> implementation you propose seems a little dangerous to me. As it
> stands, there's no way to distinguish a field 'group' from the
> queryset group, and I couldn't find any validation to prevent the user
> from naming a model field 'group' (which would be a pretty big
> restriction).
>
> Off the top of my head, I can think of two alternative syntaxes:
>
> 1) values('field','field', groups=True)
>
> The idea here is that using groups=True would modify values() to
> return a list of tuples, rather than a list of dictionaries. Each
> tuple would be a value 'row'; the first element is the usual values
> dictionary, the second element is the group members queryset.
>
> By using the kwarg, you remove the ambiguity on the input; by using
> the tuple you remove the ambiguity on output.
>
> 2) A ValueSet subclass - for want of a better name: GroupedValueSet,
> accessed via groups('field,'field')
>
> Essentially, this is the same idea as (1), but using a subclass rather
> than an argument to values().

Yes, I've been thinking of this because I knew that adding a 'tabu'
word (and one as ussual as 'group') was going to be an issue.
So far I like the idea of using something like:

values('field','field', groups='group_n

Re: Aggregation Updates

2008-06-01 Thread Tim Chase


 > * On a similar note, I can see a lot of places where you
 > seem to be copies of lists (or sublists) - this is an
 > expensive operation if you do it a lot, especially since
 > most cases you could get the same effect by keeping the
 > index values and slicing the original list as required.
 > Handling for row[:aggregate_start] seems to be the worst
 > culprit here.

My understanding is that slicing a list produces a new list,
so I'm not sure this is a real gain.  Each object in the
list is only held once (unless copy.deepcopy() is called)
and the new (sub)list merely contains references to those
object-ids, like any other python variable.

 > * Be careful about generating efficient SQL, not just
 > correct SQL. In particular, with the grouping clause:
 > remember that:
 >
 > SELECT author.id, author.name, author.age, COUNT(book.id)
 >   FROM author INNER JOIN book ON author.id=book.author_id
 >   GROUP BY author.id, author.name, author.age;
 >
 > is the same as
 >
 > SELECT author.id, author.name, author.age, COUNT(book.id)
 >   FROM author INNER JOIN book ON author.id=book.author_id
 >   GROUP BY author.id;
 >
 > except that the latter will be much faster because group 
uniqueness is
 > easier to compute.

Yes, it would be nice if this worked because it's logically
true...except that it doesn't work (at least in Postgres and
MS-SQL Server):

   -- issued in psql:
   SELECT s.id, s.statement_dt, count(sli.id)
   FROM mt_statement s
 INNER JOIN mt_statementlineitem sli
 ON sli.statement_id = s.id group by s.id;

   ERROR:  column "s.statement_dt" must appear in the
   GROUP BY clause or be used in an aggregate function

(doing a "SELECT Version()" returns "PostgreSQL 8.1.11 on
i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 20061115
(prerelease) (Debian 4.1.1-21)")

even though s.id is the primary key for mt2_statement.

I don't have a quick test DB in MySQL or sqlite to test on
hand at the moment.  Perhaps one of them is smarter?

-tim












--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

2008-06-01 Thread Russell Keith-Magee

On Sun, Jun 1, 2008 at 4:02 AM, Nicolas E. Lara G.
<[EMAIL PROTECTED]> wrote:
>
> Hello,
> Today I've commited what could be called the first working version of
> aggregate support. For those of you not keeping track of  the project,
> it can be found at: http://code.google.com/p/django-aggregation/

Hi Nicolas,

Looking good! Here's a few comments/review notes. Most are fairly
minor stylistic things, but there's one big note at the end.

* There are a few occasions where you iterate over some data to create
a dictionary, then use update to bulk insert into another dictionary.
One example is the collation of *args with aliases for insertion into
**kwargs is one; it also happens a few times in iterators. i.e.,
rather than:

newargs = {}
for arg in args:
newargs[arg.alias] = arg
kwargs.update(newargs)

just use:

for arg in args:
kwargs[arg.alias] = arg

The second version runs faster, and easier to read IMHO.

* On a similar note, I can see a lot of places where you seem to be
copies of lists (or sublists) - this is an expensive operation if you
do it a lot, especially since most cases you could get the same effect
by keeping the index values and slicing the original list as required.
Handling for row[:aggregate_start] seems to be the worst culprit here.

* I noticed that when you use map(), you have comments about using
list comprehensions as an alternative. I'd be inclined to simply use
the list comprehension. List comprehension syntax is supported in
Python 2.3, and we know map will be deprecated, so we might as well
future proof ourselves from the outset.

* Be careful about generating efficient SQL, not just correct SQL. In
particular, with the grouping clause: remember that:

SELECT author.id, author.name, author.age, COUNT(book.id)
  FROM author INNER JOIN book ON author.id=book.author_id
  GROUP BY author.id, author.name, author.age;

is the same as

SELECT author.id, author.name, author.age, COUNT(book.id)
  FROM author INNER JOIN book ON author.id=book.author_id
  GROUP BY author.id;

except that the latter will be much faster because group uniqueness is
easier to compute.

* The modeltests are starting to look pretty good; however, some of
the comments need a little work to clarify exactly what is being
returned. For example: "Average Author age" - is that the average of
all authors, or an average age computed somehow for each author?. You
don't need to go overboard, but being a little bit more verbose in
these descriptions wouldn't be a bad thing - after all, this is
documentation.

* The description for COUNT DISTINCT shouldn't go into code detail -
it should be describing what distinctness means in that context.

* The tests in the regression suite need explanatory notes for the
cases that they are testing

* Now, the big one: values()['group']. This is an interesting idea,
and I can see that it could be potentially useful, but the
implementation you propose seems a little dangerous to me. As it
stands, there's no way to distinguish a field 'group' from the
queryset group, and I couldn't find any validation to prevent the user
from naming a model field 'group' (which would be a pretty big
restriction).

Off the top of my head, I can think of two alternative syntaxes:

1) values('field','field', groups=True)

The idea here is that using groups=True would modify values() to
return a list of tuples, rather than a list of dictionaries. Each
tuple would be a value 'row'; the first element is the usual values
dictionary, the second element is the group members queryset.

By using the kwarg, you remove the ambiguity on the input; by using
the tuple you remove the ambiguity on output.

2) A ValueSet subclass - for want of a better name: GroupedValueSet,
accessed via groups('field,'field')

Essentially, this is the same idea as (1), but using a subclass rather
than an argument to values().

I'm open to other suggestions, however.

Russ %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Aggregation Updates

2008-05-31 Thread Nicolas E. Lara G.


Hello,
Today I've commited what could be called the first working version of
aggregate support. For those of you not keeping track of  the project,
it can be found at: http://code.google.com/p/django-aggregation/

Some words on the status of the project.

Working

 * Currently both annotate() and aggregate() queryset modifiers are
available.
 * aggregate() is a terminal clause and returns a dictionary.
 * annotate() returns a queryset. By default objects are grouped by
all the fields in the model.
 * Aggregate objects for the main SQL aggregation functions are
available.
 * Count aggregate object takes an extra parameter 'distinct' that
translates into the sql: COUNT(DISTINCT field)
 * It is possible to define custom grouping by calling
values('fields', 'to', 'group', 'by') before doing an annotate() call
 * Aliases for the aggregations are generated automatically when not
explicitly provided.
 * Filtering on aggregated values is possible. It results in a HAVING
sql clause. There are some interesting cases of filtering that can be
found in the doctests.
 * The related fields can now be refered to implicitly ( Count('book')
instead of Count('book__id') )

Not Working (yet)
---
 * The way of creating/extending aggregate functions is very basic.
More work is needed to make it easy to create new functions or modify
the existing ones. (there is no syntax or API defined for that yet)
 * There is, still, no way of expressing subqueries or aggregating on
annotated fields.

Not Tested
---
 * Model Inheritance

Extra
---
 I don't want to close this message without mentioning a new feature
that has not been discussed in the mailing list. Now when doing an
annotate call on a ValuesQuerySet (i.e. grouping on only some fields
of the model) there is an extra parameter for every object called
"group" that contains a QuerySet for retreiving the objects that have
been grouped to do the aggregation.
So:
>>> books = Book.objects.values('price').annotate(oldest=Max('authors__age'))
>>> books[0].get('price')
29.691
>>> books[0].get('group')
[, ]
>>> books
[
 {'price': 29.691, 'oldest': 37.0, 'group': [, ]},
 {'price': 75.0, 'oldest': 57.0, 'group': []},
 {'price': 82.797, 'oldest': 57.0, 'group': []},
 {'price': 23.09, 'oldest': 45.0, 'group': []},
 {'price': 30.0, 'oldest': 35.0, 'group': []}
]
(Yes, the querysets are lazyly evaluated)

I hope you all like this feature. I know I do.

Now I am going to spend some time documenting the current state before
moving onto the rest of the issues/features. I any of you can take a
look at it and play around to see if it fits your need it would be
great. Also if you have any problems, complains, feature requests,
suggestions, discussions to be made in the mailing list... please let
me know =)

Cheers,

Nicolas Lara
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Re: Aggregation Updates

Aggregation Updates

18 matches

Site Navigation

Mail list logo

Footer information