Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-12-04 Thread Rock


>
> > Talking through my hat?No, I think not -- I think that syntax 
> > (``queryset.groupby(field).max()``)
> actually looks like the best proposal for aggregates I've seen thus far...
>

Sounds pretty good to me. Besides the usual min, max and such, I also
like:
queryset.groupby(field).stats()
which would return a tuple with (min, max, average, stddev) for the
specified field.

> I'm taking this to django-dev for more discussion; it'll get seen by more the
> right people there.
>
> Thoughts, anyone?
>

Add "improving aggregate support" to the Django Sprint planning for
PyCon. I plan to participate and am willing to coordinate a team to do
that. Hopefully that will encourage people to spend some time on the
design ahead of time. I also promise to spend a day or two over the
holidays looking over the design proposals and adding my thoughts.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-12-04 Thread Jacob Kaplan-Moss

On 12/4/06 5:57 AM, John Lenton wrote:
> The "max", "min" and other such functions might be a little more
> problematic, unless groupby returned, rather than a generic iterator,
> a special "queryset group" and give _it_ the max/min/etc methods. This
> way it would be clear that max() returns a tuple (value, queryset) (to
> me, at least...). Also, ...groupby('foo').max() would return the same
> result as max(...groupby('foo')), but less efficiently.
> 
> Talking through my hat?

No, I think not -- I think that syntax (``queryset.groupby(field).max()``) 
actually looks like the best proposal for aggregates I've seen thus far...

I'm taking this to django-dev for more discussion; it'll get seen by more the 
right people there.

Thoughts, anyone?

Jacob

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-12-04 Thread John Lenton

On 12/1/06, Russell Keith-Magee <[EMAIL PROTECTED]> wrote:
>
> One way to think about the problem is to consider how you would write
> the documentation for it. "Django implements an object based SQL
> wrapper... except for the aggregations stuff, which you will need to
> know SQL to use properly". If the documentation sounds like it will be
> ugly, so is the implementation :-)
>
> So; lots to think about, but don't let that discourage you. As this
> thread has shown, there is plenty of interest in having aggregates -
> the discussion will probably be long, but if we can get something
> productive out of it, Django will be all the better for it.

Me myself, I think that the "group by" functionality isn't a problem;
if you look at how itertools.groupby works, it would be both easy and
natural (ie pythonic) to give querysets a groupby function with
similar semantics and laziness.

The "max", "min" and other such functions might be a little more
problematic, unless groupby returned, rather than a generic iterator,
a special "queryset group" and give _it_ the max/min/etc methods. This
way it would be clear that max() returns a tuple (value, queryset) (to
me, at least...). Also, ...groupby('foo').max() would return the same
result as max(...groupby('foo')), but less efficiently.

Talking through my hat?

-- 
John Lenton ([EMAIL PROTECTED]) -- Random fortune:
The trouble with a lot of self-made men is that they worship their creator.

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-12-01 Thread Russell Keith-Magee

On 12/2/06, Rob Hudson <[EMAIL PROTECTED]> wrote:
>
> But isn't it also dangerous to code (or not code) for future cases that
> may or may never come?  If a non-relational database backend isn't
> anywhere on the current horizon, why not code aggregates and groups to
> the current usage and break BC when they arrive, possibly at the Django
> 2.0 transition?  Just a devils advocate thought.

Agreed; YAGNI is a valid concern. However, the Django ORM has gone to
such length to keep it self clean and object based - breaking the
metaphor for one feature would be a great shame.

One way to think about the problem is to consider how you would write
the documentation for it. "Django implements an object based SQL
wrapper... except for the aggregations stuff, which you will need to
know SQL to use properly". If the documentation sounds like it will be
ugly, so is the implementation :-)

So; lots to think about, but don't let that discourage you. As this
thread has shown, there is plenty of interest in having aggregates -
the discussion will probably be long, but if we can get something
productive out of it, Django will be all the better for it.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-12-01 Thread Jacob Kaplan-Moss

On 12/1/06 11:40 AM, Rob Hudson wrote:
>> 4 - If you search the archives (user and developer), you will find several
>> discussions on aggregate functions. group_by() and having() (or
>> pre-magic-removal analogs thereof) have been rejected previously on the
>> grounds that the Django ORM is not intended to be 'SQL with a different
>> syntax'. Any proposal for group_by/having will have to be logical from a
>> Django ORM point of view, and not presuppose/require knowledge of how SQL
>> formulates queries.

Indeed, and that's been the biggest thing keeping aggregates/grouping from 
Django's ORM.  I could really use 'em myself, but I'm not going to just kludge 
something on that doesn't fit with Django's overall philosophy.

Quite a lot of the problem in cases like this is syntax; if someone comes up 
with a clean, understandable syntax for doing aggregates -- in a way that 
makes sense even to those who don't really know SQL -- I'll be totally behind 
it.

And at that point, FYI, you'll want to take the discussion to django-dev where 
it will get a little more attention.

Jacob

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-12-01 Thread Rob Hudson

Thanks for the reply, Russell.  It's obviously a lot more complex and
detailed than simply adding a min() where count() is.  :)

A couple thoughts...

> 4 - If you search the archives (user and developer), you will find several
> discussions on aggregate functions. group_by() and having() (or
> pre-magic-removal analogs thereof) have been rejected previously on the
> grounds that the Django ORM is not intended to be 'SQL with a different
> syntax'. Any proposal for group_by/having will have to be logical from a
> Django ORM point of view, and not presuppose/require knowledge of how SQL
> formulates queries.

Here's a quote from you in another thread about this:
[quote]
It was made clear to me then that 'SQL does it like X, so lets add X to
Django' wouldn't win me any points. Django's ORM isn't about finding a
way of representing SQL as Python - it's about getting a consistent,
expressive object model, that just happens to be backed by a SQL
database. Keep in mind that it could just as well be backed by an
object database, or some other persistent store. What will happen to
SQL notation if SQL isn't available?
[endquote]
Source:
http://groups.google.com/group/django-developers/browse_frm/thread/245a37912cf8d4e3/64473bd51d00ff84#64473bd51d00ff84

I think that puts more perspective on the idea.  Considering that the
backend might be something other than a relational database at some
point and should still use the same Django database API to access that
data does mean a complete separation of SQL-like ideas and notation.

Before I had read that my thoughts were that if the user knows they
need calculated averages and sums, one could assume they already know
enough about their data and how it's stored to let SQL bleed into the
ORM to some degree.  But now I think the case is well made.

But isn't it also dangerous to code (or not code) for future cases that
may or may never come?  If a non-relational database backend isn't
anywhere on the current horizon, why not code aggregates and groups to
the current usage and break BC when they arrive, possibly at the Django
2.0 transition?  Just a devils advocate thought.

> However, here are some issues to consider:

Yes, lots of issues to consider.  Lesson learned: Search the archives
before proposing an idea.  :)  But as I look back, there have been many
great discussions.  It gives me more faith in Django and the Django
developers that they hold their code and features in such high regard.
 
Cheers!
Rob


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-12-01 Thread DavidA

I'd like to see this type of support in the main branch, not separated.
It seems that better support for floating point is just a deficiency in
Django today and the aggregation need crops up everywhere - not just in
scientific applications.

My needs for aggregation are simply for reporting: e.g. show the market
value and P of all 2000 positions in our fund grouped by strategy,
analyst, sector, etc. I've been experimenting with three different
ways:

- custom SQL: easy to write the queries but doesn't leverage Django
models at all. I can't reference fields on related objects since I'm
not going through the manager and I can't "reuse" common QuerySet
helpers to ensure I'm always doing the same basic select and filtering.

- aggregation in Python: I've written a group_by() function to take a
QuerySet and perform aggregation, returning a collection of objects
that have properties compatible with the model fields. This makes it
more natural to use the results of the grouping in a template, but it
still doesn't handle related objects and the aggregation in Python
isn't as efficient as the database can do it.

- wrapping the DB API sql clause: The idea here is to generate the SQL
experssion that does the aggregation as an outer select and use the
resulting QuerySet sql clause as the subselect that yields the rows for
the aggregation. The nice thing is that it would completely reuse the
QuerySet and still do aggregation on the DB. But it still returns just
a DB cursor which has no connection back to the Django model classes.

I'd love to see more attention on this topic. At first I was surprised
that aggregation isn't supported by the DB API since it seems so
elementary to any database API, but after playing around with a few
ideas, I can see its a harder problem than I originally thought.

-Dave


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-11-30 Thread Rock

Yeah it is all coming back to me. I was unwilling to answer all of
these questions and create the perfect solution (which may not exist)
and therefore we don't have aggregates in Django even though I
demonstrated that a straightforward implementation was possible way
back when. Thanks for backing me up on that Russ. ;)

What I am pondering in the meantime is whether or not to do a fork of
Django someday that concentrates more on scientific presentations
rather than newspapers. In DjangoSci (or perhaps TechnoDjango) there
would be a lot of attention to data reduction, statistical processing,
queries oriented towards returning graphable data sets and, of course,
true floating point data representations in the database. I am willing
to wait for Malcom to finish his work before making this monumental
move. I will also look over SQLAlchemy.

The desire to mold a version of Django (or Django itself) to better
handle the needs of the technical/scientific market does not in any way
represent a slap at Django itself or its' many paid and volunteer
developers. Django rocks! It has become a core capability in my
development group. We already have 3 elaborate and highly visible
internal applications based on Django and our ability to quickly
respond to requests for improvements has changed the dynamics of how
our entire division does much of its work.


Rock


On Nov 29, 11:51 pm, "Russell Keith-Magee" <[EMAIL PROTECTED]>
wrote:
> On 11/30/06, Rob Hudson <[EMAIL PROTECTED]> wrote:
>
> > I think for those who need aggregate data these would cover a lot of
> > ground.  I'd be willing to work on a patch if this is considered
> > generally useful and we can work out what the API should look like.1 - I'm 
> > agreed on the need for easier access to aggregates. Truth be told,
> aggregates are the reason I got involved with Django in the first place.
> However, other priorities have arisen in the meantime, so I haven't got
> around to doing anything about them.
>
> 2 - Keep in mind that Malcolm has been working on refactoring
> django.db.models.query. Until this refactor is committed, we are trying to
> minimize the number of large changes to query.py.
>
> 3 - Also keep in mind that one of the goals of the SQLAlchemy branch is to
> make complex aggregates (such as those requiring group_by and having) easier
> to represent. That said, there doesn't appear to have been a lot of progress
> on this branch (at least, not in public commits, anyway).
>
> 4 - If you search the archives (user and developer), you will find several
> discussions on aggregate functions. group_by() and having() (or
> pre-magic-removal analogs thereof) have been rejected previously on the
> grounds that the Django ORM is not intended to be 'SQL with a different
> syntax'. Any proposal for group_by/having will have to be logical from a
> Django ORM point of view, and not presuppose/require knowledge of how SQL
> formulates queries.
>
> 5 - The aggregates you suggest are the quick and obvious method for getting
> aggregates into the query language. However, here are some issues to
> consider:
>
> Article.objects.count() return an integer that is the count of all author
> objects. This makes sense, and syntactically parses the same way that it
> operates.
>
> However, what does Article.objects.max('pagecount') return? The integer that
> is the largest page count, or the Article that has the largest pagecount?
>
> If it is the former, how do you use the maximum value to get the Article
> with that maximum value in a single query?
>
> If it is the latter, does it return a single object, or a queryset that
> evaluates to an object?
>
> What happens if there are two objects with the same maximum pagecount?
>
> How do you get multiple aggregates for a value in a single query (efficiency
> matters)?
>
> How does the simple case fit into the big picture? Ideally, the simple min()
> would be a degenerate case of the min-with-group by-and-having. Prove to me
> that adding min(), max(), etc isn't going to become a wart that we have to
> support into the future when 'aggregate clauses 3000' is added to Django's
> query syntax.
>
> So, as you can see - it's not as simple as 'just add a min() where count()
> is already'.
>
> Like I said at the beginning, I'm keen to see aggregates implemented - I
> just want to see them done right. There are many things that _could_ be done
> to implement aggregates; whether they are the _right_ thing to do is another
> matter entirely. I'm open to any discussion on this issue, and would be
> happy to help shepard any patches resulting from the discussion into the
> trunk.
> 
> Yours,
> Russ Magee %-)


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 

Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-11-29 Thread Russell Keith-Magee
On 11/30/06, Rob Hudson <[EMAIL PROTECTED]> wrote:
>
>
> I think for those who need aggregate data these would cover a lot of
> ground.  I'd be willing to work on a patch if this is considered
> generally useful and we can work out what the API should look like.
>
>
1 - I'm agreed on the need for easier access to aggregates. Truth be told,
aggregates are the reason I got involved with Django in the first place.
However, other priorities have arisen in the meantime, so I haven't got
around to doing anything about them.

2 - Keep in mind that Malcolm has been working on refactoring
django.db.models.query. Until this refactor is committed, we are trying to
minimize the number of large changes to query.py.

3 - Also keep in mind that one of the goals of the SQLAlchemy branch is to
make complex aggregates (such as those requiring group_by and having) easier
to represent. That said, there doesn't appear to have been a lot of progress
on this branch (at least, not in public commits, anyway).

4 - If you search the archives (user and developer), you will find several
discussions on aggregate functions. group_by() and having() (or
pre-magic-removal analogs thereof) have been rejected previously on the
grounds that the Django ORM is not intended to be 'SQL with a different
syntax'. Any proposal for group_by/having will have to be logical from a
Django ORM point of view, and not presuppose/require knowledge of how SQL
formulates queries.

5 - The aggregates you suggest are the quick and obvious method for getting
aggregates into the query language. However, here are some issues to
consider:

Article.objects.count() return an integer that is the count of all author
objects. This makes sense, and syntactically parses the same way that it
operates.

However, what does Article.objects.max('pagecount') return? The integer that
is the largest page count, or the Article that has the largest pagecount?

If it is the former, how do you use the maximum value to get the Article
with that maximum value in a single query?

If it is the latter, does it return a single object, or a queryset that
evaluates to an object?

What happens if there are two objects with the same maximum pagecount?

How do you get multiple aggregates for a value in a single query (efficiency
matters)?

How does the simple case fit into the big picture? Ideally, the simple min()
would be a degenerate case of the min-with-group by-and-having. Prove to me
that adding min(), max(), etc isn't going to become a wart that we have to
support into the future when 'aggregate clauses 3000' is added to Django's
query syntax.

So, as you can see - it's not as simple as 'just add a min() where count()
is already'.

Like I said at the beginning, I'm keen to see aggregates implemented - I
just want to see them done right. There are many things that _could_ be done
to implement aggregates; whether they are the _right_ thing to do is another
matter entirely. I'm open to any discussion on this issue, and would be
happy to help shepard any patches resulting from the discussion into the
trunk.

Yours,
Russ Magee %-)


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---


Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-11-29 Thread Rock



On Nov 29, 2:30 pm, "Jeremy Dunck" <[EMAIL PROTECTED]> wrote:
> > I needed aggregates. (I also learned about data bubbles and redesigned
> > my tables to include them as necessary. This redesign eliminated almost
> > all of my needs for an aggregate function interface.)Whatsa data bubble?  
> > Google and Wikipedia don't seem to know...

google search:
  "data bubble" database


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM

2006-11-29 Thread Rock


I created such a patch last spring during the Django sprint at PyCon.
The basic interface was very straightforward but there was also a
slightly less straightforward interface option that allowed for
grouping and so forth. The patch was discarded, however, since some of
the core Django developers wanted to chime in on the interface design
for aggregate functions, but felt they didn't have time to do so until
after 0.95 was complete.

Rather than fight for my design (which I was not particularly
passionate about anyway), I just went home and used plain old SQL when
I needed aggregates. (I also learned about data bubbles and redesigned
my tables to include them as necessary. This redesign eliminated almost
all of my needs for an aggregate function interface.)

I am still interested in this topic, but I haven't had the personal
bandwidth to stay on top of the Django Developer's group. It would be
nice to know what, if anything, is happening along these lines. I might
even be willing to spend some time on this during the holidays.

Rock Howard


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---