Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
> > > Talking through my hat?No, I think not -- I think that syntax > > (``queryset.groupby(field).max()``) > actually looks like the best proposal for aggregates I've seen thus far... > Sounds pretty good to me. Besides the usual min, max and such, I also like: queryset.groupby(field).stats() which would return a tuple with (min, max, average, stddev) for the specified field. > I'm taking this to django-dev for more discussion; it'll get seen by more the > right people there. > > Thoughts, anyone? > Add "improving aggregate support" to the Django Sprint planning for PyCon. I plan to participate and am willing to coordinate a team to do that. Hopefully that will encourage people to spend some time on the design ahead of time. I also promise to spend a day or two over the holidays looking over the design proposals and adding my thoughts. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
On 12/4/06 5:57 AM, John Lenton wrote: > The "max", "min" and other such functions might be a little more > problematic, unless groupby returned, rather than a generic iterator, > a special "queryset group" and give _it_ the max/min/etc methods. This > way it would be clear that max() returns a tuple (value, queryset) (to > me, at least...). Also, ...groupby('foo').max() would return the same > result as max(...groupby('foo')), but less efficiently. > > Talking through my hat? No, I think not -- I think that syntax (``queryset.groupby(field).max()``) actually looks like the best proposal for aggregates I've seen thus far... I'm taking this to django-dev for more discussion; it'll get seen by more the right people there. Thoughts, anyone? Jacob --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
On 12/1/06, Russell Keith-Magee <[EMAIL PROTECTED]> wrote: > > One way to think about the problem is to consider how you would write > the documentation for it. "Django implements an object based SQL > wrapper... except for the aggregations stuff, which you will need to > know SQL to use properly". If the documentation sounds like it will be > ugly, so is the implementation :-) > > So; lots to think about, but don't let that discourage you. As this > thread has shown, there is plenty of interest in having aggregates - > the discussion will probably be long, but if we can get something > productive out of it, Django will be all the better for it. Me myself, I think that the "group by" functionality isn't a problem; if you look at how itertools.groupby works, it would be both easy and natural (ie pythonic) to give querysets a groupby function with similar semantics and laziness. The "max", "min" and other such functions might be a little more problematic, unless groupby returned, rather than a generic iterator, a special "queryset group" and give _it_ the max/min/etc methods. This way it would be clear that max() returns a tuple (value, queryset) (to me, at least...). Also, ...groupby('foo').max() would return the same result as max(...groupby('foo')), but less efficiently. Talking through my hat? -- John Lenton ([EMAIL PROTECTED]) -- Random fortune: The trouble with a lot of self-made men is that they worship their creator. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
On 12/2/06, Rob Hudson <[EMAIL PROTECTED]> wrote: > > But isn't it also dangerous to code (or not code) for future cases that > may or may never come? If a non-relational database backend isn't > anywhere on the current horizon, why not code aggregates and groups to > the current usage and break BC when they arrive, possibly at the Django > 2.0 transition? Just a devils advocate thought. Agreed; YAGNI is a valid concern. However, the Django ORM has gone to such length to keep it self clean and object based - breaking the metaphor for one feature would be a great shame. One way to think about the problem is to consider how you would write the documentation for it. "Django implements an object based SQL wrapper... except for the aggregations stuff, which you will need to know SQL to use properly". If the documentation sounds like it will be ugly, so is the implementation :-) So; lots to think about, but don't let that discourage you. As this thread has shown, there is plenty of interest in having aggregates - the discussion will probably be long, but if we can get something productive out of it, Django will be all the better for it. Yours, Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
On 12/1/06 11:40 AM, Rob Hudson wrote: >> 4 - If you search the archives (user and developer), you will find several >> discussions on aggregate functions. group_by() and having() (or >> pre-magic-removal analogs thereof) have been rejected previously on the >> grounds that the Django ORM is not intended to be 'SQL with a different >> syntax'. Any proposal for group_by/having will have to be logical from a >> Django ORM point of view, and not presuppose/require knowledge of how SQL >> formulates queries. Indeed, and that's been the biggest thing keeping aggregates/grouping from Django's ORM. I could really use 'em myself, but I'm not going to just kludge something on that doesn't fit with Django's overall philosophy. Quite a lot of the problem in cases like this is syntax; if someone comes up with a clean, understandable syntax for doing aggregates -- in a way that makes sense even to those who don't really know SQL -- I'll be totally behind it. And at that point, FYI, you'll want to take the discussion to django-dev where it will get a little more attention. Jacob --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
Thanks for the reply, Russell. It's obviously a lot more complex and detailed than simply adding a min() where count() is. :) A couple thoughts... > 4 - If you search the archives (user and developer), you will find several > discussions on aggregate functions. group_by() and having() (or > pre-magic-removal analogs thereof) have been rejected previously on the > grounds that the Django ORM is not intended to be 'SQL with a different > syntax'. Any proposal for group_by/having will have to be logical from a > Django ORM point of view, and not presuppose/require knowledge of how SQL > formulates queries. Here's a quote from you in another thread about this: [quote] It was made clear to me then that 'SQL does it like X, so lets add X to Django' wouldn't win me any points. Django's ORM isn't about finding a way of representing SQL as Python - it's about getting a consistent, expressive object model, that just happens to be backed by a SQL database. Keep in mind that it could just as well be backed by an object database, or some other persistent store. What will happen to SQL notation if SQL isn't available? [endquote] Source: http://groups.google.com/group/django-developers/browse_frm/thread/245a37912cf8d4e3/64473bd51d00ff84#64473bd51d00ff84 I think that puts more perspective on the idea. Considering that the backend might be something other than a relational database at some point and should still use the same Django database API to access that data does mean a complete separation of SQL-like ideas and notation. Before I had read that my thoughts were that if the user knows they need calculated averages and sums, one could assume they already know enough about their data and how it's stored to let SQL bleed into the ORM to some degree. But now I think the case is well made. But isn't it also dangerous to code (or not code) for future cases that may or may never come? If a non-relational database backend isn't anywhere on the current horizon, why not code aggregates and groups to the current usage and break BC when they arrive, possibly at the Django 2.0 transition? Just a devils advocate thought. > However, here are some issues to consider: Yes, lots of issues to consider. Lesson learned: Search the archives before proposing an idea. :) But as I look back, there have been many great discussions. It gives me more faith in Django and the Django developers that they hold their code and features in such high regard. Cheers! Rob --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
I'd like to see this type of support in the main branch, not separated. It seems that better support for floating point is just a deficiency in Django today and the aggregation need crops up everywhere - not just in scientific applications. My needs for aggregation are simply for reporting: e.g. show the market value and P of all 2000 positions in our fund grouped by strategy, analyst, sector, etc. I've been experimenting with three different ways: - custom SQL: easy to write the queries but doesn't leverage Django models at all. I can't reference fields on related objects since I'm not going through the manager and I can't "reuse" common QuerySet helpers to ensure I'm always doing the same basic select and filtering. - aggregation in Python: I've written a group_by() function to take a QuerySet and perform aggregation, returning a collection of objects that have properties compatible with the model fields. This makes it more natural to use the results of the grouping in a template, but it still doesn't handle related objects and the aggregation in Python isn't as efficient as the database can do it. - wrapping the DB API sql clause: The idea here is to generate the SQL experssion that does the aggregation as an outer select and use the resulting QuerySet sql clause as the subselect that yields the rows for the aggregation. The nice thing is that it would completely reuse the QuerySet and still do aggregation on the DB. But it still returns just a DB cursor which has no connection back to the Django model classes. I'd love to see more attention on this topic. At first I was surprised that aggregation isn't supported by the DB API since it seems so elementary to any database API, but after playing around with a few ideas, I can see its a harder problem than I originally thought. -Dave --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
Yeah it is all coming back to me. I was unwilling to answer all of these questions and create the perfect solution (which may not exist) and therefore we don't have aggregates in Django even though I demonstrated that a straightforward implementation was possible way back when. Thanks for backing me up on that Russ. ;) What I am pondering in the meantime is whether or not to do a fork of Django someday that concentrates more on scientific presentations rather than newspapers. In DjangoSci (or perhaps TechnoDjango) there would be a lot of attention to data reduction, statistical processing, queries oriented towards returning graphable data sets and, of course, true floating point data representations in the database. I am willing to wait for Malcom to finish his work before making this monumental move. I will also look over SQLAlchemy. The desire to mold a version of Django (or Django itself) to better handle the needs of the technical/scientific market does not in any way represent a slap at Django itself or its' many paid and volunteer developers. Django rocks! It has become a core capability in my development group. We already have 3 elaborate and highly visible internal applications based on Django and our ability to quickly respond to requests for improvements has changed the dynamics of how our entire division does much of its work. Rock On Nov 29, 11:51 pm, "Russell Keith-Magee" <[EMAIL PROTECTED]> wrote: > On 11/30/06, Rob Hudson <[EMAIL PROTECTED]> wrote: > > > I think for those who need aggregate data these would cover a lot of > > ground. I'd be willing to work on a patch if this is considered > > generally useful and we can work out what the API should look like.1 - I'm > > agreed on the need for easier access to aggregates. Truth be told, > aggregates are the reason I got involved with Django in the first place. > However, other priorities have arisen in the meantime, so I haven't got > around to doing anything about them. > > 2 - Keep in mind that Malcolm has been working on refactoring > django.db.models.query. Until this refactor is committed, we are trying to > minimize the number of large changes to query.py. > > 3 - Also keep in mind that one of the goals of the SQLAlchemy branch is to > make complex aggregates (such as those requiring group_by and having) easier > to represent. That said, there doesn't appear to have been a lot of progress > on this branch (at least, not in public commits, anyway). > > 4 - If you search the archives (user and developer), you will find several > discussions on aggregate functions. group_by() and having() (or > pre-magic-removal analogs thereof) have been rejected previously on the > grounds that the Django ORM is not intended to be 'SQL with a different > syntax'. Any proposal for group_by/having will have to be logical from a > Django ORM point of view, and not presuppose/require knowledge of how SQL > formulates queries. > > 5 - The aggregates you suggest are the quick and obvious method for getting > aggregates into the query language. However, here are some issues to > consider: > > Article.objects.count() return an integer that is the count of all author > objects. This makes sense, and syntactically parses the same way that it > operates. > > However, what does Article.objects.max('pagecount') return? The integer that > is the largest page count, or the Article that has the largest pagecount? > > If it is the former, how do you use the maximum value to get the Article > with that maximum value in a single query? > > If it is the latter, does it return a single object, or a queryset that > evaluates to an object? > > What happens if there are two objects with the same maximum pagecount? > > How do you get multiple aggregates for a value in a single query (efficiency > matters)? > > How does the simple case fit into the big picture? Ideally, the simple min() > would be a degenerate case of the min-with-group by-and-having. Prove to me > that adding min(), max(), etc isn't going to become a wart that we have to > support into the future when 'aggregate clauses 3000' is added to Django's > query syntax. > > So, as you can see - it's not as simple as 'just add a min() where count() > is already'. > > Like I said at the beginning, I'm keen to see aggregates implemented - I > just want to see them done right. There are many things that _could_ be done > to implement aggregates; whether they are the _right_ thing to do is another > matter entirely. I'm open to any discussion on this issue, and would be > happy to help shepard any patches resulting from the discussion into the > trunk. > > Yours, > Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
On 11/30/06, Rob Hudson <[EMAIL PROTECTED]> wrote: > > > I think for those who need aggregate data these would cover a lot of > ground. I'd be willing to work on a patch if this is considered > generally useful and we can work out what the API should look like. > > 1 - I'm agreed on the need for easier access to aggregates. Truth be told, aggregates are the reason I got involved with Django in the first place. However, other priorities have arisen in the meantime, so I haven't got around to doing anything about them. 2 - Keep in mind that Malcolm has been working on refactoring django.db.models.query. Until this refactor is committed, we are trying to minimize the number of large changes to query.py. 3 - Also keep in mind that one of the goals of the SQLAlchemy branch is to make complex aggregates (such as those requiring group_by and having) easier to represent. That said, there doesn't appear to have been a lot of progress on this branch (at least, not in public commits, anyway). 4 - If you search the archives (user and developer), you will find several discussions on aggregate functions. group_by() and having() (or pre-magic-removal analogs thereof) have been rejected previously on the grounds that the Django ORM is not intended to be 'SQL with a different syntax'. Any proposal for group_by/having will have to be logical from a Django ORM point of view, and not presuppose/require knowledge of how SQL formulates queries. 5 - The aggregates you suggest are the quick and obvious method for getting aggregates into the query language. However, here are some issues to consider: Article.objects.count() return an integer that is the count of all author objects. This makes sense, and syntactically parses the same way that it operates. However, what does Article.objects.max('pagecount') return? The integer that is the largest page count, or the Article that has the largest pagecount? If it is the former, how do you use the maximum value to get the Article with that maximum value in a single query? If it is the latter, does it return a single object, or a queryset that evaluates to an object? What happens if there are two objects with the same maximum pagecount? How do you get multiple aggregates for a value in a single query (efficiency matters)? How does the simple case fit into the big picture? Ideally, the simple min() would be a degenerate case of the min-with-group by-and-having. Prove to me that adding min(), max(), etc isn't going to become a wart that we have to support into the future when 'aggregate clauses 3000' is added to Django's query syntax. So, as you can see - it's not as simple as 'just add a min() where count() is already'. Like I said at the beginning, I'm keen to see aggregates implemented - I just want to see them done right. There are many things that _could_ be done to implement aggregates; whether they are the _right_ thing to do is another matter entirely. I'm open to any discussion on this issue, and would be happy to help shepard any patches resulting from the discussion into the trunk. Yours, Russ Magee %-) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
On Nov 29, 2:30 pm, "Jeremy Dunck" <[EMAIL PROTECTED]> wrote: > > I needed aggregates. (I also learned about data bubbles and redesigned > > my tables to include them as necessary. This redesign eliminated almost > > all of my needs for an aggregate function interface.)Whatsa data bubble? > > Google and Wikipedia don't seem to know... google search: "data bubble" database --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Suggestion: Aggregate/Grouping/Calculated methods in Django ORM
I created such a patch last spring during the Django sprint at PyCon. The basic interface was very straightforward but there was also a slightly less straightforward interface option that allowed for grouping and so forth. The patch was discarded, however, since some of the core Django developers wanted to chime in on the interface design for aggregate functions, but felt they didn't have time to do so until after 0.95 was complete. Rather than fight for my design (which I was not particularly passionate about anyway), I just went home and used plain old SQL when I needed aggregates. (I also learned about data bubbles and redesigned my tables to include them as necessary. This redesign eliminated almost all of my needs for an aggregate function interface.) I am still interested in this topic, but I haven't had the personal bandwidth to stay on top of the Django Developer's group. It would be nice to know what, if anything, is happening along these lines. I might even be willing to spend some time on this during the holidays. Rock Howard --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---