Re: [Django] #25834: Better expose ORM grouping semantics

2015-11-30 Thread Django
#25834: Better expose ORM grouping semantics
-+-
 Reporter:  rtpg |Owner:  nobody
 Type:   |   Status:  closed
  Cleanup/optimization   |
Component:  Database layer   |  Version:  1.8
  (models, ORM)  |
 Severity:  Normal   |   Resolution:  invalid
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by timgraham):

 * status:  new => closed
 * resolution:   => invalid


Comment:

 Closing as "invalid" since this is a high level discussion that needs
 smaller, more actionable tickets pending the outcome of the mailing list
 discussions.

--
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/062.d7b5721bb825503314c2dcac09b6e4cb%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Django] #25834: Better expose ORM grouping semantics

2015-11-29 Thread Django
#25834: Better expose ORM grouping semantics
-+-
 Reporter:  rtpg |Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  1.8
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by akaariai):

 I think we have room for improvement here. Sometimes you need to do
 constructs like .values().annotate().values(), where the first values() is
 used to set the group by, the second one to select the columns you want.
 This is a somewhat non-obvious API. Also, in some cases you know exactly
 the group by you want, but there is no way to actually set the group by.

 I know some users use the private qs.query.group_by directly. Maybe we
 could expose some API to do exactly that? The API would need to come with
 a warning that you are completely responsible for setting the correct
 group by, if you set it manually.

 Agreed that this belongs to the mailing list, so lets continue there.

--
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/062.679ec69ba4c68619f936dcd5f9d021a4%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Django] #25834: Better expose ORM grouping semantics

2015-11-29 Thread Django
#25834: Better expose ORM grouping semantics
-+-
 Reporter:  rtpg |Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  1.8
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by shaib):

 TL;DR. Just dropped in to say that this sort of discussion belongs on the
 DevelopersMailingList where many more people will see it.

--
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/062.092342708bce2eb580b5e76c3c53cca4%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Django] #25834: Better expose ORM grouping semantics

2015-11-29 Thread Django
#25834: Better expose ORM grouping semantics
-+-
 Reporter:  rtpg |Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  1.8
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by jarshwah):

 * needs_better_patch:   => 0
 * needs_tests:   => 0
 * needs_docs:   => 0


Comment:

 Hi rtpg, thanks for taking the time to write this all out. Let me respond
 to some of this in line, and we can keep the conversation going from
 there.

 > So when I annotate with a mixture of "grouping required" and "adding a
 select" , then everything becomes "grouping required", and in the previous
 example, suddenly I'm grouping by currency AND price. I think this is the
 only valid behaviour if you want to return something, but shouldn't this
 just fail?

 This was actually pretty surprising to me, but not for the reason you
 mention. I would have thought that the extra alias you were annotating
 would have been added to the SELECT list AND the GROUP BY clauses.
 GROUPing is usually required for any non-aggregate columns used in the
 SELECT list (and sometimes the ORDER BY too).

 > (Some crazier stuff is when you annotate with Sum and it simply groups
 by _all_ of your model's fields. I think the reason this doesn't fail is
 for the motivating example in the documentation (of counting the min price
 of the books in a store), but this seems like something that should fail)

 Grouping by all the fields is useful for aggregating over a related model.
 This shouldn't be an error condition.

 > My understanding is that Django's ORM is coming more to terms with the
 fact that it's basically always going to be paired with a RDBMS, so this
 multi-meaning annotate being the only reasonable way to use grouping is a
 hard sell to me.

 Fair enough, and I can sympathise. I've got a lot of experience with the
 ORM and with aggregation in general, and sometimes even I can mess up
 these queries. I find it easier to think in terms of SQL, and then work
 back from there. This isn't what an ORM should be going, but I'm not sure
 how others approach aggregation with Django's ORM.

 > I very much want to get rid of all of our custom SQL, but I have to
 spend a day working on one query, all because annotate ends up being 2
 separate functions:
 >- "group by previous values call and add aggregation function to select"
 >- "add a select clause"

 I think it helps to think of annotate as the following: "adds an
 expression to the query" which in most cases simply adds a new
 column/calculation to the SELECT list. Annotate isn't responsible for
 generating the GROUP BY. The SQL Compiler generates the GROUP BY based on
 the existence of an "aggregate" (an expression with
 `contains_aggregate=True`) in the SELECT list.

 > Not to mention that the pairing with values means that if I want to
 group by a complex query, I have to do something like:

 {{{
 Model.objects.values('foo')
 .annotate(grouping_property=complex_thing)
 .annotate(Sum('stuff'))
 .annotate(grouping_property=complex_thing)`
 }}}

 First, I think you've added an extra `.annotate` clause at the end by
 mistake. Second, I don't think this query is correct. You should be
 annotating the grouping property, and then using `values` to include this
 new property:

 {{{
 Model.objects
 .annotate(grouping_property=complex_thing)
 .values('foo', 'grouping_property') # grouping_property will also be in
 the SELECT list now
 .annotate(Sum('stuff'))
 }}}

 The idea is to support expressions within `values` though, so the ideal
 queryset using existing API (and an updated values() method) would be:

 {{{
 Model.objects
 .values('foo', grouping_property=complex_thing)
 .annotate(Sum('stuff'))
 }}}

 > In my ideal world, you would have `group(by=columns_or_expressions,
 into=aggregation_things)` which would be somewhat equivalent to "group by
 the columns in by (something like values), and annotate your select with
 the into clause)",

 I don't think this API would be a good one for a few reasons. Figuring out
 what to put in a GROUP BY clause is not trivial, is different on most
 backends, and would require specialised knowledge of SQL. The Django ORM
 is designed to be easy to construct queries, but also abstracted from the
 database enough that users should not need to know SQL.

 Personally I think everyon

[Django] #25834: Better expose ORM grouping semantics

2015-11-29 Thread Django
#25834: Better expose ORM grouping semantics
--+
 Reporter:  rtpg  |  Owner:  nobody
 Type:  Cleanup/optimization  | Status:  new
Component:  Database layer (models, ORM)  |Version:  1.8
 Severity:  Normal|   Keywords:
 Triage Stage:  Unreviewed|  Has patch:  0
Easy pickings:  0 |  UI/UX:  0
--+
 Currently, what QuerySet's `annotate` does is entirely dependent on opaque
 semantics that don't hold up to well when working on more complex
 situations.

 Let's say I have a `LineItem` model (the line items of an invoice, for
 example) that I want to calculate some stats on, with a `price`, and a
 `price_currency`. Some pointy haired boss wants to know what the totals on
 the line items are depending on currency. Pull out my nifty `annotate`
 tool:

 {{{
 In [33]: print
 LineItem.objects.values('price_currency').annotate(total=Sum('price'))
 [{'price_currency': u'USD', 'total': Decimal('9001')}]
 }}}

 So there `annotate` ends up putting things into the `GROUP BY` clause to
 do the sum (namely the previous `values` call)

 Sometimes I just want to rename columns to make my life easier, so I'll
 make a `pc` field as a shortcut to price:

 {{{
 In[37]: print
 LineItem.objects.values('price_currency').annotate(pc=F('price'))
 [{'price_currency': u'USD', 'pc': Decimal('0')}, {'price_currency':
 u'USD', 'pc': Decimal('34')}, ...etc...]
 }}}

 Here `annotate` adds a `SELECT` clause without touching the `GROUP BY`
 clause


 When I end up working on a bigger query, I'll try to simplify my queryset
 expression after I have something that computes what I wanted, so doing
 things like merging `filter`s and, sometimes `annotate`:

 {{{
 In[39]: print
 LineItem.objects.values('price_currency').annotate(total=Sum('price'),
 pc=F('price')).query

 SELECT "item_lineitem"."price_currency", SUM("item_lineitem"."price") AS
 "total" FROM "item_lineitem" GROUP BY "item_lineitem"."price_currency",
 "item_lineitem"."price" ORDER BY "item_lineitem"."position" ASC
 }}}

 So when I annotate with a mixture of "grouping required" and "adding a
 select" , then everything becomes "grouping required", and in the previous
 example, suddenly I'm grouping by currency AND price.  I think this is the
 only valid behaviour if you want to return something, but shouldn't this
 just fail?

 (Some crazier stuff is when you `annotate` with Sum and it simply groups
 by _all_ of your model's fields. I think the reason this doesn't fail is
 for the motivating example in the documentation (of counting the min price
 of the books in a store), but this seems like something that should fail)

 

 My understanding is that Django's ORM is coming more to terms with the
 fact that it's basically always going to be paired with a RDBMS, so this
 multi-meaning annotate being the only reasonable way to use grouping is a
 hard sell to me. I very much want to get rid of all of our custom SQL, but
 I have to spend a day working on one query, all because `annotate` ends up
 being 2 separate functions:

- "group by previous `values` call and add aggregation function to
 select"
- "add a select clause"

 Not to mention that the pairing with `values` means that if I want to
 group by a complex query, I have to do something like:

 {{{
 
Model.objects.values('foo').annotate(grouping_property=complex_thing).annotate(Sum('stuff')).annotate(grouping_property=complex_thing)
 }}}

 in order to actually have the `grouping_property` to show up in the select
 (`values` doesn't work on some of my `annotate`d values for some reason...
 there might be a bug that I haven't successfully isolated in my code or
 the ORM).


 Anyways, I'm complaining about this because I would much rather have
 errors show up for a lot of these cases, so that I can at least fix them,
 but in the current API that would be impossible...

 In my ideal world, you would have `group(by=columns_or_expressions,
 into=aggregation_things)` which would be somewhat equivalent to "group by
 the columns in `by` (something like `values`), and annotate your `select`
 with the `into` clause)", so when someone asks how to do a grouping you
 can point to that directly. And `annotate` would be relegated to only
 touching the `SELECT` part of a query. With backwards compatability
 considerations, you could probably make a separate method just for
 `SELECT`.

 I'm not sure of the implications with regards to joins and things like
 `annotate(Sum('books__price'))`, but my impression is that all this
 functionality is in the ORM already, just all within `annotate`

--
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because y