Re: Optimizing out unused annotations from count queries

2017-08-19 Thread Josh Smeaton
Thanks for making me elaborate, because I'm probably wrong. I'll step 
through what my thinking was though.

'.annotate(local=F('related__field'))' will join to the related table. If 
`related` is a nullable foreign key, then a join would do an implicit 
filter, removing the row from the result set if it had no relation, and 
reducing the count. 

But Django models a nullable foreign key join as a LEFT JOIN which wouldn't 
filter the row out and wouldn't reduce the count.

Multivalue relationships (reverse foreign key and m2m joins) would need 
some care though I think. I'm not sure if the same annotation above works 
on m2m joins or not.

On Sunday, 20 August 2017 09:41:43 UTC+10, Tom Forbes wrote:
>
> Thanks for your reply Josh. Can you elaborate on why optimizing out '
> .annotate(local=F('related__field'))' would not be safe?
>
> On 20 Aug 2017 00:10, "Josh Smeaton"  
> wrote:
>
> I'd like to see this provided all bases were covered. I'll just list below 
> the cases where I think it wouldn't be safe.
>
> - Filtered annotations
> - Annotations that join to anything other than a non-null foreign key: 
> .annotate(local=F('related__field'))
> - Annotations that have a GROUP BY on fields that are not just the PK or 
> entire field set (with .values())
>
> The group by one I'm unsure of, because I forget if a count is called with 
> an aggregate query as a subquery or if it's an error.
>
> I think an optimisation is available if:
>
> 1. No filters ref an annotation
> 2. values() has not been called
> 3. There are no joins at all
>
> Further optimisation scenarios may be available with stricter rules (such 
> as join types), but we could look at doing that separately if it was 
> difficult.
>
> On Sunday, 20 August 2017 03:11:13 UTC+10, Tom Forbes wrote:
>>
>> Hello,
>> I think there is potential room for improvement with how Django handles 
>> count() queries, specifically relating to annotations. Currently any 
>> annotations that are added to a queryset are included in the SQL statement 
>> that is generated in the count() query, including all joins and SQL 
>> function calls - I've personally been bitten by this with DRF and a query 
>> with a particularly complex set of annotations where the count query took 
>> longer than fetching the data itself, but was semantically equivalent to a 
>> much faster, plain Model.objects.count() call.
>>
>> I think in most cases annotations can be stripped, for example in the two 
>> queries below the annotation is not filtered on so it can be safely removed 
>> from the query without affecting the result:
>>
>> one = Book.objects.annotate(Count('chapters')).count()
>> two = Book.objects.count()
>>
>> I wanted to gather some feedback from the group as to whether this always 
>> holds true: if an annotation is not filtered (or ordered on) it can always 
>> be safely removed from a count query? I wouldn't be surprised if there is a 
>> case where this isn't safe, but I cannot think of one.
>>
>> I've made an initial merge request that removes annotations from 
>> querysets that have no filter or ordering that depends on annotations here: 
>> https://github.com/django/django/pull/8928. It seems like Django also 
>> has all the information to optimize out annotations that are not directly 
>> or indirectly used by filters, but I wanted to gather some feedback before 
>> I attempted this.
>>
>> Tom
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-develop...@googlegroups.com .
> To post to this group, send email to django-d...@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/506d0f2c-51e8-400f-a4fb-66c2b8597aeb%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fb0c0183-2a39-4b2e-84cd-86e76d233487%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimizing out unused annotations from count queries

2017-08-19 Thread Tom Forbes
Thanks for your reply Josh. Can you elaborate on why optimizing out '
.annotate(local=F('related__field'))' would not be safe?

On 20 Aug 2017 00:10, "Josh Smeaton"  wrote:

I'd like to see this provided all bases were covered. I'll just list below
the cases where I think it wouldn't be safe.

- Filtered annotations
- Annotations that join to anything other than a non-null foreign key:
.annotate(local=F('related__field'))
- Annotations that have a GROUP BY on fields that are not just the PK or
entire field set (with .values())

The group by one I'm unsure of, because I forget if a count is called with
an aggregate query as a subquery or if it's an error.

I think an optimisation is available if:

1. No filters ref an annotation
2. values() has not been called
3. There are no joins at all

Further optimisation scenarios may be available with stricter rules (such
as join types), but we could look at doing that separately if it was
difficult.

On Sunday, 20 August 2017 03:11:13 UTC+10, Tom Forbes wrote:
>
> Hello,
> I think there is potential room for improvement with how Django handles
> count() queries, specifically relating to annotations. Currently any
> annotations that are added to a queryset are included in the SQL statement
> that is generated in the count() query, including all joins and SQL
> function calls - I've personally been bitten by this with DRF and a query
> with a particularly complex set of annotations where the count query took
> longer than fetching the data itself, but was semantically equivalent to a
> much faster, plain Model.objects.count() call.
>
> I think in most cases annotations can be stripped, for example in the two
> queries below the annotation is not filtered on so it can be safely removed
> from the query without affecting the result:
>
> one = Book.objects.annotate(Count('chapters')).count()
> two = Book.objects.count()
>
> I wanted to gather some feedback from the group as to whether this always
> holds true: if an annotation is not filtered (or ordered on) it can always
> be safely removed from a count query? I wouldn't be surprised if there is a
> case where this isn't safe, but I cannot think of one.
>
> I've made an initial merge request that removes annotations from querysets
> that have no filter or ordering that depends on annotations here:
> https://github.com/django/django/pull/8928. It seems like Django also has
> all the information to optimize out annotations that are not directly or
> indirectly used by filters, but I wanted to gather some feedback before I
> attempted this.
>
> Tom
>
-- 
You received this message because you are subscribed to the Google Groups
"Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/
msgid/django-developers/506d0f2c-51e8-400f-a4fb-66c2b8597aeb%40googlegroups.
com

.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFNZOJP%3DwknS2hDLT2XE4qxKn6MXrMNK2C4AHMLpY7T0ywi%3D9A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Optimizing out unused annotations from count queries

2017-08-19 Thread Josh Smeaton
I'd like to see this provided all bases were covered. I'll just list below 
the cases where I think it wouldn't be safe.

- Filtered annotations
- Annotations that join to anything other than a non-null foreign key: 
.annotate(local=F('related__field'))
- Annotations that have a GROUP BY on fields that are not just the PK or 
entire field set (with .values())

The group by one I'm unsure of, because I forget if a count is called with 
an aggregate query as a subquery or if it's an error.

I think an optimisation is available if:

1. No filters ref an annotation
2. values() has not been called
3. There are no joins at all

Further optimisation scenarios may be available with stricter rules (such 
as join types), but we could look at doing that separately if it was 
difficult.

On Sunday, 20 August 2017 03:11:13 UTC+10, Tom Forbes wrote:
>
> Hello,
> I think there is potential room for improvement with how Django handles 
> count() queries, specifically relating to annotations. Currently any 
> annotations that are added to a queryset are included in the SQL statement 
> that is generated in the count() query, including all joins and SQL 
> function calls - I've personally been bitten by this with DRF and a query 
> with a particularly complex set of annotations where the count query took 
> longer than fetching the data itself, but was semantically equivalent to a 
> much faster, plain Model.objects.count() call.
>
> I think in most cases annotations can be stripped, for example in the two 
> queries below the annotation is not filtered on so it can be safely removed 
> from the query without affecting the result:
>
> one = Book.objects.annotate(Count('chapters')).count()
> two = Book.objects.count()
>
> I wanted to gather some feedback from the group as to whether this always 
> holds true: if an annotation is not filtered (or ordered on) it can always 
> be safely removed from a count query? I wouldn't be surprised if there is a 
> case where this isn't safe, but I cannot think of one.
>
> I've made an initial merge request that removes annotations from querysets 
> that have no filter or ordering that depends on annotations here: 
> https://github.com/django/django/pull/8928. It seems like Django also has 
> all the information to optimize out annotations that are not directly or 
> indirectly used by filters, but I wanted to gather some feedback before I 
> attempted this.
>
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/506d0f2c-51e8-400f-a4fb-66c2b8597aeb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Optimizing out unused annotations from count queries

2017-08-19 Thread Tom Forbes
Hello,
I think there is potential room for improvement with how Django handles
count() queries, specifically relating to annotations. Currently any
annotations that are added to a queryset are included in the SQL statement
that is generated in the count() query, including all joins and SQL
function calls - I've personally been bitten by this with DRF and a query
with a particularly complex set of annotations where the count query took
longer than fetching the data itself, but was semantically equivalent to a
much faster, plain Model.objects.count() call.

I think in most cases annotations can be stripped, for example in the two
queries below the annotation is not filtered on so it can be safely removed
from the query without affecting the result:

one = Book.objects.annotate(Count('chapters')).count()
two = Book.objects.count()

I wanted to gather some feedback from the group as to whether this always
holds true: if an annotation is not filtered (or ordered on) it can always
be safely removed from a count query? I wouldn't be surprised if there is a
case where this isn't safe, but I cannot think of one.

I've made an initial merge request that removes annotations from querysets
that have no filter or ordering that depends on annotations here:
https://github.com/django/django/pull/8928. It seems like Django also has
all the information to optimize out annotations that are not directly or
indirectly used by filters, but I wanted to gather some feedback before I
attempted this.

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFNZOJM27OnuBSKWY8%3DAUkPa2vv75ASmL9yuohkH4heN8QUz2A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-19 Thread Luke Plant
This could work something like the way that ForeignKey `on_delete` works 
- you have options that are enumerated as constants, but in reality they 
are classes that embody the strategy to be used. We could have something 
similar - `on_missing_relation=FETCH | WARN | ERROR | IGNORE ... `


Luke

On 18/08/17 14:01, Shai Berger wrote:

On Friday 18 August 2017 09:08:11 Anssi Kääriäinen wrote:

Maybe we should just add the queryset method. This is the smallest atomic
task that can be done. Even if there's only the queryset method available,
it's possible to enable prefetches per model by using a Manager.


I disagree on both counts: I don't think it's the smallest atomic task, and
I'm not so sure it's the right thing to do.

The smallest atomic task, the way I see it, is building the infrastructure
that would allow adding the queryset method -- but would also allow different
APIs to be set around it.

And since there is as yet no consensus on the correct API for "end" users, I
would rather not define one immediately.

Shai.



--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/ba0ddd74-5240-757f-e6ea-3300f2ba3a61%40cantab.net.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-19 Thread Todor Velichkov
+1 to just add the queryset method it gives a fine granular level of 
control. Moreover when not used we could display warnings which can help 
people detect these O(n) query cases as Tom Forbes initially suggested.

On Friday, August 18, 2017 at 9:08:11 AM UTC+3, Anssi Kääriäinen wrote:
>
> On Thursday, August 17, 2017 at 11:30:45 PM UTC+3, Aymeric Augustin wrote:
>>
>> Hello, 
>>
>> > On 17 Aug 2017, at 14:48, Luke Plant  wrote: 
>> > 
>> > On 16/08/17 23:17, Aymeric Augustin wrote: 
>> >> It should kick in only when no select_related or prefetch_related is 
>> in effect, to avoid interfering with pre-existing optimizations. It's still 
>> easy to construct an example where it would degrade performance but I don't 
>> think such situations will be common in practice. 
>> > 
>> > I think Ansii's example is a non-contrived and potentially very common 
>> one where we will make things worse if this is default behaviour, 
>> especially for the non-expert programmer this behaviour is supposed to 
>> help: 
>> > 
>> > 
>> > if qs: 
>> > qs[0].do_something() 
>> > 
>> > (where do_something() access relations, or multiple relations). We will 
>> end up with *multiple* large unnecessary queries instead of just one. The 
>> difference is that the first one is easy to explain, the remainder are much 
>> harder, and will contribute even more to the "Django is slow/ORMs are bad" 
>> feeling. 
>>
>>
>> I wasn't convinced by this example because it's already sub-optimal 
>> currently. `if qs` will fetch the whole queryset just to check if it has at 
>> least one element. 
>>
>> Assuming there's 1000 elements in qs, if you don't care about a 1000x 
>> inefficiency, I don't expect you to care much about two or three 1000x 
>> inefficiencies...
>>
>
> I agree that this example isn't particularly worrying. It's something an 
> experienced developer wouldn't do. On the other hand, we are aiming at 
> making things simpler for non-experienced developers.
>
> To me the worrying part here is that we really don't have any data or 
> experience about if the cure will be worse than the disease. Likely not, 
> but what do we gain by taking risks here?
>
> Maybe we should just add the queryset method. This is the smallest atomic 
> task that can be done. Even if there's only the queryset method available, 
> it's possible to enable prefetches per model by using a Manager.
>
>  - Anssi
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/d956a26d-adcf-443f-80c8-7b7ba290d2b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.