Re: Automatic prefetching in querysets

2020-03-25 Thread Gordon Wrigley
My existing code for this is now available as a pypi package
https://github.com/tolomea/django-auto-prefetch 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/aaf633e3-a099-4cb4-aad7-10cfe6b11596%40googlegroups.com.


Re: Sealing or locking QuerySets -- a suggestion to prevent surprise N+1 queries.

2018-01-08 Thread Gordon Wrigley
Regarding auto prefetch, after the discussion here I created a 
ticket https://code.djangoproject.com/ticket/28586
It didn't get much attention presumably because it was around Django 2 
release time.
I have a todo note to add tests and doco and generally get it to a mergable 
state in order to push the conversation along.
Also the pull as it stands has a little bug I have to upload the fix for.

On Friday, January 5, 2018 at 2:41:49 AM UTC, Josh Smeaton wrote:
>
> I wasn't aware of this new feature Shai, thanks for pointing it out!
>
> For this particular case I'd prefer locking to be bound to a particular 
> queryset rather than the database as a whole. I would also expect it to 
> fail loudly when accessing a non-fetched related object (book.author), 
> which can be a common cause of pain.
>
> I'm also still very interested in auto-prefetch Adam, is there any help I 
> can lend?
>
> On Thursday, 4 January 2018 17:32:19 UTC+11, Shai Berger wrote:
>>
>> Hi all, 
>>
>> Django 2.0 has a new feature[1] which allows you to say "I expect no 
>> actual 
>> database queries in this piece of code". It is not designed to stop a 
>> specific 
>> queryset from spawning requests, so getting it to do exactly what's asked 
>> for 
>> in this thread may be a little involved, but if your goal is to prevent 
>> surprise queries, I think it is actually better than sealing a single 
>> queryset. 
>>
>> HTH, 
>> Shai 
>>
>> [1] https://docs.djangoproject.com/en/2.0/topics/db/instrumentation/ 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fe4c44aa-bd01-4196-a04a-b8daefe49f50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-09-12 Thread Gordon Wrigley
This received some positive responses, so to help move the conversation
along I have created a ticket and pull request.

https://code.djangoproject.com/ticket/28586
https://github.com/django/django/pull/9064

Regards G

On Tue, Aug 15, 2017 at 10:44 AM, Gordon Wrigley 
wrote:

> I'd like to discuss automatic prefetching in querysets. Specifically
> automatically doing prefetch_related where needed without the user having
> to request it.
>
> For context consider these three snippets using the Question & Choice
> models from the tutorial
> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
> when
> there are 100 questions each with 5 choices for a total of 500 choices.
>
> Default
> for choice in Choice.objects.all():
> print(choice.question.question_text, ':', choice.choice_text)
> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>
> Prefetch_related
> for choice in Choice.objects.prefetch_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>
> Select_related
> for choice in Choice.objects.select_related('question'):
> print(choice.question.question_text, ':', choice.choice_text)
> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>
> I've included select_related for completeness, I'm not going to propose
> changing anything about it's use. There are places where it is the best
> choice and in those places it will still be up to the user to request it. I
> will note that anywhere select_related is optimal prefetch_related is still
> better than the default and leave it at that.
>
> The 'Default' example above is a classic example of the N+1 query problem,
> a problem that is widespread in Django apps.
> This pattern of queries is what new users produce because they don't know
> enough about the database and / or ORM to do otherwise.
> Experieced users will also often produce this because it's not always
> obvious what fields will and won't be used and subsequently what should be
> prefetched.
> Additionally that list will change over time. A small change to a template
> to display an extra field can result in a denial of service on your DB due
> to a missing prefetch.
> Identifying missing prefetches is fiddly, time consuming and error prone.
> Tools like django-perf-rec <https://github.com/YPlan/django-perf-rec>
> (which I was involved in creating) and nplusone
> <https://github.com/jmcarp/nplusone> exist in part to flag missing
> prefetches introduced by changed code.
> Finally libraries like Django Rest Framework and the Admin will also
> produce queries like this because it's very difficult for them to know what
> needs prefetching without being explicitly told by an experienced user.
>
> As hinted at the top I'd like to propose changing Django so the default
> code behaves like the prefetch_related code.
> Longer term I think this should be the default behaviour but obviously it
> needs to be proved first so for now I'd suggest a new queryset function
> that enables this behaviour.
>
> I have a proof of concept of this mechanism that I've used successfully in
> production. I'm not posting it yet because I'd like to focus on desired
> behavior rather than implementation details. But in summary, what it does
> is when accessing a missing field on a model, rather than fetching it just
> for that instance, it runs a prefetch_related query to fetch it for all
> peer instances that were fetched in the same queryset. So in the example
> above it prefetches all Questions in one query.
>
> This might seem like a risky thing to do but I'd argue that it really
> isn't.
> The only time this isn't superior to the default case is when you are post
> filtering the queryset results in Python.
> Even in that case it's only inferior if you started with a large number of
> results, filtered basically all of them and the code is structured so that
> the filtered ones aren't garbage collected.
> To cover this rare case the automatic prefetching can easily be disabled
> on a per queryset or per object basis. Leaving us with a rare downside that
> can easily be manually resolved in exchange for a significant general
> improvement.
>
> In practice this thing is almost magical to work with. Unless you already
> have extensive and tightly maintained prefetches everywhere you get an
> immediate boost to virtually everything that touches the database, often
> knocking orders of magnitude off page load times.
>
> If an agreement ca

Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley
I'm not advocating either way on this, but it's probably worth noting that
the context manager proposal is compatible with a queryset level optin/out
and an object level optout.

So you could for example have prefetch_related(auto=None) which can be
explicitly set to True / False and takes it's value from the context
manager when it's None.


On Thu, Aug 17, 2017 at 12:42 AM, Josh Smeaton 
wrote:

> I think there's a lot right with your suggestions here Shai.
>
> It delivers better default behaviour for new projects, does not affect
> existing deployments, and seems pretty easy to enable/disable selectively
> at any level of the stack.
>
> My only concern would be libraries leaning on this behaviour by enabling
> it locally and users being unable to change it. That's only a small concern
> though, and wouldn't prevent me from recommending the proposal.
>
>
> On Thursday, 17 August 2017 09:34:03 UTC+10, Shai Berger wrote:
>>
>> First of all, I think making the auto-prefetch feature available in some
>> form
>> is a great idea. As an opt-in, I would have made use of it on several
>> occasions, and cut days of optimization work to minutes. It's true that
>> this
>> wouldn't achieve the best possible optimization in many cases, but it
>> would be
>> close enough, for a small fraction of the effort.
>>
>> But of the choices that have been suggested here, I find none quite
>> satisfactory. I agree, basically, with almost all of the objections
>> raised on
>> all sides. And I think I have a better suggestion.
>>
>> Rather than a setting (global and essentially constant) or all-local
>> opt-ins,
>> suppose the feature was controlled -- for all querysets -- by a context
>> manager.
>>
>> So, rather than
>>
>> > "qs.prefetch_related(auto=True)", or "with qs.auto_prefetch():",
>>
>> suppose we had
>>
>> with QuerySet.auto_prefetch(active=True):
>>
>> and under that, all querysets would auto-prefetch. Then:
>>
>> - We could put that in a middleware; using that middleware would then,
>> effectively, be a global opt in. We could add the middleware -- perhaps
>> commented-out -- to the new project template, for beginner's sake. Call
>> it
>> PerformanceTrainingWheelsMiddleware, to make people aware that it's
>> something
>> they should grow out of, but make it easily accessible.
>>
>> - We could easily build a decorator for views
>>
>> - We could have local opt-out, at the correct level -- not the objects,
>> but
>> the control flow.
>>
>> For this to work, the flag controlling the feature would need to be
>> thread-
>> local and managed as a stack. But these aren't problems.
>>
>> If (as likely) this suggestion still does not generate a concensus, I
>> would
>> prefer that we follow the path Anssi suggested --
>>
>> > Note that it should be possible to implement this fully as 3rd party
>> > project if we add an easy way to customise how related object fetching
>> is
>> > done. I'm not sure if we can add an easy way for that.
>>
>> Except I don't think "easy" is a requirement here. If we can add a
>> sensible
>> way, that would be enough -- I don't expect many Django users to develop
>> implementations of the feature, only to use them.
>>
>> My 2 cents,
>> Shai.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/django-developers/EplZGj-ejvg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/bf3e22af-9a41-4106-b986-
> fc136d54c4c6%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD-wiX3RkYYo9twG%2BMn4OL%3DBZvHMO1_6_LMvn%2BQrpM7J0FkiKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley
Regarding 2, it does work for reverse one-to-one relations.

On Wed, Aug 16, 2017 at 9:17 PM, Aymeric Augustin <
aymeric.augus...@polytechnique.org> wrote:

> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
>
> I'd like to discuss automatic prefetching in querysets. Specifically
> automatically doing prefetch_related where needed without the user having
> to request it.
>
>
>
> Hello,
>
> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in
> the admin or elsewhere gets old.
>
>
> In addition to everything that was said already, I'd like to point out
> that Django already has a very similar "magic auto prefetching" behavior in
> some cases :-)
>
> I'm referring to the admin which calls select_related() on non-nullable
> foreign keys in the changelist view. The "non-nullable" condition makes
> that behavior hard to predict — I'd go as far as to call it non
> deterministic. For details, see slide 54 of https://myks.org/data/
> 20161103-Django_Under_the_Hood-Debugging_Performance.pdf and the audio
> commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>
>
> The feature proposed here is most useful if it's opt-out because it
> targets people who aren't aware that the problem even exists — at best they
> notice that Django is slow and that reminds them vaguely of a rant that
> explains why ORMs are the worst thing since object oriented programming.
>
> It should kick in only when no select_related or prefetch_related is in
> effect, to avoid interfering with pre-existing optimizations. It's still
> easy to construct an example where it would degrade performance but I don't
> think such situations will be common in practice. Still, there should be a
> per-queryset opt-out for these cases.
>
> We may want to introduce it with a deprecation path, that is, make it
> opt-in at first and log a deprecation warning where the behavior would
> kick-in, so developers who want to disable it can add the per-queryset
> opt-out.
>
>
> At this point, my main concerns are:
>
> 1) The difficulty of identifying where the queryset originates, given that
> querysets are lazy. Passing objects around is common; sometimes it can be
> hard to figure out where an object comes from. It isn't visible in the
> stack trace. In my opinion this is the strongest argument against the
> feature.
>
> 2) The lack of this feature for reverse one-to-one relations; it's only
> implemented for foreign keys. It's hard to tell them apart in Python code.
> The subtle differences, like return None vs. raise ObjectDoesNotExist when
> there's no related object, degrade the developer experience.
>
> 3) The strong opinions expressed against the feature. I'm not sure that
> consensus is within reach. If we can't agree that this is an adequate
> amount of magic, we're likely to stick with the status quo. I'd rather not
> have this question decided by a vote of the technical board.
>
>
> In the grand scheme of things, going from "prefetching a related instance
> for an object" to "prefetching related instances for all objects in the
> queryset" isn't that much of a stretch... But I admit it's rather scary to
> make this change for all existing Django projects!
>
>
> Best regards,
>
> --
> Aymeric.
>
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/django-developers/EplZGj-ejvg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/2D9DCC0F-EAB0-4512-8AEC-
> 08A694DF9074%40polytechnique.org
> <https://groups.google.com/d/msgid/django-developers/2D9DCC0F-EAB0-4512-8AEC-08A694DF9074%40polytechnique.org?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD-wiX0ScioW28wNYmY76Ma03kbSN9dWjRvKWs60%2B0f1VwvkFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-16 Thread Gordon Wrigley
 going
>> to know about a global setting and can opt in to the old behaviour rather
>> easily. Newer users that do not know about select/prefetch_related or these
>> settings will fall into the new behaviour by default.
>>
>> It's unreasonable to expect every user of django learn the ins and outs
>> of all queryset methods. I'm probably considered a django orm expert, and I
>> still sometimes write queries that are non-optimal or *become* non-optimal
>> after changes in unrelated areas. At an absolute minimum we should be
>> screaming and shouting when this happens. But we can also fix the issue
>> while complaining, and help guide users into correct behaviour.
>>
>>
>> On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
>>>
>>> Automatically prefetching is something I feel should be avoided.
>>>
>>> A common gripe I have with ORMs is they hide what's actually happening
>>> with the database, resulting in beginners-going-on-intermediates
>>> building libraries/systems that don't scale well.
>>>
>>> We have several views in a dashboard, where a relation may be accessed
>>> once or twice while iterating over a large python filtered queryset.
>>> Prefetching this relation based on the original queryset has the
>>> potential to add around 5 seconds to the response time (probably more, that
>>> table has doubled in size since I last measured it).
>>>
>>> I feel it would be better to optimise for your usecase, as apposed to
>>> try to prevent uncalled-for behaviour.
>>>
>>>
>>>
>>> On Aug 15, 2017 23:15, "Luke Plant"  wrote:
>>>
>>>> I agree with Marc here that the proposed optimizations are 'magical'. I
>>>> think when it comes to optimizations like these you simply cannot know in
>>>> advance whether doing extra queries is going to a be an optimization or a
>>>> pessimization. If I can come up with a single example where it would
>>>> significantly decrease performance (either memory usage or speed) compared
>>>> to the default (and I'm sure I can), then I would be strongly opposed to it
>>>> ever being default behaviour.
>>>>
>>>> Concerning implementing it as an additional  QuerySet method like
>>>> `auto_prefetch()` - I'm not sure what I think, I feel like it could get
>>>> icky (i.e. increase our technical debt), due to the way it couples things
>>>> together. I can't imagine ever wanting to use it, though, I would always
>>>> prefer the manual option.
>>>>
>>>> Luke
>>>>
>>>>
>>>>
>>>> On 15/08/17 21:02, Marc Tamlyn wrote:
>>>>
>>>> Hi Gordon,
>>>>
>>>> Thanks for the suggestion.
>>>>
>>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>>> possible prefetches be identified? What happens when an initial loop in a
>>>> view requires one prefetch, but a subsequent loop in a template requires
>>>> some other prefetch? What about nested loops resulting in nested
>>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>>> multiple ways. Personally, I would argue that correctly setting up and
>>>> maintaining appropriate prefetches and selects is a necessary part of
>>>> working with an ORM.
>>>>
>>>> Do you know of any other ORMs which attempt similar magical
>>>> optimisations? How do they go about identifying the cases where it is
>>>> necessary?
>>>>
>>>> On 15 August 2017 at 10:44, Gordon Wrigley 
>>>> wrote:
>>>>
>>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>>> automatically doing prefetch_related where needed without the user having
>>>>> to request it.
>>>>>
>>>>> For context consider these three snippets using the Question & Choice
>>>>> models from the tutorial
>>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models>
>>>>>  when
>>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>>
>>>>> Default
>>>>> for choice in Choice.objects.all():
>>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>>> 501 db queries, fetches 500

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley
The warnings you propose would certainly be an improvement on the status
quo.
However for that to be a complete solution Django would also need to detect
places where there are redundant prefetch_relateds.

Additionally tools like the Admin and DRF would need to provide adequate
hooks for inserting these calls.
For example ModelAdmin.get_queryset is not really granular enough as it's
used by both the list and detail views which might touch quite different
sets of fields. (Although in practice what you generally do is optimize the
list view as that's the one that tends to explode)

That aside I sincerely believe that the proposed approach is superior to
the current default behavior in the majority of cases and further more
doesn't fail as badly as the current behavior when it's not appropriate. I
expect that if implemented as an option then in time that belief would
prove itself.

On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes  wrote:

> Exploding query counts are definitely a pain point in Django, anything to
> improve that is definitely worth considering. They have been a problem in
> all Django projects I have seen.
>
> However I think the correct solution is for developers to correctly add
> select/prefetch calls. There is no general solution for automatically
> applying them that works for enough cases, and i think adding such a method
> to querysets would be used incorrectly and too often.
>
> Perhaps a better solution would be for Django to detect these O(n) query
> cases and display intelligent warnings, with suggestions as to the correct
> select/prefetch calls to add. When debug mode is enabled we could detect
> repeated foreign key referencing from the same source.
>
> On 15 Aug 2017 19:44, "Gordon Wrigley"  wrote:
>
> Sorry maybe I wasn't clear enough about the proposed mechanism.
>
> Currently when you dereference a foreign key field on an object (so
> 'choice.question' in the examples above) if it doesn't have the value
> cached from an earlier access, prefetch_related or select_related then
> Django will automatically perform a db query to fetch it. After that the
> value will then be cached on the object for any future dereferences.
>
> This automatic fetching is the source the N+1 query problems and in my
> experience most gross performance problems in Django apps.
>
> The proposal essentially is to add a new queryset function that says for
> the group of objects fetched by this queryset, whenever one of these
> automatic foreign key queries happens on one of them instead of fetching
> the foreign key for just that one use the prefetch mechanism to fetch it
> for all of them.
> The presumption being that the vast majority of the time when you access a
> field on one object from a queryset result, probably you are going to
> access the same field on many of the others as well.
>
> The implementation I've used in production does nest across foreign keys
> so something (admittedly contrived) like:
> for choice in Choice.objects.all():
> print(choice.question.author)
> Will produce 3 queries, one for all choices, one for the questions of
> those choices and one for the authors of those questions.
>
> It's worth noting that because these are foreign keys in their "to one"
> direction each of those queryset results will be at most the same size (in
> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>
> I do not propose touching reverse foreign key or many2many fields as the
> generated queries could request substantially more rows from the DB than
> the original query and it's not at all clear how this mechanism would
> sanely interact with filtering etc. So this is purely about the forward
> direction of foreign keys.
>
> I hope that clarifies my thinking some.
>
> Regards
> G
>
> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
> wrote:
>
>> Hi Gordon,
>>
>> Thanks for the suggestion.
>>
>> I'm not a fan of adding a layer that tries to be this clever. How would
>> possible prefetches be identified? What happens when an initial loop in a
>> view requires one prefetch, but a subsequent loop in a template requires
>> some other prefetch? What about nested loops resulting in nested
>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>> multiple ways. Personally, I would argue that correctly setting up and
>> maintaining appropriate prefetches and selects is a necessary part of
>> working with an ORM.
>>
>> Do you know of any other ORMs which attempt similar magical
>> optimisations? How do they go about identifying the cases where it is
>> necessary?
>>
>> On 15 August 2017 at 10

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley
I didn't answer your questions directly. Sorry for the quoting but it's the
easiest way to deal with a half dozen questions.

> How would possible prefetches be identified?

Wherever we currently automatically fetch a foreign key value.

> What happens when an initial loop in a view requires one prefetch, but a
subsequent loop in a template requires some other prefetch?

They each do whatever prefetch they need, just as a human optimizing this
would add two prefetch clauses.

> What about nested loops resulting in nested prefetches?

Nested loops only really come up when you are dealing with RelatedManagers
which are outside the scope of this. Or did you have some other nested loop
case in mind?

>  I would argue that correctly setting up and maintaining appropriate
prefetches and selects is a necessary part of working with an ORM.

Having been lead engineer on a code base of ~100,000 lines with over 100
calls to prefetch_related and a lot of tests specifically for finding
missing ones I'd argue it's one of the worst aspects of working with
Djangos ORM at non trivial scale.

> Do you know of any other ORMs which attempt similar magical optimisations?


I don't, but unlike Django where I have years of experience I have next to
no experience with other ORM's.

Regards G

On Tue, Aug 15, 2017 at 7:44 PM, Gordon Wrigley 
wrote:

> Sorry maybe I wasn't clear enough about the proposed mechanism.
>
> Currently when you dereference a foreign key field on an object (so
> 'choice.question' in the examples above) if it doesn't have the value
> cached from an earlier access, prefetch_related or select_related then
> Django will automatically perform a db query to fetch it. After that the
> value will then be cached on the object for any future dereferences.
>
> This automatic fetching is the source the N+1 query problems and in my
> experience most gross performance problems in Django apps.
>
> The proposal essentially is to add a new queryset function that says for
> the group of objects fetched by this queryset, whenever one of these
> automatic foreign key queries happens on one of them instead of fetching
> the foreign key for just that one use the prefetch mechanism to fetch it
> for all of them.
> The presumption being that the vast majority of the time when you access a
> field on one object from a queryset result, probably you are going to
> access the same field on many of the others as well.
>
> The implementation I've used in production does nest across foreign keys
> so something (admittedly contrived) like:
> for choice in Choice.objects.all():
> print(choice.question.author)
> Will produce 3 queries, one for all choices, one for the questions of
> those choices and one for the authors of those questions.
>
> It's worth noting that because these are foreign keys in their "to one"
> direction each of those queryset results will be at most the same size (in
> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>
> I do not propose touching reverse foreign key or many2many fields as the
> generated queries could request substantially more rows from the DB than
> the original query and it's not at all clear how this mechanism would
> sanely interact with filtering etc. So this is purely about the forward
> direction of foreign keys.
>
> I hope that clarifies my thinking some.
>
> Regards
> G
>
> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
> wrote:
>
>> Hi Gordon,
>>
>> Thanks for the suggestion.
>>
>> I'm not a fan of adding a layer that tries to be this clever. How would
>> possible prefetches be identified? What happens when an initial loop in a
>> view requires one prefetch, but a subsequent loop in a template requires
>> some other prefetch? What about nested loops resulting in nested
>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>> multiple ways. Personally, I would argue that correctly setting up and
>> maintaining appropriate prefetches and selects is a necessary part of
>> working with an ORM.
>>
>> Do you know of any other ORMs which attempt similar magical
>> optimisations? How do they go about identifying the cases where it is
>> necessary?
>>
>> On 15 August 2017 at 10:44, Gordon Wrigley 
>> wrote:
>>
>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>> automatically doing prefetch_related where needed without the user having
>>> to request it.
>>>
>>> For context consider these three snippets using the Question & Choice
>>> models from the tutorial
>>> <https://docs.djangoproject.com/en/1.11/i

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley
In my current version each object keeps a reference to a WeakSet of the
results of the queryset it came from.
This is populated in _fetch_all and if it is populated then
ForwardManyToOneDescriptor does a prefetch across all the objects in the
WeakSet instead of it's regular fetching.

On Tue, Aug 15, 2017 at 8:03 PM, Collin Anderson 
wrote:

> Hi Gordon,
>
> How is it implemented? Does each object keep a reference to the queryset
> it came from?
>
> Collin
>
> On Tue, Aug 15, 2017 at 2:44 PM, Gordon Wrigley 
> wrote:
>
>> Sorry maybe I wasn't clear enough about the proposed mechanism.
>>
>> Currently when you dereference a foreign key field on an object (so
>> 'choice.question' in the examples above) if it doesn't have the value
>> cached from an earlier access, prefetch_related or select_related then
>> Django will automatically perform a db query to fetch it. After that the
>> value will then be cached on the object for any future dereferences.
>>
>> This automatic fetching is the source the N+1 query problems and in my
>> experience most gross performance problems in Django apps.
>>
>> The proposal essentially is to add a new queryset function that says for
>> the group of objects fetched by this queryset, whenever one of these
>> automatic foreign key queries happens on one of them instead of fetching
>> the foreign key for just that one use the prefetch mechanism to fetch it
>> for all of them.
>> The presumption being that the vast majority of the time when you access
>> a field on one object from a queryset result, probably you are going to
>> access the same field on many of the others as well.
>>
>> The implementation I've used in production does nest across foreign keys
>> so something (admittedly contrived) like:
>> for choice in Choice.objects.all():
>> print(choice.question.author)
>> Will produce 3 queries, one for all choices, one for the questions of
>> those choices and one for the authors of those questions.
>>
>> It's worth noting that because these are foreign keys in their "to one"
>> direction each of those queryset results will be at most the same size (in
>> rows) as the proceeding one and often (due to nulls and duplicates) smaller.
>>
>> I do not propose touching reverse foreign key or many2many fields as the
>> generated queries could request substantially more rows from the DB than
>> the original query and it's not at all clear how this mechanism would
>> sanely interact with filtering etc. So this is purely about the forward
>> direction of foreign keys.
>>
>> I hope that clarifies my thinking some.
>>
>> Regards
>> G
>>
>> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn 
>> wrote:
>>
>>> Hi Gordon,
>>>
>>> Thanks for the suggestion.
>>>
>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>> possible prefetches be identified? What happens when an initial loop in a
>>> view requires one prefetch, but a subsequent loop in a template requires
>>> some other prefetch? What about nested loops resulting in nested
>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>> multiple ways. Personally, I would argue that correctly setting up and
>>> maintaining appropriate prefetches and selects is a necessary part of
>>> working with an ORM.
>>>
>>> Do you know of any other ORMs which attempt similar magical
>>> optimisations? How do they go about identifying the cases where it is
>>> necessary?
>>>
>>> On 15 August 2017 at 10:44, Gordon Wrigley 
>>> wrote:
>>>
>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>> automatically doing prefetch_related where needed without the user having
>>>> to request it.
>>>>
>>>> For context consider these three snippets using the Question & Choice
>>>> models from the tutorial
>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>>> when
>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>
>>>> Default
>>>> for choice in Choice.objects.all():
>>>> print(choice.question.question_text, ':', choice.choice_text)
>>>> 501 db queries, fetches 500 choice rows and 500 question rows from the
>>>> DB
>>>>
>>>> Prefetch_related
>>>> for choice in Choice.objects.

Re: Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley
Sorry maybe I wasn't clear enough about the proposed mechanism.

Currently when you dereference a foreign key field on an object (so
'choice.question' in the examples above) if it doesn't have the value
cached from an earlier access, prefetch_related or select_related then
Django will automatically perform a db query to fetch it. After that the
value will then be cached on the object for any future dereferences.

This automatic fetching is the source the N+1 query problems and in my
experience most gross performance problems in Django apps.

The proposal essentially is to add a new queryset function that says for
the group of objects fetched by this queryset, whenever one of these
automatic foreign key queries happens on one of them instead of fetching
the foreign key for just that one use the prefetch mechanism to fetch it
for all of them.
The presumption being that the vast majority of the time when you access a
field on one object from a queryset result, probably you are going to
access the same field on many of the others as well.

The implementation I've used in production does nest across foreign keys so
something (admittedly contrived) like:
for choice in Choice.objects.all():
print(choice.question.author)
Will produce 3 queries, one for all choices, one for the questions of those
choices and one for the authors of those questions.

It's worth noting that because these are foreign keys in their "to one"
direction each of those queryset results will be at most the same size (in
rows) as the proceeding one and often (due to nulls and duplicates) smaller.

I do not propose touching reverse foreign key or many2many fields as the
generated queries could request substantially more rows from the DB than
the original query and it's not at all clear how this mechanism would
sanely interact with filtering etc. So this is purely about the forward
direction of foreign keys.

I hope that clarifies my thinking some.

Regards
G

On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn  wrote:

> Hi Gordon,
>
> Thanks for the suggestion.
>
> I'm not a fan of adding a layer that tries to be this clever. How would
> possible prefetches be identified? What happens when an initial loop in a
> view requires one prefetch, but a subsequent loop in a template requires
> some other prefetch? What about nested loops resulting in nested
> prefetches? Code like this is almost guaranteed to break unexpectedly in
> multiple ways. Personally, I would argue that correctly setting up and
> maintaining appropriate prefetches and selects is a necessary part of
> working with an ORM.
>
> Do you know of any other ORMs which attempt similar magical optimisations?
> How do they go about identifying the cases where it is necessary?
>
> On 15 August 2017 at 10:44, Gordon Wrigley 
> wrote:
>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>> For context consider these three snippets using the Question & Choice
>> models from the tutorial
>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>> when
>> there are 100 questions each with 5 choices for a total of 500 choices.
>>
>> Default
>> for choice in Choice.objects.all():
>> print(choice.question.question_text, ':', choice.choice_text)
>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>
>> Prefetch_related
>> for choice in Choice.objects.prefetch_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>
>> Select_related
>> for choice in Choice.objects.select_related('question'):
>> print(choice.question.question_text, ':', choice.choice_text)
>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>
>> I've included select_related for completeness, I'm not going to propose
>> changing anything about it's use. There are places where it is the best
>> choice and in those places it will still be up to the user to request it. I
>> will note that anywhere select_related is optimal prefetch_related is still
>> better than the default and leave it at that.
>>
>> The 'Default' example above is a classic example of the N+1 query
>> problem, a problem that is widespread in Django apps.
>> This pattern of queries is what new users produce because they don't know
>> enough about the database and / or ORM to do otherwise.
>> Experieced users will also often produce this because it's no

Automatic prefetching in querysets

2017-08-15 Thread Gordon Wrigley
I'd like to discuss automatic prefetching in querysets. Specifically 
automatically doing prefetch_related where needed without the user having 
to request it.

For context consider these three snippets using the Question & Choice 
models from the tutorial 
 when 
there are 100 questions each with 5 choices for a total of 500 choices.

Default
for choice in Choice.objects.all():
print(choice.question.question_text, ':', choice.choice_text)
501 db queries, fetches 500 choice rows and 500 question rows from the DB

Prefetch_related
for choice in Choice.objects.prefetch_related('question'):
print(choice.question.question_text, ':', choice.choice_text)
2 db queries, fetches 500 choice rows and 100 question rows from the DB

Select_related
for choice in Choice.objects.select_related('question'):
print(choice.question.question_text, ':', choice.choice_text)
1 db query, fetches 500 choice rows and 500 question rows from the DB

I've included select_related for completeness, I'm not going to propose 
changing anything about it's use. There are places where it is the best 
choice and in those places it will still be up to the user to request it. I 
will note that anywhere select_related is optimal prefetch_related is still 
better than the default and leave it at that.

The 'Default' example above is a classic example of the N+1 query problem, 
a problem that is widespread in Django apps.
This pattern of queries is what new users produce because they don't know 
enough about the database and / or ORM to do otherwise.
Experieced users will also often produce this because it's not always 
obvious what fields will and won't be used and subsequently what should be 
prefetched.
Additionally that list will change over time. A small change to a template 
to display an extra field can result in a denial of service on your DB due 
to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming and error prone. 
Tools like django-perf-rec  
(which I was involved in creating) and nplusone 
 exist in part to flag missing 
prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin will also 
produce queries like this because it's very difficult for them to know what 
needs prefetching without being explicitly told by an experienced user.

As hinted at the top I'd like to propose changing Django so the default 
code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour but obviously it 
needs to be proved first so for now I'd suggest a new queryset function 
that enables this behaviour.

I have a proof of concept of this mechanism that I've used successfully in 
production. I'm not posting it yet because I'd like to focus on desired 
behavior rather than implementation details. But in summary, what it does 
is when accessing a missing field on a model, rather than fetching it just 
for that instance, it runs a prefetch_related query to fetch it for all 
peer instances that were fetched in the same queryset. So in the example 
above it prefetches all Questions in one query.

This might seem like a risky thing to do but I'd argue that it really isn't.
The only time this isn't superior to the default case is when you are post 
filtering the queryset results in Python.
Even in that case it's only inferior if you started with a large number of 
results, filtered basically all of them and the code is structured so that 
the filtered ones aren't garbage collected.
To cover this rare case the automatic prefetching can easily be disabled on 
a per queryset or per object basis. Leaving us with a rare downside that 
can easily be manually resolved in exchange for a significant general 
improvement.

In practice this thing is almost magical to work with. Unless you already 
have extensive and tightly maintained prefetches everywhere you get an 
immediate boost to virtually everything that touches the database, often 
knocking orders of magnitude off page load times.

If an agreement can be reached on pursuing this then I'm happy to put in 
the work to productize the proof of concept.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.