#28586: Automatically prefetch related for "to one" fields as needed.
-------------------------------------+-------------------------------------
               Reporter:  Gordon     |          Owner:  nobody
  Wrigley                            |
                   Type:  New        |         Status:  new
  feature                            |
              Component:  Database   |        Version:  master
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:  prefetch_related
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 When accessing a 2one field (foreign key in the forward direction and
 one2one in either direction) on a model instance, if the fields value has
 not yet been loaded then Django should prefetch the field for all model
 instances loaded by the same queryset as the current model instance.

 There has been some discussion of this on the mailing list
 https://groups.google.com/forum/#!topic/django-developers/EplZGj-ejvg

 Currently when accessing an uncached 2one field Django will automatically
 fetch the missing value from the Database. When this occurs in a loop it
 creates 1+N query problems. Consider the following snippet:

 {{{#!python
 for choice in Choice.objects.all():
     print(choice.question.question_text, ':', choice.choice_text)
 }}}

 This will do one query for the choices and then one query per choice to
 get that choice's question.
 This behavior can be avoided with correct application of prefetch_related
 like this:

 {{{#!python
 for choice in Choice.objects.prefetch_related('question'):
     print(choice.question.question_text, ':', choice.choice_text)
 }}}

 This has several usability issues, notably:
 * Less experienced users are generally not aware that it's necessary.
 * Cosmetic seeming changes to things like templates can change the fields
 that should be prefetched.
 * Related to that the code that requires the prefetch_related (template
 for example) may be quite removed from where the prefetch_related needs to
 be applied (view for example).
 * Subsequently finding where prefetch_related calls are missing is non
 trivial and needs to be done on an ongoing basis.
 * Excess fields in prefetch_related calls are even harder to find and
 result in unnecessary database queries.
 * It is very difficult for libraries like the admin and Django Rest
 Framework to automatically generate correct prefetch_related clauses.

 The proposal is on the first iteration of the loop in the example above,
 when we first access a choice's question field instead of fetching the
 question for just that choice, speculatively fetch the questions for all
 the choices returned by the queryset.
 This change results in the first snippet having the same database behavior
 as the second while reducing or eliminating all of the noted usability
 issues.

 Some important points:
 * 2many fields are not changed at all by this proposal as I can't think of
 a reasonable way of deciding which of the many to fetch
 * Because these are 2one fields the generated queries can't have more
 result rows than the original query and may have less.
 * This feature will never result in more database queries.
 * It will not change the DB behavior of code which is full covered by
 prefetch_related (and select_related) calls at all.
 * This will inherently chain across relations like choice.question.author,
 the conditions above still hold under such chaining.
 * It may result in larger data transfer between the database and Django in
 some situations.

 On that last point an example would be this:
 {{{#!python
 qs = Choice.objects.all()
 list(qs)[0].question
 }}}
 Such examples generally seem to be rarer and more likely to be visible
 during code inspection (vs {{choice.question}} in a template). And larger
 queries are usually a better failure mode than producing hundreds of
 queries.
 For this to actually produce inferior behavior in practice you need to:
 a. fetch a large number of choices
 b. filter out basically all of them
 c. in a way that prevents garbage collection of the unfiltered ones
 If any of those aren't true then automatic prefetching will still produce
 equivalent or better database behavior than without.

 Several optin/optout options were discussed in the mailing list, I will
 attempt to summarize these below. Most of them are compatible with each
 other, however in the interests of having a clean interface we probably
 want to limit how many we implement.
 1. A global option in settings. So as to not accidentally fix existing
 code this could default to disabled if not specified.
 2. Per queryset either as auto_prefetch_related(value) or
 prefetch_related(auto=value) where value would determine enabled,
 disabled, default.
 3. Per object, similar to the per queryset version.
 4. Per model in meta, it's not clear if this was intended to be on
   a. the model used in the original queryset
   b. the model the field is on
   c. the model the field refers to
 5. As a context manager (this could then easily be applied in middleware
 or a view decorator)
 6. On the field, similar to on_delete


 P.S. I've been using this in my own code with no optin / optout for
 sometime and have had literally no problems with it.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/28586>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/050.5a65a956070b2bf50b3dfed691543c8e%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to