#16937: Prefetch related objects
-------------------------------------+-------------------------------------
     Reporter:  lukeplant            |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  1.3
  (models, ORM)                      |               Resolution:
     Severity:  Normal               |             Triage Stage:  Accepted
     Keywords:                       |      Needs documentation:  0
    Has patch:  0                    |  Patch needs improvement:  1
  Needs tests:  0                    |                    UI/UX:  0
Easy pickings:  0                    |
-------------------------------------+-------------------------------------

Comment (by akaariai):

 Replying to [comment:14 lukeplant]:
 > It is basically following the existing code in `get_query_set()`, which
 may be incorrect itself, but is also sending using instances that are not
 of the same type as the model given.  The docs suggest this might be OK:
 >
 > https://docs.djangoproject.com/en/dev/topics/db/multi-db/#topics-db-
 multi-db-hints
 >
 > However, since there are many instances here, it is probably better not
 to send the 'instance' hint at all. Thank you for forcing the issue.
 >
 > I don't think we should add multi-db tests unless there are specific
 multi-db issues we need to address. I also don't use multi-db at all, so I
 would be stabbing in the dark trying to find issues there.

 I think somebody who uses the multi-db feature should comment about this.
 Two people stabbing in the dark is not better than one people doing that.
 Come to think of it, it might be worse... :)

 The first instance as a hint is probably somewhat useful as you can get
 the db used in the query from that. I guess you want to get that from
 somewhere.

 >> [About too-may-parameters in SQL].
 > I think we can punt on this one. There are many cases where you can end
 up with more than 1000 parameters in a single query. If your DB can't do
 that, and you need to, don't use prefetch_related or get a better DB. If
 there is the demand, it should be possible to add a fix for this as an
 enhancement.

 Agreed. I checked other core supported databases, and there doesn't seem
 to be any practical limits.

 > >  - I don't like the usage of queryset len() to force the evaluation of
 the queryset. Better to have explicit method (.evaluate()) or something)
 and call it than rely on the implicit evaluation when len() is called.
 Maybe another ticket for this?
 >
 > The internals of `QuerySet` at this point are very delicate, and
 optimized. `__iter__`, `__nonzero__`, `__len__` etc all depend heavily on
 the internals of each other in order to achieve the optimization required,
 and we have tests that ensure things are as we expect. I am happy to use
 the same method here, and I don't think that an `evaluate()` method would
 actually add clarity while preserving the optimisations we've done,
 because it is not as simple as 'evaluate' - what I mean is 'fill the
 internal cache', and I know that `len()` has exactly that behaviour.

 OK, this is then for sure food for another ticket, if even for that. I
 have to add this to my list of queryset annoyances, it already contains
 .clone() that does a lot more than cloning...

 > >  - The current method of select_matching_instances can be slow. The
 method used here is effectively a nested loop, while using a dict would be
 considerably faster in many cases. The current method is O(nm) while dict
 would give O(n + m) but with higer memory cost. Currently fetching 1000
 objects and prefetching 5 related objects for each takes 3 seconds on my
 laptop. Just fetching the objects in two different querysets takes 0.2
 seconds. So the joining takes almost 2.8 seconds. In the end of this post
 is a profile of this run.
 >
 > Nice catch, and thanks for the figures, I'll see what I can do about
 this, unless you want to have a go.

 I can take a stab at this, I have done the hash-join-in-python too many
 times already...

 > >  - In models/query.py around line 1646 there is this snippet:
 > >    {{{
 > >       for obj in instances:
 > >           qs = getattr(obj, attname).all()
 > >    }}}
 > >    I am afraid this is somewhat expensive, as this will result in the
 following code executed for each
 > >    obj:
 > >    {{{
 > >        return super(RelatedManager,
 self).get_query_set().using(db).filter(**(self.core_filters))
 > >    }}}
 > >    That is one QuerySet construction and two clones for each object.
 This is one good reason not to use
 > >    relatedmanager.all() for prefetch cache... :)
 >
 >
 > The code is exactly the same as what you would otherwise execute - you
 will be doing `[foo.bars.all() for foo in foos]` at some point (otherwise
 why use prefetch_related?), and here we are just doing that up front. The
 next time you call the `all()` you get the pre-created version. So I don't
 think trying to optimise here is necessary. If we find a shortcut which
 doesn't significantly reduce the clarity, fine, but it can wait.

 True. But this whole ticket is about optimizing the mentioned code path.
 On the other hand, the queryset creation takes about as much time as model
 initialization in the attached profile. Making queryset clone faster would
 be the key here, and I might have an idea or two about making that
 faster... In any case, this can be handled later if the need arises.

 I will try to get the hash-join patch done in a hour or two.

 IMHO it would be good to make a summary of the API extensibility
 discussion to django-developers and leave it there for a couple of days so
 that others can have a say. I think the current resolution is good enough,
 but in this case more eyes means better results.

 Otherwise, close to commit ready :)

-- 
Ticket URL: <https://code.djangoproject.com/ticket/16937#comment:15>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to