#16937: Prefetch related objects -------------------------------------+------------------------------------- Reporter: lukeplant | Owner: nobody Type: New feature | Status: new Component: Database layer | Version: 1.3 (models, ORM) | Resolution: Severity: Normal | Triage Stage: Accepted Keywords: | Needs documentation: 0 Has patch: 0 | Patch needs improvement: 0 Needs tests: 0 | UI/UX: 0 Easy pickings: 0 | -------------------------------------+-------------------------------------
Comment (by akaariai): So the API might look something like this in the future: {{{ Books.objects.prefetch_related( Opts('young_readers', path='books__read_by', qs=filtering_qs, collapse=True), 'another_related' ) # or with tuple / namedtuple instead of Opts. The kwarg names are just placeholders. }}} I can live with that :) Here is the review, there is a lot of stuff here, but most (all?) of these are issues that do not need to be dealt with at the moment. They can be resolved in later tickets. I will mark "patch needs improvement" but feel free to just unset it if you feel that these issues are better dealt in later tickets. - About db_for_read usage: Is this usage correct? {{{ [contrib/contentypes/generic.py:278 and db/models/fields/related.py:446] db = self._db or router.db_for_read(self.model, instance=instances[0]) }}} Two possible problems: 1. the model and instance are of different classes, I don't know, maybe this is OK? 2. Only the first instance of the list is given as a hint, is this OK? I don't know multidb too well, so it might be this is OK. - Related to above: no multidb tests. - Attached is a failing tests - it is related to having more than 1000 parameters in single SQL query. I think this should be fixed, but this can maybe be done in a later commit. - The patch assumes relation to the PK field of the model in m2m cases - this is not necessary the case. Attached is a failing test related to this and a successful test in foreign key case. - I don't understand what the following comment means, and there are some typos in it: {{{ [modeltests/prefetch_related:141-146] The second prefetch_related lookup is a reverse foreign key. This means the query for it can only be something like "first_time_authors__pk__in = [...]" and cannot return more rows then the number of Author objects in the database. This means that these objects will be reused (since in our data we've arranged for there len(authors1) > Author.objects.count()) }}} - I don't like the usage of queryset len() to force the evaluation of the queryset. Better to have explicit method (.evaluate()) or something) and call it than rely on the implicit evaluation when len() is called. Maybe another ticket for this? - The current method of select_matching_instances can be slow. The method used here is effectively a nested loop, while using a dict would be considerably faster in many cases. The current method is O(nm) while dict would give O(n + m) but with higer memory cost. Currently fetching 1000 objects and prefetching 5 related objects for each takes 3 seconds on my laptop. Just fetching the objects in two different querysets takes 0.2 seconds. So the joining takes almost 2.8 seconds. In the end of this post is a profile of this run. - In models/query.py prefetch_related_objects there is this loop: {{{ i = 0 while i < len(fields): field = fields[i] }}} Am I missing something, or wouldn't "for field in fields" work here? - Same method as above: Is it good enough just to break here, or should the code raise an exception (not the AttributeError but something more helpful): {{{ good_objects = True for obj in obj_list: if not hasattr(obj, '_prefetched_objects_cache'): try: obj._prefetched_objects_cache = {} except AttributeError: # Must be in a QuerySet subclass that is not returning # Model instances, either in Django or 3rd # party. prefetch_related() doesn't make sense, so quit # now. good_objects = False break if not good_objects: break }}} Is it really necessary to loop through the whole obj_list? One would assume that they are homogenous, so checking just obj_list[0] would be enough? - In models/query.py around line 1646 there is this snippet: {{{ for obj in instances: qs = getattr(obj, attname).all() }}} I am afraid this is somewhat expensive, as this will result in the following code executed for each obj: {{{ return super(RelatedManager, self).get_query_set().using(db).filter(**(self.core_filters)) }}} That is one QuerySet construction and two clones for each object. This is one good reason not to use relatedmanager.all() for prefetch cache... :) - Some tests for multi-table inheritance added in the attachment. - I have only skimmed the domcumentation. As noted in the beginning most of these issues are not blockers but things that might need work later. IMHO the code is high-quality, there are good tests and the documentation looks sane. Even if there are a lot of nitpicks above I think this is very close to be ready for commit. The promised profile: {{{ 1 0.055 0.055 22.647 22.647 speedtester.py:1(<module>) 55540/30018 0.091 0.000 21.001 0.001 {len} 1 0.000 0.000 20.967 20.967 transaction.py:206(inner) 1 0.011 0.011 20.966 20.966 speedtester.py:24(managed) 2/1 0.000 0.000 20.949 20.949 query.py:90(__iter__) 3/2 0.002 0.001 20.949 10.474 query.py:75(__len__) 1 0.000 0.000 20.824 20.824 query.py:545(_prefetch_related_objects) 1 0.002 0.002 20.824 20.824 query.py:1534(prefetch_related_objects) 1 0.026 0.026 20.819 20.819 query.py:1627(_prefetch_one_level) 999 9.552 0.010 18.146 0.018 related.py:451(select_matching_instances) 5011132/5010067 8.638 0.000 8.696 0.000 {getattr} 999 0.021 0.000 2.016 0.002 related.py:457(all) 999 0.005 0.000 1.992 0.002 manager.py:115(all) 999 0.028 0.000 1.987 0.002 related.py:434(get_query_set) 11988 0.525 0.000 1.554 0.000 base.py:279(__init__) 2005 0.027 0.000 1.392 0.001 query.py:837(_clone) 2005 0.111 0.000 1.349 0.001 query.py:233(clone) }}} -- Ticket URL: <https://code.djangoproject.com/ticket/16937#comment:12> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to django-updates@googlegroups.com. To unsubscribe from this group, send email to django-updates+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-updates?hl=en.