#18702: Remove chunked reads from iter(qs)
-------------------------------------+-------------------------------------
               Reporter:  akaariai   |          Owner:  nobody
                   Type:             |         Status:  new
  Cleanup/optimization               |        Version:  1.4
              Component:  Database   |       Keywords:
  layer (models, ORM)                |      Has patch:  1
               Severity:  Normal     |    Needs tests:  0
           Triage Stage:  Design     |  Easy pickings:  0
  decision needed                    |
    Needs documentation:  0          |
Patch needs improvement:  0          |
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 The queryset iterator protocol does convert rows lazily to objects when
 iterated. This has two advantages:
   1. If one iterates just part of the queryset, there is no need to do
 model conversion for all objects.
   2. Again, if iterating just part of the qs, some backends allow you to
 fetch just part of the rows from the DB (oracle, for example).

 However, there are some costs, too:
   1. Complexity in the `__iter__` -> _result_iter -> (_results_cache,
 _iter) -> _iterator implementation.
   2. The lazy fetching costs around 5-10% performance in the case of "for
 obj in qs.all()" (1000 objs, 2 fields). For values_list('id') the cost is
 around 30%.
   3. The current implementation silently discards some exceptions when
 doing list(qs). This can be annoying especially when debugging django-core
 code.

 My take is we are optimizing the wrong case currently. That is, the case
 where one wants to consume a queryset only partially, but can't use the
 .iterator() method. The case would be something like:
 {{{
     for obj in qs:
         if somecond:
             break
     # Now, another loop for the same queryset!
     for obj in qs:
         if someothercond:
             break
 }}}
 If there is no another loop, it is possible to use .iterator(). If one of
 the above loops consumes major portion of the qs, then there is no benefit
 in doing partial object conversion.

 The question is if there are common patterns where the current
 implementation is worth the code complexity & performance loss for the
 common cases.

 I will leave this as DDN, as this change is obviously something that needs
 to be considered carefully.

 There is a branch implementing the removal of chunked reads at:
 https://github.com/akaariai/django/compare/non_chunked_reads

-- 
Ticket URL: <https://code.djangoproject.com/ticket/18702>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to