#26530: Batch operations on large querysets
-------------------------------------+-------------------------------------
     Reporter:  mjtamlyn             |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by akaariai):

 If the idea is to do something for each object, then
 {{{
 for obj in qs.iterator():
     obj.do_something()
 }}}
 should give you a lot better memory efficiency. Of course, if using
 PostgreSQL, the driver will still fetch all the rows into memory.

 A very good approach would be to finally tackle the named cursors issue.
 Then you could just do:
 {{{
 for obj in qs.iterator(cursor_size=100):
     obj.do_something()
 }}}
 and be done with it. The problem with the named cursor approach is that
 some databases have more or less hard to overcome limitations of what can
 be done with the cursor, how transactions work and so on.

 If you really want batches of object, then we probably need to use the
 pointer approach. Otherwise iterating through a large queryset will end up
 doing queries like `select * from the_query offset 100000 limit 100` which
 is very inefficient, and concurrent modifications could end up introducing
 the same object in multiple batches.

 I'm mildly in favor of adding this, as the addition to API surface isn't
 large, and there are a lot of ways to implement the batching in mildly
 wrong ways.

 If we are going for this, then I think the API should be the `for batch in
 qs.batch(size=100)` one. The queryset should be ordered in such a way that
 primary key is the only sorting criteria. We can change that later so that
 primary or some other unique key is a postfix of the order by, but that is
 a bit harder to do.

--
Ticket URL: <https://code.djangoproject.com/ticket/26530#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/066.457b52950d524941b989bfbc64f959f9%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to