#26530: Batch operations on large querysets -------------------------------------+------------------------------------- Reporter: mjtamlyn | Owner: nobody Type: New feature | Status: new Component: Database layer | Version: master (models, ORM) | Severity: Normal | Resolution: Keywords: | Triage Stage: | Unreviewed Has patch: 0 | Needs documentation: 0 Needs tests: 0 | Patch needs improvement: 0 Easy pickings: 0 | UI/UX: 0 -------------------------------------+-------------------------------------
Comment (by akaariai): If the idea is to do something for each object, then {{{ for obj in qs.iterator(): obj.do_something() }}} should give you a lot better memory efficiency. Of course, if using PostgreSQL, the driver will still fetch all the rows into memory. A very good approach would be to finally tackle the named cursors issue. Then you could just do: {{{ for obj in qs.iterator(cursor_size=100): obj.do_something() }}} and be done with it. The problem with the named cursor approach is that some databases have more or less hard to overcome limitations of what can be done with the cursor, how transactions work and so on. If you really want batches of object, then we probably need to use the pointer approach. Otherwise iterating through a large queryset will end up doing queries like `select * from the_query offset 100000 limit 100` which is very inefficient, and concurrent modifications could end up introducing the same object in multiple batches. I'm mildly in favor of adding this, as the addition to API surface isn't large, and there are a lot of ways to implement the batching in mildly wrong ways. If we are going for this, then I think the API should be the `for batch in qs.batch(size=100)` one. The queryset should be ordered in such a way that primary key is the only sorting criteria. We can change that later so that primary or some other unique key is a postfix of the order by, but that is a bit harder to do. -- Ticket URL: <https://code.djangoproject.com/ticket/26530#comment:3> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.457b52950d524941b989bfbc64f959f9%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.