I've been working on a bulk_update method for a project to allow saving many instances of a Model in a single call/query, an analog to the existing bulk_create method. There seems to be some interest in this as evidenced by a library django-bulk-update <https://github.com/aykut/django-bulk-update> (which isn't working for me in tests with python3 and SQLite) and a recent ticket <https://code.djangoproject.com/ticket/23646>.
The ticket was closed as wontfix with an invitation for discussion, although the author doesn't seem to have pursued it. However, Shai Berger did comment positively on the ticket and proposed a different API, similar to the one in django-bulk-update and my own prototype: Book.objects.update_many(books, 'price') My own attempt was on Django 1.7 using a subclass of ExpressionNode passed as a value into UpdateQuery.add_update_values which allows Django to generate most of the UPDATE SQL, while the expression subclass was only responsible for generating an SQL CASE expression. This quickly generalized into a Case expression class which could be passed as a value into QuerySet.update. But now that Query Expressions have been merged, I've done a quick port and the Case expression now works with updates, annotations and aggregates, and is used to power my bulk_update method. A simple performance test shows that this works quite well. I tested the following methods of updating many database rows, using an in memory SQLite database : # loop_save for o in objects: o.save(update_fields=['field']) # bulk_update MyModel.objects.bulk_update(objects, update_fields=['field']) # many_updates q = Q(condition__lt=test_value) MyModel.objects.filter(q).update(field=value1) MyModel.objects.filter(~q).update(field=value2) # case_update q = Q(condition__lt=test_value) MyModel.objects.update(field=Case([(q, value1)], default=value2, output_field=MyModel._meta.get_field('field'))) With 10000 objects in the database, and updating all of them I got these run times using clock(): loop_save: 3.022845626653664 bulk_update: 0.1785595393377175 many_updates: 0.009768530320993563 case_update: 0.009343744353718986 We can see that bulk_update outperforms looping by an order of magnitude, despite working by generating a CASE expression with a WHEN clause for every object conditioned on its pk. Using Case in an update has no significant performance improvement, over running multiple queries. It may arguably allow for more readable code when using many conditions since the semantics of case are that conditions are evaluated in order, like a Python if ... elif ... else. Compare this: MyModel.objects.filter(Q(condition__lt=test_value1)).update(field=value1) MyModel.objects.filter(Q(condition__gte=test_value1, condition__lt=test_value2)).update(field=value2) MyModel.objects.filter(Q(condition__gte=test_value2, condition__lt=test_value3)).update(field=value3) MyModel.objects.filter(Q(condition__gte=test_value3, condition__lt=test_value4)).update(field=value4) MyModel.objects.filter(Q(condition__gte=test_value4)).update(field=value5) as opposed to: q = Q(condition__lt=test_value) MyModel.objects.update(field=Case([(Q(condition__lt=test_value1), value1), (Q(condition__lt=test_value2), value2), (Q(condition__lt=test_value3), value3), (Q(condition__lt=test_value4), value4)], default=value5, output_field=MyModel._meta.get_field('field'))) With only 10 objects in the database, and updating all of them the times are: loop_save: 0.004131221428270882 bulk_update: 0.0006249582329893796 many_updates: 0.0006562061446388481 case_update: 0.00036259952923938313 So the bulk_update performance gains are still there, and update using a Case expression if faster because the SQL generation time dominates in this example, and only one query needs to be generated and executed. Case expressions can also be used in other ways (although these examples use a subclass SimpleCase, which is described in a little more detail below, for simpler syntax): # annotation MyModel.objects.annotate(status_text=SimpleCase('status', [('S', 'Started'), ('R', 'Running'), ('F', 'Finished')], default='Unknown', output_field=CharField())) # aggregation MyModel.objects.aggregate(started=Sum(SimpleCase('status', [('S', 1)], default=0, output_field=IntegerField())), running=Sum(SimpleCase('status', [('R', 1)], default=0, output_field=IntegerField())), finished=Sum(SimpleCase('status', [('F', 1)], default=0, output_field=IntegerField()))) In these examples SimpleCase is a subclass of Case that generates the SQL simple CASE expression which is an equality test against a field. So is this something that would be a worthwhile addition to core? It could work now as a separate module, but the update example requires setting output_field. My original Django 1.7 code abused the prepare_database_save method that SQLUpdateCompiler called on an ExpressionNode to get the field that the expression would be assigned to automatically. After the merge of Query Expression SQLUpdateCompiler calls either resolve_expression or prepare_database_save, so I can no longer use this. A change in SQLUpdateCompiler would be required with a new API for settings output_field automatically on ExpressionNodes used as update values. If this were considered for core, I would appreciate input on the API. I currently have: Case(list_of_case_tuples, default, output_field) - A general CASE expression called a searched case in the SQL spec. list_of_case_tuples is an iterable of tuples of the form (Q_object, value), and default is the value for the ELSE clause. This generates SQL like: CASE WHEN n > 0 THEN 'positive' WHEN n < 0 THEN 'negative' ELSE 'zero' END SimpleCase(fieldname, list_of_simple_case_tuples, default, output_field) - A simple case expression in the SQL spec. fieldname is a field identifier like in an F expression, list_of_simple_case_tuples is an iterable of tuples of the form (condition_value, result_value). This generates SQL like: CASE n WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE 'other' END Model.objects.bulk_update(list_of_instances, update_fields) - An analog to the bulk_create method that save changes in many model instances. list of instances is an iterable of model instances, update_fields is the same as the argument to Model.save of the same name. Of course I will put up a branch on github if there is interest in this proposal. Thanks for your time and sorry for the wall of text. - Michael -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/000de073-e26e-4b79-ae1b-67ba2f3d7180%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.