I'd like to go for bulk_merge(objects, force_update, force_insert,
update_fields) method. The basic implementation would call obj.save() in
loop. Optimized implementations could then be added later on.

Bulk update would then be an optimized implementation for force_update
or update_fields cases.

I'm +1 for adding some form of Case statement to core. I'm not sure what
the API should look like.

 - Anssi

On Mon, 2014-11-17 at 04:54 -0800, michal.modzelew...@gmail.com wrote:
> I've been working on a bulk_update method for a project to allow
> saving many instances of a Model in a single call/query, an analog to
> the existing bulk_create method.
> There seems to be some interest in this as evidenced by a
> library django-bulk-update (which isn't working for me in tests with
> python3 and SQLite) and a recent ticket.
> 
> 
> The ticket was closed as wontfix with an invitation for discussion,
> although the author doesn't seem to have pursued it. However, Shai
> Berger did comment positively on the ticket and proposed a different
> API, similar to the one in django-bulk-update and my own prototype:
> 
> 
> Book.objects.update_many(books, 'price')
> 
> 
> My own attempt was on Django 1.7 using a subclass of ExpressionNode
> passed as a value into UpdateQuery.add_update_values which allows
> Django to generate most of the UPDATE SQL, while the expression
> subclass was only responsible for generating an SQL CASE
> expression. This quickly generalized into a Case expression class
> which could be passed as a value into QuerySet.update.
> 
> 
> But now that Query Expressions have been merged, I've done a quick
> port and the Case expression now works with updates, annotations and
> aggregates, and is used to power my bulk_update method.
> 
> 
> A simple performance test shows that this works quite well. I tested
> the following methods of updating many database rows, using an in
> memory SQLite database :
> 
> 
> # loop_save
> for o in objects:
>     o.save(update_fields=['field'])
> 
> 
> # bulk_update
> MyModel.objects.bulk_update(objects, update_fields=['field'])
> 
> 
> # many_updates
> q = Q(condition__lt=test_value)
> MyModel.objects.filter(q).update(field=value1)
> MyModel.objects.filter(~q).update(field=value2)
> 
> 
> # case_update
> q = Q(condition__lt=test_value)
> MyModel.objects.update(field=Case([(q, value1)], default=value2,
> 
> output_field=MyModel._meta.get_field('field')))
> 
> 
> With 10000 objects in the database, and updating all of them I got
> these run times using clock():
> 
> 
> loop_save:    3.022845626653664
> bulk_update:  0.1785595393377175
> many_updates: 0.009768530320993563
> case_update:  0.009343744353718986
> 
> 
> 
> We can see that bulk_update outperforms looping by an order of
> magnitude, despite  working by generating a CASE expression with a
> WHEN clause for every object conditioned on its pk.
> Using Case in an update has no significant performance improvement,
> over running multiple queries. It may arguably allow for more readable
> code when using many conditions since the semantics of case are that
> conditions are evaluated in order, like a Python if ... elif ... else.
> Compare this:
> 
> 
> MyModel.objects.filter(Q(condition__lt=test_value1)).update(field=value1)
> MyModel.objects.filter(Q(condition__gte=test_value1,
> condition__lt=test_value2)).update(field=value2)
> MyModel.objects.filter(Q(condition__gte=test_value2,
> condition__lt=test_value3)).update(field=value3)
> MyModel.objects.filter(Q(condition__gte=test_value3,
> condition__lt=test_value4)).update(field=value4)
> MyModel.objects.filter(Q(condition__gte=test_value4)).update(field=value5)
> 
> 
> as opposed to:
> 
> 
> q = Q(condition__lt=test_value)
> MyModel.objects.update(field=Case([(Q(condition__lt=test_value1),
> value1),
>                                    (Q(condition__lt=test_value2),
> value2),
>                                    (Q(condition__lt=test_value3),
> value3),
>                                    (Q(condition__lt=test_value4),
> value4)], default=value5,
> 
> output_field=MyModel._meta.get_field('field')))
> 
> 
> With only 10 objects in the database, and updating all of them the
> times are:
> 
> 
> loop_save:    0.004131221428270882
> bulk_update:  0.0006249582329893796
> many_updates: 0.0006562061446388481
> case_update:  0.00036259952923938313
> 
> 
> 
> So the bulk_update performance gains are still there, and update using
> a Case expression if faster because the SQL generation time dominates
> in this example, and only one query needs to be generated and
> executed.
> 
> 
> Case expressions can also be used in other ways (although these
> examples use a subclass SimpleCase, which is described in a little
> more detail below, for simpler syntax):
> 
> 
> # annotation
> MyModel.objects.annotate(status_text=SimpleCase('status', [('S',
> 'Started'), ('R', 'Running'), ('F', 'Finished')],
>                                                 default='Unknown',
> output_field=CharField()))
> 
> 
> # aggregation
> MyModel.objects.aggregate(started=Sum(SimpleCase('status', [('S', 1)],
> default=0, output_field=IntegerField())),
>                           running=Sum(SimpleCase('status', [('R', 1)],
> default=0, output_field=IntegerField())),
>                           finished=Sum(SimpleCase('status', [('F',
> 1)], default=0, output_field=IntegerField())))
> 
> 
> In these examples SimpleCase is a subclass of Case that generates the
> SQL simple CASE expression which is an equality test against a field.
> 
> 
> So is this something that would be a worthwhile addition to core?
> It could work now as a separate module, but the update example
> requires setting output_field. My original Django 1.7 code abused the
> prepare_database_save method that SQLUpdateCompiler called on an
> ExpressionNode to get the field that the expression would be assigned
> to automatically. After the merge of Query Expression
> SQLUpdateCompiler calls either resolve_expression or
> prepare_database_save, so I can no longer use this. A change in
> SQLUpdateCompiler would be required with a new API for settings
> output_field automatically on ExpressionNodes used as update values.
> 
> 
> If this were considered for core, I would appreciate input on the API.
> I currently have:
> 
> 
> Case(list_of_case_tuples, default, output_field) - A general CASE
> expression called a searched case in the SQL spec. list_of_case_tuples
> is an iterable of tuples of the form (Q_object, value), and default is
> the value for the ELSE clause. This generates SQL like:
> 
> 
> CASE WHEN n > 0 THEN 'positive' WHEN n < 0 THEN 'negative' ELSE 'zero'
> END
> 
> 
> SimpleCase(fieldname, list_of_simple_case_tuples, default,
> output_field) - A simple case expression in the SQL spec. fieldname is
> a field identifier like in an F expression, list_of_simple_case_tuples
> is an iterable of tuples of the form (condition_value, result_value).
> This generates SQL like:
> 
> 
> CASE n WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE 'other' END
> 
> 
> Model.objects.bulk_update(list_of_instances, update_fields) - An
> analog to the bulk_create method that save changes in many model
> instances. list of instances is an iterable of model instances,
> update_fields is the same as the argument to Model.save of the same
> name.
> 
> 
> Of course I will put up a branch on github if there is interest in
> this proposal. Thanks for your time and sorry for the wall of text.
> 
> 
> - Michael
> 
> 


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1416293469.8931.214.camel%40TTY32.
For more options, visit https://groups.google.com/d/optout.

Reply via email to