Re: [Django] #23576: ORM delete strategy can lead to race conditions inside a transaction

2014-09-30 Thread Django
#23576: ORM delete strategy can lead to race conditions inside a transaction
-+-
 Reporter:  jdufresne|Owner:  nobody
 Type:  Uncategorized|   Status:  new
Component:  Database layer   |  Version:  1.7
  (models, ORM)  |   Resolution:
 Severity:  Normal   | Triage Stage:
 Keywords:   |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by jdufresne):

 This looks specific to database backends that do not support
 `update_can_self_select`. From the list of database backends shipped with
 Django, this applies only to MySQL.

 This is the race free query generated for backends supporting
 `update_can_self_select`.

 {{{
 DELETE FROM `myapp_c` WHERE `myapp_c`.`id` IN (SELECT U0.`id` AS `id` FROM
 `myapp_c` U0 INNER JOIN `myapp_a` U1 ON ( U0.`a_id` = U1.`id` ) INNER JOIN
 `myapp_b` U2 ON ( U0.`b_id` = U2.`id` ) WHERE (U1.`name` = 'foo' AND
 U2.`name` = 'bar'))
 }}}

 The problematic queries used by the MySQL:

 {{{
 SELECT `myapp_c`.`id` FROM `myapp_c` INNER JOIN `myapp_a` ON (
 `myapp_c`.`a_id` = `myapp_a`.`id` ) INNER JOIN `myapp_b` ON (
 `myapp_c`.`b_id` = `myapp_b`.`id` ) WHERE (`myapp_a`.`name` = 'foo' AND
 `myapp_b`.`name` = 'bar')
 DELETE FROM `myapp_c` WHERE `myapp_c`.`id` IN ({ RESULT FROM PREVIOUS
 QUERY })
 }}}

 The fact that this occurs across two queries allows rows to be inserted
 that would normally meet the criteria of the `WHERE` clause.

 One way to handle this would be to use MySQL's multiple table syntax
 http://dev.mysql.com/doc/refman/5.7/en/delete.html which allows joins. The
 query would be as simple as taking the `SELECT` query, removing the
 `SELECT` clause and replacing it with `DELETE myapp_c FROM myapp_c INNER
 JOIN ...`.

 I am interested in doing this, however, I'm not sure exactly where to
 start in order to build a better query. I see inside
 `DeleteQuery.delete_qs()` is where the check for feature
 `update_can_self_select` occurs. So perhaps a new feature check and branch
 should occur here. However, once the feature is checked, I'm not sure how
 to start building the alternative query. Any guidance would be
 appreciated.

--
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/067.3d7c803c3e6a9a457fbcfa62aa0133a5%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Django] #23576: ORM delete strategy can lead to race conditions inside a transaction

2014-09-30 Thread Django
#23576: ORM delete strategy can lead to race conditions inside a transaction
-+-
 Reporter:  jdufresne|Owner:  nobody
 Type:  Uncategorized|   Status:  new
Component:  Database layer   |  Version:  1.7
  (models, ORM)  |   Resolution:
 Severity:  Normal   | Triage Stage:
 Keywords:   |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by jdufresne):

 * needs_better_patch:   => 0
 * needs_tests:   => 0
 * needs_docs:   => 0


Comment:

 I have created a minimal test case:
 https://github.com/jdufresne/djtest-23576 I'd be happy to answer any
 questions about the test case if anything requires explanation. The
 waiting on `stdin` is to simulate a view with a long running process
 inside a transaction.

 To run the test, run `$ python test.py` from the project root directory.
 (As it is a race condition it may take more than one try.)

 The test runs in a management command instead of a HTTP view in order to
 easily trigger the race using multiple processes, but the principle is the
 same. Imagine the command is the view and each process is an HTTP request.
 Upon running the test and triggering the error I get the following
 traceback:

 {{{
 Traceback (most recent call last):
   File "manage.py", line 10, in 
 execute_from_command_line(sys.argv)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/core/management/__init__.py", line 385, in
 execute_from_command_line
 utility.execute()
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/core/management/__init__.py", line 377, in execute
 self.fetch_command(subcommand).run_from_argv(self.argv)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/core/management/base.py", line 288, in run_from_argv
 self.execute(*args, **options.__dict__)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/core/management/base.py", line 338, in execute
 output = self.handle(*args, **options)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/core/management/base.py", line 533, in handle
 return self.handle_noargs(**options)
   File "/home/jon/djtest/myproj/myapp/management/commands/mytest.py", line
 13, in handle_noargs
 C(a=A.objects.get(name='foo'), b=B.objects.get(name='bar'))
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/models/manager.py", line 92, in manager_method
 return getattr(self.get_queryset(), name)(*args, **kwargs)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/models/query.py", line 409, in bulk_create
 self._batched_insert(objs_without_pk, fields, batch_size)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/models/query.py", line 938, in _batched_insert
 using=self.db)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/models/manager.py", line 92, in manager_method
 return getattr(self.get_queryset(), name)(*args, **kwargs)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/models/query.py", line 921, in _insert
 return query.get_compiler(using=using).execute_sql(return_id)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/models/sql/compiler.py", line 920, in execute_sql
 cursor.execute(sql, params)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/backends/utils.py", line 81, in execute
 return super(CursorDebugWrapper, self).execute(sql, params)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/backends/utils.py", line 65, in execute
 return self.cursor.execute(sql, params)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/utils.py", line 94, in __exit__
 six.reraise(dj_exc_type, dj_exc_value, traceback)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/backends/utils.py", line 65, in execute
 return self.cursor.execute(sql, params)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/django/db/backends/mysql/base.py", line 128, in execute
 return self.cursor.execute(query, args)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/MySQLdb/cursors.py", line 205, in execute
 self.errorhandler(self, exc, value)
   File "/home/jon/djtest/venv/lib/python2.7/site-
 packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
 raise errorclass, errorvalue
 django.db.utils.IntegrityError: (1062, "Duplicate entry '1-1' for key
 'a_id'")
 }}}

--
Ticket URL: 

[Django] #23576: ORM delete strategy can lead to race conditions inside a transaction

2014-09-30 Thread Django
#23576: ORM delete strategy can lead to race conditions inside a transaction
--+
 Reporter:  jdufresne |  Owner:  nobody
 Type:  Uncategorized | Status:  new
Component:  Database layer (models, ORM)  |Version:  1.7
 Severity:  Normal|   Keywords:
 Triage Stage:  Unreviewed|  Has patch:  0
Easy pickings:  0 |  UI/UX:  0
--+
 Originally posted to #19544 due to similar error message, but I this is a
 different ticket.

 

 I have investigated this a bit more. For me the race occurs for the
 following reason:

 Suppose you have `MyModel` that has a unique constraint. Suppose you have
 a view of the form:

 {{{
 @transaction.atomic
 def my_view(request):
MyModel.objects.some_long_filter().delete()
objs = builds_list_of_unsaved_models_from_request(request)
MyModel.objects.bulk_create(objs)
 }}}

 That is, a view that builds a list of objects in bulk. The view first
 deletes the objects that were created by any previous requests to ensure
 the unique constraint is not violated. From my investigation, the
 `delete()` will not actually call an SQL `DELETE` statement directly -- as
 I would have expected -- but instead caches a list of ids of objects that
 exist, then calls a new SQL query with a `DELETE .. WHERE id IN (list of
 ids)`. This all happens in `DeleteQuery.delete_qs()` file
 `django/db/models/sql/subqueries.py:52`.

 Now suppose you have some original data. Then, two requests occur at the
 same time. The first request and the second request will both cache the
 same ids from the original data to delete. Suppose the first request
 finishes slightly earlier and commits the *new objects with new ids*. Now,
 the second request will try to delete the cached ids from the original
 data and *not* the newly created ids the first request. Upon save there is
 a unique constraint violation -- as the wrong ids were deleted -- causing
 an `IntegrityError`.

 Coming from an application with many raw SQL queries, this unique
 constraint violation was a big surprise inside a transaction. The caching
 of ids in the `delete()` function seems a sneaky source of race
 conditions.

 

 Traceback when the IntegrityError occurs:

 {{{
 Traceback (most recent call last):
   File "manage.py", line 13, in 
 execute_from_command_line(sys.argv)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/core/management/__init__.py", line 399, in
 execute_from_command_line
 utility.execute()
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/core/management/__init__.py", line 392, in execute
 self.fetch_command(subcommand).run_from_argv(self.argv)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/core/management/base.py", line 242, in run_from_argv
 self.execute(*args, **options.__dict__)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/core/management/base.py", line 285, in execute
 output = self.handle(*args, **options)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/core/management/base.py", line 415, in handle
 return self.handle_noargs(**options)
   File
 "/home/jon/devel/myproj/development/myapp/management/commands/thetest.py",
 line 36, in handle_noargs
 MyModel.objects.bulk_create(objs)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/db/models/manager.py", line 160, in bulk_create
 return self.get_queryset().bulk_create(*args, **kwargs)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/db/models/query.py", line 359, in bulk_create
 self._batched_insert(objs_without_pk, fields, batch_size)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/db/models/query.py", line 838, in _batched_insert
 using=self.db)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/db/models/manager.py", line 232, in _insert
 return insert_query(self.model, objs, fields, **kwargs)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/db/models/query.py", line 1514, in insert_query
 return query.get_compiler(using=using).execute_sql(return_id)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/db/models/sql/compiler.py", line 903, in execute_sql
 cursor.execute(sql, params)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/django/db/backends/util.py", line 69, in execute
 return super(CursorDebugWrapper, self).execute(sql, params)
   File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-
 packages/