#28010: select_for_update() is too eager in presence of joins
-------------------------------------+-------------------------------------
               Reporter:  Ran        |          Owner:  nobody
  Benita                             |
                   Type:  New        |         Status:  new
  feature                            |
              Component:  Database   |        Version:  1.10
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Using `select_for_update()` on a queryset appends ` FOR UPDATE` to the
 query. If the query joins other tables, the effect is that all tables are
 locked. I believe this is not what most people using `select_for_update()`
 want, instead they only want to lock the row from the main table.

 == Example

 Consider a scenario like this:

 {{{#!python
 class Product(models.Model):
     name = models.CharField(max_length=100)
     price = models.PositiveIntegerField()


 class Order(models.Model):
     product = models.ForeignKey(Product, on_delete=models.PROTECT)
     status = models.CharField(max_length=100, choices=(
         ('NEW', 'New'),
         ('APPROVED', 'Approved'),
         ('CANCELED', 'Canceled'),
     )

     ...

     @classmethod
     def approve(cls, id):
         with transaction.atomic():
             order =
 Order.objects.select_related('product').select_for_update().get(pk=id)
             # ...Check order.product.price & stuff...
             order.status = 'APPROVED'
             order.save()

     @classmethod
     def cancel(cls, id):
         with transaction.atomic():
             # Similar...
 }}}

 Here, locking is needed in order to serialize `approve()` and `cancel()`
 on the same Order, to avoid race conditions.

 I have also used `select_related` in `approve()`, in order to avoid an
 extra query. Potentially, there are many related objects, but to keep the
 example simple I added just one.

 The interaction between `select_related` and `select_for_update` has the
 unintended consequence (IMO) of also locking the related `Product` row.
 Hence, if there is some concurrent task which updates product prices for
 example, it can cause conflicts, slowdowns, deadlocks, and other bad
 things.

 The fix is to remove the `select_related`. But then extra queries are
 needed for each related object access without a reasonable workaround (I
 think?).

 == Possible solutions

 PostgreSQL (which I am using) has an option to specify exactly which
 tables are to be locked, instead of locking all tables in the query:
 https://www.postgresql.org/docs/9.6/static/sql-select.html#SQL-FOR-UPDATE-
 SHARE. The syntax is `SELECT ... FOR UPDATE OF orders_order`.

 Oracle also supports this with similar syntax, however it allows
 specifying which *columns* to lock:
 http://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj31783.html. I guess
 there it is possible to lock specific columns of specific rows, while
 postgres only locks full rows.

 I am not familiar with MySQL but it doesn't seem like it supports refining
 the `FOR UPDATE`.

 Here are some possible solutions I can think of:

 1. Add support for `select_for_update` to specify which models to lock,
 e.g. `select_for_update('self', 'product')`. This will use the `OF`
 syntax.

 2. Only support the `select_for_update('self')` behavior, and make it
 implicit, i.e. `select_for_update` only ever locks the main model of the
 QuerySet.

 3. Add a way to do `select_related` on an already-existing object,
 something like a standalone `select_related_objects` after the existing
 `prefetch_related_objects` function:
 https://docs.djangoproject.com/en/1.10/ref/models/querysets/#prefetch-
 related-objects, and keep `select_for_update` as-is. Then it should at
 least be possible to do one extra query instead of one per related object.

 Thanks for reading this far :)

--
Ticket URL: <https://code.djangoproject.com/ticket/28010>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/051.62779c5a3f59a901e217afc382507673%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to