#33191: Avoid unnecessary clear of cached reference
-------------------------------------+-------------------------------------
     Reporter:  Barry Johnson        |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  3.2
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Description changed by Barry Johnson:

Old description:

> Consider this case of ORM models "Parent" and "Child", where Child has a
> foreign key reference to Parent (and the database can return generated
> IDs following insert operations):
>
> {{{
> parent = Parent(name='parent_object')
> child = child(parent=parent)
> parent.save()
> child.save()
> print(child.parent.name)
> }}}
>
> The print statement will cause an unnecessary lazy read of the parent
> object.
>
> In the application where this behavior was first observed, the
> application was creating thousands of parent and child objects using
> bulk_create().  The subsequent lazy reads occurred when creating log
> entries to record the action, and added thousands of unwanted SELECT
> queries.
>
> Closed ticket 29497 solved a problem with potential data loss in this
> situation by essentially executing {{{child.parent_id =
> child.parent.pk}}} while preparing the child object to be saved.
> However, when the child's ForeignKeyDeferredAttrbute "parent_id" changes
> value from None to the parent's ID, the child's internal cache containing
> the reference to "parent" is cleared.  The subsequent reference to
> child.parent then must do a lazy read and reload parent from the
> database.
>
> A workaround to avoid this lazy read is to explicitly update both the
> "parent_id" and "parent" cache entry by adding this non-intuitive
> statement:
>     {{{child.parent = child.parent}}}
> after executing parent.save()
>
> But it appears that a simple change could avoid clearing the cache in
> this narrow case.
> Within Model._prepare_related_fields_for_save(), replace
>     {{{setattr(self, field.attname, obj.pk)}}}
> with
>     {{{self.__dict__[field.attname] = obj.pk}}}
>
> This suggested code has -not- been tested.
>
> This change would set the associated "parent_id" attribute to the desired
> value without affecting the cache.  In this spot of the code, "obj" is
> currently set to the cached parent object that we want to preserve, and
> we're just reconciling the associated copy of the parent's primary key.

New description:

 Consider this case of ORM models "Parent" and "Child", where Child has a
 foreign key reference to Parent (and the database can return generated IDs
 following insert operations):

 {{{
 parent = Parent(name='parent_object')
 child = Child(parent=parent)
 parent.save()
 child.save()
 print(child.parent.name)
 }}}

 The print statement will cause an unnecessary lazy read of the parent
 object.

 In the application where this behavior was first observed, the application
 was creating thousands of parent and child objects using bulk_create().
 The subsequent lazy reads occurred when creating log entries to record the
 action, and added thousands of unwanted SELECT queries.

 Closed ticket 29497 solved a problem with potential data loss in this
 situation by essentially executing {{{child.parent_id = child.parent.pk}}}
 while preparing the child object to be saved.  However, when the child's
 ForeignKeyDeferredAttrbute "parent_id" changes value from None to the
 parent's ID, the child's internal cache containing the reference to
 "parent" is cleared.  The subsequent reference to child.parent then must
 do a lazy read and reload parent from the database.

 A workaround to avoid this lazy read is to explicitly update both the
 "parent_id" and "parent" cache entry by adding this non-intuitive
 statement:
     {{{child.parent = child.parent}}}
 after executing parent.save()

 But it appears that a simple change could avoid clearing the cache in this
 narrow case.
 Within Model._prepare_related_fields_for_save(), replace
     {{{setattr(self, field.attname, obj.pk)}}}
 with
     {{{self.__dict__[field.attname] = obj.pk}}}

 This suggested code has -not- been tested.

 This change would set the associated "parent_id" attribute to the desired
 value without affecting the cache.  In this spot of the code, "obj" is
 currently set to the cached parent object that we want to preserve, and
 we're just reconciling the associated copy of the parent's primary key.

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33191#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/068.1107a4de4c24723150dde062834f2eda%40djangoproject.com.

Reply via email to