[Django] #32840: Micro-optimisation possibility in Field.get_col

Django Fri, 11 Jun 2021 04:44:39 -0700

#32840: Micro-optimisation possibility in Field.get_col
-------------------------------------+-------------------------------------
               Reporter:  Keryn      |          Owner:  nobody
  Knight                             |
                   Type:             |         Status:  new
  Cleanup/optimization               |
              Component:  Database   |        Version:  dev
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Current implementation is:
 {{{
 def get_col(self, alias, output_field=None):
     if output_field is None:
         output_field = self
     if alias != self.model._meta.db_table or output_field != self:
         from django.db.models.expressions import Col
         return Col(alias, self, output_field)
     else:
         return self.cached_col
 }}}
 If no ''different'' output field is provided, is doing the following
 comparison needlessly as far as I can tell: `output_field != self` for
 which
 the default implementation of `Field.__eq__` is:
 {{{
         if isinstance(other, Field):
             return (
                 self.creation_counter == other.creation_counter and
                 getattr(self, 'model', None) == getattr(other, 'model',
 None)
             )
         return NotImplemented
 }}}
 in that scenario, because `self` and `output_field` are literally the same
 object (down to the `id(...)`) the `isinstance` resolves to True, the
 creation counters also are the same and the models are as you'd expect ...
 the same. There's no short-circuiting via falsy condition available.


 I think that the method body can be changed to:

 {{{
 has_diff_output_field = True
 if output_field is None:
     output_field = self
     has_diff_output_field = False
 if alias != self.model._meta.db_table or (has_diff_output_field and
 output_field != self):
     from django.db.models.expressions import Col
     return Col(alias, self, output_field)
 else:
     return self.cached_col
 }}}

 The introduction of `has_diff_output_field` being the important part. If
 it's `False` then comparison short-circuiting will prevent the execution
 of `output_field != self` at all.
 I'm purposefully avoiding making further investigation/judgement about
 whether `output_field != self` is itself necessary, because it's
 ostensibly possible for a custom `output_field` to be provided which has
 the same `creation_counter` + `model` and I don't know how ''likely'' that
 is.

 Across the entire test suite (ignoring those which have skipped), executed
 with the proposed change didn't seem to break anything (yay) and
 augmenting the method additionally with:
 {{{
         if has_diff_output_field:
             print('different')
         else:
             print('same')
 }}}
 and counting the results across some 14K tests, there were `87021
 different`  and `178493 same`.

 Quick example of how to get to the method:
 {{{
 >>> tuple(get_user_model().objects.all())
 (Pdb) w
   /path/django/db/models/query.py(280)__iter__()
 -> self._fetch_all()
   /path/django/db/models/query.py(1343)_fetch_all()
 -> self._result_cache = list(self._iterable_class(self))
   /path/django/db/models/query.py(51)__iter__()
 -> results = compiler.execute_sql(chunked_fetch=self.chunked_fetch,
 chunk_size=self.chunk_size)
   /path/django/db/models/sql/compiler.py(1175)execute_sql()
 -> sql, params = self.as_sql()
   /path/django/db/models/sql/compiler.py(523)as_sql()
 -> extra_select, order_by, group_by = self.pre_sql_setup()
   /path/django/db/models/sql/compiler.py(55)pre_sql_setup()
 -> self.setup_query()
   /path/django/db/models/sql/compiler.py(46)setup_query()
 -> self.select, self.klass_info, self.annotation_col_map =
 self.get_select()
   /path/django/db/models/sql/compiler.py(228)get_select()
 -> cols = self.get_default_columns()
   /path/django/db/models/sql/compiler.py(715)get_default_columns()
 -> column = field.get_col(alias)
 > /path/django/db/models/fields/__init__.py(396)get_col()
 }}}

 Overall it's:
 - 1 comparison if they're the same (it was 1 comparison before too, but
 that was itself 3 comparisons)
 - 1 additional comparison if they're not the same.

 The weighting/ratio of the test suite + the fact that the ''simplest'' ORM
 usage suggests (to me) it might have merit.

 Addendum: when I say micro, I mean micro. It's not a big time saver, I
 just happened to notice upon far more calls to `__eq__` than I expected.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/32840>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.4ff52e8b8b4a18aa6fdb392a592e6de3%40djangoproject.com.

[Django] #32840: Micro-optimisation possibility in Field.get_col

Reply via email to