Re: [Django] #5420: Allow database API users to specify the fields to exclude in a SELECT statement

Django Mon, 13 Oct 2008 23:06:19 -0700

#5420: Allow database API users to specify the fields to exclude in a SELECT
statement
---------------------------------------------------+------------------------
          Reporter:  adrian                        |         Owner:  jacob   
            Status:  assigned                      |     Milestone:  post-1.0
         Component:  Database layer (models, ORM)  |       Version:  SVN     
        Resolution:                                |      Keywords:  qs-rf   
             Stage:  Accepted                      |     Has_patch:  1       
        Needs_docs:  1                             |   Needs_tests:  0       
Needs_better_patch:  1                             |  
---------------------------------------------------+------------------------
Changes (by adunar):


 * cc: [EMAIL PROTECTED] (added)
  * needs_docs:  0 => 1

Comment:

 A few months ago I patched Apture's internal version of Django to support
 lazy loading/saving of certain fields within Django models. However, I did
 it a different way than already discussed here.

 In particular, our problem was that implicit Django-generated query sets
 would fetch big text fields that we didn't need. Example :
 {{{
 #!python
 class Student(models.Model):
     name = models.CharField(max_length=32)
     year = models.IntegerField()
     thesis = models.TextField()

 class FavoriteFood(models.Model):
     food = models.CharField(max_length=32)
     reason = models.CharField(max_length=128)
     student = models.ForeignKey(Student)

 favorites = FavoriteFood.objects.filter(food='lasagna')
 for favorite in favorites:
     print favorite.student.name,
     # Django just loaded the student's entire thesis :(
     print "likes lasagna because", favorite.reason

 favorites = FavoriteFood.objects.filter(food='chicken
 enchiladas').select_related()
 # Django just loaded a bunch of theses again :(
 }}}

 To solve this, we changed the client interface by adding a boolean 'lazy'
 parameter to the Field constructor, e.g.:
 {{{
 #!python
 thesis = models.TextField(lazy=True)
 }}}

 This was implemented by putting a descriptor on lazy fields that keeps
 track of whether the field has been loaded yet and whether it has been
 modified. By using a descriptor instead of !__setattr!__, it doesn't
 really have an impact on performance for models that don't use lazy
 fields.

 After we migrated our internal Django version to 1.0, I went back and
 cleaned up the lazy fields code and added support for changing the lazy
 fields on each query set. This turned out to be considerably harder to do
 in a way that doesn't degrade performance for clients who don't use lazy
 fields, which is probably why this ticket has been open for so long
 despite its obvious importance...

 The client interface adds one function to the manager and query set,
 toggle_fields(fetch=None, lazy=None, fetch_only=None), where each argument
 can be an array of field names (or None):

 {{{
 #!python

 # fetches name (assuming thesis was defined with lazy=True)
 students = Student.objects.all().toggle_fields(lazy=['year'])

 # fetches name,year,thesis
 students2 = Student.objects.all().toggle_fields(fetch=['thesis'])

 # fetches name, year
 students3 =
 Student.objects.all().toggle_fields(fetch_only=['name','year'])

 thesis = students[0].thesis  # lazy-loads thesis
 students[0].save()           # saves name, year
 students[0].thesis = "Django is awesome"
 students[0].save()           # saves name, year, thesis
 }}}

 Does anyone have ideas for better names for
 toggle_fields/lazy/fetch/fetch_only? I think that hide and show/expose
 don't really fit here because the client can still get and set and save
 the field values whether it is lazy or not. Also, I thought that having
 one method with different parameters would follow Django's style better
 than adding 3 new methods. Another question is whether to allow lazy-
 loading and lazy-saving to be independent; e.g., to have a field that's
 always loaded but only saved when changed. It probably wouldn't be too
 hard to support this.

 Internally, when a toggle_fields query is executed, the Django ORM
 dynamically creates a subclass of the model type that has a
 !LazyDescriptor for each of the fields that are lazy for that query (but
 not the fields that were created with lazy=True). Unlike a typical model
 subclass, this one is very lightweight. It skips most of the code in
 !ModelBase.!__new!__, and shares the same _meta object.

 One drawback of dynamically creating subclasses is that they are harder to
 serialize (e.g. with the pickle module), but that's probably possible to
 support if desired.

 Anyway, there shouldn't be much of a performance impact for people not
 using toggle_fields or lazy=True.  In most cases the code checks to see if
 there are any lazy fields before doing anything different from before.
 There's just a bit of overhead from some extra conditional tests and
 function calls. I haven't actually run performance tests on that though.

 As part of my patch, I cleaned up some code in db/models/sql/query.py. In
 particular, the code to get a column's SQL alias was duplicated in 6
 places. Also, depending on whether the get_default_columns function was
 called for the base model or a related model, it took different
 parameters, did different things, and returned different values. So I
 split it into two separate functions. This refactoring makes the Query
 class easier to understand and easier to subclass without duplicating
 code. That seems like something Django should incorporate even if people
 don't like the other code in this patch.

 One change I'm less sure about is in Query.setup_joins. Basically, when
 computing the join for a !ForeignKey field like student_id, the old code
 would add a join to the student table even though it doesn't need to
 (because student_id is stored directly in the original row). Then,
 add_fields checked for this case and removed the join. I changed it so
 that setup_joins doesn't add the unnecessary join in the first place.
 Maybe there's some reason for doing this that I'm not seeing.

 Anyway, we're already using this because it makes database access way
 faster for models that have big text fields that we don't read or write
 very often. My patch has some regression tests, but I haven't updated the
 documentation yet. If this is something Django would want to incorporate,
 I'd be happy to do some more work to improve the patch. Thoughts?

-- 
Ticket URL: <http://code.djangoproject.com/ticket/5420#comment:21>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: [Django] #5420: Allow database API users to specify the fields to exclude in a SELECT statement

Reply via email to