Re: Django ORM performance patch. Fixes #5420, #5768

Malcolm Tredinnick Fri, 15 Feb 2008 02:30:54 -0800


On Fri, 2008-02-15 at 01:54 -0800, Dima Dogadaylo wrote:
> On 14 Feb, 17:26, Sebastian Noack <[EMAIL PROTECTED]>
> wrote:
> >
> > Of course, this decreases database load, but IMHO it is better to use
> > QuerySet.values if you want select just certain values from a model.
> 
> Often I need methods defined in models and isn't available at
> dictionaries.


> For example get_avatar_url().

This is all very reasonable, but one of the problems with your patch is
that you've approached it from the reverse end of what was suggested in
#5420 and #5420 is written the way it is for a reason. When Adrian
proposed that API, he realised that almost always you're going to be
pulling back all of the fields or almost all of them. Once a database
has read a row to access some of the data, accessing all of the data in
the row is close to zero overhead. It's already had to read the block
off disk, for example, which is many kilobytes in size. Only for either
extremely large fields or for fields that need a large amount of
post-processing to be useful (such as polygon fields) is it really a win
to micro-manage what is pulled that.

For most non-trivial SQL queries with modern databases, constructing
(extracting) the output columns, unless they are complex computations,
is a very small fraction of the timeslice devote to the query execution.

That's why the API is to specify what to exclude, rather than what to
include. I tend to agree with the proposal, too. From a performance
management perspective, it's going to be easier to use that way.

There's also the fairly pragmatic side-effect that if you are only
pulling back a very few fields, there isn't really a lot your model
methods are going to be able to do without loading more data, unless
they're very simple. You might as well just write those methods as
functions which take a dictionary and use values() for the cases when
only two or three items are needed. If you have a case that doesn't fit
this, the standard data modelling technique is you move your very
heavyweight fields into another model/table, so the lightweight stuff
doesn't get dragged down by also pulling back the heavyweight stuff.
That's always going to be more effective than trying to only select a
small fraction of the available columns in a row.

> > That QuerySet.values can not select values from related fields, annoys
> > myself, too. But your implementation is  a bit restricted. Here
> > the example from your blog:
> >
> > entries = Entry.objects.values('headline', user=('username',))[:10]
> >
> > A syntax as following would be more straight forward:
> >
> > entries = Entry.objects.values('headline', 'user__username')[:10]
> 
> I used standard Django lookup syntax, but because in trunk Djnago
> don't support selecting of related fields, I thought it's better to
> respect DRY principe and write:
> entries = Entry.objects.values('headline', user__address=('country',
> 'city', 'zip'))

This isn't "standard Django lookup syntax", though. It's some entirely
new lookup syntax you created. There's nowhere in the current lookup
hierarchy that uses this method for referring to fields in another
model. Reusing the existing syntax makes more sense. It's less for
people to learn and it's less restrictive: with the double-underscore
syntax, I can list the fields in any order, since they're all positional
arguments. Your syntax has normal fields as positional arguments (so
they must come first) and related fields as keyword arguments that take
a sequence.

> Also please note, currently lookups with 2 underscores are used only
> in **kwargs, because it don't have many sense as quoted string (even
> current QuerySet.order_by() uses '.' instead of '__' as separator).

Well, that information's a little old (if you're going to be writing
code in the query construction are of Django, you also need to keep up
with the branch work that's rewriting large portions of that area; that
was already mentioned in the related-values ticket before you started
working on it). Let's not use the current cross-model ordering syntax as
an example. It's a bit of a wart in the design (Database table name +
model field attribute name... it's mixing apples and oranges). In
queryset-refactor, we've changed that (order-by) use double-underscores
and only field names, just as in filters. One of the reasons for this
was consistency.

I think, as Sebastian suggested (and I mentioned in the comment to the
ticket on related values lookup a few months bcak), we should also use
double underscores for selecting related values. That makes everything
nice and consistent across the board.

Regards,
Malcolm

-- 
Many are called, few volunteer. 
http://www.pointy-stick.com/blog/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Django ORM performance patch. Fixes #5420, #5768

Reply via email to