Hi Carl,

Thanks for the feedback, sorry for the length of my reply, I tried to be
detailed to avoid you having to read the code.

On 03/10/11 20:44, Carl Meyer wrote:

> My only real concern is one I'm a bit surprised hasn't been raised
> already: API and naming sanity. If I'm a new user coming to this API,
> as far as I'm concerned "select_related" and "prefetch_related" may as
> well be synonyms - they don't tell me anything useful about how the
> functions differ. And even knowing the internal details, any
> justification for the naming seems tortured at best: both methods use
> selects, and both prefetch something that would otherwise have to be
> fetched separately later.

To me, 'prefetch' does suggest a separate phase of fetching, whereas
'select' does suggest we are including some other things within a SQL
select. I do think they need to be separate methods, for reasons given
below, and if anyone has a better name for 'prefetch_related' I'm all ears.

> Given that this and select_related operate on entirely disjoint sets
> of fields, is there a good technical reason not to simply merge this
> functionality into select_related? Sure, the underlying implementation
> is quite different, but philosophically the ORM doesn't generally aim
> to be a transparent reflection of what's happening at the SQL layer,
> it aims to present a coherent object API. At that level I think
> select_related and prefetch_related are pretty much indistinguishable
> from each other. A note in the docs that using select_related on an
> m2m will result in an extra query (but many fewer than if you didn't
> use it and iterated over the m2m related objects for each row!) seems
> to me a quite adequate nod to the implementation difference.

Given that both these methods exist solely for performance reasons, I
don't think it is necessarily a good thing to attempt to hide what kind
of thing is actually going on. The developer who chooses to add either
of these needs to know the typical performance characteristics that are
invoked by choosing one or the other, so I don't think it actually helps
to hide this. Yes, at the higher level there isn't much distinction, but
if you are using either method you have already abandoned the higher
level, because you are worrying about exactly what you are asking the
database to do, and we should make things easy for such people.

> As the builder of the bikeshed, you get to paint it, so if you've
> already considered all this and haven't come up with any better API
> naming options, I'll withdraw my concern. But it seems to me it's at
> least worth some public discussion before locking ourselves into an
> API.

There are some quite big semantic and technical differences between the
two that really make merging them a bad idea IMO.

First, prefetch_related causes a QuerySet to effectively lose the
benefits of chunking that normally happen, because in order to get the
next level we have to get all the PKs (or whatever) from the first
level. This means we must fully populate the cache as soon as the
QuerySet is evaluated, so that we can do prefetching before any of the
instances are used. This is a documented side effect.

But we really don't want to do that if select_related is called, since
there is no need. Documenting the side effect would be harder if we
merged the two methods, and isolating the side effect could be possibly
much harder.

We cannot work out from the fields specified whether the user means
'prefetch_related' or 'select_related', because prefetch_related works
after the query is evaluated, and *has* to if we want it to be as
powerful as it is, and select_related works before the query is
evaluated, and *has* to.

For example, prefetch_related supports stepping over *any* singly
related object, including those from properties - see:

https://code.djangoproject.com/attachment/ticket/16937/prefetch_7.diff

tests/modeltests/prefetch_related/models.py L141
tests/modeltests/prefetch_related/tests.py  L392

The support for GenericForeignKey also works this way.

I've also written prefetch_related to hopefully be usable by 3rd party
'many-to-many-like' relationships, as might be provided by something
like TaggableManager in django-taggit. The support for GenericRelation
works this way - there is no specific support for anything in
contrib.content_type in the prefetch code - instead the GenericRelation
manager code just implements the same API that the core related manager
code implements.

So prefetch_related works entirely by traversing attributes, and seeing
if it can find things that quack like a ManyRelatedManager, and I think
this is a strength of the implementation.

But we don't know what attributes are available, or what type of thing
they return, before we construct the objects. This makes is pretty hard
to decide whether something is a prefetch_related thing or
select_related thing. I guess we could assume that everything that we
can't interpret as a 'select_related' is a 'prefetch_related'. That's
not currently particularly easy, as the lower-level code supporting
select_related at the moment silently swallows bad fields.

But even if this were fixed, I think it would make the whole API more
difficult to predict - you would have to say 'if your lookups include
any of these elements, you get this behaviour and these side-effects,
otherwise this other set of behaviour'. And predicting behaviour is
crucial here, because controlling the number and nature of the queries
is actually the whole reason these methods exist.

Finally, there is also the fact that 'select_related' currently has API
issues of its own that would make merging very hard. For instance, what
would 'depth=N' mean? Does it now include m2m? That could be a *massive*
performance regression in *many* cases. There are issues about
chaining/clearing select_related() which would only be confused by
adding prefetch_related to the mix. In fact, in some of my own code that
I've adapted to use prefetch_related, I've already come across examples
where I want to clear prefetch_related behaviour, and leave
select_related alone. This makes me think that merging the APIs, at
least from the current starting point, would be a bad move.

Regards,

Luke

-- 
"I was sad because I had no shoes, until I met a man who had no
feet. So I said, "Got any shoes you're not using?"  (Steven Wright)

Luke Plant || http://lukeplant.me.uk/

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to