On 25 maalis, 13:23, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 25 March 2013 10:58, Tim Chase <django.us...@tim.thechases.com> wrote:
>
> > On 2013-03-25 03:40, Anssi Kääriäinen wrote:
> >> I am very likely going to change the ORM to use EXISTS subqueries
> >> instead of IN subqueries. I know this is a good idea on PostgreSQL
> >> but I don't have enough experience of other databases to know if
> >> this is a good idea or not.
>
> > I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at
> > $OLD_JOB, but there it's usually about the same, occasionally with IN
> > winning out. However, the wins were marginal, and MSSQL is a 2nd-class
> > citizen in the Django world, so I'm +1 on using EXISTS instead of IN,
> > if the results are assured to be the same.
>
> The results are definitely different because NOT IN has some quite
> strange characteristics: if the subquery returns a NULL then the whole
> result is "unknown". It is that weirdness that makes it hard to
> optimize for, or at least, not-yet-optimized for in PostgreSQL.
>
> In most cases it is the NOT EXISTS behaviour that people find natural
> and normal anyway and that is the best mechanism to use.

When doing an .exclude() that requires subquery Django automatically
generates the queries so that the inner query's select clause can't
contain nulls. For example:
>>> print D.objects.exclude(e__id__gte=0).query
SELECT `table_d`.`id`, `table_d`.`a`, `table_d`.`b` FROM `table_d`
WHERE NOT (`table_d`.`id` IN (SELECT U1.`d_id` FROM `table_e` U1 WHERE
(U1.`id` >= 0  AND U1.`d_id` IS NOT NULL)))

However it is possible to generate NOT IN query where the SQL
semantics are in effect when using __in lookup:
>>> print 
>>> D.objects.exclude(id__in=E.objects.filter(id__gte=0).values_list('d_id')).query
SELECT `table_d`.`id`, `table_d`.`a`, `table_d`.`b` FROM `table_d`
WHERE NOT (`table_d`.`id` IN (SELECT U0.`d_id` FROM `table_e` U0 WHERE
U0.`id` >= 0 ))

The results of the latter case could change (assuming d_id can contain
null values).

I think that this could be considered a bug fix. Django's ORM doesn't
try to mimic SQL semantics, it tries to have Python semantics for the
query. So an exclude(__in) lookup should behave like Python's "value
not in list", not like SQL's NOT IN.

On the other hand having __in lookups that do EXISTS in SQL might be a
bit surprising. The way __in works is documented as generating SQL IN
lookup: https://docs.djangoproject.com/en/dev/ref/models/querysets/#in.

I feel pretty strongly that NOT EXISTS semantics are wanted. The NOT
IN semantics are likely there just because that is how the
implementation was originally done, not because there was any decision
to choose those semantics. Also, multicolumn NOT IN lookups aren't
supported on all databases (SQLite at least), so for that case NOT
EXISTS semantics is going to happen anyways.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to