Re: Using EXISTS instead of IN for subqueries

2017-09-06 Thread Todor Velichkov
How does this query look like? Not sure if this is gonna help, but you can 
take a look at Exists() subqueries 


On Wednesday, September 6, 2017 at 4:29:21 PM UTC+3, Stephan Seyboth wrote:
>
> Sorry for resurrecting this ancient thread, but it was the last direct 
> mention I could find about this topic.
>
> Curious to hear what happened to Anssi's proposal to change the ORM to use 
> EXISTS subqueries instead of IN subqueries. Looks like the ORM still uses 
> IN as of Django 1.11.4.
>
> One of our queries just started running over 1000x slower from one day to 
> the next b/c of the issues Postgres has with IN.
>
> Are there any suggested workarounds that don't require resorting to raw 
> SQL?
>
> On Tuesday, March 26, 2013 at 5:40:35 PM UTC+1, Anssi Kääriäinen wrote:
>>
>> On Tuesday, March 26, 2013 5:19:52 PM UTC+2, Michael Manfre wrote:
>>>
>>>
>>>
>>> On Tue, Mar 26, 2013 at 10:23 AM, Tim Chase >> > wrote:
>>>
  EXISTS also has some nice features
 like the ability to do testing against multiple columns, i.e., you
 can't do something like

   select *
   from tbl1
   where (col1, col2) in (
select a, b
from tbl2
)

 but that's a simple EXISTS query.

>>>
>>> Agreed, EXISTS is more flexible and at least to me, often easier to 
>>> read, format, and maintain. Assuming this is implemented with a database 
>>> feature, I'll most likely enable the EXISTS change for django-mssql. My 
>>> main objections to a blanket change is to not have the specific behaviors 
>>> of one database dictate how every other database must behave. Some one 
>>> needs to be the voice of 3rd party database backends and it appears for the 
>>> time being, I'm wearing that hat.
>>>
>>
>> One of the main reasons for the change is that EXISTS allows for queries 
>> that are impossible with IN. In addition EXISTS semantics regarding NULLs 
>> is wanted. And, at least PostgreSQL performs better with NOT EXISTS than 
>> NOT IN. Granted, the better performance on PostgreSQL is perhaps the most 
>> important reason for me, but this change is not only about that.
>>
>> The main reason for this thread was to find out if there are some 
>> databases where performance of EXISTS is worse than IN. The DB feature 
>> approach seems good because it allows investigating performance 
>> characteristics one DB at time.
>>
>>  
>>>
 > > MSSQL is a 2nd-class citizen in the Django world, so I'm +1
 >
 > Reasoning like that helps to keep it in its place.

 MSSQL's lack of certain core features is what does that.
 OFFSET/LIMIT provided by pretty much every other DB vendor?  Use
 baroque hacks to get the same functionality.  
>>>
>>>
>>> The non "baroque hack" way of doing limit offset is with nested selects. 
>>> Maybe someday the non-standard LIMIT/OFFSET keywords will get added to the 
>>> standard (I truly hope this happens) so Oracle, MSSQL, DB2, and Informix 
>>> could share SQL with postgres and mysql without needing to mangle it.
>>>
>>
>> Hmmh, this means Oracle, MSSQL, DB2 and Informix are doing more or less 
>> the same thing for limit/offset support? If so, then having a more generic 
>> approach to this problem than having a custom compiler per backend might be 
>> worth it...
>>
>> BTW there is already something like LIMIT and OFFSET in SQL 2008 
>> standard. The syntax is different than LIMIT/OFFSET, and supported only by 
>> some vendors... See 
>> https://en.wikipedia.org/wiki/Select_%28SQL%29#FETCH_FIRST_clause
>>
>>  - Anssi
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/43e19ef1-5a07-4608-94f4-82ba10ad9a85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using EXISTS instead of IN for subqueries

2017-09-06 Thread Stephan Seyboth
Sorry for resurrecting this ancient thread, but it was the last direct 
mention I could find about this topic.

Curious to hear what happened to Anssi's proposal to change the ORM to use 
EXISTS subqueries instead of IN subqueries. Looks like the ORM still uses 
IN as of Django 1.11.4.

One of our queries just started running over 1000x slower from one day to 
the next b/c of the issues Postgres has with IN.

Are there any suggested workarounds that don't require resorting to raw SQL?

On Tuesday, March 26, 2013 at 5:40:35 PM UTC+1, Anssi Kääriäinen wrote:
>
> On Tuesday, March 26, 2013 5:19:52 PM UTC+2, Michael Manfre wrote:
>>
>>
>>
>> On Tue, Mar 26, 2013 at 10:23 AM, Tim Chase  
>> wrote:
>>
>>>  EXISTS also has some nice features
>>> like the ability to do testing against multiple columns, i.e., you
>>> can't do something like
>>>
>>>   select *
>>>   from tbl1
>>>   where (col1, col2) in (
>>>select a, b
>>>from tbl2
>>>)
>>>
>>> but that's a simple EXISTS query.
>>>
>>
>> Agreed, EXISTS is more flexible and at least to me, often easier to read, 
>> format, and maintain. Assuming this is implemented with a database feature, 
>> I'll most likely enable the EXISTS change for django-mssql. My main 
>> objections to a blanket change is to not have the specific behaviors of one 
>> database dictate how every other database must behave. Some one needs to be 
>> the voice of 3rd party database backends and it appears for the time being, 
>> I'm wearing that hat.
>>
>
> One of the main reasons for the change is that EXISTS allows for queries 
> that are impossible with IN. In addition EXISTS semantics regarding NULLs 
> is wanted. And, at least PostgreSQL performs better with NOT EXISTS than 
> NOT IN. Granted, the better performance on PostgreSQL is perhaps the most 
> important reason for me, but this change is not only about that.
>
> The main reason for this thread was to find out if there are some 
> databases where performance of EXISTS is worse than IN. The DB feature 
> approach seems good because it allows investigating performance 
> characteristics one DB at time.
>
>  
>>
>>> > > MSSQL is a 2nd-class citizen in the Django world, so I'm +1
>>> >
>>> > Reasoning like that helps to keep it in its place.
>>>
>>> MSSQL's lack of certain core features is what does that.
>>> OFFSET/LIMIT provided by pretty much every other DB vendor?  Use
>>> baroque hacks to get the same functionality.  
>>
>>
>> The non "baroque hack" way of doing limit offset is with nested selects. 
>> Maybe someday the non-standard LIMIT/OFFSET keywords will get added to the 
>> standard (I truly hope this happens) so Oracle, MSSQL, DB2, and Informix 
>> could share SQL with postgres and mysql without needing to mangle it.
>>
>
> Hmmh, this means Oracle, MSSQL, DB2 and Informix are doing more or less 
> the same thing for limit/offset support? If so, then having a more generic 
> approach to this problem than having a custom compiler per backend might be 
> worth it...
>
> BTW there is already something like LIMIT and OFFSET in SQL 2008 standard. 
> The syntax is different than LIMIT/OFFSET, and supported only by some 
> vendors... See 
> https://en.wikipedia.org/wiki/Select_%28SQL%29#FETCH_FIRST_clause
>
>  - Anssi
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/956f0034-cfa9-485e-ac69-e4b25134b970%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Michael Manfre
On Tue, Mar 26, 2013 at 5:47 PM, Petite Abeille wrote:

>
> On Mar 26, 2013, at 10:03 PM, Alex Gaynor  wrote:
>
> > For what it's worth, SQL2011 does define OFFSET, finally.
>

Just checked and it appears this has been implemented with SQL Server 2012.
A quick check of the other database with Django backends seems to show that
all of them have a beta or stable release with support for a simple
limit/offset syntax, with many supporting the standard defined syntax. This
might make the generic approach as simple as adding an
DatabaseOperatoins.limit_offset_sql() abstraction.

Regards,
Michael Manfre

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Petite Abeille

On Mar 26, 2013, at 10:03 PM, Alex Gaynor  wrote:

> For what it's worth, SQL2011 does define OFFSET, finally.

Perhaps worthwhile mentioning as well :

"Do not try to implement a scrolling window using LIMIT and OFFSET. Doing so 
will become sluggish as the user scrolls down toward the bottom of the list."
-- Scrolling Cursor, What Not To Do
http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor

Just saying...

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Alex Gaynor
For what it's worth, SQL2011 does define OFFSET, finally.

Alex


On Tue, Mar 26, 2013 at 5:00 PM, Petite Abeille wrote:

>
> On Mar 26, 2013, at 4:19 PM, Michael Manfre  wrote:
>
> > Maybe someday the non-standard LIMIT/OFFSET keywords will get added to
> the
> > standard (I truly hope this happens) so Oracle, MSSQL, DB2, and Informix
> > could share SQL with postgres and mysql without needing to mangle it.
>
> FWIW, Oracle 12c is rumored to support OFFSET/FETCH FIRST:
>
> • Top-N now using Row limiting clause eg. “OFFSET 10 ROWS FETCH FIRST 10
> ROWS ONLY”. Similar to mySQL syntax.
>
> http://www.oracle-base.com/blog/2012/10/06/oracle-openworld-2012-day-5/
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>


-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Tim Chase
On 2013-03-26 15:54, Michael Manfre wrote:
> On Tue, Mar 26, 2013 at 12:40 PM, Anssi Kääriäinen
> deal with limit/offset. A generic approach would be nice to have,
> but I can't imagine a generic way that would let me generate the
> "SELECT ... FROM (SELECT ROW_NUMBER() OVER (...)) WHERE ..."
> monstrosity with lots of column aliasing that I currently
> construct.

The closest I've come is an ugly nested query using TOP (the TSQL
analog to LIMIT, but as mentioned earlier, there's no OFFSET
counterpart), and inverting the sort conditions:

  -- want sorted by "a asc, b desc, c asc"
  -- assuming LIMIT=10, OFFSET=20
  select *
  from (
   select top 10 -- LIMIT
   *
   from (
select top 30 -- LIMIT+OFFSET=10+20 
*
from tbl
order by a asc, b desc, c asc
) top_half
   order by a desc, b asc, c desc -- note inversion
   ) reversed_top_half
  order by a asc, b desc, c asc

It's been a while since I've done it, so it Works™, but (1) there's
the inevitable fence-posting error I'd have to verify, (2) it involves
sorting, reverse-sorting, then re-sorting (not exactly the speediest
operation), and (3) it's hideous.  It doesn't seem to require the
column-aliasing you mention, and it is a fairly generic approach, but
I can't say I recommend it :-)

-tkc






-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Anssi Kääriäinen
On Tuesday, March 26, 2013 5:19:52 PM UTC+2, Michael Manfre wrote:
>
>
>
> On Tue, Mar 26, 2013 at 10:23 AM, Tim Chase 
>  > wrote:
>
>>  EXISTS also has some nice features
>> like the ability to do testing against multiple columns, i.e., you
>> can't do something like
>>
>>   select *
>>   from tbl1
>>   where (col1, col2) in (
>>select a, b
>>from tbl2
>>)
>>
>> but that's a simple EXISTS query.
>>
>
> Agreed, EXISTS is more flexible and at least to me, often easier to read, 
> format, and maintain. Assuming this is implemented with a database feature, 
> I'll most likely enable the EXISTS change for django-mssql. My main 
> objections to a blanket change is to not have the specific behaviors of one 
> database dictate how every other database must behave. Some one needs to be 
> the voice of 3rd party database backends and it appears for the time being, 
> I'm wearing that hat.
>

One of the main reasons for the change is that EXISTS allows for queries 
that are impossible with IN. In addition EXISTS semantics regarding NULLs 
is wanted. And, at least PostgreSQL performs better with NOT EXISTS than 
NOT IN. Granted, the better performance on PostgreSQL is perhaps the most 
important reason for me, but this change is not only about that.

The main reason for this thread was to find out if there are some databases 
where performance of EXISTS is worse than IN. The DB feature approach seems 
good because it allows investigating performance characteristics one DB at 
time.

 
>
>> > > MSSQL is a 2nd-class citizen in the Django world, so I'm +1
>> >
>> > Reasoning like that helps to keep it in its place.
>>
>> MSSQL's lack of certain core features is what does that.
>> OFFSET/LIMIT provided by pretty much every other DB vendor?  Use
>> baroque hacks to get the same functionality.  
>
>
> The non "baroque hack" way of doing limit offset is with nested selects. 
> Maybe someday the non-standard LIMIT/OFFSET keywords will get added to the 
> standard (I truly hope this happens) so Oracle, MSSQL, DB2, and Informix 
> could share SQL with postgres and mysql without needing to mangle it.
>

Hmmh, this means Oracle, MSSQL, DB2 and Informix are doing more or less the 
same thing for limit/offset support? If so, then having a more generic 
approach to this problem than having a custom compiler per backend might be 
worth it...

BTW there is already something like LIMIT and OFFSET in SQL 2008 standard. 
The syntax is different than LIMIT/OFFSET, and supported only by some 
vendors... See 
https://en.wikipedia.org/wiki/Select_%28SQL%29#FETCH_FIRST_clause

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Michael Manfre
On Tue, Mar 26, 2013 at 10:23 AM, Tim Chase
wrote:

> > In SQL 2008r2, the optimizer is usually smart enough to end up with
> > the same execution plan for IN and EXISTS queries. Historically,
> > EXISTS was usually the faster operation for SQL Server and if
> > memory serves it had to deal with its ability to bail out of the
> > EXISTS query sooner compared to the IN query.
>
> I'd have to go back and re-test 2008r2, as my testing was on 2005
> (and earlier). But my testing directly contradicts your
> "Historically..." bit, as I DISTINCTly (bad SQL pun intended) remember
> being surprised precisely because of what you say: EXISTS should be
> able to optimize and bail early.


My "Historically" comment is based upon memory of observed behaviors from
many years ago and could even be an ingrained memory dating back to SQL
Server 7. I'll concede that your testing is probably more accurate than my
memory of observed SQL optimizer behaviors from upwards of a decade ago.
The execution plan is a result of the specific query and schema, so it's
possible that we're both correct and both wrong at the same time.


>  EXISTS also has some nice features
> like the ability to do testing against multiple columns, i.e., you
> can't do something like
>
>   select *
>   from tbl1
>   where (col1, col2) in (
>select a, b
>from tbl2
>)
>
> but that's a simple EXISTS query.
>

Agreed, EXISTS is more flexible and at least to me, often easier to read,
format, and maintain. Assuming this is implemented with a database feature,
I'll most likely enable the EXISTS change for django-mssql. My main
objections to a blanket change is to not have the specific behaviors of one
database dictate how every other database must behave. Some one needs to be
the voice of 3rd party database backends and it appears for the time being,
I'm wearing that hat.


> > > MSSQL is a 2nd-class citizen in the Django world, so I'm +1
> >
> > Reasoning like that helps to keep it in its place.
>
> MSSQL's lack of certain core features is what does that.
> OFFSET/LIMIT provided by pretty much every other DB vendor?  Use
> baroque hacks to get the same functionality.


The non "baroque hack" way of doing limit offset is with nested selects.
Maybe someday the non-standard LIMIT/OFFSET keywords will get added to the
standard (I truly hope this happens) so Oracle, MSSQL, DB2, and Informix
could share SQL with postgres and mysql without needing to mangle it.

Regards,
Michael Manfre

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Tim Chase
On 2013-03-25 22:16, Michael Manfre wrote:
> On Monday, March 25, 2013 6:58:12 AM UTC-4, Tim Chase wrote:
> > I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at 
> > $OLD_JOB, but there it's usually about the same, occasionally
> > with IN winning out. 
> 
> In SQL 2008r2, the optimizer is usually smart enough to end up with
> the same execution plan for IN and EXISTS queries. Historically,
> EXISTS was usually the faster operation for SQL Server and if
> memory serves it had to deal with its ability to bail out of the
> EXISTS query sooner compared to the IN query.

I'd have to go back and re-test 2008r2, as my testing was on 2005
(and earlier). But my testing directly contradicts your
"Historically..." bit, as I DISTINCTly (bad SQL pun intended) remember
being surprised precisely because of what you say: EXISTS should be
able to optimize and bail early.  EXISTS also has some nice features
like the ability to do testing against multiple columns, i.e., you
can't do something like

  select *
  from tbl1
  where (col1, col2) in (
   select a, b
   from tbl2
   )

but that's a simple EXISTS query.

> > MSSQL is a 2nd-class citizen in the Django world, so I'm +1 
> 
> Reasoning like that helps to keep it in its place.

MSSQL's lack of certain core features is what does that.
OFFSET/LIMIT provided by pretty much every other DB vendor?  Use
baroque hacks to get the same functionality.  I seem to recall other
issues, though v2008 seems to have addressed many of them (min date =
1900-01-01 which was an issue when dealing with
historical/genealogical data; seems to be better in 2008)

Either way, if EXISTS in MSSQL is now faster than IN, it just is one
more tally in the "plus" column for why this might be a good idea
(modulo implementation complexities).

-tkc






-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-26 Thread Anssi Kääriäinen
On 26 maalis, 07:16, Michael Manfre  wrote:
> On Monday, March 25, 2013 6:58:12 AM UTC-4, Tim Chase wrote:
>
> > I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at
> > $OLD_JOB, but there it's usually about the same, occasionally with IN
> > winning out.
>
> In SQL 2008r2, the optimizer is usually smart enough to end up with the
> same execution plan for IN and EXISTS queries. Historically, EXISTS was
> usually the faster operation for SQL Server and if memory serves it had to
> deal with its ability to bail out of the EXISTS query sooner compared to
> the IN query.
>
> MSSQL is a 2nd-class citizen in the Django world, so I'm +1
>
>
>
> Reasoning like that helps to keep it in its place.
>
> Anssi,
>
> Any chance of adding a new database feature to flip the behavior of __in to
> either IN or EXISTS? Sounds like this change of logical and documented
> behavior is being made specifically because of failings with Postgresql.
> The feature would also help satisfy the deprecation cycle normally used for
> changes to documented behaviors. Sub-queries are more likely to expose
> database specific issues with the SQL provided by Django (normally when
> used with aggregates or slicing). Adding the database feature might save
> every other backend from having to potentially jump
> through unnecessary hoops (mangling more SQL).

Yes, this is a possibility and I think I will take this path. This
adds a bit of complexity to the code but not much. There is already
support for both IN and EXISTS subqueries in the ORM (as of couple of
days ago). In addition I will change the __in lookup to exclude NULL
values from the inner query so that EXISTS and IN give the same
results.

By default PostgreSQL and SQLite will get EXISTS as preferred subquery
type. For other databases it will be easy to flip the feature flag if
it seems EXISTS is better than IN for the DB.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-25 Thread Michael Manfre


On Monday, March 25, 2013 6:58:12 AM UTC-4, Tim Chase wrote:
>
> I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at 
> $OLD_JOB, but there it's usually about the same, occasionally with IN 
> winning out. 


In SQL 2008r2, the optimizer is usually smart enough to end up with the 
same execution plan for IN and EXISTS queries. Historically, EXISTS was 
usually the faster operation for SQL Server and if memory serves it had to 
deal with its ability to bail out of the EXISTS query sooner compared to 
the IN query.

MSSQL is a 2nd-class citizen in the Django world, so I'm +1 
>

Reasoning like that helps to keep it in its place.

Anssi,

Any chance of adding a new database feature to flip the behavior of __in to 
either IN or EXISTS? Sounds like this change of logical and documented 
behavior is being made specifically because of failings with Postgresql. 
The feature would also help satisfy the deprecation cycle normally used for 
changes to documented behaviors. Sub-queries are more likely to expose 
database specific issues with the SQL provided by Django (normally when 
used with aggregates or slicing). Adding the database feature might save 
every other backend from having to potentially jump 
through unnecessary hoops (mangling more SQL).

Regards,
Michael Manfre

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-25 Thread Simon Riggs
On 25 March 2013 12:37, Anssi Kääriäinen  wrote:

> I feel pretty strongly that NOT EXISTS semantics are wanted. The NOT
> IN semantics are likely there just because that is how the
> implementation was originally done, not because there was any decision
> to choose those semantics.

Most likely, yes, so it looks like a bug fix now not an optimization.

> Also, multicolumn NOT IN lookups aren't
> supported on all databases (SQLite at least), so for that case NOT
> EXISTS semantics is going to happen anyways.

Yes, I think that's the clincher.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-25 Thread Alex Gaynor
I have no idea how EXISTS performs on MySQL, however I can say that IN +
subqueries on MySQL are so atrocious that we outright banned that where I
work, so I don't see how it could be worse :)

Alex


On Mon, Mar 25, 2013 at 8:37 AM, Anssi Kääriäinen
wrote:

> On 25 maalis, 13:23, Simon Riggs  wrote:
> > On 25 March 2013 10:58, Tim Chase 
> wrote:
> >
> > > On 2013-03-25 03:40, Anssi Kääriäinen wrote:
> > >> I am very likely going to change the ORM to use EXISTS subqueries
> > >> instead of IN subqueries. I know this is a good idea on PostgreSQL
> > >> but I don't have enough experience of other databases to know if
> > >> this is a good idea or not.
> >
> > > I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at
> > > $OLD_JOB, but there it's usually about the same, occasionally with IN
> > > winning out. However, the wins were marginal, and MSSQL is a 2nd-class
> > > citizen in the Django world, so I'm +1 on using EXISTS instead of IN,
> > > if the results are assured to be the same.
> >
> > The results are definitely different because NOT IN has some quite
> > strange characteristics: if the subquery returns a NULL then the whole
> > result is "unknown". It is that weirdness that makes it hard to
> > optimize for, or at least, not-yet-optimized for in PostgreSQL.
> >
> > In most cases it is the NOT EXISTS behaviour that people find natural
> > and normal anyway and that is the best mechanism to use.
>
> When doing an .exclude() that requires subquery Django automatically
> generates the queries so that the inner query's select clause can't
> contain nulls. For example:
> >>> print D.objects.exclude(e__id__gte=0).query
> SELECT `table_d`.`id`, `table_d`.`a`, `table_d`.`b` FROM `table_d`
> WHERE NOT (`table_d`.`id` IN (SELECT U1.`d_id` FROM `table_e` U1 WHERE
> (U1.`id` >= 0  AND U1.`d_id` IS NOT NULL)))
>
> However it is possible to generate NOT IN query where the SQL
> semantics are in effect when using __in lookup:
> >>> print
> D.objects.exclude(id__in=E.objects.filter(id__gte=0).values_list('d_id')).query
> SELECT `table_d`.`id`, `table_d`.`a`, `table_d`.`b` FROM `table_d`
> WHERE NOT (`table_d`.`id` IN (SELECT U0.`d_id` FROM `table_e` U0 WHERE
> U0.`id` >= 0 ))
>
> The results of the latter case could change (assuming d_id can contain
> null values).
>
> I think that this could be considered a bug fix. Django's ORM doesn't
> try to mimic SQL semantics, it tries to have Python semantics for the
> query. So an exclude(__in) lookup should behave like Python's "value
> not in list", not like SQL's NOT IN.
>
> On the other hand having __in lookups that do EXISTS in SQL might be a
> bit surprising. The way __in works is documented as generating SQL IN
> lookup: https://docs.djangoproject.com/en/dev/ref/models/querysets/#in.
>
> I feel pretty strongly that NOT EXISTS semantics are wanted. The NOT
> IN semantics are likely there just because that is how the
> implementation was originally done, not because there was any decision
> to choose those semantics. Also, multicolumn NOT IN lookups aren't
> supported on all databases (SQLite at least), so for that case NOT
> EXISTS semantics is going to happen anyways.
>
>  - Anssi
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>


-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-25 Thread Anssi Kääriäinen
On 25 maalis, 13:23, Simon Riggs  wrote:
> On 25 March 2013 10:58, Tim Chase  wrote:
>
> > On 2013-03-25 03:40, Anssi Kääriäinen wrote:
> >> I am very likely going to change the ORM to use EXISTS subqueries
> >> instead of IN subqueries. I know this is a good idea on PostgreSQL
> >> but I don't have enough experience of other databases to know if
> >> this is a good idea or not.
>
> > I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at
> > $OLD_JOB, but there it's usually about the same, occasionally with IN
> > winning out. However, the wins were marginal, and MSSQL is a 2nd-class
> > citizen in the Django world, so I'm +1 on using EXISTS instead of IN,
> > if the results are assured to be the same.
>
> The results are definitely different because NOT IN has some quite
> strange characteristics: if the subquery returns a NULL then the whole
> result is "unknown". It is that weirdness that makes it hard to
> optimize for, or at least, not-yet-optimized for in PostgreSQL.
>
> In most cases it is the NOT EXISTS behaviour that people find natural
> and normal anyway and that is the best mechanism to use.

When doing an .exclude() that requires subquery Django automatically
generates the queries so that the inner query's select clause can't
contain nulls. For example:
>>> print D.objects.exclude(e__id__gte=0).query
SELECT `table_d`.`id`, `table_d`.`a`, `table_d`.`b` FROM `table_d`
WHERE NOT (`table_d`.`id` IN (SELECT U1.`d_id` FROM `table_e` U1 WHERE
(U1.`id` >= 0  AND U1.`d_id` IS NOT NULL)))

However it is possible to generate NOT IN query where the SQL
semantics are in effect when using __in lookup:
>>> print 
>>> D.objects.exclude(id__in=E.objects.filter(id__gte=0).values_list('d_id')).query
SELECT `table_d`.`id`, `table_d`.`a`, `table_d`.`b` FROM `table_d`
WHERE NOT (`table_d`.`id` IN (SELECT U0.`d_id` FROM `table_e` U0 WHERE
U0.`id` >= 0 ))

The results of the latter case could change (assuming d_id can contain
null values).

I think that this could be considered a bug fix. Django's ORM doesn't
try to mimic SQL semantics, it tries to have Python semantics for the
query. So an exclude(__in) lookup should behave like Python's "value
not in list", not like SQL's NOT IN.

On the other hand having __in lookups that do EXISTS in SQL might be a
bit surprising. The way __in works is documented as generating SQL IN
lookup: https://docs.djangoproject.com/en/dev/ref/models/querysets/#in.

I feel pretty strongly that NOT EXISTS semantics are wanted. The NOT
IN semantics are likely there just because that is how the
implementation was originally done, not because there was any decision
to choose those semantics. Also, multicolumn NOT IN lookups aren't
supported on all databases (SQLite at least), so for that case NOT
EXISTS semantics is going to happen anyways.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-25 Thread Simon Riggs
On 25 March 2013 10:58, Tim Chase  wrote:
> On 2013-03-25 03:40, Anssi Kääriäinen wrote:
>> I am very likely going to change the ORM to use EXISTS subqueries
>> instead of IN subqueries. I know this is a good idea on PostgreSQL
>> but I don't have enough experience of other databases to know if
>> this is a good idea or not.
>
> I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at
> $OLD_JOB, but there it's usually about the same, occasionally with IN
> winning out. However, the wins were marginal, and MSSQL is a 2nd-class
> citizen in the Django world, so I'm +1 on using EXISTS instead of IN,
> if the results are assured to be the same.

The results are definitely different because NOT IN has some quite
strange characteristics: if the subquery returns a NULL then the whole
result is "unknown". It is that weirdness that makes it hard to
optimize for, or at least, not-yet-optimized for in PostgreSQL.

In most cases it is the NOT EXISTS behaviour that people find natural
and normal anyway and that is the best mechanism to use.

> However, the query constuction to move the condition into the EXISTS
> subclause might be a bit more complex.


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using EXISTS instead of IN for subqueries

2013-03-25 Thread Tim Chase
On 2013-03-25 03:40, Anssi Kääriäinen wrote:
> I am very likely going to change the ORM to use EXISTS subqueries
> instead of IN subqueries. I know this is a good idea on PostgreSQL
> but I don't have enough experience of other databases to know if
> this is a good idea or not.

I can only speak for testing IN-vs-EXISTS speed on MSSQLServer at
$OLD_JOB, but there it's usually about the same, occasionally with IN
winning out. However, the wins were marginal, and MSSQL is a 2nd-class
citizen in the Django world, so I'm +1 on using EXISTS instead of IN,
if the results are assured to be the same.

However, the query constuction to move the condition into the EXISTS
subclause might be a bit more complex.

-tkc


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Using EXISTS instead of IN for subqueries

2013-03-25 Thread Anssi Kääriäinen
I am very likely going to change the ORM to use EXISTS subqueries
instead of IN subqueries. I know this is a good idea on PostgreSQL but
I don't have enough experience of other databases to know if this is a
good idea or not.

There are two main reasons for doing this. First, exists should
perform better on some databases, and exists allows for filter
conditions other than single column equality on all databases. So,
EXISTS subqueries are needed in the ORM in any case, the question is
if they should be the only option.

The semantics of NOT IN are harder to optimize for the DB than NOT
EXISTS, and this can result in large performance differences. See for
example this post from pgsql-hacker mailing list:
http://www.postgresql.org/message-id/19913.1359149...@sss.pgh.pa.us

It is easy to construct cases where NOT IN results in runtime of days
and NOT EXISTS in runtime of seconds. Just have a large enough table
in the subquery and PostgreSQL will choke.

Quick testing indicates that Oracle and MySQL seem to perform about
the same for IN and EXISTS variants, and SQLite seems to be a bit
faster when using EXISTS over IN. The docs of MySQL suggests using
EXISTS: 
https://dev.mysql.com/doc/refman/5.5/en/subquery-optimization-with-exists.html
(see the part about "very useful optimization"). My experience of
using these databases is very limited, so I might be missing some
known problematic cases.

So, the question is if there are situations where performance of
EXISTS is a lot worse than IN?

It will be possible to have a
connection.features.prefers_exists_subqueries flag and use that to
decide if the query should be generated as IN or EXISTS subquery.
However, always using EXISTS is a lot simpler.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.