Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-11 Thread J. King
Doh. I meant the linked document on the error log. Silly me. 



On September 11, 2017 9:41:39 PM EDT, "J. King"  wrote:
>There's an extra word in the first paragraph of Section 4 of that
>document, by the way:
>
>" The error logger callback has also proven useful in catching errors
>occasional errors that the application misses..."
>
>On September 11, 2017 11:22:50 AM EDT, Dan Kennedy
> wrote:
>>On 09/10/2017 08:30 PM, R Smith wrote:
>>> Well yes but the documentation suggests that one could expect a
>>slight 
>>> degradation. The words "works best with" does not seem to imbue an 
>>> idea of "give WITHOUT ROWID tables a wide berth when your tables are
>
>>> more than few columns wide", and I don't think the Devs intended
>that
>>
>>> either.
>>>
>>> I can still roll with the idea that the WITHOUT ROWID tables with
>151
>>
>>> columns and lots of data actually performed FASTER than the same
>>ROWID 
>>> tables for the other queries but slower with the JOIN... but that
>>much 
>>> slower?
>>
>>
>>
>>The difference seems to be that, currently, SQLite never creates 
>>automatic indexes on WITHOUT ROWID tables.
>>
>>   http://sqlite.org/optoverview.html#autoindex
>>
>>I don't think there is a good reason that it cannot do so. It just 
>>doesn't yet.
>>
>>Dan.
>>
>>
>>
>>
>>
>>>
>>> I mean we are not talking a "little" slower, we are talking 50ms vs.
>
>>> ~70,000ms, that's a difference of a whopping 3 orders of magnitude
>>and 
>>> change. And we are not talking huge tables, another query that
>simply
>>
>>> walks the tables can do so several million times in the same time
>the
>>
>>> JOIN query does.
>>>
>>> Put another way, I can create 100 new tables and populate them each 
>>> with the rows from one test table, then delete the rows from each
>new
>>
>>> table that doesn't satisfy an EXISTS() check in the other test table
>
>>> (essentially mimicking the JOIN query) and then output each full new
>
>>> table, 100 of them in turn, and then DROP them all. SQlite can do
>ALL
>>
>>> of that in a fraction of the time that the normal JOIN query
>(between
>>
>>> those same two test tables) takes to complete.
>>>
>>> We are talking a formula 1 car suddenly going at max speed of 1 
>>> mile-per-weekend, barely keeping up with a semi-athletic snail, and 
>>> people suggest checking the fuel octane rating. I'm saying there is 
>>> something wrong under the hood.
>>>
>>> Cheers,
>>> Ryan
>>>
>>>
>>> On 2017/09/10 11:28 AM, Clemens Ladisch wrote:
 R Smith wrote:
> I am using 151 columns for both tests. The only thing that changes
> between the two scripts are the words "WITHOUT ROWID" being added
  says:
 | WITHOUT ROWID tables will work correctly ... for tables with a
>>single
 | INTEGER PRIMARY KEY. However, ordinary rowid tables will run
>>faster in
 | that case.
 and:
 | WITHOUT ROWID tables work best when individual rows are not too
>>large.

 So this is pretty much the documented worst case for WITHOUT ROWID 
 tables.

 If the query is executed by making a copy of all the table data
>into
 a temporary B-tree (index), which is what INTERSECT and probably
>>IN(...)
 are doing, then the WITHOUT-ROWID-ness of the table does not
>matter.


 Regards,
 Clemens
 ___
 sqlite-users mailing list
 sqlite-users@mailinglists.sqlite.org

>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>>___
>>sqlite-users mailing list
>>sqlite-users@mailinglists.sqlite.org
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>-- 
>Sent from my Android device with K-9 Mail. Please excuse my brevity.
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-11 Thread J. King
There's an extra word in the first paragraph of Section 4 of that document, by 
the way:

" The error logger callback has also proven useful in catching errors 
occasional errors that the application misses..."

On September 11, 2017 11:22:50 AM EDT, Dan Kennedy  
wrote:
>On 09/10/2017 08:30 PM, R Smith wrote:
>> Well yes but the documentation suggests that one could expect a
>slight 
>> degradation. The words "works best with" does not seem to imbue an 
>> idea of "give WITHOUT ROWID tables a wide berth when your tables are 
>> more than few columns wide", and I don't think the Devs intended that
>
>> either.
>>
>> I can still roll with the idea that the WITHOUT ROWID tables with 151
>
>> columns and lots of data actually performed FASTER than the same
>ROWID 
>> tables for the other queries but slower with the JOIN... but that
>much 
>> slower?
>
>
>
>The difference seems to be that, currently, SQLite never creates 
>automatic indexes on WITHOUT ROWID tables.
>
>   http://sqlite.org/optoverview.html#autoindex
>
>I don't think there is a good reason that it cannot do so. It just 
>doesn't yet.
>
>Dan.
>
>
>
>
>
>>
>> I mean we are not talking a "little" slower, we are talking 50ms vs. 
>> ~70,000ms, that's a difference of a whopping 3 orders of magnitude
>and 
>> change. And we are not talking huge tables, another query that simply
>
>> walks the tables can do so several million times in the same time the
>
>> JOIN query does.
>>
>> Put another way, I can create 100 new tables and populate them each 
>> with the rows from one test table, then delete the rows from each new
>
>> table that doesn't satisfy an EXISTS() check in the other test table 
>> (essentially mimicking the JOIN query) and then output each full new 
>> table, 100 of them in turn, and then DROP them all. SQlite can do ALL
>
>> of that in a fraction of the time that the normal JOIN query (between
>
>> those same two test tables) takes to complete.
>>
>> We are talking a formula 1 car suddenly going at max speed of 1 
>> mile-per-weekend, barely keeping up with a semi-athletic snail, and 
>> people suggest checking the fuel octane rating. I'm saying there is 
>> something wrong under the hood.
>>
>> Cheers,
>> Ryan
>>
>>
>> On 2017/09/10 11:28 AM, Clemens Ladisch wrote:
>>> R Smith wrote:
 I am using 151 columns for both tests. The only thing that changes
 between the two scripts are the words "WITHOUT ROWID" being added
>>>  says:
>>> | WITHOUT ROWID tables will work correctly ... for tables with a
>single
>>> | INTEGER PRIMARY KEY. However, ordinary rowid tables will run
>faster in
>>> | that case.
>>> and:
>>> | WITHOUT ROWID tables work best when individual rows are not too
>large.
>>>
>>> So this is pretty much the documented worst case for WITHOUT ROWID 
>>> tables.
>>>
>>> If the query is executed by making a copy of all the table data into
>>> a temporary B-tree (index), which is what INTERSECT and probably
>IN(...)
>>> are doing, then the WITHOUT-ROWID-ness of the table does not matter.
>>>
>>>
>>> Regards,
>>> Clemens
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-11 Thread Dan Kennedy

On 09/10/2017 08:30 PM, R Smith wrote:
Well yes but the documentation suggests that one could expect a slight 
degradation. The words "works best with" does not seem to imbue an 
idea of "give WITHOUT ROWID tables a wide berth when your tables are 
more than few columns wide", and I don't think the Devs intended that 
either.


I can still roll with the idea that the WITHOUT ROWID tables with 151 
columns and lots of data actually performed FASTER than the same ROWID 
tables for the other queries but slower with the JOIN... but that much 
slower?




The difference seems to be that, currently, SQLite never creates 
automatic indexes on WITHOUT ROWID tables.


  http://sqlite.org/optoverview.html#autoindex

I don't think there is a good reason that it cannot do so. It just 
doesn't yet.


Dan.







I mean we are not talking a "little" slower, we are talking 50ms vs. 
~70,000ms, that's a difference of a whopping 3 orders of magnitude and 
change. And we are not talking huge tables, another query that simply 
walks the tables can do so several million times in the same time the 
JOIN query does.


Put another way, I can create 100 new tables and populate them each 
with the rows from one test table, then delete the rows from each new 
table that doesn't satisfy an EXISTS() check in the other test table 
(essentially mimicking the JOIN query) and then output each full new 
table, 100 of them in turn, and then DROP them all. SQlite can do ALL 
of that in a fraction of the time that the normal JOIN query (between 
those same two test tables) takes to complete.


We are talking a formula 1 car suddenly going at max speed of 1 
mile-per-weekend, barely keeping up with a semi-athletic snail, and 
people suggest checking the fuel octane rating. I'm saying there is 
something wrong under the hood.


Cheers,
Ryan


On 2017/09/10 11:28 AM, Clemens Ladisch wrote:

R Smith wrote:

I am using 151 columns for both tests. The only thing that changes
between the two scripts are the words "WITHOUT ROWID" being added

 says:
| WITHOUT ROWID tables will work correctly ... for tables with a single
| INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
| that case.
and:
| WITHOUT ROWID tables work best when individual rows are not too large.

So this is pretty much the documented worst case for WITHOUT ROWID 
tables.


If the query is executed by making a copy of all the table data into
a temporary B-tree (index), which is what INTERSECT and probably IN(...)
are doing, then the WITHOUT-ROWID-ness of the table does not matter.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-10 Thread R Smith
Well yes but the documentation suggests that one could expect a slight 
degradation. The words "works best with" does not seem to imbue an idea 
of "give WITHOUT ROWID tables a wide berth when your tables are more 
than few columns wide", and I don't think the Devs intended that either.


I can still roll with the idea that the WITHOUT ROWID tables with 151 
columns and lots of data actually performed FASTER than the same ROWID 
tables for the other queries but slower with the JOIN... but that much 
slower?


I mean we are not talking a "little" slower, we are talking 50ms vs. 
~70,000ms, that's a difference of a whopping 3 orders of magnitude and 
change. And we are not talking huge tables, another query that simply 
walks the tables can do so several million times in the same time the 
JOIN query does.


Put another way, I can create 100 new tables and populate them each with 
the rows from one test table, then delete the rows from each new table 
that doesn't satisfy an EXISTS() check in the other test table 
(essentially mimicking the JOIN query) and then output each full new 
table, 100 of them in turn, and then DROP them all. SQlite can do ALL of 
that in a fraction of the time that the normal JOIN query (between those 
same two test tables) takes to complete.


We are talking a formula 1 car suddenly going at max speed of 1 
mile-per-weekend, barely keeping up with a semi-athletic snail, and 
people suggest checking the fuel octane rating. I'm saying there is 
something wrong under the hood.


Cheers,
Ryan


On 2017/09/10 11:28 AM, Clemens Ladisch wrote:

R Smith wrote:

I am using 151 columns for both tests. The only thing that changes
between the two scripts are the words "WITHOUT ROWID" being added

 says:
| WITHOUT ROWID tables will work correctly ... for tables with a single
| INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
| that case.
and:
| WITHOUT ROWID tables work best when individual rows are not too large.

So this is pretty much the documented worst case for WITHOUT ROWID tables.

If the query is executed by making a copy of all the table data into
a temporary B-tree (index), which is what INTERSECT and probably IN(...)
are doing, then the WITHOUT-ROWID-ness of the table does not matter.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-10 Thread Clemens Ladisch
R Smith wrote:
> I am using 151 columns for both tests. The only thing that changes
> between the two scripts are the words "WITHOUT ROWID" being added

 says:
| WITHOUT ROWID tables will work correctly ... for tables with a single
| INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
| that case.
and:
| WITHOUT ROWID tables work best when individual rows are not too large.

So this is pretty much the documented worst case for WITHOUT ROWID tables.

If the query is executed by making a copy of all the table data into
a temporary B-tree (index), which is what INTERSECT and probably IN(...)
are doing, then the WITHOUT-ROWID-ness of the table does not matter.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread Nico Williams
On Sat, Sep 09, 2017 at 11:26:35PM +0200, R Smith wrote:
> I think you are missing something or my explanation was not clear.
> When I say "first test" I mean of THIS test suite, not the previous set from
> 3 days ago.

I meant the opposite.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread R Smith



On 2017/09/09 9:20 PM, Nico Williams wrote:

On Sat, Sep 09, 2017 at 05:56:03PM +0200, R Smith wrote:

*Findings* pertaining to JOIN, INTERSECT and WHERE IN (...) type queries on
tables *WITHOUT Row_ids*:
(This is the full test posted below because it is the one that matters most)
INTERSECT AND WHERE IN (...) queries posted similar times here with WHERE IN
(...) being the slightly faster (similar to the above findings), but BOTH
were a good 10 - 15% faster than on the ROW-ID tables, so WITHOUT ROWID
tables seem to have a definite advantage here (it is currently unclear to me
why this is).

A troubling test is the JOIN on WITHOUT ROWID tables - it took several
orders of magnitude longer than any other test in the entire experiment.

In your first test you were ordering by PK, now you're not, and you
don't have an [covering] index on the columns you're ordering by, so,
yeah, "orders of magnitude" slower is to be expected.  You're comparing
apples and oranges. /// etc 


I think you are missing something or my explanation was not clear.
When I say "first test" I mean of THIS test suite, not the previous set 
from 3 days ago.


I am in no way ordering by PK in this set of tests, anywhere. I am not 
using 1 column, I am using 151 columns for both tests. The only thing 
that changes between the two scripts are the words "WITHOUT ROWID" being 
added or removed, NOTHING else. The problem also happens on the 
un-ordered tests. The ordering is of no consequence.


To be clear, there are two tests, both do not order by PK, both do not 
have covering indexes, both have 151 columns and lots of data... the 
only difference between them is the words "WITHOUT ROWID" at the end of 
the test tables. One query runs in ~50 milliseconds, the other runs for 
minutes. The results are exactly the same.


You can test this by taking the script I provided, run it, it will run 
for longer than 2 minutes (depending on your machine speed) because of 
the JOIN queries taking over a minute each, then remove the words 
"WITHOUT ROWID" from the two test tables and run it again. The entire 
script will finish in underĀ  3 seconds with the JOIN queries weighing in 
at circa 50ms each.


That cannot possibly be an expected circumstance and it isn't an 
apples-oranges thing.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread Nico Williams
On Sat, Sep 09, 2017 at 05:56:03PM +0200, R Smith wrote:
> *Findings* pertaining to JOIN, INTERSECT and WHERE IN (...) type queries on
> tables *WITHOUT Row_ids*:
> (This is the full test posted below because it is the one that matters most)
> INTERSECT AND WHERE IN (...) queries posted similar times here with WHERE IN
> (...) being the slightly faster (similar to the above findings), but BOTH
> were a good 10 - 15% faster than on the ROW-ID tables, so WITHOUT ROWID
> tables seem to have a definite advantage here (it is currently unclear to me
> why this is).
> 
> A troubling test is the JOIN on WITHOUT ROWID tables - it took several
> orders of magnitude longer than any other test in the entire experiment.

In your first test you were ordering by PK, now you're not, and you
don't have an [covering] index on the columns you're ordering by, so,
yeah, "orders of magnitude" slower is to be expected.  You're comparing
apples and oranges.

Also, in your first test you have one column.  Now you have lots.
That's no way to isolate the performance numbers you're trying to get.

The only way to have an ORDER BY speed up a query is when there's an
index on the columns in question that can be scanned to produce the
order you're looking for without additional sorting, or when the engine
can create such an index as a temporary index (this, of course, has a
cost, so it won't always work well)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users