Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-10 Thread P Kishor
On Sat, Jan 10, 2009 at 7:58 AM, Lukas Haase  wrote:
> D. Richard Hipp schrieb:
>> On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote:
>>> SELECT t.topic, t.length
>>> FROM printgroup AS pg1
>>> LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup
>>> LEFT JOIN topics AS t ON t.topicID = pg2.topicID
>>> LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID
>>> WHERE ti.topic_textID = ''
>>> ORDER BY pg2.topicID ASC;
>>
>> You seem very fond of using LEFT JOINs in places where they do not
>> make good sense.
>
> Yes, I started with mySQL 3 many years ago. At the beginning I only knew
> about LEFT JOINs and used them. Now I think I also know the other types
> of JOINs but I still use LEFT JOINs very often, just by habit. And with
> mySQL I never had performance problems with them.
>
>> What is it that you think a LEFT JOIN does?
>
> (A LEFT JOIN B) joins together table A and B while all records are taken
>  from A and only records that match both are takes from B. If a record
> from A has no corresponding data in B, the values are NULL.
>
>> How is
>> a LEFT JOIN different than an ordinary inner JOIN?
>
> INNER JOIN takes *all* records from both tables, A and B. Generally, the
> resultset will be larger.

all the rows from both tables A and B *that match* the join
condition... in other words, unlike a LEFT (or a RIGHT) JOIN, which
would include even those rows where only A (or B) match but show a
NULL value for the B (or A) table, an INNER JOIN, aka, just JOIN,
would usually have a smaller result set. Unless and until both tables
provided a match (and stuffing NULL is not a match), a row would not
be included in the result set.


>
>> I ask because I
>> suspect that your answer will reveal misconceptions about LEFT JOINs
>> which, when rectified, will cause most of your performance issues to
>> go away.
>
> Maybe my I think too much in "left joining" but I did not know that
> there is so much difference in performance.
>
> Best Regards,
> Luke
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Puneet Kishor http://www.punkish.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-10 Thread Nicolas Williams
On Sat, Jan 10, 2009 at 02:58:57PM +0100, Lukas Haase wrote:
> > What is it that you think a LEFT JOIN does?
> 
> (A LEFT JOIN B) joins together table A and B while all records are taken 
>   from A and only records that match both are takes from B. If a record 
> from A has no corresponding data in B, the values are NULL.

That's what a LEFT JOIN is, but why do you think you need it?

> > How is  
> > a LEFT JOIN different than an ordinary inner JOIN?
> 
> INNER JOIN takes *all* records from both tables, A and B. Generally, the 
> resultset will be larger.

The result of an INNER JOIN will be smaller than or equal to that of a
LEFT JOIN since rows from A that can't be joined to any rows from B
don't appear in the result of the INNER JOIN but do appear in the LEFT
JOIN.  Were you thinking of cross joins?

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-10 Thread D. Richard Hipp
When you have an inner join like this:

SELECT *
FROM a JOIN b ON a.x=b.y
WHERE b.z=123;

then the SQL engine is free to interchange the order of the tables in  
the join.  For example, the join might be implemented as:

SELECT *
FROM b JOIN a ON a.x=b.y
WHERE b.z=123

Giving the SQL engine the freedom to interchange tables will often  
create significant performance improvements.  For example, if there  
are indexes on b.z and a.x, then by interchanging the two tables, the  
join can be satisfied by first looking up entries where b.z=123 then  
finding corresponding b.y values and using them to look up entries  
matching a.x=b.y.

Note that the tables can only be interchanged if you use a inner  
JOIN.  Change the order of tables in a LEFT JOIN creates a different  
answer.  So when you use LEFT JOIN, that forces a particular ordering  
of tables, and greatly restricts the query engines opportunities to  
optimize.  So you should avoid using LEFT JOIN if you don't really  
need it.

In the example above, because the tables can be reordered, the query  
will run in O(logN).  But if you us a LEFT JOIN forcing the original a- 
before-b order, the runtime will be O(N*N).  Quite a bit slower.

And in the example above, the LEFT JOIN is not really needed.  Because  
of the "b.z=123" test in the WHERE clause, the terms of the LEFT JOIN  
where table b is NULL will never make it to the output.  So you will  
get the same result using either an inner JOIN or a LEFT JOIN.  So  
since an inner JOIN gives the query engine more optimization  
opportunities, you might as well use it.

On Jan 10, 2009, at 8:58 AM, Lukas Haase wrote:

> D. Richard Hipp schrieb:
>> On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote:
>>> SELECT t.topic, t.length
>>> FROM printgroup AS pg1
>>> LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup
>>> LEFT JOIN topics AS t ON t.topicID = pg2.topicID
>>> LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID
>>> WHERE ti.topic_textID = ''
>>> ORDER BY pg2.topicID ASC;
>>
>> You seem very fond of using LEFT JOINs in places where they do not
>> make good sense.
>
> Yes, I started with mySQL 3 many years ago. At the beginning I only  
> knew
> about LEFT JOINs and used them. Now I think I also know the other  
> types
> of JOINs but I still use LEFT JOINs very often, just by habit. And  
> with
> mySQL I never had performance problems with them.
>
>> What is it that you think a LEFT JOIN does?
>
> (A LEFT JOIN B) joins together table A and B while all records are  
> taken
>  from A and only records that match both are takes from B. If a record
> from A has no corresponding data in B, the values are NULL.
>
>> How is
>> a LEFT JOIN different than an ordinary inner JOIN?
>
> INNER JOIN takes *all* records from both tables, A and B. Generally,  
> the
> resultset will be larger.
>
>> I ask because I
>> suspect that your answer will reveal misconceptions about LEFT JOINs
>> which, when rectified, will cause most of your performance issues to
>> go away.
>
> Maybe my I think too much in "left joining" but I did not know that
> there is so much difference in performance.
>
> Best Regards,
> Luke
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

D. Richard Hipp
d...@hwaci.com



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-10 Thread Lukas Haase
Kees Nuyt schrieb:
> On Fri, 09 Jan 2009 21:16:03 +0100, Lukas Haase
>  wrote in General Discussion of SQLite
> Database :
> 
>> Hello Richard!
>>
>> Thank you very much!! It works! :-)
>>
>>
>> Indeed. 0-10 milliseconds instead of 500-800 :-)
>>
>> But may you tell me why this works and where you have this information? 
>> I know the O-notation but I do not know /why/ this boosts down to log(n)...
> 
> Use EXPLAIN SELECT .
>   to see the virtual machine instructions
> and EXPLAIN SELECT QUERY PLAN .
>   to see which index is used.
> 
> http://www.sqlite.org/lang_explain.html
> 
> Each JOIN is implemented as nested loops. The virtual
> machine code can tell a lot about what part of the database
> has to be scanned.

Thank you, I know this and I did try it already.

But unfortunately I do not know how to interpret the results.

Best regards,
Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-10 Thread Lukas Haase
D. Richard Hipp schrieb:
> On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote:
>> SELECT t.topic, t.length
>> FROM printgroup AS pg1
>> LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup
>> LEFT JOIN topics AS t ON t.topicID = pg2.topicID
>> LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID
>> WHERE ti.topic_textID = ''
>> ORDER BY pg2.topicID ASC;
> 
> You seem very fond of using LEFT JOINs in places where they do not  
> make good sense.

Yes, I started with mySQL 3 many years ago. At the beginning I only knew 
about LEFT JOINs and used them. Now I think I also know the other types 
of JOINs but I still use LEFT JOINs very often, just by habit. And with 
mySQL I never had performance problems with them.

> What is it that you think a LEFT JOIN does?

(A LEFT JOIN B) joins together table A and B while all records are taken 
  from A and only records that match both are takes from B. If a record 
from A has no corresponding data in B, the values are NULL.

> How is  
> a LEFT JOIN different than an ordinary inner JOIN?

INNER JOIN takes *all* records from both tables, A and B. Generally, the 
resultset will be larger.

> I ask because I  
> suspect that your answer will reveal misconceptions about LEFT JOINs  
> which, when rectified, will cause most of your performance issues to  
> go away.

Maybe my I think too much in "left joining" but I did not know that 
there is so much difference in performance.

Best Regards,
Luke


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-09 Thread D. Richard Hipp

On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote:
>
> SELECT t.topic, t.length
> FROM printgroup AS pg1
> LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup
> LEFT JOIN topics AS t ON t.topicID = pg2.topicID
> LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID
> WHERE ti.topic_textID = ''
> ORDER BY pg2.topicID ASC;
>

You seem very fond of using LEFT JOINs in places where they do not  
make good sense.  What is it that you think a LEFT JOIN does?  How is  
a LEFT JOIN different than an ordinary inner JOIN?  I ask because I  
suspect that your answer will reveal misconceptions about LEFT JOINs  
which, when rectified, will cause most of your performance issues to  
go away.

D. Richard Hipp
d...@hwaci.com



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-09 Thread Kees Nuyt

On Fri, 09 Jan 2009 21:16:03 +0100, Lukas Haase
 wrote in General Discussion of SQLite
Database :

>Hello Richard!
>
>Thank you very much!! It works! :-)
>
>
>Indeed. 0-10 milliseconds instead of 500-800 :-)
>
>But may you tell me why this works and where you have this information? 
>I know the O-notation but I do not know /why/ this boosts down to log(n)...

Use EXPLAIN SELECT .
to see the virtual machine instructions
and EXPLAIN SELECT QUERY PLAN .
to see which index is used.

http://www.sqlite.org/lang_explain.html

Each JOIN is implemented as nested loops. The virtual
machine code can tell a lot about what part of the database
has to be scanned.

>I have other queries which worry me. But that trick did not help in 
>these cases :-(
>
[...]
>
>Thank you again and best regards,
>Luke

-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-09 Thread Lukas Haase
Hello Richard!

Thank you very much!! It works! :-)

D. Richard Hipp schrieb:
> On Jan 7, 2009, at 6:11 PM, Lukas Haase wrote:
> 
>> Hello,
>>
>> Can somebody tell me why this (simple) query take so much time? This
>> query does nothing more than querying a table and JOINing two other
>> tables together.
>>
>> SELECT
>>  ti1.topicID AS topicID,
>>  ti2.topic_textID AS parent,
>>  n.level,
>>  n.level_order
>> FROM navigation AS n
>> LEFT JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID
>> LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID
>> WHERE ti1.topic_textID = 'X';
> 
> SQLite should be running this query in O(NlogN).
> 
> If you change the first LEFT JOIN to a plain old JOIN (which should  
> give equivalent results by virtue of the WHERE clause restricting  
> ti1.topic_textID to not be NULL) then it should run in O(logN) - much  
> faster.  Try it and let me know.

Indeed. 0-10 milliseconds instead of 500-800 :-)

But may you tell me why this works and where you have this information? 
I know the O-notation but I do not know /why/ this boosts down to log(n)...

I have other queries which worry me. But that trick did not help in 
these cases :-(

Especially I have problems with a self-join. In a table I have defined 
groups of elements ("printgroup"):

CREATE TABLE printgroup(
topicID INTEGER,
printgroup INTEGER,
PRIMARY KEY(topicID, printgroup)
);

I think these indices are not necessary because both fields are primary 
keys anyway.
CREATE INDEX topicID ON printgroup(topicID);
CREATE INDEX pprintgroup ON printgroup(printgroup);

When I know one element of a group (given by topicID) I want to find all 
other elements in the same group:

SELECT t.topic, t.length
FROM printgroup AS pg1
LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup
LEFT JOIN topics AS t ON t.topicID = pg2.topicID
LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID
WHERE ti.topic_textID = ''
ORDER BY pg2.topicID ASC;

The table "topics" just contains the actual data for each topicID 
(t.topic with length t.length).

This query takes a few seconds (und to minutes) with "sqlite3.exe" and 
even much longer in my application (sqlite with CppSQlite3): Up to 15 
minutes!

Mimicking your magic above I tried to leave out the "LEFT" in the 
self-joins but it did not change anything :-(

And unfortunately, the optimization FAQ [1] is very incomplete, at least 
at the interesting points (indices) :-(


Thank you again and best regards,
Luke


[1] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-08 Thread MikeW
Lukas Haase  writes:

> 
> Hello,
> 
> Can somebody tell me why this (simple) query take so much time? This 
> query does nothing more than querying a table and JOINing two other 
> tables together.
> 
... snip ...
> 
> Does anybody have an idea what's going wrong here? How can I speed up 
> this query?
> 
> Thank you very much in advance,
> Luke

Silly question I know - presume this time does not include the prepare
(which should be done in advance) ?

MikeW



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 600ms for simple query: How to optimize it?

2009-01-07 Thread D. Richard Hipp

On Jan 7, 2009, at 6:11 PM, Lukas Haase wrote:

> Hello,
>
> Can somebody tell me why this (simple) query take so much time? This
> query does nothing more than querying a table and JOINing two other
> tables together.
>
> SELECT
>   ti1.topicID AS topicID,
>   ti2.topic_textID AS parent,
>   n.level,
>   n.level_order
> FROM navigation AS n
> LEFT JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID
> LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID
> WHERE ti1.topic_textID = 'X';

SQLite should be running this query in O(NlogN).

If you change the first LEFT JOIN to a plain old JOIN (which should  
give equivalent results by virtue of the WHERE clause restricting  
ti1.topic_textID to not be NULL) then it should run in O(logN) - much  
faster.  Try it and let me know.

SELECT
ti1.topicID AS topicID,
ti2.topic_textID AS parent,
n.level,
n.level_order
FROM navigation AS n
JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID
LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID
WHERE ti1.topic_textID = 'X';


>
>
> I thought I optimized the table good with indexes but one such a query
> takes 500 to 1000ms in my C++ program.
>
> Here are my table definitions and the indexes (unfortunately I need  
> the
> VARCHAR(20) field because I get the "topicID" only as text:
>
> CREATE TABLE topic_ids(
>   topicID INTEGER,
>   topic_textID VARCHAR(20),
>   PRIMARY KEY(topicID)
> );
> CREATE INDEX topic_textID ON topic_ids(topic_textID);
>
> CREATE TABLE navigation(
>   topicID INTEGER PRIMARY KEY,
>   parent_topicID INTEGER,
>   level VARCHAR(20),
>   level_order INTEGER
> );
> CREATE INDEX parent_topicID ON navigation(parent_topicID);
> CREATE INDEX level ON navigation(level);
> CREATE INDEX level_order ON navigation(level_order);
>
> I need to execute this query in a database application each time a new
> page is opened. So 500ms are really too much. A few ms would be great.
>
> And the tables itself are not really huge:
>
> SELECT COUNT(*) FROM navigation;
> 19469
> SELECT COUNT(*) FROM topic_ids;
> 19469
>
> Does anybody have an idea what's going wrong here? How can I speed up
> this query?
>
> Thank you very much in advance,
> Luke
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

D. Richard Hipp
d...@hwaci.com



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 600ms for simple query: How to optimize it?

2009-01-07 Thread Lukas Haase
Hello,

Can somebody tell me why this (simple) query take so much time? This 
query does nothing more than querying a table and JOINing two other 
tables together.

SELECT
ti1.topicID AS topicID,
ti2.topic_textID AS parent,
n.level,
n.level_order
FROM navigation AS n
LEFT JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID
LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID
WHERE ti1.topic_textID = 'X';

I thought I optimized the table good with indexes but one such a query 
takes 500 to 1000ms in my C++ program.

Here are my table definitions and the indexes (unfortunately I need the 
VARCHAR(20) field because I get the "topicID" only as text:

CREATE TABLE topic_ids(
topicID INTEGER,
topic_textID VARCHAR(20),
PRIMARY KEY(topicID)
);
CREATE INDEX topic_textID ON topic_ids(topic_textID);

CREATE TABLE navigation(
topicID INTEGER PRIMARY KEY,
parent_topicID INTEGER,
level VARCHAR(20),
level_order INTEGER
);
CREATE INDEX parent_topicID ON navigation(parent_topicID);
CREATE INDEX level ON navigation(level);
CREATE INDEX level_order ON navigation(level_order);

I need to execute this query in a database application each time a new 
page is opened. So 500ms are really too much. A few ms would be great.

And the tables itself are not really huge:

SELECT COUNT(*) FROM navigation;
19469
SELECT COUNT(*) FROM topic_ids;
19469

Does anybody have an idea what's going wrong here? How can I speed up 
this query?

Thank you very much in advance,
Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users