Re: [sqlite] 600ms for simple query: How to optimize it?
On Sat, Jan 10, 2009 at 7:58 AM, Lukas Haasewrote: > D. Richard Hipp schrieb: >> On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote: >>> SELECT t.topic, t.length >>> FROM printgroup AS pg1 >>> LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup >>> LEFT JOIN topics AS t ON t.topicID = pg2.topicID >>> LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID >>> WHERE ti.topic_textID = '' >>> ORDER BY pg2.topicID ASC; >> >> You seem very fond of using LEFT JOINs in places where they do not >> make good sense. > > Yes, I started with mySQL 3 many years ago. At the beginning I only knew > about LEFT JOINs and used them. Now I think I also know the other types > of JOINs but I still use LEFT JOINs very often, just by habit. And with > mySQL I never had performance problems with them. > >> What is it that you think a LEFT JOIN does? > > (A LEFT JOIN B) joins together table A and B while all records are taken > from A and only records that match both are takes from B. If a record > from A has no corresponding data in B, the values are NULL. > >> How is >> a LEFT JOIN different than an ordinary inner JOIN? > > INNER JOIN takes *all* records from both tables, A and B. Generally, the > resultset will be larger. all the rows from both tables A and B *that match* the join condition... in other words, unlike a LEFT (or a RIGHT) JOIN, which would include even those rows where only A (or B) match but show a NULL value for the B (or A) table, an INNER JOIN, aka, just JOIN, would usually have a smaller result set. Unless and until both tables provided a match (and stuffing NULL is not a match), a row would not be included in the result set. > >> I ask because I >> suspect that your answer will reveal misconceptions about LEFT JOINs >> which, when rectified, will cause most of your performance issues to >> go away. > > Maybe my I think too much in "left joining" but I did not know that > there is so much difference in performance. > > Best Regards, > Luke > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Puneet Kishor http://www.punkish.org/ Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
On Sat, Jan 10, 2009 at 02:58:57PM +0100, Lukas Haase wrote: > > What is it that you think a LEFT JOIN does? > > (A LEFT JOIN B) joins together table A and B while all records are taken > from A and only records that match both are takes from B. If a record > from A has no corresponding data in B, the values are NULL. That's what a LEFT JOIN is, but why do you think you need it? > > How is > > a LEFT JOIN different than an ordinary inner JOIN? > > INNER JOIN takes *all* records from both tables, A and B. Generally, the > resultset will be larger. The result of an INNER JOIN will be smaller than or equal to that of a LEFT JOIN since rows from A that can't be joined to any rows from B don't appear in the result of the INNER JOIN but do appear in the LEFT JOIN. Were you thinking of cross joins? Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
When you have an inner join like this: SELECT * FROM a JOIN b ON a.x=b.y WHERE b.z=123; then the SQL engine is free to interchange the order of the tables in the join. For example, the join might be implemented as: SELECT * FROM b JOIN a ON a.x=b.y WHERE b.z=123 Giving the SQL engine the freedom to interchange tables will often create significant performance improvements. For example, if there are indexes on b.z and a.x, then by interchanging the two tables, the join can be satisfied by first looking up entries where b.z=123 then finding corresponding b.y values and using them to look up entries matching a.x=b.y. Note that the tables can only be interchanged if you use a inner JOIN. Change the order of tables in a LEFT JOIN creates a different answer. So when you use LEFT JOIN, that forces a particular ordering of tables, and greatly restricts the query engines opportunities to optimize. So you should avoid using LEFT JOIN if you don't really need it. In the example above, because the tables can be reordered, the query will run in O(logN). But if you us a LEFT JOIN forcing the original a- before-b order, the runtime will be O(N*N). Quite a bit slower. And in the example above, the LEFT JOIN is not really needed. Because of the "b.z=123" test in the WHERE clause, the terms of the LEFT JOIN where table b is NULL will never make it to the output. So you will get the same result using either an inner JOIN or a LEFT JOIN. So since an inner JOIN gives the query engine more optimization opportunities, you might as well use it. On Jan 10, 2009, at 8:58 AM, Lukas Haase wrote: > D. Richard Hipp schrieb: >> On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote: >>> SELECT t.topic, t.length >>> FROM printgroup AS pg1 >>> LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup >>> LEFT JOIN topics AS t ON t.topicID = pg2.topicID >>> LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID >>> WHERE ti.topic_textID = '' >>> ORDER BY pg2.topicID ASC; >> >> You seem very fond of using LEFT JOINs in places where they do not >> make good sense. > > Yes, I started with mySQL 3 many years ago. At the beginning I only > knew > about LEFT JOINs and used them. Now I think I also know the other > types > of JOINs but I still use LEFT JOINs very often, just by habit. And > with > mySQL I never had performance problems with them. > >> What is it that you think a LEFT JOIN does? > > (A LEFT JOIN B) joins together table A and B while all records are > taken > from A and only records that match both are takes from B. If a record > from A has no corresponding data in B, the values are NULL. > >> How is >> a LEFT JOIN different than an ordinary inner JOIN? > > INNER JOIN takes *all* records from both tables, A and B. Generally, > the > resultset will be larger. > >> I ask because I >> suspect that your answer will reveal misconceptions about LEFT JOINs >> which, when rectified, will cause most of your performance issues to >> go away. > > Maybe my I think too much in "left joining" but I did not know that > there is so much difference in performance. > > Best Regards, > Luke > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users D. Richard Hipp d...@hwaci.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
Kees Nuyt schrieb: > On Fri, 09 Jan 2009 21:16:03 +0100, Lukas Haase >wrote in General Discussion of SQLite > Database : > >> Hello Richard! >> >> Thank you very much!! It works! :-) >> >> >> Indeed. 0-10 milliseconds instead of 500-800 :-) >> >> But may you tell me why this works and where you have this information? >> I know the O-notation but I do not know /why/ this boosts down to log(n)... > > Use EXPLAIN SELECT . > to see the virtual machine instructions > and EXPLAIN SELECT QUERY PLAN . > to see which index is used. > > http://www.sqlite.org/lang_explain.html > > Each JOIN is implemented as nested loops. The virtual > machine code can tell a lot about what part of the database > has to be scanned. Thank you, I know this and I did try it already. But unfortunately I do not know how to interpret the results. Best regards, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
D. Richard Hipp schrieb: > On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote: >> SELECT t.topic, t.length >> FROM printgroup AS pg1 >> LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup >> LEFT JOIN topics AS t ON t.topicID = pg2.topicID >> LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID >> WHERE ti.topic_textID = '' >> ORDER BY pg2.topicID ASC; > > You seem very fond of using LEFT JOINs in places where they do not > make good sense. Yes, I started with mySQL 3 many years ago. At the beginning I only knew about LEFT JOINs and used them. Now I think I also know the other types of JOINs but I still use LEFT JOINs very often, just by habit. And with mySQL I never had performance problems with them. > What is it that you think a LEFT JOIN does? (A LEFT JOIN B) joins together table A and B while all records are taken from A and only records that match both are takes from B. If a record from A has no corresponding data in B, the values are NULL. > How is > a LEFT JOIN different than an ordinary inner JOIN? INNER JOIN takes *all* records from both tables, A and B. Generally, the resultset will be larger. > I ask because I > suspect that your answer will reveal misconceptions about LEFT JOINs > which, when rectified, will cause most of your performance issues to > go away. Maybe my I think too much in "left joining" but I did not know that there is so much difference in performance. Best Regards, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
On Jan 9, 2009, at 3:16 PM, Lukas Haase wrote: > > SELECT t.topic, t.length > FROM printgroup AS pg1 > LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup > LEFT JOIN topics AS t ON t.topicID = pg2.topicID > LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID > WHERE ti.topic_textID = '' > ORDER BY pg2.topicID ASC; > You seem very fond of using LEFT JOINs in places where they do not make good sense. What is it that you think a LEFT JOIN does? How is a LEFT JOIN different than an ordinary inner JOIN? I ask because I suspect that your answer will reveal misconceptions about LEFT JOINs which, when rectified, will cause most of your performance issues to go away. D. Richard Hipp d...@hwaci.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
On Fri, 09 Jan 2009 21:16:03 +0100, Lukas Haasewrote in General Discussion of SQLite Database : >Hello Richard! > >Thank you very much!! It works! :-) > > >Indeed. 0-10 milliseconds instead of 500-800 :-) > >But may you tell me why this works and where you have this information? >I know the O-notation but I do not know /why/ this boosts down to log(n)... Use EXPLAIN SELECT . to see the virtual machine instructions and EXPLAIN SELECT QUERY PLAN . to see which index is used. http://www.sqlite.org/lang_explain.html Each JOIN is implemented as nested loops. The virtual machine code can tell a lot about what part of the database has to be scanned. >I have other queries which worry me. But that trick did not help in >these cases :-( > [...] > >Thank you again and best regards, >Luke -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
Hello Richard! Thank you very much!! It works! :-) D. Richard Hipp schrieb: > On Jan 7, 2009, at 6:11 PM, Lukas Haase wrote: > >> Hello, >> >> Can somebody tell me why this (simple) query take so much time? This >> query does nothing more than querying a table and JOINing two other >> tables together. >> >> SELECT >> ti1.topicID AS topicID, >> ti2.topic_textID AS parent, >> n.level, >> n.level_order >> FROM navigation AS n >> LEFT JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID >> LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID >> WHERE ti1.topic_textID = 'X'; > > SQLite should be running this query in O(NlogN). > > If you change the first LEFT JOIN to a plain old JOIN (which should > give equivalent results by virtue of the WHERE clause restricting > ti1.topic_textID to not be NULL) then it should run in O(logN) - much > faster. Try it and let me know. Indeed. 0-10 milliseconds instead of 500-800 :-) But may you tell me why this works and where you have this information? I know the O-notation but I do not know /why/ this boosts down to log(n)... I have other queries which worry me. But that trick did not help in these cases :-( Especially I have problems with a self-join. In a table I have defined groups of elements ("printgroup"): CREATE TABLE printgroup( topicID INTEGER, printgroup INTEGER, PRIMARY KEY(topicID, printgroup) ); I think these indices are not necessary because both fields are primary keys anyway. CREATE INDEX topicID ON printgroup(topicID); CREATE INDEX pprintgroup ON printgroup(printgroup); When I know one element of a group (given by topicID) I want to find all other elements in the same group: SELECT t.topic, t.length FROM printgroup AS pg1 LEFT JOIN printgroup AS pg2 ON pg1.printgroup = pg2.printgroup LEFT JOIN topics AS t ON t.topicID = pg2.topicID LEFT JOIN topic_ids AS ti ON ti.topicID = pg1.topicID WHERE ti.topic_textID = '' ORDER BY pg2.topicID ASC; The table "topics" just contains the actual data for each topicID (t.topic with length t.length). This query takes a few seconds (und to minutes) with "sqlite3.exe" and even much longer in my application (sqlite with CppSQlite3): Up to 15 minutes! Mimicking your magic above I tried to leave out the "LEFT" in the self-joins but it did not change anything :-( And unfortunately, the optimization FAQ [1] is very incomplete, at least at the interesting points (indices) :-( Thank you again and best regards, Luke [1] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
Lukas Haasewrites: > > Hello, > > Can somebody tell me why this (simple) query take so much time? This > query does nothing more than querying a table and JOINing two other > tables together. > ... snip ... > > Does anybody have an idea what's going wrong here? How can I speed up > this query? > > Thank you very much in advance, > Luke Silly question I know - presume this time does not include the prepare (which should be done in advance) ? MikeW ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 600ms for simple query: How to optimize it?
On Jan 7, 2009, at 6:11 PM, Lukas Haase wrote: > Hello, > > Can somebody tell me why this (simple) query take so much time? This > query does nothing more than querying a table and JOINing two other > tables together. > > SELECT > ti1.topicID AS topicID, > ti2.topic_textID AS parent, > n.level, > n.level_order > FROM navigation AS n > LEFT JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID > LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID > WHERE ti1.topic_textID = 'X'; SQLite should be running this query in O(NlogN). If you change the first LEFT JOIN to a plain old JOIN (which should give equivalent results by virtue of the WHERE clause restricting ti1.topic_textID to not be NULL) then it should run in O(logN) - much faster. Try it and let me know. SELECT ti1.topicID AS topicID, ti2.topic_textID AS parent, n.level, n.level_order FROM navigation AS n JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID WHERE ti1.topic_textID = 'X'; > > > I thought I optimized the table good with indexes but one such a query > takes 500 to 1000ms in my C++ program. > > Here are my table definitions and the indexes (unfortunately I need > the > VARCHAR(20) field because I get the "topicID" only as text: > > CREATE TABLE topic_ids( > topicID INTEGER, > topic_textID VARCHAR(20), > PRIMARY KEY(topicID) > ); > CREATE INDEX topic_textID ON topic_ids(topic_textID); > > CREATE TABLE navigation( > topicID INTEGER PRIMARY KEY, > parent_topicID INTEGER, > level VARCHAR(20), > level_order INTEGER > ); > CREATE INDEX parent_topicID ON navigation(parent_topicID); > CREATE INDEX level ON navigation(level); > CREATE INDEX level_order ON navigation(level_order); > > I need to execute this query in a database application each time a new > page is opened. So 500ms are really too much. A few ms would be great. > > And the tables itself are not really huge: > > SELECT COUNT(*) FROM navigation; > 19469 > SELECT COUNT(*) FROM topic_ids; > 19469 > > Does anybody have an idea what's going wrong here? How can I speed up > this query? > > Thank you very much in advance, > Luke > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users D. Richard Hipp d...@hwaci.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] 600ms for simple query: How to optimize it?
Hello, Can somebody tell me why this (simple) query take so much time? This query does nothing more than querying a table and JOINing two other tables together. SELECT ti1.topicID AS topicID, ti2.topic_textID AS parent, n.level, n.level_order FROM navigation AS n LEFT JOIN topic_ids AS ti1 ON ti1.topicID = n.topicID LEFT JOIN topic_ids AS ti2 ON ti2.topicID = n.parent_topicID WHERE ti1.topic_textID = 'X'; I thought I optimized the table good with indexes but one such a query takes 500 to 1000ms in my C++ program. Here are my table definitions and the indexes (unfortunately I need the VARCHAR(20) field because I get the "topicID" only as text: CREATE TABLE topic_ids( topicID INTEGER, topic_textID VARCHAR(20), PRIMARY KEY(topicID) ); CREATE INDEX topic_textID ON topic_ids(topic_textID); CREATE TABLE navigation( topicID INTEGER PRIMARY KEY, parent_topicID INTEGER, level VARCHAR(20), level_order INTEGER ); CREATE INDEX parent_topicID ON navigation(parent_topicID); CREATE INDEX level ON navigation(level); CREATE INDEX level_order ON navigation(level_order); I need to execute this query in a database application each time a new page is opened. So 500ms are really too much. A few ms would be great. And the tables itself are not really huge: SELECT COUNT(*) FROM navigation; 19469 SELECT COUNT(*) FROM topic_ids; 19469 Does anybody have an idea what's going wrong here? How can I speed up this query? Thank you very much in advance, Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users