Re: [sqlite] Slow JOIN on ROWID
On Fri, Oct 14, 2011 at 12:46 PM, Fabian wrote: > 2011/10/14 Simon Slavin > > > > So, OP, try issuing these commands > > > > DROP TABLE sqlite_stat1; > > DROP TABLE sqlite_stat2; > > > > then see whether anything improves. > > > > > I already tried that, but it returned: > > Error: table sqlite_stat1 may not be dropped > If you build from the Fossil trunk, DROP TABLE sqite_statN will work. Or, you can do this: DELETE FROM sqlite_stat1; DELETE FROM sqlite_stat2; ANALYZE sqlite_master; The extra ANALYZE is necessary to cause SQLite to reload the tables. Or, instead of doing the extra ANALYZE, you can close and reopen the database connection. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 10/14/2011 11:23 PM, Simon Slavin wrote: On 14 Oct 2011, at 5:12pm, Fabian wrote: Is this the expected output? If so, ANALYZE was to blame. The query containing 'ORDER BY rowid DESC' is still slower than the one which doesn't specify any order, but the results are closer to eachother now. Iterating backwards is often a bit slower than iterating forwards. Especially with a database that not very fragmented. So that should never happen, right ? ANALYZE is meant to make things faster, not slower. So is that an actual fixable bug or is it one of those extremely unlikely situations that is hard to fix ? I'd agree it's a subtle bug. Since "SELECT WHERE data=10 ORDER BY rowid" is in practice the same query as "WHERE data=10", it's difficult to think of a reason why ANALYZE data should cause SQLite to change the query plan for one of them and not the other. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Simon Slavin > > So that should never happen, right ? ANALYZE is meant to make things > faster, not slower. So is that an actual fixable bug or is it one of those > extremely unlikely situations that is hard to fix ? > > So, OP, try issuing these commands > > DROP TABLE sqlite_stat1; > DROP TABLE sqlite_stat2; > > then see whether anything improves. > > I already tried that, but it returned: Error: table sqlite_stat1 may not be dropped ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 14 Oct 2011, at 5:12pm, Fabian wrote: > Is this the expected output? If so, ANALYZE was to blame. The query > containing 'ORDER BY rowid DESC' is still slower than the one which doesn't > specify any order, but the results are closer to eachother now. So that should never happen, right ? ANALYZE is meant to make things faster, not slower. So is that an actual fixable bug or is it one of those extremely unlikely situations that is hard to fix ? So, OP, try issuing these commands DROP TABLE sqlite_stat1; DROP TABLE sqlite_stat2; then see whether anything improves. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
> > Do you have an sqlite_stat1 table in the database (created by > running ANALYZE)? What is the output of the shell command > How about the contents of the "sqlite_stat1" table? What does > the shell command ".dump sqlite_stat1" show? > > This is the output with a fresh database, where ANALYZE hasn't been run: EXPLAIN SELECT * FROM table1 WHERE data=1 LIMIT 250 OFFSET 5; 0|Trace|0|0|0||00| 1|Integer|250|1|0||00| 2|Integer|5|2|0||00| 3|MustBeInt|2|0|0||00| 4|IfPos|2|6|0||00| 5|Integer|0|2|0||00| 6|Add|1|2|3||00| 7|IfPos|1|9|0||00| 8|Integer|-1|3|0||00| 9|Integer|1|4|0||00| 10|Goto|0|39|0||00| 11|OpenRead|0|2|0|13|00| 12|OpenRead|1|31079|0|keyinfo(1,BINARY)|00| 13|SeekGe|1|36|4|1|00| 14|IdxGE|1|36|4|1|01| 15|IdxRowid|1|5|0||00| 16|Seek|0|5|0||00| 17|AddImm|2|-1|0||00| 18|IfNeg|2|20|0||00| 19|Goto|0|35|0||00| 20|IdxRowid|1|6|0||00| 21|Column|1|0|7||00| 22|Column|0|2|8||00| 23|Column|0|3|9||00| 24|Column|0|4|10||00| 25|Column|0|5|11||00| 26|Column|0|6|12||00| 27|Column|0|7|13||00| 28|Column|0|8|14||00| 29|Column|0|9|15||00| 30|Column|0|10|16||00| 31|Column|0|11|17||00| 32|Column|0|12|18||00| 33|ResultRow|6|13|0||00| 34|IfZero|1|36|-1||00| 35|Next|1|14|0||00| 36|Close|0|0|0||00| 37|Close|1|0|0||00| 38|Halt|0|0|0||00| 39|Transaction|0|0|0||00| 40|VerifyCookie|0|10|0||00| 41|TableLock|0|2|0|table1|00| 42|Goto|0|11|0||00| SELECT * FROM table1 WHERE data=1 ORDER BY rowid DESC LIMIT 250 OFFSET 5; 0|Trace|0|0|0||00| 1|Noop|0|0|0||00| 2|Integer|250|1|0||00| 3|Integer|5|2|0||00| 4|MustBeInt|2|0|0||00| 5|IfPos|2|7|0||00| 6|Integer|0|2|0||00| 7|Add|1|2|3||00| 8|IfPos|1|10|0||00| 9|Integer|-1|3|0||00| 10|Integer|1|4|0||00| 11|Goto|0|40|0||00| 12|OpenRead|0|2|0|13|00| 13|OpenRead|2|31079|0|keyinfo(1,BINARY)|00| 14|SeekLe|2|37|4|1|00| 15|IdxLT|2|37|4|1|00| 16|IdxRowid|2|5|0||00| 17|Seek|0|5|0||00| 18|AddImm|2|-1|0||00| 19|IfNeg|2|21|0||00| 20|Goto|0|36|0||00| 21|IdxRowid|2|6|0||00| 22|Column|2|0|7||00| 23|Column|0|2|8||00| 24|Column|0|3|9||00| 25|Column|0|4|10||00| 26|Column|0|5|11||00| 27|Column|0|6|12||00| 28|Column|0|7|13||00| 29|Column|0|8|14||00| 30|Column|0|9|15||00| 31|Column|0|10|16||00| 32|Column|0|11|17||00| 33|Column|0|12|18||00| 34|ResultRow|6|13|0||00| 35|IfZero|1|37|-1||00| 36|Prev|2|15|0||00| 37|Close|0|0|0||00| 38|Close|2|0|0||00| 39|Halt|0|0|0||00| 40|Transaction|0|0|0||00| 41|VerifyCookie|0|10|0||00| 42|TableLock|0|2|0|table1|00| 43|Goto|0|12|0||00| Is this the expected output? If so, ANALYZE was to blame. The query containing 'ORDER BY rowid DESC' is still slower than the one which doesn't specify any order, but the results are closer to eachother now. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Dan Kennedy > > Your EXPLAIN output shows that it is doing a linear scan of > table1. Which is different from what I get here with the same > schema and query. When I run them here, both queries (with and > without the "ORDER BY rowid") use the same query plan. > > Do you have an sqlite_stat1 table in the database (created by > running ANALYZE)? What is the output of the shell command > How about the contents of the "sqlite_stat1" table? What does > the shell command ".dump sqlite_stat1" show? You are right, I issues an ANALYSE a couple of days ago, I completely forgot about that. Maybe that's causing a negative impact on the performance. sqlite> .dump sqlite_stat1 PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; ANALYZE sqlite_master; INSERT INTO "sqlite_stat1" VALUES('table1','ididx','463923 66275'); COMMIT; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 10/14/2011 10:13 PM, Fabian wrote: 2011/10/14 Dan Kennedy Good question. Can you enter the following commands into the shell tool and post the complete output (no "QUERY PLAN" this time): .version .schema .explain EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET 50; EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50; sqlite> .version SQLite 3.7.8 2011-09-19 14:49:19 3e0da808d2f5b4d12046e05980ca04578f581177 sqlite> .schema CREATE TABLE table1(id INTEGER PRIMARY KEY, data INT); CREATE INDEX ididx ON table1(data); sqlite> .explain sqlite> EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET 50; Your EXPLAIN output shows that it is doing a linear scan of table1. Which is different from what I get here with the same schema and query. When I run them here, both queries (with and without the "ORDER BY rowid") use the same query plan. Do you have an sqlite_stat1 table in the database (created by running ANALYZE)? What is the output of the shell command How about the contents of the "sqlite_stat1" table? What does the shell command ".dump sqlite_stat1" show? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Dan Kennedy > > Good question. Can you enter the following commands into > the shell tool and post the complete output (no "QUERY PLAN" > this time): > > .version > .schema > .explain > > EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET > 50; > > EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50; > > sqlite> .version SQLite 3.7.8 2011-09-19 14:49:19 3e0da808d2f5b4d12046e05980ca04578f581177 sqlite> .schema CREATE TABLE table1(id INTEGER PRIMARY KEY, data INT); CREATE INDEX ididx ON table1(data); sqlite> .explain sqlite> EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET 50; addr opcode p1p2p3p4 p5 comment - - -- - 0 Trace 0 0 000 1 Noop 0 0 000 2 Integer250 1 000 3 Integer5 2 000 4 MustBeInt 2 0 000 5 IfPos 2 7 000 6 Integer0 2 000 7 Add1 2 300 8 IfPos 1 10000 9 Integer-13 000 10Integer1 4 000 11Goto 0 36000 12OpenRead 0 2 0 12 00 13Rewind 0 34000 14Column 0 1 500 15Ne 4 335 collseq(BINARY) 6c 16AddImm 2 -1000 17IfNeg 2 19000 18Goto 0 33000 19Rowid 0 7 000 20Column 0 1 800 21Column 0 2 900 22Column 0 3 10 00 23Column 0 4 11 00 24Column 0 5 12 00 25Column 0 6 13 00 26Column 0 7 14 00 27Column 0 8 15 00 28Column 0 9 16 00 29Column 0 1017 00 30Column 0 1118 00 31ResultRow 7 12000 32IfZero 1 34-1 00 33Next 0 14001 34Close 0 0 000 35Halt 0 0 000 36Transaction0 0 000 37VerifyCookie 0 10000 38TableLock 0 2 0 table1 00 39Goto 0 12000 EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50; addr opcode p1p2p3p4 p5 comment - - -- --- 0 Trace 0 0 000 1 Integer250 1 000 2 Integer5 2 000 3 MustBeInt 2 0 000 4 IfPos 2 6 000 5 Integer0 2 000 6 Add1 2 300 7 IfPos 1 9 000 8 Integer-13 000 9 Integer1 4 000 10Goto 0 38000 11OpenRead 0 2 0 12 00 12OpenRead 1 123405 0 keyinfo(1,BINARY) 00 13SeekGe 1 354 1 00 14IdxGE 1 354 1 01 15IdxRowid 1 5 000 16Seek 0 5 000 17AddImm 2 -1000 18IfNeg 2 20000 19Goto 0 34000 20IdxRowid 1 6 000 21Column 1 0 700 22Column 0 2 800 23Column 0 3 900 24Column 0 4 10 00 25Column 0 5 11 00 26Column 0 6 12 00 27Column 0 7 13 00 28Column 0 8 14 00 29Column 0 9 15
Re: [sqlite] Slow JOIN on ROWID
On 10/14/2011 09:34 PM, Fabian wrote: 2011/10/14 Dan Kennedy If SQLite cannot use an index to for an ORDER BY in a SELECT query and has to do an external sort, the EXPLAIN QUERY PLAN output will have something like this in it: 0|0|0|USE TEMP B-TREE FOR ORDER BY Since my EXPLAIN does not show this, what else could be the cause for the large decrease in performance when ordering by rowid? Good question. Can you enter the following commands into the shell tool and post the complete output (no "QUERY PLAN" this time): .version .schema .explain EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET 50; EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 14 Oct 2011, at 2:59pm, Dan Kennedy wrote: > On 10/14/2011 07:40 PM, Simon Slavin wrote: >> >> I'm sorry, I completely missed the 'data = 10' earlier. If you have an >> index on the data column then that's the index SQLite would used for that >> query. Once it has picked that index it no longer has access to the rowid >> index. >> >> If you want an index that would be ideal for >> >> SELECT * FROM table WHERE data = 10 ORDER BY rowid LIMIT 250 OFFSET 50 >> >> it would be one index on the two columns together: >> >> CREATE INDEX tdr ON table (data, rowid) > > This is a logical conclusion to draw, but it is not actually > necessary. All indexes created by SQLite implicitly have the > rowid as the final column. So all you really need for the > query above is: > > CREATE INDEX tdr ON table1(data); I understand your reasoning, but what the OP reports suggests that SQLite is not working this way. > Note that this: > > CREATE INDEX tdr ON table1(data, someothercolumn); > > would not work quite as well. Since the index would not help > with the ORDER BY. The precise index I recommended was specially picked for the precise SELECT I mentioned. The 'othercolumn' is, in fact, the column that the OP wanted in his 'ORDER BY'. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Dan Kennedy > > If SQLite cannot use an index to for an ORDER BY in a SELECT > query and has to do an external sort, the EXPLAIN QUERY PLAN > output will have something like this in it: > > 0|0|0|USE TEMP B-TREE FOR ORDER BY > > Since my EXPLAIN does not show this, what else could be the cause for the large decrease in performance when ordering by rowid? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 10/14/2011 07:40 PM, Simon Slavin wrote: On 14 Oct 2011, at 1:36pm, David Bicking wrote: On 10/14/2011 06:39 AM, Fabian wrote: When I execute: SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50 It's very fast, but it's get much slower (10 times) when I add an ORDER BY clause, like rowid ASC or rowid DESC. I'm trying to understand why this is. It seems like SQLite is actually performing an actual sort behind the scenes, while I expected it to just iterate in reverse order (because rowid is always incremental), which should give comparable performance as the first query? What does EXPLAIN QUERY PLAN say? I believe Sqlite can only use one index per table. I'm sorry, I completely missed the 'data = 10' earlier. If you have an index on the data column then that's the index SQLite would used for that query. Once it has picked that index it no longer has access to the rowid index. If you want an index that would be ideal for SELECT * FROM table WHERE data = 10 ORDER BY rowid LIMIT 250 OFFSET 50 it would be one index on the two columns together: CREATE INDEX tdr ON table (data, rowid) This is a logical conclusion to draw, but it is not actually necessary. All indexes created by SQLite implicitly have the rowid as the final column. So all you really need for the query above is: CREATE INDEX tdr ON table1(data); Note that this: CREATE INDEX tdr ON table1(data, someothercolumn); would not work quite as well. Since the index would not help with the ORDER BY. If SQLite cannot use an index to for an ORDER BY in a SELECT query and has to do an external sort, the EXPLAIN QUERY PLAN output will have something like this in it: 0|0|0|USE TEMP B-TREE FOR ORDER BY Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On Oct 14, 2011, at 3:37 PM, Fabian wrote: > 2011/10/14 Petite Abeille > >> >> Hurray! Now you must have the finest query ever to grace the intraweb! A >> true work of beauty :)) >> >> > Here in Belgium we have a saying: Who doesnt honor the petite, is not worth > the big ;) "The details are not the details, the details make the product." -- Charles Eames ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Petite Abeille > > Hurray! Now you must have the finest query ever to grace the intraweb! A > true work of beauty :)) > > Here in Belgium we have a saying: Who doesnt honor the petite, is not worth the big ;) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On Oct 14, 2011, at 2:49 PM, Fabian wrote: > That explains everything! Hurray! Now you must have the finest query ever to grace the intraweb! A true work of beauty :)) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 14 Oct 2011, at 1:49pm, Fabian wrote: > 2011/10/14 Simon Slavin > >> If you have an index on the data column then that's the index SQLite would >> used for that query. Once it has picked that index it no longer has access >> to the rowid index. >> >> CREATE INDEX tdr ON table (data, rowid) > > Thanks! That explains everything! Can I remove the index for 'data' if I > create an index like that, or do I need to keep them both? You can remove the index on the single column. SQLite is smart enough to use the compound index even if it needs only the columns at the beginning of it. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Simon Slavin > > If you have an index on the data column then that's the index SQLite would > used for that query. Once it has picked that index it no longer has access > to the rowid index. > > CREATE INDEX tdr ON table (data, rowid) > > Thanks! That explains everything! Can I remove the index for 'data' if I create an index like that, or do I need to keep them both? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Petite Abeille > > Much? Really? I get the broadly same execution time for either variant: > > 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~2192503 rows) > > 0|0|0|SCAN TABLE mail_header (~2192503 rows) > > I get SELECT mail_header.rowid FROM mail_header WHERE data = 1 ORDER BY mail_header.rowid DESC LIMIT 250 OFFSET 142750 0 0 0 SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~46392 rows) VS SELECT mail_header.rowid FROM mail_header WHERE data = 1 LIMIT 250 OFFSET 142750 0 0 0 SEARCH TABLE mail_header USING INDEX ididx (id=?) (~66275 rows) So you seem to get the same speed in both queries, because you have no index on the column in the WHERE clause, making my EXPLAIN differs from yours? But when the column is indexed, the 'ORDER BY' makes the query much slower. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 14 Oct 2011, at 1:36pm, David Bicking wrote: > On 10/14/2011 06:39 AM, Fabian wrote: >> When I execute: >> >> SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50 >> >> It's very fast, but it's get much slower (10 times) when I add an ORDER BY >> clause, like rowid ASC or rowid DESC. >> >> I'm trying to understand why this is. It seems like SQLite is actually >> performing an actual sort behind the scenes, while I expected it to just >> iterate in reverse order (because rowid is always incremental), which should >> give comparable performance as the first query? > > What does EXPLAIN QUERY PLAN say? I believe Sqlite can only use one index per > table. I'm sorry, I completely missed the 'data = 10' earlier. If you have an index on the data column then that's the index SQLite would used for that query. Once it has picked that index it no longer has access to the rowid index. If you want an index that would be ideal for SELECT * FROM table WHERE data = 10 ORDER BY rowid LIMIT 250 OFFSET 50 it would be one index on the two columns together: CREATE INDEX tdr ON table (data, rowid) Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 10/14/2011 06:39 AM, Fabian wrote: Exactly. I still don't have optimal performance in the query (although it's much better now), and it seems to be related to ORDER BY. When I execute: SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50 It's very fast, but it's get much slower (10 times) when I add an ORDER BY clause, like rowid ASC or rowid DESC. I'm trying to understand why this is. It seems like SQLite is actually performing an actual sort behind the scenes, while I expected it to just iterate in reverse order (because rowid is always incremental), which should give comparable performance as the first query? What does EXPLAIN QUERY PLAN say? I believe Sqlite can only use one index per table. If you have an index on the data field, it is probably using that index to quickly get the data=10 condition, in which case it can't use the rowid index, and thus has to sort behind the scenes as you say. If it is using an index to find the data, I believe you can do +data = 10, which will invalidate the index use on data. (Hope I remember that right..) David ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On Oct 14, 2011, at 12:39 PM, Fabian wrote: > I still don't have optimal performance in the query (although it's much > better now), and it seems to be related to ORDER BY. Yes, order by has a cost. > When I execute: > > SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50 > > It's very fast, but it's get much slower (10 times) when I add an ORDER BY > clause, like rowid ASC or rowid DESC. Much? Really? I get the broadly same execution time for either variant: explain query plan selectmail_header.id from mail_header order by mail_header.id limit 250 offset5; 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~2192503 rows) CPU Time: user 0.006068 sys 0.000665 explain query plan selectmail_header.id from mail_header limit 250 offset5; 0|0|0|SCAN TABLE mail_header (~2192503 rows) CPU Time: user 0.005792 sys 0.000655 Note that the query with the order by will use the internal pk index, while the one without order will simply scan the table itself. > I'm trying to understand why this is. It seems like SQLite is actually > performing an actual sort behind the scenes, Yes. > while I expected it to just > iterate in reverse order (because rowid is always incremental), which should > give comparable performance as the first query? No order by = random order. Try PRAGMA reverse_unordered_selects = boolean;. For example, while rowid tend to increment monotonically, they can be reused, e.g. after a delete. You might want to read on rowid and autoincrement: http://www.sqlite.org/autoinc.html ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On 14 Oct 2011, at 11:39am, Fabian wrote: > I still don't have optimal performance in the query (although it's much > better now), and it seems to be related to ORDER BY. > > When I execute: > > SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50 > > It's very fast, but it's get much slower (10 times) 10 times slower mean nothing if the original query is extremely fast. Do you have times in milliseconds or something ? > when I add an ORDER BY > clause, like rowid ASC or rowid DESC. Just for laughs try making your own index on whatever value you think it's using for rowid. In other words, if you have a column in that table declared as partNumber INTEGER PRIMARY KEY declare an index on that column, even though theoretically SQLite shouldn't need it. > I'm trying to understand why this is. It seems like SQLite is actually > performing an actual sort behind the scenes, while I expected it to just > iterate in reverse order (because rowid is always incremental), which should > give comparable performance as the first query? I don't know enough about how SQLite works to guess that, but I do think there's something funny about what you report. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
> > > Exactly. > > I still don't have optimal performance in the query (although it's much better now), and it seems to be related to ORDER BY. When I execute: SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50 It's very fast, but it's get much slower (10 times) when I add an ORDER BY clause, like rowid ASC or rowid DESC. I'm trying to understand why this is. It seems like SQLite is actually performing an actual sort behind the scenes, while I expected it to just iterate in reverse order (because rowid is always incremental), which should give comparable performance as the first query? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On Oct 13, 2011, at 10:57 PM, Fabian wrote: > Thank you very much! This approach solved the problem. However, in my > situation I need to select a lot more columns than just 'id' from > 'mail_header', Feel free to select all the relevant columns from the inner query. > and when I look at the resulting query it appears it is > selecting all those columns twice. Well, if you are explicit, you are specifying them twice, once in the inner query, and once at the top level. If that bothers you, you could use '*' in the inner part. Even though this is not really advisable. Better to be explicit, at the cost of a bit more verbosity. > Would it be smart to change the query so that the inner loop only selects > mail_header.id, and adding a third join that fetches the extra columns from > 'mail_header'? No, that would defeat the purpose of the exercise, which is to minimize IO. > Or would the performance penalty from adding a third join > out-weight the advantage of selecting less columns in the inner loop? Exactly. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/12 Petite Abeille > > Now the join is performed only 250 times, adding just a small overhead > compare the the bare bone query without the join. > > The short of it: minimize the amount of work upfront :) > > Thank you very much! This approach solved the problem. However, in my situation I need to select a lot more columns than just 'id' from 'mail_header', and when I look at the resulting query it appears it is selecting all those columns twice. Would it be smart to change the query so that the inner loop only selects mail_header.id, and adding a third join that fetches the extra columns from 'mail_header'? Or would the performance penalty from adding a third join out-weight the advantage of selecting less columns in the inner loop? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
On Oct 12, 2011, at 5:16 PM, Fabian wrote: > Why is this very fast (20 ms): > 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) > And this very slow (3500ms): > 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) > 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows) The issue here is that offset works in term of your entire query. So for each row in table1 matching your where close, it's first going to do a join to table2, order the entire result set, skip the first half-a-million rows in the result set and then return the remaining 250 rows. A rather expensive proposition. Here is an example using two table: mail_header [1] and mail_header_text, a FTS table [2]. It's a one-to-one relationship. (0) Querying the count selectcount( * ) from mail_header where mail_header.header_id = 2 order by mail_header.id 0|0|0|SCAN TABLE mail_header (~219250 rows) CPU Time: user 0.690721 sys 0.064676 Ok, 83,391 rows at play. (1) Querying mail_header, with an offset explain query plan selectmail_header.id from mail_header where mail_header.header_id = 2 order by mail_header.id limit 250 offset5; 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows) CPU Time: user 0.390615 sys 0.037031 Ok, we get 250 rows, after sorting 83,391 rows and skipping 50,000 of them. (2) Same, but with join to mail_header_text explain query plan selectmail_header.id, mail_header_text.value from mail_header join mail_header_text onmail_header_text.docid = mail_header.id where mail_header.header_id = 2 order by mail_header.id limit 250 offset5; 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows) 0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows) CPU Time: user 2.153607 sys 0.265462 Note how it's an order of magnitude slower. This is because all these one-to-one joins on these 83K mail_header. They do have a cost. (3) Same, with a join, but with the offset factored out explain query plan selectmail_header.id, mail_header_text.value from ( selectmail_header.id from mail_header where mail_header.header_id = 2 order by mail_header.id limit 250 offset5 ) asmail_header join mail_header_text onmail_header_text.docid = mail_header.id; 1|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows) 0|0|0|SCAN SUBQUERY 1 AS mail_header (~250 rows) 0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows) CPU Time: user 0.402250 sys 0.039327 Now the join is performed only 250 times, adding just a small overhead compare the the bare bone query without the join. The short of it: minimize the amount of work upfront :) [1] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L271 [2] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L260 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Slow JOIN on ROWID
Why is this very fast (20 ms): SELECT table1.data1, table1.data2 FROM table1 WHERE table1.data1 = 10 ORDER BY table1.rowid DESC LIMIT 250 OFFSET 50 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) And this very slow (3500ms): SELECT table1.data1, table2.data2 FROM table1 JOIN table2 ON table1.rowid=table2.rowid WHERE table1.data1 = 10 ORDER BY table1.rowid DESC LIMIT 250 OFFSET 50 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows) The values in table1.data2 and table2.data2 are 100% identical, because my app has always duplicated all columns of table2 (fts4 table) into table1 to work around this issue. But I really like to solve it some day, because it doubles my database size. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users