[sqlite] Performance problem LEFT OUTER JOIN and string data from right table
Hey everyone. new to SQLite so please have patience with me having two tables and doing left outer join A IDI INTEGER PRIMARY KEY Parent INTEGER INDEX Status INTEGER B IDR UNIQUE INTEGER FOREIGN KEY IndexME.IDI Points TEXT (at average ~120 character string) (this table is primarily used for large data, so it contains some blobs etc...) SELECT A.IDI, B.Text FROM A LEFT OUTER JOIN B ON B.IDR = A.IDI WHERE Parent = ? query written here is a lot simplified (for example Points column is filtered using custom function) however main culprit seems to be LEFT OUTER JOIN as accessing that same column in query which only has B table in it is lightning fast. result of query is just around 40 rows, (due to Parent filter) and it takes ~2 seconds on I7 3.4Ghz with database stored on SSD HDD. am I doing something terribly wrong ? suggestions ? oh and I tried running ANALYZE on database connection without any visible improvement. thanks in advance, M. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem LEFT OUTER JOIN and string data fromright table
Mira Suk mira@centrum.cz wrote: query written here is a lot simplified (for example Points column is filtered using custom function) however main culprit seems to be LEFT OUTER JOIN as accessing that same column in query which only has B table in it is lightning fast. result of query is just around 40 rows, (due to Parent filter) You may want an index on A(Parent) then. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem LEFT OUTER JOIN and string datafromright table
Mira Suk wrote: query written here is a lot simplified (for example Points column is filtered using custom function) however main culprit seems to be LEFT OUTER JOIN as accessing that same column in query which only has B table in it is lightning fast. result of query is just around 40 rows, (due to Parent filter) You may want an index on A(Parent) then. -- Igor Tandetnik A.Parent is has Index query SELECT IDI FROM A WHERE Parent = ? is fast - aka this have nothing to with that condition. IMHO only with result column from B table. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem LEFT OUTER JOIN and stringdatafromright table
Mira Suk mira@centrum.cz wrote: Mira Suk wrote: query written here is a lot simplified (for example Points column is filtered using custom function) however main culprit seems to be LEFT OUTER JOIN as accessing that same column in query which only has B table in it is lightning fast. result of query is just around 40 rows, (due to Parent filter) You may want an index on A(Parent) then. A.Parent is has Index Ok then, show the result of prepending EXPLAIN QUERY PLAN to your statement. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem LEFT OUTER JOIN andstringdatafromright table
Ok then, show the result of prepending EXPLAIN QUERY PLAN to your statement. -- Igor Tandetnik First of all thanks for bearing with me :) functions TZB_MATCHRECURSIVE(int,int) - disabled for this test - always return 1. applies filter recursively TZB_ISCHILD(int) - bitmask check TZB_MATCHDIM(int,text) - disabled for this test - always return 1. does weird stuff with text parameter You probably by now realized the table forms tree of items. Parent links to IDI and this way tree structure is formed. parent parameter is for testing set to 0 - which is root of tree (query should return top-most folders) in this query all ItemsME.Status and ItemsME.Points are NULL. there is no entry in ItemsME for directories and there are no files in root test 1. query SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text] FROM [IndexME] LEFT OUTER JOIN [ItemsME] ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE [IndexME].[Parent] = ?1 AND (TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR (TZB_ISCHILD([IndexME].[Status]) AND TZB_MATCHDIM([ItemsME].[Status], [ItemsME].[Points]))) ORDER BY [IndexME].[Order]; plan SEARCH TABLE IndexME USING AUTOMATIC COVERING INDEX (Parent=?) (~5 rows) SEARCH TABLE ItemsME USING AUTOMATIC COVERING INDEX (IDR=?) (~5 rows) USE TEMP B-TREE FOR ORDER BY runtime ~2seconds test 2. (TZB_MATCHDIM removed) query SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text] FROM [IndexME] LEFT OUTER JOIN [ItemsME] ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE [IndexME].[Parent] = ?1 AND (TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR (TZB_ISCHILD([IndexME].[Status]))) ORDER BY [IndexME].[Order]; plan SEARCH TABLE IndexME USING AUTOMATIC COVERING INDEX (Parent=?) (~5 rows) SEARCH TABLE ItemsME USING AUTOMATIC COVERING INDEX (IDR=?) (~5 rows) USE TEMP B-TREE FOR ORDER BY runtime ~160miliseconds test 3. (TZB_MATCHDIM replaced with LENGTH to prove it's not my func) results equal to 1) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem LEFT OUTER JOINandstringdatafromright table
Mira Suk mira@centrum.cz wrote: test 1. query SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text] FROM [IndexME] LEFT OUTER JOIN [ItemsME] ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE [IndexME].[Parent] = ?1 AND (TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR (TZB_ISCHILD([IndexME].[Status]) AND TZB_MATCHDIM([ItemsME].[Status], [ItemsME].[Points]))) ORDER BY [IndexME].[Order]; test 2. (TZB_MATCHDIM removed) query SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text] FROM [IndexME] LEFT OUTER JOIN [ItemsME] ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE [IndexME].[Parent] = ?1 AND (TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR (TZB_ISCHILD([IndexME].[Status]))) ORDER BY [IndexME].[Order]; The difference is that #2 mentions only one field from ItemsME, namely IDR. The value of that field comes from the index, the table itself doesn't need to be read at all. It's not even clear why #2 bothers to join with ItemsME at all - it's a no-op. #1 uses more fields from ItemsME, so it needs to actually look up and read records from that table. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem LEFT OUTERJOINandstringdatafromright table
The difference is that #2 mentions only one field from ItemsME, namely IDR. The value of that field comes from the index, the table itself doesn't need to be read at all. It's not even clear why #2 bothers to join with ItemsME at all - it's a no-op. #1 uses more fields from ItemsME, so it needs to actually look up and read records from that table. However fact is that IDR is non-existant - for that specific query (which SQLite should see from index) so it should not touch table - just fill with NULLs. The more important thing is - even if I add [ItemsME].[insert whatever integer field here] into result fields it's same speed as in case 2) - that is relatively fast. Just touching any text/blob fields in query make it run ~12x slower. And considering resultset containing few rows even if all texts were split over 2 pages (none is longer than 300 chars) this would equal to records * 2kB IO. in case of 50 records that's 100kB. Do you really think SQLite needs 1.8 seconds for 100kB in reads on SSD drive ? I certainly hope not. Maybe there is reason RIGHT JOIN isn't implemented ? BTW even the 2nd case is considerably slower than single table selects. the TZB_MATCHRECURSIVE is doing way more difficult job (selecting all childs and filtering them) but only in single table and is still faster (around 40ms per top level parent folder which usually contain ~700 folders/items). Just out of curiosity is it possible to download some older versions of SQLite to test if this behaviour is present in it ? I can only find current version on download page. nOOb here. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance Problem
Hi, I'm using Motorola MC55 device, with 2GB external memory card. For the SQlite Db I have used the following Pragma values PRAGMA cache_size = 16000 PRAGMA temp_store = 2 PRAGMA synchronous = OFF PRAGMA locking_mode = EXCLUSIVE for some performance improvement For insertion of records I use sqlite3_prepare then binding the values using sqlite3_bind_int64 and finally executing using sqlite3_step We use to insert a set of 10 records every time, so I use BEGIN and COMMIT transaction . Despite these the insertion is slow, like it takes around 700 milliseconds to insert one record, is there anyother way to improve the performance. Kindly help in this regard Regards, Sasikumar U =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Problem
On Wed, Feb 16, 2011 at 6:13 AM, sasikuma...@tcs.com wrote: Hi, I'm using Motorola MC55 device, with 2GB external memory card. For the SQlite Db I have used the following Pragma values PRAGMA cache_size = 16000 PRAGMA temp_store = 2 PRAGMA synchronous = OFF PRAGMA locking_mode = EXCLUSIVE for some performance improvement For insertion of records I use sqlite3_prepare then binding the values using sqlite3_bind_int64 and finally executing using sqlite3_step We use to insert a set of 10 records every time, so I use BEGIN and COMMIT transaction . Despite these the insertion is slow, like it takes around 700 milliseconds to insert one record, is there anyother way to improve the performance. Kindly help in this regard I have no idea what a Motorola MC55 is, but either it is a very slow device or else you are doing something wrong. Some things to try: (1) Increase the number of inserts within a single transaction to 1000. Does that make it 100 times faster? (It should.) (2) Set PRAGMA journal_mode=WAL; (3) Populate your database on a workstation then copy it over to the device. Regards, Sasikumar U =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Problem
of sasikuma...@tcs.com [sasikuma...@tcs.com] Sent: Wednesday, February 16, 2011 5:13 AM To: sqlite-users@sqlite.org Subject: EXT :[sqlite] Performance Problem Hi, I'm using Motorola MC55 device, with 2GB external memory card. For the SQlite Db I have used the following Pragma values PRAGMA cache_size = 16000 PRAGMA temp_store = 2 PRAGMA synchronous = OFF PRAGMA locking_mode = EXCLUSIVE for some performance improvement For insertion of records I use sqlite3_prepare then binding the values using sqlite3_bind_int64 and finally executing using sqlite3_step We use to insert a set of 10 records every time, so I use BEGIN and COMMIT transaction . Despite these the insertion is slow, like it takes around 700 milliseconds to insert one record, is there anyother way to improve the performance. Kindly help in this regard Regards, Sasikumar U =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Problem
) { sprintf(SqlTxt, UPDATE t2 SET n = %d;, n+1); sqlite3_exec(db, SqlTxt,NULL,NULL,NULL); } } if (interval != 1) sqlite3_exec(db,commit,NULL,NULL,NULL); printf(commits per second: %.1f\n,nrec/(elapsed()-t1)); sqlite3_close(db); } Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of sasikuma...@tcs.com [sasikuma...@tcs.com] Sent: Wednesday, February 16, 2011 5:13 AM To: sqlite-users@sqlite.org Subject: EXT :[sqlite] Performance Problem Hi, I'm using Motorola MC55 device, with 2GB external memory card. For the SQlite Db I have used the following Pragma values PRAGMA cache_size = 16000 PRAGMA temp_store = 2 PRAGMA synchronous = OFF PRAGMA locking_mode = EXCLUSIVE for some performance improvement For insertion of records I use sqlite3_prepare then binding the values using sqlite3_bind_int64 and finally executing using sqlite3_step We use to insert a set of 10 records every time, so I use BEGIN and COMMIT transaction . Despite these the insertion is slow, like it takes around 700 milliseconds to insert one record, is there anyother way to improve the performance. Kindly help in this regard Regards, Sasikumar U =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance problem with count(*) calculation
Hello! $ time sqlite3 test32k.db select count(*) from role_exist 1250 real0m58.908s user0m0.056s sys 0m0.864s $ sqlite3 test32k.db SQLite version 3.6.23 sqlite .schema role_exist CREATE TABLE role_exist ( id INTEGER PRIMARY KEY, uid BLOB NOT NULL DEFAULT (randomblob(16)) UNIQUE ON CONFLICT REPLACE ); So 58s for count of all records! The count(*) for all records may use the counter from primary key b-tree, is't it? == HARDWARE: $ grep CPU /proc/cpuinfo model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz $ free total used free sharedbuffers cached Mem: 83108927552880 758012 0 294966667708 -/+ buffers/cache: 8556767455216 Swap: 3903784 3012403602544 Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem with count(*) calculation
How does $ time sqlite3 test32k.db select count(1) from role_exist perform? On Thu, Apr 1, 2010 at 5:52 AM, Alexey Pechnikov pechni...@mobigroup.ruwrote: Hello! $ time sqlite3 test32k.db select count(*) from role_exist 1250 real0m58.908s user0m0.056s sys 0m0.864s $ sqlite3 test32k.db SQLite version 3.6.23 sqlite .schema role_exist CREATE TABLE role_exist ( id INTEGER PRIMARY KEY, uid BLOB NOT NULL DEFAULT (randomblob(16)) UNIQUE ON CONFLICT REPLACE ); So 58s for count of all records! The count(*) for all records may use the counter from primary key b-tree, is't it? == HARDWARE: $ grep CPU /proc/cpuinfo model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPUQ6700 @ 2.66GHz $ free total used free sharedbuffers cached Mem: 83108927552880 758012 0 294966667708 -/+ buffers/cache: 8556767455216 Swap: 3903784 3012403602544 Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- VerifEye Technologies Inc. 905-948-0015x245 7100 Warden Ave, Unit 3 Markham ON, L3R 8B5 Canada ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem with count(*) calculation
So 58s for count of all records! The count(*) for all records may use the counter from primary key b-tree, is't it? What does this mean? I believe there's no any kind of counters in b-tree. If you meant counter from auto-increment key then how about gaps in the middle? Pavel On Thu, Apr 1, 2010 at 5:52 AM, Alexey Pechnikov pechni...@mobigroup.ru wrote: Hello! $ time sqlite3 test32k.db select count(*) from role_exist 1250 real 0m58.908s user 0m0.056s sys 0m0.864s $ sqlite3 test32k.db SQLite version 3.6.23 sqlite .schema role_exist CREATE TABLE role_exist ( id INTEGER PRIMARY KEY, uid BLOB NOT NULL DEFAULT (randomblob(16)) UNIQUE ON CONFLICT REPLACE ); So 58s for count of all records! The count(*) for all records may use the counter from primary key b-tree, is't it? == HARDWARE: $ grep CPU /proc/cpuinfo model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz $ free total used free shared buffers cached Mem: 8310892 7552880 758012 0 29496 6667708 -/+ buffers/cache: 855676 7455216 Swap: 3903784 301240 3602544 Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem with count(*) calculation
On Thu, Apr 01, 2010 at 10:44:51AM -0400, Pavel Ivanov scratched on the wall: So 58s for count of all records! The count(*) for all records may use the counter from primary key b-tree, is't it? What does this mean? I believe there's no any kind of counters in b-tree. If you meant counter from auto-increment key then how about gaps in the middle? If a statement is in the form SELECT count(*) FROM table there is an optimization that short-cuts the standard aggregate functions and gets the leaf count directly from a BTree. However, I believe you are correct that the BTree does not keep a global leaf count in the header, so the whole tree must be loaded and walked. The optimization will favor a low-column index BTree over the table root BTree, since this will typically require less I/O. Time is also saved in short-cutting the column loading and aggregation functions (countStep() and countFinalize()). But in the end, it looks like most of the tree is still loaded. See the function isSimpleCount() in the source for more specifics. -j -- Jay A. Kreibich J A Y @ K R E I B I.C H Our opponent is an alien starship packed with atomic bombs. We have a protractor. I'll go home and see if I can scrounge up a ruler and a piece of string. --from Anathem by Neal Stephenson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem with count(*) calculation
Hello! On Thursday 01 April 2010 18:04:10 Adam DeVita wrote: How does $ time sqlite3 test32k.db select count(1) from role_exist perform? Equal to count(*). Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance problem for a simple select with range
First, thanks, your suggestion worked. To my surprise, it was enough to add limit 1 to the original query. So: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum limit 1 takes about 1.398-005 seconds and select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum takes about 3 seconds. Igor Tandetnik wrote: Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations( locidINTEGER PRIMARY KEY, country TEXT, regionTEXT, cityTEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER) CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: http://www.maxmind.com/app/geolitecity Blocks table has 2,776,436 rows Locations table has 159,488 rows After inserting the data I run analyze. My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) Performance issues: I use python's sqlite3 module to run the query. With this configuration it takes about 0.6 seconds to complete the query. I think this is too slow. I could write a binary tree myself and have searches like this take, O(log(num_rows)) which is 7*something_which_shouldnt_take_too_much. Am I wrong? And what would you use as a key for this binary tree? I bet you would utilize additional information that the DB engine doesn't have - that your blocks don't overlap (they don't, right?) Try coming up with a search strategy without making this assumption. Try this: create an index on startIpNum, and run a query like this: select * from blocks, locations where blocks.startIpNum = ? and blocks.locid = locations.locid order by blocks.startIpNum desc limit 1; This gives you the record with the largest value of startIpNum that is still smaller than the threshold, and should be very fast. It can produce a false positive - make the additional check for (? = startIpEnd) in your application code. Don't put this check into the query though, or you will force it back into O(N) behavior in case your target value doesn't fall within any block after all. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] - -- View this message in context: http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13509241 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Performance problem for a simple select with range
I'm not an SQL guru by any means, so seeing this made a light go on. Does that mean it is a good idea in the general case to always add limit 1 to a select that you know should only return 1 row? I'm assuming this works because the engine can short-cut out as soon as it finds that first matching row. -Original Message- From: Dani Va [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 31, 2007 8:30 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Performance problem for a simple select with range First, thanks, your suggestion worked. To my surprise, it was enough to add limit 1 to the original query. So: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum limit 1 takes about 1.398-005 seconds and select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum takes about 3 seconds. Igor Tandetnik wrote: Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations( locidINTEGER PRIMARY KEY, country TEXT, regionTEXT, cityTEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER) CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: http://www.maxmind.com/app/geolitecity Blocks table has 2,776,436 rows Locations table has 159,488 rows After inserting the data I run analyze. My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) Performance issues: I use python's sqlite3 module to run the query. With this configuration it takes about 0.6 seconds to complete the query. I think this is too slow. I could write a binary tree myself and have searches like this take, O(log(num_rows)) which is 7*something_which_shouldnt_take_too_much. Am I wrong? And what would you use as a key for this binary tree? I bet you would utilize additional information that the DB engine doesn't have - that your blocks don't overlap (they don't, right?) Try coming up with a search strategy without making this assumption. Try this: create an index on startIpNum, and run a query like this: select * from blocks, locations where blocks.startIpNum = ? and blocks.locid = locations.locid order by blocks.startIpNum desc limit 1; This gives you the record with the largest value of startIpNum that is still smaller than the threshold, and should be very fast. It can produce a false positive - make the additional check for (? = startIpEnd) in your application code. Don't put this check into the query though, or you will force it back into O(N) behavior in case your target value doesn't fall within any block after all. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] - -- View this message in context: http://www.nabble.com/Performance-problem-for-a- simple-select-with-range-tf4711654.html#a13509241 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem for a simple select with range
Igor Tandetnik wrote: Try searching for a value that doesn't fall into any block - you'll likely find that the query takes a noticeable time to produce zero records. Pick a large value that's greater than all startIpNum's. Yes, you are right. That's why I'm going with the original query you suggested. Thanks again Dani -- View this message in context: http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13517991 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Performance problem for a simple select with range
I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations( locidINTEGER PRIMARY KEY, country TEXT, regionTEXT, cityTEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER) CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: http://www.maxmind.com/app/geolitecity Blocks table has 2,776,436 rows Locations table has 159,488 rows After inserting the data I run analyze. My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) Disclaimer: I'm a bit new to databases. Performance issues: I use python's sqlite3 module to run the query. With this configuration it takes about 0.6 seconds to complete the query. I think this is too slow. I could write a binary tree myself and have searches like this take, O(log(num_rows)) which is 7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer) Anyway, I thought the problem was that startIpNum, endIpNum are not indexed. So I added indices for them (even tried indexing them both). This only makes the query take about 3 seconds. Ideas anyone? Source: is attached. Thank you for your help -- Dani http://daniva.googlepages.com -- Dani http://daniva.googlepages.com '''geolite GeoLite City is a free IP to city database provided by MaxMind. They provide a C API (and a python wrapper) for the database. If you can't compile the C sources on your server (or get a binary version), this script might be helpful for you. The script puts the geoip data in a sqllite database, and provides interfaces for updating and searching the database. To use this script, get the database in CSV format: http://www.maxmind.com/app/geolitecity You also need to have python 2.5 for this script (sqlite3 is used) ''' import sqlite3 as sqlite import os def dottedQuadToNum(ip): convert decimal dotted quad string to long integer hexn = ''.join([%02X % long(i) for i in ip.split('.')]) return long(hexn, 16) def cursorToDict(cursor): val = cursor.next() return dict([(cursor.description[i][0],val[i]) for i in xrange(len(cursor.description))]) def test(): import sqlite3 from time import clock x = sqlite3.connect('geolite.db') y = x.cursor() ip = dottedQuadToNum(84.108.189.94) res = y.execute('select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum', [ip,ip]) begin = clock() f = res.next() end = clock() y.close() x.close() return end-begin, f def test2(): from time import clock x = GeoLiteDB() x.connect(); begin = clock() x.ipLocation(84.108.189.94); end = clock() x.close() return end - begin def createDB(dbPath = 'geolite.db', locationsPath='GeoLiteCity-Location.csv', blocksPath='GeoLiteCity-Blocks.csv', warnOnDelete = True): if os.path.exists(dbPath): if warnOnDelete: print file %s will be deleted. Press any key to continue, or 'n' to abort... % (os.path.abspath(dbPath)) if getch() == 'n': print 'aborted.' return None os.remove(os.path.abspath(dbPath)) conn = sqlite.connect(dbPath) cursor = conn.cursor() try: cursor.execute('''CREATE TABLE locations( locid INTEGER PRIMARY KEY, country TEXT, region TEXT, city TEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER)''') cursor.execute('''CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER)''') locations = file(locationsPath,'r') print ('parsing locations. This will a while.') print locations.readline().strip() #should print copyright note print locations.readline().strip() #should print column names lines = ([x.strip('') for x in line.strip().split(',')] for line in locations.xreadlines()) cursor.executemany('insert into locations values (?,?,?,?,?,?,?,?,?)', lines) locations.close() blocks = file(blocksPath,'r') print ('parsing blocks. This will take longer.') print blocks.readline().strip() #should print copyright note print blocks.readline().strip() #should print column names lines = ([x.strip('') for x in line.strip().split(',')] for line in blocks.xreadlines()) cursor.executemany('insert into blocks values (?,?,?)', lines) blocks.close() #cursor.execute('''CREATE UNIQUE INDEX startIpNumIx ON blocks(startIpNum);''') # cursor.execute('''CREATE UNIQUE INDEX endIpNumIx ON blocks(endIpNum);''') conn.commit() print 'analyze' cursor.execute('''ANALYZE;''') numBlocks = cursor.execute('select count(*) from
Re: [sqlite] Performance problem for a simple select with range
[Default] On Mon, 29 Oct 2007 15:25:18 +0200, Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations( locidINTEGER PRIMARY KEY, country TEXT, regionTEXT, cityTEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER) CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: http://www.maxmind.com/app/geolitecity Blocks table has 2,776,436 rows Locations table has 159,488 rows After inserting the data I run analyze. My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) Disclaimer: I'm a bit new to databases. Performance issues: I use python's sqlite3 module to run the query. With this configuration it takes about 0.6 seconds to complete the query. I think this is too slow. I could write a binary tree myself and have searches like this take, O(log(num_rows)) which is 7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer) Anyway, I thought the problem was that startIpNum, endIpNum are not indexed. So I added indices for them (even tried indexing them both). This only makes the query take about 3 seconds. Ideas anyone? Source: is attached. Thank you for your help Just some suggestions: Index locid in both tables, and rewrite select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum to: select * from blocks INNER JOIN locations USING (locid) where ? = blocks.startIpNum AND ? = blocks.endIpNum or: select * from locations INNER JOIN blocks USING (locid) where ? = blocks.startIpNum AND ? = blocks.endIpNum HTH -- ( Kees Nuyt ) c[_] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem for a simple select with range
Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations(locidINTEGER PRIMARY KEY, ...); CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: Blocks table has 2,776,436 rows Locations table has 159,488 rows My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) To do searches of this kind with maximum efficiency, you normally want to use a 1-dimensional R-Tree index. SQLite does not support RTree indices natively, though it is conceivable that you could write a RTree virtual table extension for SQLite. Without an RTree index, and unless you can exploit the distribution of data in the blocks table, you really cannot do much better than a full table scan on blocks with an indexed lookup of locations for each matching block. That is probably what is happening on your original query before you added indices. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Performance problem
Dear experts: I'm having a performance problem I can't understand. I am running a select count(*) query joining a table on itself, and the query runs for five minutes using Sqlite3.exe before I get bored and kill it. This is on a dual-core box with 4GB of memory, running Windows XP Pro. The Sqlite version is 3.3.7. Here's the problem query with the plan: select count(*) from keyword a, keyword b where a.key=b.key and a.value='music' and b.value='history'; 0|0|TABLE keyword AS a WITH INDEX value 1|1|TABLE keyword AS b WITH INDEX value Here's the schema CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX key on keyword(key); CREATE INDEX nextword on keyword(nextword); CREATE INDEX value on keyword(value); The table has 3,486,410 records and the SQLite database totals 320MB. There are a few small tables in the db besides the KEYWORD table. 4,318 records have value='music' and 27,058 have value='history'. The keys are 12-byte strings. That doesn't seem like an extreme case to me. Using DBI::ODBC::SQLite in a web application the result is just as bad -- the server times out. Any suggestions would be much appreciated! Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem
[EMAIL PROTECTED] wrote: Stephen Toney [EMAIL PROTECTED] wrote: Here's the problem query with the plan: select count(*) from keyword a, keyword b where a.key=b.key and a.value='music' and b.value='history'; A faster approach would be: SELECT (SELECT count(*) FROM keyword WHERE value='music')* (SELECT count(*) FROM keyword WHERE value='history'); Never mind. I overlooked the a.key=b.key term in your original query -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Performance problem
Thanks, Igor, Richard, and Tom, Why doesn't SQLite use the index on key? I can see from the plan that it doesn't, but why not? Can only one index be used per query? This seems strange. I have used SQL Server and Visual Foxpro for this same problem, and they both handle this query in a second if the indexes are there. Is there a good place to read more about this SQLite behavior? I'm fairly familiar with the online documentation and don't recall reading this. Thanks a million! Stephen On Thu, 2007-03-01 at 07:54 -0500, Tom Briggs wrote: You will likely be well served by a compound index on (value,key). As the schema stands now, the indexes will help find records with matching values, but not with matching keys; providing one index that correlates the two should help. Disclaimer: I haven't recreated your schema, added said index and checked that the query plan produced is better. Don't assume this to be good advice without trying it. :) -Tom -Original Message- From: Stephen Toney [mailto:[EMAIL PROTECTED] Sent: Thursday, March 01, 2007 7:00 AM To: sqlite-users@sqlite.org Subject: [sqlite] Performance problem Dear experts: I'm having a performance problem I can't understand. I am running a select count(*) query joining a table on itself, and the query runs for five minutes using Sqlite3.exe before I get bored and kill it. This is on a dual-core box with 4GB of memory, running Windows XP Pro. The Sqlite version is 3.3.7. Here's the problem query with the plan: select count(*) from keyword a, keyword b where a.key=b.key and a.value='music' and b.value='history'; 0|0|TABLE keyword AS a WITH INDEX value 1|1|TABLE keyword AS b WITH INDEX value Here's the schema CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX key on keyword(key); CREATE INDEX nextword on keyword(nextword); CREATE INDEX value on keyword(value); The table has 3,486,410 records and the SQLite database totals 320MB. There are a few small tables in the db besides the KEYWORD table. 4,318 records have value='music' and 27,058 have value='history'. The keys are 12-byte strings. That doesn't seem like an extreme case to me. Using DBI::ODBC::SQLite in a web application the result is just as bad -- the server times out. Any suggestions would be much appreciated! Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com -- --- To unsubscribe, send email to [EMAIL PROTECTED] -- --- -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem
On Thu, 2007-03-01 at 12:46 +, [EMAIL PROTECTED] wrote: Or maybe better yet: Have you looked into using FTS2 for whatever it is you are trying to do? Full-text search is hard to get right and you appear to be trying to create your own. Why not use a FTS subsystem that is already written and testing and available to you? -- D. Richard Hipp [EMAIL PROTECTED] Several reasons: 1. App is 10 years old and working well with other DBMSs, so why mess with it? This problem only occurred since using SQLite as the DBMS. 2. Queries must work with other DBMSs with minimal tinkering (SQL Server, Oracle, Foxpro, etc.) -- using ODBC. 3. Our indexing is tuned to museums, libraries, and other cultural organizations. For example, certain characters are converted before indexing (such as OE diphthong to the two letters OE). We also index words with hyphens and apostrophes both with and without the punctuation so the searcher can enter them various ways. 4. We do not preserve case in the index, so it can ignore incorrect capitalization in the search terms. Maybe FTS does this too? 5. For historical reasons, we use NCRs like #233; instead of UTF-8. Our programs remove these before indexing. I am considering FTS for another project though. I appreciate the suggestion! Stephen -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Performance problem
Regarding: Can only one index be used per query? Yes, I believe that *is* the defined behaviour of sqlite (though it does support compound indicies). Larger DBMS often have very involved code to determine query plans. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem
Stephen Toney [EMAIL PROTECTED] wrote: Thanks, Igor, Richard, and Tom, Why doesn't SQLite use the index on key? I can see from the plan that it doesn't, but why not? Can only one index be used per query? This seems strange. I have used SQL Server and Visual Foxpro for this same problem, and they both handle this query in a second if the indexes are there. SQLite is limited to a single index per table of the FROM clause. (In your case the same table occurs twice in the FROM clause, so each instance can use a separate indices, but each instance can only use a single index.) Other systems relax this restriction through the use of bitmap indices. SQLite does not (directly) support bitmap indices. You can achieve about the same thing as a bitmap index by playing games with rowids, but the SQL needed to do so is convoluted. In your case, I think the query would need to be: SELECT count(*) FROM keyword AS a CROSS JOIN keyword AS b WHERE a.value='music' AND b.rowid IN ( SELECT rowid FROM keyword WHERE value='history' INTERSECT SELECT rowid FROM keyword WHERE key=a.key ); It seems so much simpler to use a multi-column index. It is almost certainly going to be faster. Is there a good place to read more about this SQLite behavior? I'm fairly familiar with the online documentation and don't recall reading this. You might get a few hints at http://www.sqlite.org/php2004/page-001.html and the pages that follow. That is from a talk I gave in 2004. It is somewhat out of date. My goal for this calendar year is to get some detailed documentation online about the kinds of issues you are seeing. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem
Stephen Toney [EMAIL PROTECTED] wrote: 4. We do not preserve case in the index, so it can ignore incorrect capitalization in the search terms. Maybe FTS does this too? That's a function of your stemmer. The default stemmers in FTS2 both ignore capitalization. 5. For historical reasons, we use NCRs like #233; instead of UTF-8. Our programs remove these before indexing. You can do this in your stemmer. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem
Richard, Thanks for the additional info. I'll look into the multi-column index idea. Sounds as if it might be the solution. Stephen On Thu, 2007-03-01 at 14:42 +, [EMAIL PROTECTED] wrote: Stephen Toney [EMAIL PROTECTED] wrote: Thanks, Igor, Richard, and Tom, Why doesn't SQLite use the index on key? I can see from the plan that it doesn't, but why not? Can only one index be used per query? This seems strange. I have used SQL Server and Visual Foxpro for this same problem, and they both handle this query in a second if the indexes are there. SQLite is limited to a single index per table of the FROM clause. (In your case the same table occurs twice in the FROM clause, so each instance can use a separate indices, but each instance can only use a single index.) Other systems relax this restriction through the use of bitmap indices. SQLite does not (directly) support bitmap indices. You can achieve about the same thing as a bitmap index by playing games with rowids, but the SQL needed to do so is convoluted. In your case, I think the query would need to be: SELECT count(*) FROM keyword AS a CROSS JOIN keyword AS b WHERE a.value='music' AND b.rowid IN ( SELECT rowid FROM keyword WHERE value='history' INTERSECT SELECT rowid FROM keyword WHERE key=a.key ); It seems so much simpler to use a multi-column index. It is almost certainly going to be faster. Is there a good place to read more about this SQLite behavior? I'm fairly familiar with the online documentation and don't recall reading this. You might get a few hints at http://www.sqlite.org/php2004/page-001.html and the pages that follow. That is from a talk I gave in 2004. It is somewhat out of date. My goal for this calendar year is to get some detailed documentation online about the kinds of issues you are seeing. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] - -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Performance problem with simple queries
Problem summery: --- Simple queries sometimes take ~400 ms Analysis: --- - A php script runs the same SQL query several times in different places (in different transactions). Some of the queries take around 400ms while the others (identical queries) takes only 4-20ms. (The timings are consistent for the same php script.) - Testing showed that when adding multiple identical queries after a problematic query run very fast, I.e.: ... - time measurement is done with explode(' ', microtime()); Test script: --- A Test script shows that the first query take long time, but the others are quick. It happens also not in the first query, but we don't have a simple test script that reproduces that yet. Script output: pDB is sqlite:/usr/local/web/443/lbaProvDBponeOpenTest: br 0.66058707 SELECT * FROM MediaGateway br 0.00106287 SELECT * FROM MediaGateway br 0.00017715 SELECT * FROM MediaGateway br 0.12085819 SELECT * FROM GeneralPlatformProperties br 0.00016093 SELECT * FROM GeneralPlatformProperties WHERE 1 br 0.0649929 SELECT * FROM ClassOfRestriction WHERE id IS NULL br 0.05895591 SELECT * FROM ClassOfRestriction br 0.05869102 SELECT * From BriefHuntGroupList ORDER BY huntGroupName COLLATE NOCASE Second run: br 0.04814911 SELECT * FROM MediaGateway br 0.00015306 SELECT * FROM MediaGateway br 0.00019789 SELECT * FROM MediaGateway br 0.00516605 SELECT * FROM GeneralPlatformProperties br 0.000175 SELECT * FROM GeneralPlatformProperties WHERE 1 br 0.02837396 SELECT * FROM ClassOfRestriction WHERE id IS NULL br 0.00037098 SELECT * FROM ClassOfRestriction br 0.05728984 SELECT * From BriefHuntGroupList ORDER BY huntGroupName COLLATE NOCASE Questrions: --- What can explain the long procces time for the first query? What can explain log proccessing time for queries whic are not the first ? Thanks, Erez.
Re: [sqlite] Performance problem with simple queries
You are almost certainly encountering disk cacheing effects. Makavy, Erez (Erez) wrote: Problem summery: --- Simple queries sometimes take ~400 ms Analysis: --- - A php script runs the same SQL query several times in different places (in different transactions). Some of the queries take around 400ms while the others (identical queries) takes only 4-20ms. (The timings are consistent for the same php script.) - Testing showed that when adding multiple identical queries after a problematic query run very fast, I.e.: ... - time measurement is done with explode(' ', microtime()); Test script: --- A Test script shows that the first query take long time, but the others are quick. It happens also not in the first query, but we don't have a simple test script that reproduces that yet. Script output: pDB is sqlite:/usr/local/web/443/lbaProvDBponeOpenTest: br 0.66058707 SELECT * FROM MediaGateway br 0.00106287 SELECT * FROM MediaGateway br 0.00017715 SELECT * FROM MediaGateway br 0.12085819 SELECT * FROM GeneralPlatformProperties br 0.00016093 SELECT * FROM GeneralPlatformProperties WHERE 1 br 0.0649929 SELECT * FROM ClassOfRestriction WHERE id IS NULL br 0.05895591 SELECT * FROM ClassOfRestriction br 0.05869102 SELECT * From BriefHuntGroupList ORDER BY huntGroupName COLLATE NOCASE Second run: br 0.04814911 SELECT * FROM MediaGateway br 0.00015306 SELECT * FROM MediaGateway br 0.00019789 SELECT * FROM MediaGateway br 0.00516605 SELECT * FROM GeneralPlatformProperties br 0.000175 SELECT * FROM GeneralPlatformProperties WHERE 1 br 0.02837396 SELECT * FROM ClassOfRestriction WHERE id IS NULL br 0.00037098 SELECT * FROM ClassOfRestriction br 0.05728984 SELECT * From BriefHuntGroupList ORDER BY huntGroupName COLLATE NOCASE Questrions: --- What can explain the long procces time for the first query? What can explain log proccessing time for queries whic are not the first ? Thanks, Erez. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem with 3.2.7
Are you wrapping the transactions in between Begin/End Transactions? BEGIN TRANSACTION; INSERT INTO table (foo) VALUES (bar); INSERT INTO table (foo) VALUES (par); INSERT INTO table (foo) VALUES (tar); INSERT INTO table (foo) VALUES (far); .. INSERT INTO table (foo) VALUES (car); INSERT INTO table (foo) VALUES (jar); INSERT INTO table (foo) VALUES (mar); COMMIT TRANSACTION; Check out this document for more info http://www.sqlite.org/lang_transaction.html I'm sure I must be doing something wrong. This is my first attempt at working with SQLite.
Re: [sqlite] Performance problem with 3.2.7
On Mon, 21 Nov 2005, Shane Baker wrote: I'm sure I must be doing something wrong. This is my first attempt at working with SQLite. We'll see... I have a simple table, with 7 columns. There are 6 integers and a BLOB, with the primary key being on an integer. When I try to run inserts (one insert per transacion - I know this is not optimal, but it represents my application's usage), I am only getting about 7 inserts per second, on average. My first suspect was the BLOB and the fact that I was binding this parameter using SQLITE_TRANSIENT (using sqlite3_bind_blob). I removed the BLOB from the schema altogether, leaving just 6 integers, and I still have the same performance. The performance problem is the synchronous IO bottleneck of doing only a single insert per transaction. For reference, I am getting around 10,000 queries per second when I lookup a row based on the primary key column. All performance measurements I've seen posted by others suggest between 200 and 300 inserts per second with one insert per transaction. Probably not, unless this is to a FLASH device, for example. The Bottleneck in hard disk IO is the rotational and head movement latencies to write data to the platters. Assuming no head movement, a 7200 rpm disk will only allow the same sector to be rewritten 1/7200 times a minute, which is 120 times a second. Add in that many different sectors need to be updated synchronously, and throughput drops dramatically. A quick test indicates that I can almost double the performance on Linux/ext3 by having data=journal option set in the mount flags. This is because head movement is reduced significantly. A test that previously took ~500 seconds (13785 inserts without transactions) took 280 seconds with data=journal. For reference, the same data inserted with a single transaction took ~1.2 seconds! I haven't run a profiler yet but hope to do this tomorrow. Does anyone have any ideas as to what I might be doing wrong, or where I should look? If you can change your model to insert more than 1 row per transaction, you should see a significant performance increase. You'll see roughly N times the performance for small N. If this is not an option, look at your storage and how you can reduce latency. FLASH devices have low latency, being solid state, and some RAID controllers have battery backed buffers, and so may have lower latency. Thanks in advance. Christian -- /\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \
Re: [sqlite] Performance problem with 3.2.7
No, as I mentioned in my original message, I am not wrapping them. I don't want to test an unrealistic scenario for my application. In my application, there are multiple sources that will be inserting into the database and pooling the information for a bulk insert won't work. I understand that I will get better performance by inserting all of my rows inside a transaction. What I don't understand is why, when NOT using a transaction, I get about 7 inserts per second compared to others who are reporting between 200 and 300. I am working with reasonable hardware. I just need to figure out why my performance is about 30x slower than what others are reporting when using the library in similar ways. On Mon, 21 Nov 2005, Chris Schirlinger wrote: Are you wrapping the transactions in between Begin/End Transactions? BEGIN TRANSACTION; INSERT INTO table (foo) VALUES (bar); INSERT INTO table (foo) VALUES (par); INSERT INTO table (foo) VALUES (tar); INSERT INTO table (foo) VALUES (far); .. INSERT INTO table (foo) VALUES (car); INSERT INTO table (foo) VALUES (jar); INSERT INTO table (foo) VALUES (mar); COMMIT TRANSACTION; Check out this document for more info http://www.sqlite.org/lang_transaction.html I'm sure I must be doing something wrong. This is my first attempt at working with SQLite.
Re: [sqlite] Performance problem with 3.2.7
Thank you very much for the feedback. I understand your point, hardware takes a deterministic amount of time. I have been basing my assumptions on these sources: http://www.sqlite.org/cvstrac/wiki?p=PerformanceConsiderations (See Transactions and performance) http://blog.amber.org/2004/11/28/sqlite-insertion-performance/ There was one other, but I can't find it. For the time being, I don't think that inserts are going to happen very frequently in my application and I can probably roll updates into transactions. Thanks again. On Mon, 21 Nov 2005, Christian Smith wrote: On Mon, 21 Nov 2005, Shane Baker wrote: I'm sure I must be doing something wrong. This is my first attempt at working with SQLite. We'll see... I have a simple table, with 7 columns. There are 6 integers and a BLOB, with the primary key being on an integer. When I try to run inserts (one insert per transacion - I know this is not optimal, but it represents my application's usage), I am only getting about 7 inserts per second, on average. My first suspect was the BLOB and the fact that I was binding this parameter using SQLITE_TRANSIENT (using sqlite3_bind_blob). I removed the BLOB from the schema altogether, leaving just 6 integers, and I still have the same performance. The performance problem is the synchronous IO bottleneck of doing only a single insert per transaction. For reference, I am getting around 10,000 queries per second when I lookup a row based on the primary key column. All performance measurements I've seen posted by others suggest between 200 and 300 inserts per second with one insert per transaction. Probably not, unless this is to a FLASH device, for example. The Bottleneck in hard disk IO is the rotational and head movement latencies to write data to the platters. Assuming no head movement, a 7200 rpm disk will only allow the same sector to be rewritten 1/7200 times a minute, which is 120 times a second. Add in that many different sectors need to be updated synchronously, and throughput drops dramatically. A quick test indicates that I can almost double the performance on Linux/ext3 by having data=journal option set in the mount flags. This is because head movement is reduced significantly. A test that previously took ~500 seconds (13785 inserts without transactions) took 280 seconds with data=journal. For reference, the same data inserted with a single transaction took ~1.2 seconds! I haven't run a profiler yet but hope to do this tomorrow. Does anyone have any ideas as to what I might be doing wrong, or where I should look? If you can change your model to insert more than 1 row per transaction, you should see a significant performance increase. You'll see roughly N times the performance for small N. If this is not an option, look at your storage and how you can reduce latency. FLASH devices have low latency, being solid state, and some RAID controllers have battery backed buffers, and so may have lower latency. Thanks in advance. Christian -- /\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \
Re: [sqlite] Performance problem with 3.2.7
Hi, On Mon, 21 Nov 2005 10:56:41 -0500 (EST) Shane Baker [EMAIL PROTECTED] wrote: I just need to figure out why my performance is about 30x slower than what others are reporting when using the library in similar ways. Are you using sqlite on windows or MacOS X? As I tested, sqlite performs 10 write transactions per sec on windows and MacOS X. On Linux, sqlite performs up to 300 write transactions per sec if HDD cache is enabled. However, when HDD cache is disabled (/sbin/hdparm -W0 /dev/hda etc.), it drops to 10 transactions per sec. To ensure ACID compliance, sqlite seems to be using FlushFileBuffers() on windows, F_FULLFSYNC fcntl() on MacOS X, and fsync() on other unixes. FlushFileBuffers and F_FULLFSYNC fcntl() flush HDD write-back cache, and they are very slow. However, many implementations of fsync() don't flush HDD cache (they flush OScache only). It's very fast, but dangerous (not ACID compliant) if HDD cache has no battery backup. (So i'm using sqlite on Linux with HDD cache off.)
Re: [sqlite] Performance problem with 3.2.7
Thank you very much. I am happy to hear that the performance I am seeing is in line with what others have observed. I am running this on Windows XP. On Tue, 22 Nov 2005, Akira Higuchi wrote: Hi, On Mon, 21 Nov 2005 10:56:41 -0500 (EST) Shane Baker [EMAIL PROTECTED] wrote: I just need to figure out why my performance is about 30x slower than what others are reporting when using the library in similar ways. Are you using sqlite on windows or MacOS X? As I tested, sqlite performs 10 write transactions per sec on windows and MacOS X. On Linux, sqlite performs up to 300 write transactions per sec if HDD cache is enabled. However, when HDD cache is disabled (/sbin/hdparm -W0 /dev/hda etc.), it drops to 10 transactions per sec. To ensure ACID compliance, sqlite seems to be using FlushFileBuffers() on windows, F_FULLFSYNC fcntl() on MacOS X, and fsync() on other unixes. FlushFileBuffers and F_FULLFSYNC fcntl() flush HDD write-back cache, and they are very slow. However, many implementations of fsync() don't flush HDD cache (they flush OScache only). It's very fast, but dangerous (not ACID compliant) if HDD cache has no battery backup. (So i'm using sqlite on Linux with HDD cache off.)
Re: [sqlite] sqlite performance problem
On Tue, 12 Apr 2005, Maksim Yevmenkin wrote: Dear SQLite users, consider this [snip] it only took 4+ seconds to read, parse, perform hash table lookup and sum the data. note that for unique 1417 keys it had to do hash lookup and hash insert. so, just with plain ascii file i get four times the speed i get with sqlite. note that my c program will scale linearly with the size of dataset (just like i see with sqlite). so, - what am i doing wrong here? Not managing your expectations. Try the test with another SQL database, and see what sort of speed you get. - any suggestions on how to speed things up with sqlite? Buy a faster machine. No, I'm not being facetious. - is sqlite optimized to retrieve one row from the table? When fetching that one row using an index, yes, that's what indexes are for. But when an index scan touches a significant proportion of the corresponding table, then it becomes less than optimal, due the reasons given previously in the thread: - Cache thrashing - Index + Table access - CPU overhead - Non-linear file access i'm starting to think that having index and data in the same file is not such a great idea when you want to perform range requests. Unless the index and data are on different disks, you'll get no benefit by splitting them from the same file. thanks, max Christian On Apr 12, 2005 11:33 AM, Maksim Yevmenkin [EMAIL PROTECTED] wrote: Gé, thanks for the suggestion. unfortunately it did not make any difference :( below is the results. as you can see it takes 7+ seconds to group by 333,392 records and i'm grouping by column on which i have index. again, i'm not a database guy, but i think that is slow. perhaps someone can comment if that is the best sqlite can do? sqlite .schema data CREATE TABLE data( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a on data (a); sqlite select count(*) from data; 92 test-1.sql PRAGMA cache_size = 30; PRAGMA cache_size; PRAGMA page_size; SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a); test-2.sql PRAGMA cache_size; PRAGMA page_size; SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test-1.sql 30 1024 1417 6.89u 0.33s 0:07.55 95.6% time sqlite3 db test-2.sql 2000 1024 1417 6.91u 0.19s 0:07.39 96.0% time sqlite3 db2048 test-1.sql 30 2048 1417 6.80u 0.08s 0:07.32 93.9% time sqlite3 db2048 test-2.sql 2000 2048 1417 6.77u 0.12s 0:07.10 97.0% time sqlite3 db4096 test-1.sql 30 4096 1417 6.80u 0.15s 0:07.21 96.3% time sqlite3 db4096 test-2.sql 2000 4096 1417 6.79u 0.15s 0:07.15 97.0% time sqlite3 db8192 test-1.sql 30 8192 1417 6.70u 0.11s 0:07.01 97.1% time sqlite3 db8192 test-2.sql 2000 8192 1417 6.73u 0.09s 0:07.01 97.2% thanks, max On Apr 12, 2005 7:10 AM, Gé Weijers [EMAIL PROTECTED] wrote: Maksim, Some things you could try: 1) increase cache memory You may be causing a lot of cache misses if the size of the query result is very large compared to the size of the cache. Index-based searches can cause multiple reloads of the same page because of a lack of locality in the cache. An index-less search will just load each page once. as an experiment, try 'PRAGMA cache_size = 30', before you run the query. 1 GB of ram should be able to support 300MB of cache. 2) use 8192-byte pages Larger pages seem to improve performance quite a bit, in my experience. Do 'PRAGMA page_size = 8192' before you create the database. Doing both may cause excessive memory use (20 * 8K = ...). I've never tried that. Gé Maksim Yevmenkin wrote: Robert, [snip] i said i print these rows to /dev/null too in my perl code. plus the perl code does some other things such as joining these rows with other hashes and summing the numbers. That's fine. I was merely trying to account for the 50% speed difference between the two differing column tests, which has been accomplished. As for the temp table ... I haven't tried this, but isn't temp a reserved word in SQLite? More importantly, you should be doing this statement inside yes, it is. i really want to create 'temporary table' in memory. i was really hoping it would speed things up. I misread the statement, so ignore me on that part. However, 339,000 rows into a temporary in-memory table ... I tried some experiments locally here and none of them took more than 2 seconds to execute. Are you sure you're not using up all available memory, which is causing the system to hit the swapfile? What does this same query look like when you drop the temp from the query? the system has 1G of ram. i was monitoring sqlite3 memory usage with 'top'. the SIZE and
Re: [sqlite] sqlite performance problem
On Fri, 2005-04-08 at 11:53 -0700, Maksim Yevmenkin wrote: CREATE INDEX data_by_a ON data (a); time sqlite3 db 'select n2 from data where a = 18234721' /dev/null 25.95u 0.71s 0:27.02 98.6% If you make the index look like this: CREATE INDEX data_by_a ON data(a, n2); Then SQLite only has to look at the index to get the information it needs. It never has to consult the original table. This will make the query about twice as fast or maybe even a little faster. -- D. Richard Hipp [EMAIL PROTECTED]
RE: [sqlite] sqlite performance problem
well, it could be true, but not in the queries i have posted. i group by column a and there is an index on column a, so sqlite does not have to do anything to compute key. it does not even have to back to Do not confuse the index key with the aggregator key. The two may be the same in your case, but in general they won't be, so I wouldn't expect SQLite to reuse the index key as the aggregator key even if it can. the disk and fetch the value of column a because its right in the index. the only work sqlite has to do is to go back to the disk and fetch the value of column n1 and sum it. You're underestimating how much work has to be done to just fetch the value of column n1. The entire row containing n1 has to be read, then the value of column n1 extracted from it. That means disk seeks, disk reads, moving around within the data row, etc. what you say is correct, but four (!) times performance increase? please, you have got to admit that something is not right here. It may well be possible to make your query run faster. The biggest problem here is simply that you're expecting a magically high-performance solution without understanding what needs to be done in order to satisfy your request. why? i'm using relational database to do exactly what it should do fast. that is i want select the range of rows from a much bigger set Do not confuse what you want it to do fast, or even what you think it should do fast, with what it actually does or how fast it does it. and do something with it. i'm not asking it to do complex computations or anything else. You don't think you're doing anything complex, but you really are, you just don't understand that you are. Locating a variable-sized row in an unpredictable location and performing arbitrary operations on some sub-section of that row is a heck of a lot more complicated than scanning through a flat file and computing hash keys. i think, i know what is going on here. the problem is that every time i do an indexed scan sqlite has to 1) fetch index pages 2) fetch data pages that match where condition because both index and data are in the same file sqlite has to perform insane amount of seek() calls and this is clearly demonstrated in my previous post. Once again, the fact that the data and the index are in the same file is irrelevant. Even if the data and the index were in separate files, the disk heads would be doing a lot of random I/O to move from the location of one row to the next given that the table rows are not accessed sequentially when using an index. it still takes 14 whooping seconds to group by 300,000+ records down to 1417 records and that is just unacceptable (imo) for relational database. Why does the fact that SQLite is a relational database mean that response times should be measured in seconds? Have you tried this with any other relational database, as another poster suggested? You may well get better performance out of something like MySQL or Oracle, but it'll come at a price - much more memory usage and an SMP system that allows you to effectively multithread. I can guarantee you that any other relational database, if configured to use the same amount of memory as SQLite (a few MBs) and the same number of threads (1) that they would all perform markedly *worse* than SQLite. from what i can see sqlite took extra 7 seconds to fetch the data of the disk. that is it had to seek back to data page, fetch it, process it, seek back to next index page etc. That's correct. That's the price you pay for using an index to look up the data. yes, i tried all that. it is not that easy to read sqlite explain output (imo) but i convince myself that it uses all the proper indexes when i make the query. Just because you think it's using the correct index doesn't mean that that's the optimal access path. Assuming that it is, however... Given that you do seem to understand the overhead incurred by using an index to look up data that is located elsewhere, what's the issue? It seems pretty clear that SQLite isn't doing anything wrong. It isn't doing it as optimally as your C program, but we've already covered the reasons for that, so what's left to discuss? i think sqlite can only do good job when resulting set is very small. ideally one row. in this case it does not have to go back to disk to fetch the columns that are not in index. it is that simple (imo). the more columns (that are not in the index) you request the more time it takes. Why would you expect the processing of a large number of rows to take the same amount of time as processing a single row? Once again, your expectations are unreasonable. It's going to take more time to process a large number of rows simply because there's more data to be managed. The number of columns requested isn't much of a factor either. Requesting 10 columns that are not in the index is only slightly more expensive than
Re: [sqlite] sqlite performance problem
Maksim, Some things you could try: 1) increase cache memory You may be causing a lot of cache misses if the size of the query result is very large compared to the size of the cache. Index-based searches can cause multiple reloads of the same page because of a lack of locality in the cache. An index-less search will just load each page once. as an experiment, try 'PRAGMA cache_size = 30', before you run the query. 1 GB of ram should be able to support 300MB of cache. 2) use 8192-byte pages Larger pages seem to improve performance quite a bit, in my experience. Do 'PRAGMA page_size = 8192' before you create the database. Doing both may cause excessive memory use (20 * 8K = ...). I've never tried that. Gé Maksim Yevmenkin wrote: Robert, [snip] i said i print these rows to /dev/null too in my perl code. plus the perl code does some other things such as joining these rows with other hashes and summing the numbers. That's fine. I was merely trying to account for the 50% speed difference between the two differing column tests, which has been accomplished. As for the temp table ... I haven't tried this, but isn't temp a reserved word in SQLite? More importantly, you should be doing this statement inside yes, it is. i really want to create 'temporary table' in memory. i was really hoping it would speed things up. I misread the statement, so ignore me on that part. However, 339,000 rows into a temporary in-memory table ... I tried some experiments locally here and none of them took more than 2 seconds to execute. Are you sure you're not using up all available memory, which is causing the system to hit the swapfile? What does this same query look like when you drop the temp from the query? the system has 1G of ram. i was monitoring sqlite3 memory usage with 'top'. the SIZE and RES did not exceed 30M. so i do not think the memory is the issue here. time sqlite3 db 'create table foo as select * from data where a = 18234721' /dev/null 22.06u 1.39s 0:27.75 84.5% so pretty much the same time without 'temp'. i'm starting to suspect disk. here is what i did. i created a separate database with only one table. this table contains subset of 92 rows from original data table. it also has the same index on a column, i.e. i did sqlite3 db1 sqlite attach db as s; sqlite create table data as select * from s.data where a = 18234721; sqlite create index data_by_a on data (a); full scan time sqlite3 db1 'select n1 from data' /dev/null 17.19u 0.55s 0:19.06 93.0% bad index scan, because it is guaranteed then the table only has keys that match where time sqlite3 db1 'select n1 from data where a = 18234721' /dev/null 25.73u 0.59s 0:28.37 92.7% +10 seconds! is this the overhead of indexed scan? is this what it really takes to seek back and forth between index and data? what am i missing here? thanks, max
Re: [sqlite] sqlite performance problem
Gé, thanks for the suggestion. unfortunately it did not make any difference :( below is the results. as you can see it takes 7+ seconds to group by 333,392 records and i'm grouping by column on which i have index. again, i'm not a database guy, but i think that is slow. perhaps someone can comment if that is the best sqlite can do? sqlite .schema data CREATE TABLE data( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a on data (a); sqlite select count(*) from data; 92 test-1.sql PRAGMA cache_size = 30; PRAGMA cache_size; PRAGMA page_size; SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a); test-2.sql PRAGMA cache_size; PRAGMA page_size; SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test-1.sql 30 1024 1417 6.89u 0.33s 0:07.55 95.6% time sqlite3 db test-2.sql 2000 1024 1417 6.91u 0.19s 0:07.39 96.0% time sqlite3 db2048 test-1.sql 30 2048 1417 6.80u 0.08s 0:07.32 93.9% time sqlite3 db2048 test-2.sql 2000 2048 1417 6.77u 0.12s 0:07.10 97.0% time sqlite3 db4096 test-1.sql 30 4096 1417 6.80u 0.15s 0:07.21 96.3% time sqlite3 db4096 test-2.sql 2000 4096 1417 6.79u 0.15s 0:07.15 97.0% time sqlite3 db8192 test-1.sql 30 8192 1417 6.70u 0.11s 0:07.01 97.1% time sqlite3 db8192 test-2.sql 2000 8192 1417 6.73u 0.09s 0:07.01 97.2% thanks, max On Apr 12, 2005 7:10 AM, Gé Weijers [EMAIL PROTECTED] wrote: Maksim, Some things you could try: 1) increase cache memory You may be causing a lot of cache misses if the size of the query result is very large compared to the size of the cache. Index-based searches can cause multiple reloads of the same page because of a lack of locality in the cache. An index-less search will just load each page once. as an experiment, try 'PRAGMA cache_size = 30', before you run the query. 1 GB of ram should be able to support 300MB of cache. 2) use 8192-byte pages Larger pages seem to improve performance quite a bit, in my experience. Do 'PRAGMA page_size = 8192' before you create the database. Doing both may cause excessive memory use (20 * 8K = ...). I've never tried that. Gé Maksim Yevmenkin wrote: Robert, [snip] i said i print these rows to /dev/null too in my perl code. plus the perl code does some other things such as joining these rows with other hashes and summing the numbers. That's fine. I was merely trying to account for the 50% speed difference between the two differing column tests, which has been accomplished. As for the temp table ... I haven't tried this, but isn't temp a reserved word in SQLite? More importantly, you should be doing this statement inside yes, it is. i really want to create 'temporary table' in memory. i was really hoping it would speed things up. I misread the statement, so ignore me on that part. However, 339,000 rows into a temporary in-memory table ... I tried some experiments locally here and none of them took more than 2 seconds to execute. Are you sure you're not using up all available memory, which is causing the system to hit the swapfile? What does this same query look like when you drop the temp from the query? the system has 1G of ram. i was monitoring sqlite3 memory usage with 'top'. the SIZE and RES did not exceed 30M. so i do not think the memory is the issue here. time sqlite3 db 'create table foo as select * from data where a = 18234721' /dev/null 22.06u 1.39s 0:27.75 84.5% so pretty much the same time without 'temp'. i'm starting to suspect disk. here is what i did. i created a separate database with only one table. this table contains subset of 92 rows from original data table. it also has the same index on a column, i.e. i did sqlite3 db1 sqlite attach db as s; sqlite create table data as select * from s.data where a = 18234721; sqlite create index data_by_a on data (a); full scan time sqlite3 db1 'select n1 from data' /dev/null 17.19u 0.55s 0:19.06 93.0% bad index scan, because it is guaranteed then the table only has keys that match where time sqlite3 db1 'select n1 from data where a = 18234721' /dev/null 25.73u 0.59s 0:28.37 92.7% +10 seconds! is this the overhead of indexed scan? is this what it really takes to seek back and forth between index and data? what am i missing here? thanks, max
RE: [sqlite] sqlite performance problem
with sum(n1) added query runs twice as slow. as i was told its because sqlite has to fetch data row. fine, but why its soo slow?! and it Because for each row it has to compute the aggregate key, find the aggregator for that key and increment the sum for that aggregate key. That's a lot more work than just selecting the row and doing nothing with it. so, just with plain ascii file i get four times the speed i get with sqlite. note that my c program will scale linearly with the size of dataset (just like i see with sqlite). With anything related to computers, there are always tradeoffs - most commonly power for complexity, and flexibility for speed. Your C program *should* be faster than anything SQLite can do - it's simpler and more specific to the problem you're trying to solve. On the flip side, it'll never do anything other than what it already does - it can never be used to solve any other problem. - what am i doing wrong here? Your expectations are too high, for starters. For reasons I will never understand, people expect a relational database to be a silver bullet that answers all questions instantaneously. RDBMSs are useful because they're flexible, but they're also not 100% optimal because they're flexible. That's the price you pay for being able to answer questions with SQL instead of C. The other problem you may be running into is lack of a clear problem definition. So far I've seen mention of a half dozen things which you think should perform better, but you haven't dug into any of them to find out *why* they're slow. An earlier poster helped identify disk I/O as a bottleneck. If from there you can prove that that's a bottleneck because SQLite is performing too much disk I/O, then you have cause to claim that SQLite is slow. Otherwise, all you can do is blame your disks. Until you get a clear picture of what you want, what you need and what's actually happening, however, you'll never be able to tell the difference. - any suggestions on how to speed things up with sqlite? First, you must understand what SQLite is doing. Have you tried EXPLAINing the query that you're unhappy with to see what SQLite is going with it under the hood? Also, an earlier poster mentioned increasing your block size. How high did you raise it? I've seen significant performance increases with block sizes of 8k and 16k; just how much difference it makes seems to depend on the system in question, however. On some systems I've tested, 8k block sizes are faster than 16k block sizes; my guess is that all this comes down to the block size used by the file system and/or OS (and the size of the data itself can factor in there as well), though I haven't yet dug deep enough to be sure. From my experience though, in most cases, there are certainly gains to be had by using larger block sizes. - is sqlite optimized to retrieve one row from the table? If there is an index on the column or columns referenced in your WHERE clause, yes. Moreso if the column or columns referenced contain unique values. Note however that this has nothing to do with SQLite - all relational databases (all databases with B-tree indexes, actually, not just RDBMSs) work this way. i'm starting to think that having index and data in the same file is not such a great idea when you want to perform range requests. Having the data and the index in the same file is irrelevant. It is entirely possible that SQLite might be using an index to satisfy your query, and entirely possible that it would be more optimal to execute the query without using the index, but the fact that the index and data are in the same file has nothing to do with it. Whether to use the index or not is an age-old problem that other RDBMSs solve using cost-based optimizers; I'm not sure if there's a solution for that in SQLite, as I believe query plan execution is rule-based (I'm not 100% sure about that though, someone please correct me if I'm wrong). If that's the case, then you'll need to try to find another way to write your query such that SQLite executes it more optimally. If I remember correctly, the old trick to make sure you didn't use an index was to change your WHERE clause slightly to something like: SELECT COUNT(*) FROM (SELECT a FROM data WHERE (a + 0) = 18234721 GROUP BY a); This should force a table scan rather than an index scan. Try that and see how your queries perform; if things go faster, that means the index is hurting you, and you need to either drop it or rewrite your queries to avoid using it. -Tom
Re: [sqlite] sqlite performance problem
Thomas, with sum(n1) added query runs twice as slow. as i was told its because sqlite has to fetch data row. fine, but why its soo slow?! and it Because for each row it has to compute the aggregate key, find the aggregator for that key and increment the sum for that aggregate key. That's a lot more work than just selecting the row and doing nothing with it. well, it could be true, but not in the queries i have posted. i group by column a and there is an index on column a, so sqlite does not have to do anything to compute key. it does not even have to back to the disk and fetch the value of column a because its right in the index. the only work sqlite has to do is to go back to the disk and fetch the value of column n1 and sum it. so, just with plain ascii file i get four times the speed i get with sqlite. note that my c program will scale linearly with the size of dataset (just like i see with sqlite). With anything related to computers, there are always tradeoffs - most commonly power for complexity, and flexibility for speed. Your C program *should* be faster than anything SQLite can do - it's simpler and more specific to the problem you're trying to solve. On the flip side, it'll never do anything other than what it already does - it can never be used to solve any other problem. what you say is correct, but four (!) times performance increase? please, you have got to admit that something is not right here. - what am i doing wrong here? Your expectations are too high, for starters. For reasons I will never understand, people expect a relational database to be a silver bullet that answers all questions instantaneously. RDBMSs are useful because they're flexible, but they're also not 100% optimal because they're flexible. That's the price you pay for being able to answer questions with SQL instead of C. why? i'm using relational database to do exactly what it should do fast. that is i want select the range of rows from a much bigger set and do something with it. i'm not asking it to do complex computations or anything else. The other problem you may be running into is lack of a clear problem definition. So far I've seen mention of a half dozen things which you think should perform better, but you haven't dug into any of them to find out *why* they're slow. An earlier poster helped identify disk I/O as a bottleneck. If from there you can prove that that's a bottleneck because SQLite is performing too much disk I/O, then you have cause to claim that SQLite is slow. Otherwise, all you can do is blame your disks. Until you get a clear picture of what you want, what you need and what's actually happening, however, you'll never be able to tell the difference. i think, i know what is going on here. the problem is that every time i do an indexed scan sqlite has to 1) fetch index pages 2) fetch data pages that match where condition because both index and data are in the same file sqlite has to perform insane amount of seek() calls and this is clearly demonstrated in my previous post. even if i remote the sum() but just add extra column that is not in the index, like so SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test.sql 30 1024 1417 13.14u 1.06s 0:14.40 98.6% it still takes 14 whooping seconds to group by 300,000+ records down to 1417 records and that is just unacceptable (imo) for relational database. now if you only request column that is in the index SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test.sql 30 1024 1417 6.84u 0.24s 0:07.18 98.6% it only takes 7+ seconds. from what i can see sqlite took extra 7 seconds to fetch the data of the disk. that is it had to seek back to data page, fetch it, process it, seek back to next index page etc. - any suggestions on how to speed things up with sqlite? First, you must understand what SQLite is doing. Have you tried EXPLAINing the query that you're unhappy with to see what SQLite is going with it under the hood? yes, i tried all that. it is not that easy to read sqlite explain output (imo) but i convince myself that it uses all the proper indexes when i make the query. Also, an earlier poster mentioned increasing your block size. How high did you raise it? I've seen significant performance increases with block sizes of 8k and 16k; just how much difference it makes seems to depend on the system in question, however. On some systems I've tested, 8k block sizes are faster than 16k block sizes; my guess is that all this comes down to the block size used by the file system and/or OS (and the size of the data itself can factor in there as well), though I haven't yet dug deep enough to be sure. From my experience though, in most cases, there are certainly gains to be had by using larger block sizes. by default you can not raise page_size above
Re: [sqlite] sqlite performance problem
On Tue, 2005-04-12 at 16:17 -0700, Maksim Yevmenkin wrote: SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test.sql 30 1024 1417 13.14u 1.06s 0:14.40 98.6% Have you tried doing the query like this: SELECT count(*) FROM (SELECT a,nl FROM data WHERE a-18234721=0 GROUP BY a); -- D. Richard Hipp [EMAIL PROTECTED]
Re: [sqlite] sqlite performance problem
Maksim Yevmenkin wrote: i think, i know what is going on here. the problem is that every time i do an indexed scan sqlite has to 1) fetch index pages 2) fetch data pages that match where condition because both index and data are in the same file sqlite has to perform insane amount of seek() calls and this is clearly demonstrated in my previous post. even if i remote the sum() but just add extra column that is not in the index, like so SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test.sql 30 1024 1417 13.14u 1.06s 0:14.40 98.6% it still takes 14 whooping seconds to group by 300,000+ records down to 1417 records and that is just unacceptable (imo) for relational database. If seeking was the problem, you'd see more system time. The i/o time is at most 1.06 seconds. The file is probably being cached in the os, and therefore, the seeks just become memory operations. You might also get more help if you presented your requests in a milder tone. You might even get responses from the people who wrote sqlite, and therefore have knowledge of its internals. but one thing to try: CREATE TABLE data( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a on data (a, b, c, d, e, n1, n2); This is a big waste of space, and will slow down all your updates and inserts, but since you seem to be mostly worried about the speed of selects, those shouldn't make a difference to you. You may already know this, but when you create an index with multiple keys, it can use a subset of those keys to look up rows. The limitation is that any key that appears in the index before a column you would like to limit must also be limited, e.g., select * from a where a = 12 and b = 16 and c = 16 will use the index, but select * from a where b = 22 and c = 23 will not use the index. You claim that the reason sqlite is slower than you expect is because sqlite is using the index to look up the data. If so, this should fix the problem, by making the index have all the data right there. I'm not so sure that this will resolve your problems, but it may take you closer to a solution. Another thing to try is to try the same experiment with postgresql or mysql, and see if they have the same sort of performance you're expecting to see. If not, you may want to consider that Thomas was right, and a general solution may not be able to perform the calculations with the same speed that you are hoping for. John LeSueur
Re: [sqlite] sqlite performance problem
SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test.sql 30 1024 1417 13.14u 1.06s 0:14.40 98.6% Have you tried doing the query like this: SELECT count(*) FROM (SELECT a,nl FROM data WHERE a-18234721=0 GROUP BY a); i just tried. i think this effectively disables the index on column a and on 9+ million records data set it took time sqlite3 db test.sql 30 1024 1417 61.16u 40.69s 2:05.01 81.4% thanks, max
Re: [sqlite] sqlite performance problem
John. i think, i know what is going on here. the problem is that every time i do an indexed scan sqlite has to 1) fetch index pages 2) fetch data pages that match where condition because both index and data are in the same file sqlite has to perform insane amount of seek() calls and this is clearly demonstrated in my previous post. even if i remote the sum() but just add extra column that is not in the index, like so SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a); time sqlite3 db test.sql 30 1024 1417 13.14u 1.06s 0:14.40 98.6% it still takes 14 whooping seconds to group by 300,000+ records down to 1417 records and that is just unacceptable (imo) for relational database. If seeking was the problem, you'd see more system time. The i/o time is at most 1.06 seconds. The file is probably being cached in the os, and therefore, the seeks just become memory operations. i agree that most of the data are cached by filesystem. it would be even slower it sqlite had to go and fetch all the data from the disk plates. i'm not sure if system time is counted if the process was put to sleep due to, say, disk i/o. You might also get more help if you presented your requests in a milder tone. You might even get responses from the people who wrote sqlite, and therefore have knowledge of its internals. i did not intend to offended anyone, if i did i apologize. but one thing to try: CREATE TABLE data( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a on data (a, b, c, d, e, n1, n2); This is a big waste of space, and will slow down all your updates and inserts, but since you seem to be mostly worried about the speed of selects, those shouldn't make a difference to you. that will not work for me either. i have to fit 60 million rows table in under 15 gig. You may already know this, but when you create an index with multiple keys, it can use a subset of those keys to look up rows. The limitation is that any key that appears in the index before a column you would like to limit must also be limited, e.g., select * from a where a = 12 and b = 16 and c = 16 will use the index, but select * from a where b = 22 and c = 23 will not use the index. yes, i know this. i do not see how this is applicable in my tests. You claim that the reason sqlite is slower than you expect is because sqlite is using the index to look up the data. If so, this should fix the problem, by making the index have all the data right there. that is not what i said. the index is properly gives me the much (30 times) smaller data set. however, the column i have index on is only one of the columns i want to fetch. the other columns are still on disk. in my case it takes longer to get the other columns i want. again, i'd rather not place index on all the columns because i might not meet size requirements. I'm not so sure that this will resolve your problems, but it may take you closer to a solution. Another thing to try is to try the same experiment with postgresql or mysql, and see if they have the same sort of performance you're expecting to see. If not, you may want to consider i will try another database. its just a already spend too much time trying to make sqlite work :( i still hope that i'm doing something wrong here. i just dont know what it is :) that Thomas was right, and a general solution may not be able to perform the calculations with the same speed that you are hoping for. but i'm not doing anything out of the ordinary here. select rows, sum some columns and do join on a couple of other tables. why would sqlite not work for me? thanks, max
Re: [sqlite] sqlite performance problem
Maksim Yevmenkin wrote: so, just with plain ascii file i get four times the speed i get with sqlite. note that my c program will scale linearly with the size of dataset (just like i see with sqlite). With anything related to computers, there are always tradeoffs - most commonly power for complexity, and flexibility for speed. Your C program *should* be faster than anything SQLite can do - it's simpler and more specific to the problem you're trying to solve. On the flip side, it'll never do anything other than what it already does - it can never be used to solve any other problem. what you say is correct, but four (!) times performance increase? please, you have got to admit that something is not right here. I have to agree with Thomas, your expectations are too high. If I'd be using a relational database and I could get within a factor of 4 of what I can concoct in C I'd declare victory. Relational databases are often far from speedy, even on simple queries. You pay for: * variable record formats * integrity checks * duplicate storage of keys in the BTree * duplicate storage of keys in multiple tables * the ACID property, even if you're not using it in your samples * the ability to perform queries in a flexible way * . If your database is simple you may be better off performance wise by rolling your own solution, or using another database. MySQL is pretty fast if you run it using ISAM tables, but you pay with data corruption if the DB or system crashes. If your queries generally produce a sizeable percentage of the records stored you might as well do a sequential scan over a file, if written with care, performance will be completely I/O bound. Use the 'mmap' system call or equivalent to map the DB into memory, and you can read your DB using pointer arithmetic, and use 'memmove' for updates. Gé
Re: [sqlite] sqlite performance problem
Christian, thanks for the reply. i'm having strange performance problem with sqlite-3.2.0. consider the following table [snip] now the problem: 1) if i do a select with an idex it takes 27 sec. to get 92 rows time sqlite3 db 'select n2 from data where a = 18234721' /dev/null 25.95u 0.71s 0:27.02 98.6% time sqlite3 db 'select n2 from data where a = 18234721' /dev/null 26.02u 0.66s 0:27.53 96.9% 2) if i do a select with sequential lookup it takes 1min to get 9818210 rows time sqlite3 db 'select n2 from data where a+0 = 18234721' /dev/null 49.54u 14.65s 1:04.98 98.7% time sqlite3 db 'select n2 from data where a+0 = 18234721' /dev/null 49.80u 14.42s 1:05.03 98.7% - so how come it take only twice as much time to get 30 times more rows? When doing an index scan, you will be accessing the index as well as the table pages. For a sequential scan, only the table pages are touched, reducing thrashing of the cache. i guess, i can believe this. however its pretty disappointing to get 50% improvement on 30 times less dataset :( but how do you explain this? sqlite .schema data CREATE TABLE data ( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a ON data (a); time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.15u 0.59s 0:27.00 99.0% time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.04u 0.61s 0:26.91 99.0% and time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.22u 0.41s 0:12.67 99.6% time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.01u 0.61s 0:12.64 99.8% so just by selecting two different rows (both next to each other). i get another 50% time difference? sure the column types are different, and i can even understand that FLOAT might be 8 bytes and INTEGER 4 bytes, but 50% time difference? it just cant be that linear can it? do you think it is possible to get better results by issuing multiple queries each of which will return even less dataset? and, no, i'm not a database guy. Also, if the data is not in index order in the table, and/or dispersed across the database file, you may have to visit each page more than once when traversing in index order. In the full table scan, you'll read the rows in table order, hence only touching each page once before moving on to later pages, thus reducing cache thrashing even more. all rows were inserted into the table in order that matches the index. i can not say if the records on disk will have the same order. - and why is it taking 27 seconds to get 92 rows anyway? You think 12347 rows/s is bad? because i have the same data set in perl multilevel hash stored on disk (storable format), and it takes about 1 minute to perform the same query on the same hardware. this time includes reading the data from the disk, traversing every single key on each hash level, etc. i was hoping that sqlite would do something like in just a few ( 10) seconds. - is there any way to create an index in separate file? No, SQLite is a single file embedded database. Keeps administration almost non-existent (by design.) - so, if i have a large table ( 9 million records) with an index then sqlite would have constantly seek back and forth between index and data (within the same file) probably wasting lots of disk bandwidth? - if above is correct than can i force sqlite to get all (or as much as possible) of index into the memory? the hardware is sun netra t1 running solaris 5.7. the db file size is about 800 mbytes. just for the record i'd like to have at least 15 times more records in the 'data' table. If you can match SQLite for simple read-only throughput with another database, you'll be doing well. i do not know about other database, but compared to plain perl its not that impressive. i must be doing something wrong here. thanks, max
RE: [sqlite] sqlite performance problem
-Original Message- From: Maksim Yevmenkin [mailto:[EMAIL PROTECTED] Sent: Monday, April 11, 2005 9:59 AM To: Christian Smith Cc: sqlite-users@sqlite.org Subject: Re: [sqlite] sqlite performance problem i guess, i can believe this. however its pretty disappointing to get 50% improvement on 30 times less dataset :( but how do you explain this? sqlite .schema data CREATE TABLE data ( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a ON data (a); time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.15u 0.59s 0:27.00 99.0% time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.04u 0.61s 0:26.91 99.0% and time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.22u 0.41s 0:12.67 99.6% time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.01u 0.61s 0:12.64 99.8% so just by selecting two different rows (both next to each other). i get another 50% time difference? sure the column types are different, and i can even understand that FLOAT might be 8 bytes and INTEGER 4 bytes, but 50% time difference? it just cant be that linear can it? do you think it is possible to get better results by issuing multiple queries each of which will return even less dataset? and, no, i'm not a database guy. The most glaring fault in your time tests that I see is that you're running the command-line sqlite3, which is printf'ing all 339,000+ rows to dev/null. It doesn't show on your screen, but the work is still being done regardless. If you really want a fair speed test, change those statements to: select count(n1) from data where a = 18234721 And select count(e) from data where a = 18234721 Or even select count(*) from data where a = 18234721 THEN tell us what the difference in performance is ... Robert
Re: [sqlite] sqlite performance problem
Robert, i guess, i can believe this. however its pretty disappointing to get 50% improvement on 30 times less dataset :( but how do you explain this? sqlite .schema data CREATE TABLE data ( a INTEGER, b INTEGER, c CHAR, d INTEGER, e INTEGER, n1 FLOAT, n2 FLOAT ); CREATE INDEX data_by_a ON data (a); time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.15u 0.59s 0:27.00 99.0% time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.04u 0.61s 0:26.91 99.0% and time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.22u 0.41s 0:12.67 99.6% time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.01u 0.61s 0:12.64 99.8% so just by selecting two different rows (both next to each other). i get another 50% time difference? sure the column types are different, and i can even understand that FLOAT might be 8 bytes and INTEGER 4 bytes, but 50% time difference? it just cant be that linear can it? do you think it is possible to get better results by issuing multiple queries each of which will return even less dataset? and, no, i'm not a database guy. The most glaring fault in your time tests that I see is that you're running the command-line sqlite3, which is printf'ing all 339,000+ rows to dev/null. It doesn't show on your screen, but the work is still being done regardless. well, i do print these rows to /dev/null in perl test too. If you really want a fair speed test, change those statements to: select count(n1) from data where a = 18234721 time sqlite3 db 'select count(n1) from data where a = 18234721' /dev/null 7.79u 0.70s 0:08.50 99.8% And select count(e) from data where a = 18234721 time sqlite3 db 'select count(e) from data where a = 18234721' /dev/null 7.90u 0.42s 0:08.31 100.1% Or even select count(*) from data where a = 18234721 time sqlite3 db 'select count(*) from data where a = 18234721' /dev/null 1.35u 0.16s 0:01.47 102.7% 8 times faster then count(n1) or count(e)? i'm confused. i guess it just used a field (on which it had index?) THEN tell us what the difference in performance is ... fine, if i ask sqlite just to count the rows it wins hands-down, but i really want these rows. even more i 'd like to then natural join these rows with a couple of other tables to really do what the perl code currently does. but, it takes 22 seconds to just to create a temp table with the required dataset time sqlite3 db 'create temp table foo as select * from data where a = 18234721' /dev/null 21.93u 0.89s 0:22.95 99.4% and i do not understand what i'm doing wrong here :( thanks, max
RE: [sqlite] sqlite performance problem
Let's recap ... time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.15u 0.59s 0:27.00 99.0% time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.04u 0.61s 0:26.91 99.0% time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.22u 0.41s 0:12.67 99.6% time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.01u 0.61s 0:12.64 99.8% so just by selecting two different rows (both next to each other). I get another 50% time difference? sure the column types are different, and i can even understand that FLOAT might be 8 bytes and INTEGER 4 bytes, but 50% time difference? it just cant be that linear can it? By doing these same two queries using select count(), you've proven my original theory that the time difference is due to the volume of bytes being transmitted to dev/null and NOT because of the actual data types of the columns or ANY OTHER FACTOR. If you really want, change your pipe output to two file names, I'm 100% sure you'll find that the 'select n1 ...' query results in an output file significantly larger than the 'select e ...' output file. This is where the differing performance comes from -- the time it is taking to parse and print your output. The select count(*) does indeed use the index and hence the significant performance difference. fine, if i ask sqlite just to count the rows it wins hands-down, but i really want these rows. even more i 'd like to then natural join these rows with a couple of other tables to really do what the perl code currently does. but, it takes 22 seconds to just to create a temp table with the required dataset time sqlite3 db 'create temp table foo as select * from data where a = 18234721' /dev/null 21.93u 0.89s 0:22.95 99.4% and i do not understand what i'm doing wrong here :( Again, the only reason I suggested using count() in your timing test was to ensure that the command-line sqlite3 program's output was consistent for both tests and to eliminate dev/null printf's from factoring into the total time. In your application, you'll call select * (or whatever) without the count to retrieve the rows -- but since you're not printf'ing them and instead are doing your own thing with them, you will indeed see close to identical times in your selects just like you did in the count() test. As for the temp table ... I haven't tried this, but isn't temp a reserved word in SQLite? More importantly, you should be doing this statement inside a transaction. Transactions are critically important in SQLite. I would go so far as to say NEVER EVER perform ANY bulk write in SQLite outside a transaction! The performance difference is beyond phenomenal. Robert
Re: [sqlite] sqlite performance problem
Robert, time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.15u 0.59s 0:27.00 99.0% time sqlite3 db 'select n1 from data where a = 18234721' /dev/null 26.04u 0.61s 0:26.91 99.0% time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.22u 0.41s 0:12.67 99.6% time sqlite3 db 'select e from data where a = 18234721' /dev/null 12.01u 0.61s 0:12.64 99.8% so just by selecting two different rows (both next to each other). I get another 50% time difference? sure the column types are different, and i can even understand that FLOAT might be 8 bytes and INTEGER 4 bytes, but 50% time difference? it just cant be that linear can it? By doing these same two queries using select count(), you've proven my original theory that the time difference is due to the volume of bytes being transmitted to dev/null and NOT because of the actual data types of the columns or ANY OTHER FACTOR. If you really want, change your pipe output to ok, i agree, redirecting output to /dev/null might have impact on times. two file names, I'm 100% sure you'll find that the 'select n1 ...' query results in an output file significantly larger than the 'select e ...' output file. This is where the differing performance comes from -- the time it is taking to parse and print your output. here you are wrong actually. time sqlite3 db 'select e from data where a = 18234721' x 12.01u 0.64s 0:12.80 98.8% time sqlite3 db 'select n1 from data where a = 18234721' y 26.06u 0.62s 0:26.86 99.3% the size of x is 1070681 (output of column e) and the size of y is 1004219 (output of column n1), so the file sizes are about the same. x is 66462 bytes more. it is probably possible that printf() is 2 times slower on float's. The select count(*) does indeed use the index and hence the significant performance difference. fine fine, if i ask sqlite just to count the rows it wins hands-down, but i really want these rows. even more i 'd like to then natural join these rows with a couple of other tables to really do what the perl code currently does. but, it takes 22 seconds to just to create a temp table with the required dataset time sqlite3 db 'create temp table foo as select * from data where a = 18234721' /dev/null 21.93u 0.89s 0:22.95 99.4% and i do not understand what i'm doing wrong here :( Again, the only reason I suggested using count() in your timing test was to ensure that the command-line sqlite3 program's output was consistent for both tests and to eliminate dev/null printf's from factoring into the total time. In your application, you'll call select * (or whatever) without the count to retrieve the rows -- but since you're not printf'ing them and instead are doing your own thing with them, you will indeed see close to identical times in your selects just like you did in the count() test. i said i print these rows to /dev/null too in my perl code. plus the perl code does some other things such as joining these rows with other hashes and summing the numbers. As for the temp table ... I haven't tried this, but isn't temp a reserved word in SQLite? More importantly, you should be doing this statement inside yes, it is. i really want to create 'temporary table' in memory. i was really hoping it would speed things up. a transaction. Transactions are critically important in SQLite. I would go so far as to say NEVER EVER perform ANY bulk write in SQLite outside a transaction! The performance difference is beyond phenomenal. nope. time sqlite3 db 'begin; create temp table foo as select * from data where a = 18234721; commit' /dev/null 21.90u 0.77s 0:22.87 99.1% still 22 seconds to just create a table with 300,000+ records, and that is, unfortunately, too slow :( doing strace/truss on sqlite shows that it performs huge amount of seek's. so the original questions stay: - what am i doing wrong here? - is sqlite going to be not as fast on a fairly large index'ed table because it has to seek back and forth between index and data? thanks, max
Re: [sqlite] sqlite performance problem
Robert, [snip] i said i print these rows to /dev/null too in my perl code. plus the perl code does some other things such as joining these rows with other hashes and summing the numbers. That's fine. I was merely trying to account for the 50% speed difference between the two differing column tests, which has been accomplished. As for the temp table ... I haven't tried this, but isn't temp a reserved word in SQLite? More importantly, you should be doing this statement inside yes, it is. i really want to create 'temporary table' in memory. i was really hoping it would speed things up. I misread the statement, so ignore me on that part. However, 339,000 rows into a temporary in-memory table ... I tried some experiments locally here and none of them took more than 2 seconds to execute. Are you sure you're not using up all available memory, which is causing the system to hit the swapfile? What does this same query look like when you drop the temp from the query? the system has 1G of ram. i was monitoring sqlite3 memory usage with 'top'. the SIZE and RES did not exceed 30M. so i do not think the memory is the issue here. time sqlite3 db 'create table foo as select * from data where a = 18234721' /dev/null 22.06u 1.39s 0:27.75 84.5% so pretty much the same time without 'temp'. i'm starting to suspect disk. here is what i did. i created a separate database with only one table. this table contains subset of 92 rows from original data table. it also has the same index on a column, i.e. i did sqlite3 db1 sqlite attach db as s; sqlite create table data as select * from s.data where a = 18234721; sqlite create index data_by_a on data (a); full scan time sqlite3 db1 'select n1 from data' /dev/null 17.19u 0.55s 0:19.06 93.0% bad index scan, because it is guaranteed then the table only has keys that match where time sqlite3 db1 'select n1 from data where a = 18234721' /dev/null 25.73u 0.59s 0:28.37 92.7% +10 seconds! is this the overhead of indexed scan? is this what it really takes to seek back and forth between index and data? what am i missing here? thanks, max
Re: [sqlite] Performance problem
SQLite only uses a single index per table on any give query. This is unlikely to change. Would it be able to use a multi-column query on ipnode + author? Hugh Shi Elektronische Medien GmbH, Peter Spiske wrote: the following simple query is very slow: SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author; The database is about 250 MB in size and the table the query is run against has 12 cols and 120,000 rows. Every col has an index. The above query returns about 80% of the records. As soon as the ORDER BY statement is left away, the query ist fast. SQLite only uses a single index per table on any give query. This is unlikely to change. Since your query is returning 80% of the rows in the table, the use of an index to implement the WHERE clause is not really helping you any. But the simple-minded query optimizer of SQLite does not realize this. SQLite always prefers to use an index to implement the WHERE clause when it can. SQLite will also use that same index to implement ORDER BY, if possible, or if no index was helpful for implementing WHERE it will try to find an index to implement ORDER BY. In your case, the best solution would be to trick sqlite into not using the index on the WHERE clause. This will make it use the index to implement ORDER BY and you should get much better performance. I suggest trying this query: SELECT title FROM t1 WHERE ipnode||'x'='VZx' ORDER BY author; By appending the string 'x' onto the end of the ipnode column prevents the query optimizer from use an index on ipnode. This leave the optimizer free to use an index to implement ORDER BY. The other thing you would try is to DROP the index on the ipnode column. -- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[sqlite] Performance problem
Hi, the following simple query is very slow: SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author; The database is about 250 MB in size and the table the query is run against has 12 cols and 120,000 rows. Every col has an index. The above query returns about 80% of the records. As soon as the ORDER BY statement is left away, the query ist fast. If i interpret the output of explain in the sqlite commandline tool correctly, only one index - the one for the column ipnode - is used. No index is used for sorting!! Usually the above table would be joined with another one which makes things even worse. Other tests produced the same results: Wether there is an ORDER BY clause or multiple columns are referenced in the WHERE clause - there will allways be only one index used. What am i doing wrong? Is it true that sqlite can only use one index per table in a query? How can i get sqlite to use more than one index per table at a time? cheers Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Performance problem
At 1:33 PM +0100 3/20/04, Shi Elektronische Medien GmbH, Peter Spiske wrote: the following simple query is very slow: SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author; The database is about 250 MB in size and the table the query is run against has 12 cols and 120,000 rows. Every col has an index. The above query returns about 80% of the records. As soon as the ORDER BY statement is left away, the query ist fast. Whether or not indexes are used, sorting tends to be a very labour intensive operation and slows things down considerably. You can try to isolate your slowdown like this: 1. Run your existing select and time it. 2. Take the order-by off your select and run it. 3. Make your existing select, with order-by, into a subselect (in from clause) of a larger select that does something trivial such as returning a count of rows from the inner select. You would have to be careful with your choice, though, to make sure that SQLite doesn't optimize anything away by your choice, as you want all the work for the inner select to do all the same work. The point is that the outer select returns almost no data. Then network speed is basically taken out of the equation and all the time is basically just the select execute and not the select fetch. -- Darren Duncan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Performance problem
Shi Elektronische Medien GmbH, Peter Spiske wrote: the following simple query is very slow: SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author; The database is about 250 MB in size and the table the query is run against has 12 cols and 120,000 rows. Every col has an index. The above query returns about 80% of the records. As soon as the ORDER BY statement is left away, the query ist fast. SQLite only uses a single index per table on any give query. This is unlikely to change. Since your query is returning 80% of the rows in the table, the use of an index to implement the WHERE clause is not really helping you any. But the simple-minded query optimizer of SQLite does not realize this. SQLite always prefers to use an index to implement the WHERE clause when it can. SQLite will also use that same index to implement ORDER BY, if possible, or if no index was helpful for implementing WHERE it will try to find an index to implement ORDER BY. In your case, the best solution would be to trick sqlite into not using the index on the WHERE clause. This will make it use the index to implement ORDER BY and you should get much better performance. I suggest trying this query: SELECT title FROM t1 WHERE ipnode||'x'='VZx' ORDER BY author; By appending the string 'x' onto the end of the ipnode column prevents the query optimizer from use an index on ipnode. This leave the optimizer free to use an index to implement ORDER BY. The other thing you would try is to DROP the index on the ipnode column. -- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Performance problem
- Forwarded by Ben Carlyle/AU/IRSA/Rail on 07/11/2003 10:00 AM - Ben Carlyle 07/11/2003 10:00 AM To: Mrs. Brisby [EMAIL PROTECTED]@CORP cc: Subject:Re: [sqlite] Performance problem Mrs. Brisby [EMAIL PROTECTED] 07/11/2003 12:08 AM To: Jonas Forsman / Axier.SE [EMAIL PROTECTED] cc: Clark, Chris [EMAIL PROTECTED], D. Richard Hipp [EMAIL PROTECTED], [EMAIL PROTECTED] Subject:Re: [sqlite] Performance problem On Wed, 2003-11-05 at 23:59, Jonas Forsman / Axier.SE wrote: Note: Testing has shown PostgreSQL's hash indexes to be similar or slower than B-tree indexes, and the index size and build time for hash indexes is much worse. Hash indexes also suffer poor performance under high concurrency. For these reasons, hash index use is discouraged. Please note I'm note I'm not talking about a hash of the entire key- I'm talking about n distinct b-trees that are selected by an 8-n bit function. This transformation can be made very fast: We get a speed improvement here on searches if our 8-n bit function takes less time than n-1 random memcmp()'s. How would you handle the lack of ordering associate with hash tables? Sqlite can currently use indicies for three main tests: equals, less than, and greater than. While hash-tables are good at finding equal-to in constant time it usually means linear time (a table-scan) to test for less than or greater than. Do you have a solution to this problem? Benjamin. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Performance problem
On Thu, 2003-11-06 at 19:00, [EMAIL PROTECTED] wrote: How would you handle the lack of ordering associate with hash tables? Sqlite can currently use indicies for three main tests: equals, less than, and greater than. While hash-tables are good at finding equal-to in constant time it usually means linear time (a table-scan) to test for less than or greater than. Do you have a solution to this problem? You presume that a defining characteristic of a hash-table is that the function be non-linear. Consider a n-3 bit function: f(x) - x mod 8 A very linear function that it is perfectly easy to iterate _in_order_ all values of the table. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Performance problem
According to the postgresql documentation, the hash algorithm is discouraged compared to b-tree for performance reasons. http://www.postgresql.org/docs/view.php?version=7.3idoc=1file=indexes-types.html Note: Testing has shown PostgreSQL's hash indexes to be similar or slower than B-tree indexes, and the index size and build time for hash indexes is much worse. Hash indexes also suffer poor performance under high concurrency. For these reasons, hash index use is discouraged. /jonas - Original Message - From: Mrs. Brisby [EMAIL PROTECTED] To: Clark, Chris [EMAIL PROTECTED] Cc: D. Richard Hipp [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, November 06, 2003 4:14 AM Subject: RE: [sqlite] Performance problem On Wed, 2003-11-05 at 13:44, Clark, Chris wrote: -Original Message- From: Mrs. Brisby [mailto:[EMAIL PROTECTED] MySQL has stated in-documentation that it uses a B-tree for it's index. I think this is a mistake- especially for larger indexes. Using several B-trees attached to a hash-table is much faster (if the hash is fast or your data is uniform). Following this train of thought (this isn't a feature request!); some DBMS's support different structures for tables/indices and the DBA can specify the required structure depending in the expected data usage/patterns (and a 2ndary index need not be the same structure as the primary table structure, allowing for, say, a hash table and b-tree 2ndary's as per the example above). E.g. Ingres has; Heap (yep, completely unstructured), B-tree, Hash, and ISAM (there is also an R-tree but that is only for spatial datatypes so it's not as interesting for this discussion). It all depends on the data and how it is used as to which structure should/could be used. A typical example of the hash primary and b-tree 2ndary is a unique customer id so that the customer record can be hit directly with the hash, or if the hash is not perfect, through a couple of overflow pages (compared to a b-tree which always will need to jump through a few pages in the index, admittedly that may only be an improvement of microsecs versus millisecs in lookup time). The b-tree 2ndary would then be for things like customer name (which are often duplicated, potentual for lots of people called Mr smith) in case one needs to perform searches on a customer name (who say, forgot their customer id). It wouldn't strictly require a hash-table either (after further thought); perhaps just the first octet of the key could be permuted though an 8-4 (15-bit) or 8-3 (7-bit) function. If the keys are otherwise random this should be just as well and you wouldn't lose the ability to use substrings... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Performance problem
Avner Levy wrote: We have just finished testing the same scenario with MySql at amazingly they continued to insert 1500-3000 rows per second even when the database had 60,000,000 records. I don't know how this magic is done... Nor do I. If anybody can clue me in, I would appreciate it. I suspect the secret must be in the file format for their indices. Does anybody have any idea what that format is? Or how indices work in MySQL? FWIW: Big changes are looming on the horizon for SQLite. The format of indices will probably have to change at some point in the future in order to accomodate BLOBs. This will result in an an incompatible file format change (version 3.0.0). The change is not imminent (think summer of 2004) nor inevitable. But if other changes could be made at the same time to support faster inserts of non-localized data, the transition from 2.x.x to 3.x.x would be a good time to toss them in. -- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [sqlite] Performance problem
H, the name of the database IS sql LITE. My guess is that if it was mean to be the monstrous scalable solution, it would have been architected this way in the first place. One thing I find interesting about Sqlite is that I tested in this past week up to 360,000 records. I had no performance degradations inserting, or querying, updating, or deleting. I am very pleased. In fact, for the solution I have used it for, I have tossed SQL server on the enterprise distributed version. It scales plenty well on just everyday hardware now. The funny thing is that for its simplicity it is probably the best database there is for it's current area of use. I think if you want to scale it, go buy a mainframe! : - ) If you want to go in and modify it to be scalable, you ought to put in the time to do this requested stuff, then sell it. You have an excellent base of code to start from. Just some thoughts. Allan -Original Message- From: Avner Levy [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 04, 2003 3:56 AM To: D. Richard Hipp Cc: [EMAIL PROTECTED] Subject: Re: [sqlite] Performance problem Hi, We have just finished testing the same scenario with MySql at amazingly they continued to insert 1500-3000 rows per second even when the database had 60,000,000 records. I don't know how this magic is done, but that what the tests showed (with the same machine / OS). There is one thing I don't understand. Sqlite works in optimistic mode, which means the changes are written to the database and the original data is backup. In this case I would expect that the commit will be fast as deleting a file. But the test shows that the commit can take up to 5 minutes when the database is large. Does any body has an explanation ? I saw the comments about using OS feature to solve the problem. Personally if we will choose Sqlite, we need it for lots of platforms and for product installation. Demanding us as users of Sqlite to start configuring journaling on the customer's computers is not practical. Personally I'm a big fan of Sqlite (great tool), and I really hope that this issue can be solved (so we can choose Sqlite). I understand that solving this issue demands big changes, but I believe this will take Sqlite to the big players league. Thanks Avner D. Richard Hipp wrote: Avner Levy wrote: Hi, I've written the following program to test the sqlite performance for a specific scenario I have. I've used most tricks I've found but still when the database gets big the performance gets unacceptable. The performance problem happens of course only if the indexes are defined. Since I need the indexes, is there any other option that can make it faster ? When the database gets big the insert rate gets to 50 rows per second. Thanks in advance. * It seems that the larger the DB gets, the journal copies more and more pages out of the database into the file, which kills performance. Each transaction in your test involves 3 INSERTs into a single table defined (roughly) as follows: CREATE TABLE test(a,b); CREATE INDEX idx1 ON test(a); CREATE INDEX idx2 ON test(b); In these INSERTs, the values for test.a are non-decreasing. That means that new entries into the test table and into the idx1 index always go into the same place in their B*Trees - specifically at the end. But the values for test.b are random, which means that entries into idx2 are distributed throughout its B*Tree. Because changes to test and idx1 are localized, only a handful of disk blocks are changed (many others are added, but few existing ones are changed) but the changes to idx2 are not localized, which means that nearly every disk block associated with idx2 must be changed. Modifying the (thousands) of disk blocks associated with idx2 is what is taking so long. I do not know anything I can do inside SQLite to make it go any faster. I do not know of any other way of creating an index that could work around this issue. If you disable idx2, you will see that the inserts go much faster. I also observe that the journal file is about 1/3rd the size of the database, which is consistent with the observation that every disk block associated with idx2 needs to be changed. My TCL test script for this situation is appended to this message. If you really are doing millions of INSERTs prior to doing any SELECTs, you might consider deferring the creation of idx2 until after all the data has been put into the database. Generally speaking, it is a little faster to create an index for existing data than it is to create the index at the same time that the data is being inserted. But the difference is not that great. And, of course, this won't help if in reality you need to do some SELECTs along the way... I'm curious to know how other database engines deal with this problem. Have you tried a similar experiment on MySQL, or PostgreSQL, or even Oracle
Re: [sqlite] Performance problem
Allan Edwards wrote: I have YET to see a database, small to massively scalable that could handle BLOBS worth anything. ... I prefer the simplicity talk given early. If someone wants blobs, do it the old fashioned way! Your concerns are understood, for BLOBs that are truely large. But what about short pieces of binary data. (Do you call them BOBs?) For example, what about that 30-byte unicode string that just happens to contain embedded NULs. Or a 75-byte GIF image. I'd like to be able to put those kinds of things directly into a column of an SQLite table. Even an indexed column. Unfortunately, a file format change is going to be required to achieve that goal, I think. -- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: [sqlite] Performance problem
Your suggestion was to reconstruct the index from original table data on a rollback. For a large transaction that touches most pages of the index, this can give (at best) a 2:1 speedup. But the other side, the potential slowdown is extreme. Yeah, there is that drawback. Other DBMSs avoid this problem by some (decidedly NOT simple) mechanisms such as keeping before images in main memory, keeping semantic records of changes in main memory, keeping page serial numbers (to aid recovery), etc. A decision can be made dynamically to stop recording before-images (say, because some threshold percentage of the index's pages are being changed, or because your main memory buffer is full), and transition to my suggested approach. Presently if P% of the pages are modified, 2P% pages are written for a commit, and 3P% for a rollback. With the dynamic scheme, these change to smaller of 2P or P + X smaller of 3P or P + X + 100 percent, respectively, where X is the threshold. If the threshold was 25% then you'd never write more than 125% of the size of the index for a committed transaction, and never write more than 225% for a rolled-back transaction. Note that the present scheme writes up to 200% for a committed and 300% for a rolled-back transaction. Here is a table comparing the approaches for various percentage of index pages modified. The left column is the percentage of index pages modified, the right columns are the percentage of index pages written for each of the approaches and situations (present c, r and dynamic c' and r' for committed and rolled-back). 25% 50% 25% 50% %c c' c'r r' r' 10 20 20 2030 30 30 20 40 40 4060 60 60 30 60 55 6090 155 90 40 80 65 80 120 165 120 50 100 75 100 150 175 200 60 120 85 110 180 185 210 70 140 95 120 210 195 220 80 160 105 130 240 205 230 90 180 115 140 270 215 240 100 200 125 150 300 225 250 Note that the dynamic approach is always better or equivalent for committed transactions, and only worse for rolled-back transactions when the percentage of pages modified is between X and (X+1)/2 where X is the threshold. Finding an optimal value for X is application dependent, unfortunately. Another approach... One database I implemented used newly allocated pages for all updates to indexes, relying on the original versions to remain intact so rollback was free (not counting the reclaiming of the free space). Commit was implemented by writing the pointer to the root of the index BTree to point to the new pages. The advantage of this method is journaling for free. The drawback to this method is that even small changes require writing the BTree along an entire path from leaf to root so a single disk write can commit the index changes, and a free space recovery mechanism. Overall it was very reliable and effective. e - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Performance problem
[EMAIL PROTECTED] wrote: DRH: Will the changes to indicies allow us to define arbitrary collation functions? If so, will those indicies be used when a query is done that could use the arbitrary collation function? Likely so. But no promises yet. -- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Fw: [sqlite] Performance problem
Sorry Richard I meant to send this to the group Hello, Last week I raised an issue about case sensitivity in where clauses. In doing a little research I happened to talk to an Oracle DBA here where I work and asked him the question of how Oracle handled case sensitivity . He explained it is handle in the same way and suggested the same fix drop both to lower case and then compare. The next thing is mentions is really what I want to raise, he said the latest release of Oracle allows you to build indexes which include functions. He also said that these new type of indexes were used when the user issued a select statement with a function in the where clause or a like (i.e select Col1, Col2 from tableName where col1 like 'Gr%' or select * from tablename where lower(col1) ='fred' ). Without knowing all the ins and outs of it, this seems like a great feature to have. Any change of having someone look into the possibility of implementing some thing like that? regards Greg - Original Message - From: D. Richard Hipp To: [EMAIL PROTECTED] Sent: Wednesday, November 05, 2003 1:45 PM Subject: Re: [sqlite] Performance problem [EMAIL PROTECTED] wrote: DRH: Will the changes to indicies allow us to define arbitrary collation functions? If so, will those indicies be used when a query is done that could use the arbitrary collation function? Likely so. But no promises yet. -- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re[2]: [sqlite] Performance problem
- Forwarded by Ben Carlyle/AU/IRSA/Rail on 04/11/2003 02:26 PM - Ben Carlyle 04/11/2003 02:01 PM To: Doug Currie [EMAIL PROTECTED] cc: Subject:Re: Re[2]: [sqlite] Performance problem Doug Currie [EMAIL PROTECTED] 03/11/2003 05:39 AM Please respond to Doug Currie To: [EMAIL PROTECTED] cc: Subject:Re[2]: [sqlite] Performance problem Can any readers suggest ways that I have not thought of for making large numbers of non-localized INSERTs go faster? Instead of journaling the old (undo) pages of idx2, simply record the fact that idx2 is modified (i.e., make a single record in the journal). Upon recovery, noticing that idx2 is modified will cause it to be recreated from the rolled-back data. I've been thinking about his for a while and I think the solution, as with many other potential sqlite performance issues is to solve it in the operating system, not sqlite it's self. If you don't have enough bandwidth to your file, raid it. If you want to be able to write a large number of transactions per second using journalling that lazily writes to the database enable data-journalling on your filesystem. This is a couter-intuitve approach. When you apply data journalling on a filesystem such as linux's ext3 the operating system writes all data to a journal before writing it to the appropriate locations on disk. You may think that because its being written twice (four times in the case of sqlite, because the journal will be written twice and the file will be written twice) it will take longer to work. When you use fsync() alot the opposite is actually true. You get the behaviour that other list members have been discussing with the combined undo/redo log. The data updates are written sequentially by the operating system to disk so seek-time doesn't come into it, and the fsync() simply ensures that your file has been written to the sequential log. The operating system it's self deals with the lazy writing to the actual disk, and it's transparent to sqlite because the operating system cache makes it appear that its already been written. This only works when you use all-data journalling, not just meta-data like directory entries, and regular writes to the same filesystem might be a little slower... but try it. You should get better performance with data-journalling and sqlite. As this is a problem that can be solved very nicely indeed in the operating system I think that sqlite should make no special provisions for it. I do suggest that the relevant FAQs should contain the instruction to use a data-journalled fs if a large number of transactions must be pushed through sqlite. Benjamin.