Re: [sqlite] Longest real SQL statement
Hi Richard, Our longest select so far this one: select * from log where ((aut not like '6%' and aut not like '-1%') or (avm not like '6%' and avm not like '-1%') or (lam not like '6%' and lam not like '-1%') or (pam not like '6%' and pam not like '-1%') or (pas not like '6%' and pas not like '-1%') or (clc not like '6%' and clc not like '-1%') or (fro not like '6%' and fro not like '-1%') or (spw_ifr not like '6%' and spw_ifr not like '-1%') or (spw_dem not like '6%' and spw_dem not like '-1%') or (sp_reg not like '6%' and sp_reg not like '-1%') or (uni_10 not like '6%' and uni_10 not like '-1%') or (aut_dem not like '6%' and aut_dem not like '-1%') or (uni_11 not like '6%' and uni_11 not like '-1%')) order by system, id This returns 29 columns. I claim no credit for such a rotten data model that requires a query like this -- it was converted from a spreadsheet. In another app we have an insert into that inserts 60 columns. The query is generated by a program so I can't just copy it for you, but I can provide it if will help. HTH, Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com On Wed, 2007-05-09 at 23:32 +, [EMAIL PROTECTED] wrote: I'm looking for an upper bound on how big legitimate SQL statements handed to SQLite get to be. I'm not interested in contrived examples. I want to see really big SQL statements that are actually used in real programs. Big can be defined in several ways: * Number of bytes of text in the SQL statement. * Number of tokens in the SQL statement * Number of result columns in a SELECT * Number of terms in an expression If you are using really big SQL statements, please tell me about them. I'd like to see the actual SQL text if possible. But if your use is proprietary, please at least tell me how big your query is in bytes or tokens or columns or expression terms. Thanks. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] - -- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Difference in these indices?
John, Thanks for the useful info. Unfortunately it sounds as if this is more than I have time for right now. Stephen On Wed, 2007-03-28 at 16:29 -0600, John Stanton wrote: Perl would not do a good job. You need to use the Sqlite page structures and they are defined in C terms. If you want to make such a program I can give you a template in simple ANSI C. It builds to a different data structure from Sqlite but the algorithms are there. It uses a quicksort as I recall and draws heavily on Knuth's Sorting and Searching volume. If you build a B-Tree by insertions it is continually splitting nodes. A B-Tree grows by extending the root which makes it somewhat self balancing but keeps it busy. Enhanced B-Trees will merge nodes to minimize splitting and checkerboarding and enhance balance. The splitting is expensive and even a simple insertion can be fairly expensive because it may change the interior nodes. To build one bottom-up you calculate how much space you need based on the size and count of keys and how many levels you will have. Then you sort the keys and start filling leaf nodes. As you fill a node you insert an entry into its parent and as you fill a parent you insert into ints parent and so on. Eventually you will have a fully populated tree with the root less than full. You never read a node and only write one when it is full so I/O activity is limited. As a buffer you have a stack of nodes with a depth equal to the depth of the tree. You can add some optimization to the tree by making interior nodes contiguous etc. By using more modern OS capabilities (POSIX) you could build it faster by extending the Sqlite file by the size of the index, memory mapping that area and using it as the buffer. When you are finished you unmap the area and the index is complete. Using that method you perform no writes and get a 20-50% speed improvement compared to using the write API call. Stephen Toney wrote: I may work on such a program, if time permits. If successful I will share it. It would be in Perl using DBI::ODBC, so may not be amazingly fast. I am pretty good at C++ but have phased it out for most work, so I am still using the antique Sybase compiler, and I doubt the SQLite C++ library would work with that. Otherwise it would, of course, be a better choice for such a utility. Anyone ever tried that combination? John, could you clarify what you mean by building it bottom-up? I'm not sure how to build a b-tree any way but by insertions. Best regards, Stephen On Wed, 2007-03-28 at 11:46 -0600, John Stanton wrote: I proposed such a program earlier in this discussion. I would envisage a seperate program which strips out a list of keys from the database, sorts it then allocates space in the DB file for the resulting index and builds it bottom up. It would be an off-line process but fast and would make raising indices on large databases time efficient. Based on our experience of building a B-Tree with such a program compared to successive insertions a speed improvement in raising an index of at least an order of magnitude could be expected. By making it an independent program it can be lean, mean and fast and not touch the regular Sqlite library. Stephen Toney wrote: On Wed, 2007-03-28 at 08:23 -0600, Dennis Cote wrote: It might make sense to create a separate standalone utility program (like sqlite3_analyzer) that reuses some the sqlite source to do bulk inserts into a table in a database file as fast a possible with out having to worry about locking or journaling etc. That would solve my problem too (thread: CREATE INDEX performance on indexing a 5.8-million record table). I'd love something like that! - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Difference in these indices?
I may work on such a program, if time permits. If successful I will share it. It would be in Perl using DBI::ODBC, so may not be amazingly fast. I am pretty good at C++ but have phased it out for most work, so I am still using the antique Sybase compiler, and I doubt the SQLite C++ library would work with that. Otherwise it would, of course, be a better choice for such a utility. Anyone ever tried that combination? John, could you clarify what you mean by building it bottom-up? I'm not sure how to build a b-tree any way but by insertions. Best regards, Stephen On Wed, 2007-03-28 at 11:46 -0600, John Stanton wrote: I proposed such a program earlier in this discussion. I would envisage a seperate program which strips out a list of keys from the database, sorts it then allocates space in the DB file for the resulting index and builds it bottom up. It would be an off-line process but fast and would make raising indices on large databases time efficient. Based on our experience of building a B-Tree with such a program compared to successive insertions a speed improvement in raising an index of at least an order of magnitude could be expected. By making it an independent program it can be lean, mean and fast and not touch the regular Sqlite library. Stephen Toney wrote: On Wed, 2007-03-28 at 08:23 -0600, Dennis Cote wrote: It might make sense to create a separate standalone utility program (like sqlite3_analyzer) that reuses some the sqlite source to do bulk inserts into a table in a database file as fast a possible with out having to worry about locking or journaling etc. That would solve my problem too (thread: CREATE INDEX performance on indexing a 5.8-million record table). I'd love something like that! - To unsubscribe, send email to [EMAIL PROTECTED] - -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] CREATE INDEX performance
Hi everyone, I'm trying to speed up index creation: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); The value field is a string, max 15 bytes. The key field is a string of fixed-width 10 bytes. It took only 7 minutes to fill this table with 5.7 million records, but it's taking 18 minutes to build the index. This is on a dual-core Windows XP Pro machine with 4GB memory. Any ideas on how to improve this? It will have to be done as part of a software installation, and I can't see users waiting that long. By comparison, building separate indexes on the two fields in the multi- column index took only 2-3 minutes. Why would it be so much longer for a multi-column index? Thanks for any ideas! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
After. I create it after because the total populate-plus-index time is much slower if the index is created before (44 minutes compared to 25). Thanks, Joel! Any suggestions? Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? Many thanks, Stephen On Tue, 2007-03-27 at 11:15 -0400, Joel Cochran wrote: Did you create the index before or after populating the database? -- Joel Cochran On 3/27/07, Stephen Toney [EMAIL PROTECTED] wrote: Hi everyone, I'm trying to speed up index creation: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); The value field is a string, max 15 bytes. The key field is a string of fixed-width 10 bytes. It took only 7 minutes to fill this table with 5.7 million records, but it's taking 18 minutes to build the index. This is on a dual-core Windows XP Pro machine with 4GB memory. Any ideas on how to improve this? It will have to be done as part of a software installation, and I can't see users waiting that long. By comparison, building separate indexes on the two fields in the multi- column index took only 2-3 minutes. Why would it be so much longer for a multi-column index? Thanks for any ideas! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Thanks, Martin, Still, even if my indexing can't be speeded up, this seems like an important question, as I can't see why creating one index with two words would take several times as long as creating two indexes with one word each. Either my mental model or SQLite'd indexing is screwy. I'm perfectly willing to assume it's me, but I'd like to learn why. Or if it's SQLite, then my timing observations are of some value. Best, Stephen On Tue, 2007-03-27 at 18:20 +0100, Martin Jenkins wrote: Stephen Toney wrote: Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? I think it just boils down to how much time people have. Martin - To unsubscribe, send email to [EMAIL PROTECTED] - -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
On Tue, 2007-03-27 at 11:53 -0600, Dennis Cote wrote: Stephen Toney wrote: Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? Stephen, There is nothing wrong with your question. In fact it was very clearly stated. I just don't have any answers for you. I suspect that others are in the same boat. It is normal for an index creation operation to take some time since it is inserting index records into a btree in random order. It involves many updates to pages throughout the index. It does seem strange that you are seeing such different times for the two cases (single column vs compound index). Are you sure about the times you posted? Were they indexing the exact same table? Were both the indexes created after the fill operation during your timing tests (ie. fill + index(single) and then fill + index(compound)) and not one after the other (i.e. fill + index(single) + index(compound))? Thanks, Dennis, I feel reasonably confident about my numbers, but since the system is in development, other factors may have changed. I would re-test before asking anyone else to try to replicate this. The timings were done by recreating the db content each time with no indexes, then building either the multi-column index or the two single-column indexes. Can you supply sample data if someone wants to try some test of their own? It wouldn't have to be the full data set you are using. We could use a subset to get relative timings in the seconds range rather than minutes (This assumes that you are not running into some cache size problems that slow down the larger data set disproportionately). Yes, I'd be glad to supply sample data, but do not have an FTP site. I'm not sure how big the sample should be. Maybe I should test some samples myself before asking anyone else to. As you say, the problem may not exist at a smaller size. Another reason for my puzzlement -- although I love SQLite, my expectations are based on using Foxpro for many years. Foxpro's indexing speed for a problem like this is about 10 - 20 times faster. And I've never come across a Foxpro database where the indexing took longer than the loading -- and Foxpro is blazingly fast at loading. So I assumed that every DBMS would be faster at indexing than loading. (Both use B-trees for indexes, so I believe it's a meaningful comparison. But maybe in this case the single-file architecture of SQLite works against it; Foxpro uses a binary format for its B-trees.) In other words, since SQLite is so fast at some things, I expect it to be fast at all things. Is this unreasonable? Is it optimized for fast retrieval and not indexing? Thanks! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
On Tue, 2007-03-27 at 13:12 -0600, John Stanton wrote: I suspect that the timing difference is due to page overflows. I did only a cursory browse of the B-Tree code but it is just a guess. A test would be to make a simple table with two adjacent integer columns and time raising an index on one column and on both. If the times are comparable the speed difference reported in this thread is a page overflow artifact. Thanks, John, Good idea -- I'll give that a try. I wouldn't have expected that with such a small record, but it's worth testing. (The schema again, in which value is a string of max 15 chars: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); Thanks! Stephen Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
On Tue, 2007-03-27 at 13:24 -0600, John Stanton wrote: Another reason for my puzzlement -- although I love SQLite, my expectations are based on using Foxpro for many years. Foxpro's indexing speed for a problem like this is about 10 - 20 times faster. And I've never come across a Foxpro database where the indexing took longer than the loading -- and Foxpro is blazingly fast at loading. So I assumed that every DBMS would be faster at indexing than loading. (Both use B-trees for indexes, so I believe it's a meaningful comparison. But maybe in this case the single-file architecture of SQLite works against it; Foxpro uses a binary format for its B-trees.) In other words, since SQLite is so fast at some things, I expect it to be fast at all things. Is this unreasonable? Is it optimized for fast retrieval and not indexing? Thanks! You are comparing an ACID RDBMS with rollback and commit with a much simpler situation. For example we developed a data storage software product which was of the same generation as Foxpro. I wrote a fast indexing program which would create a 10 million entry B-Tree index in less than a minute on a very slow machine, but it had minimal features, unlike Sqlite. I do all the indexing in a single transaction, which I thought would have turned off any rollback potential, and also saved time. I assume that if Foxpro had all the features you want now you would not be changing from it. Of course. I'm just trying to learn what I don't know :-) Have you thought of doing the index creation as a background process unseen by the user? Actually, it already is a separate process launched on the server by the web app. But since users can't use the system until the index is built, I show them the progress of the indexing so at least they have something to look at! Thanks, Stephen Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] CREATE INDEX performance
Thanks again for all the good suggestions last week. I am now using a multi-column index and results of a table self-join are instantaneous! Even a 5-way join takes only 1-2 seconds. I'm very pleased. But it takes 30 minutes to build the index on a dual-core Windows machine with 4GB memory. Any ideas on how to improve this? It will have to be done as part of a software installation, and I can't see users waiting that long. Here are the details: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); The value field is a string, max 15 bytes. The key field is a string of fixed-width 10 bytes. There are about 3.5 million records to be indexed. By comparison, indexing on either of these fields separately took only 2-3 minutes. Why would it be so much longer for a multi-column index? Thanks! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Performance problem
Dear experts: I'm having a performance problem I can't understand. I am running a select count(*) query joining a table on itself, and the query runs for five minutes using Sqlite3.exe before I get bored and kill it. This is on a dual-core box with 4GB of memory, running Windows XP Pro. The Sqlite version is 3.3.7. Here's the problem query with the plan: select count(*) from keyword a, keyword b where a.key=b.key and a.value='music' and b.value='history'; 0|0|TABLE keyword AS a WITH INDEX value 1|1|TABLE keyword AS b WITH INDEX value Here's the schema CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX key on keyword(key); CREATE INDEX nextword on keyword(nextword); CREATE INDEX value on keyword(value); The table has 3,486,410 records and the SQLite database totals 320MB. There are a few small tables in the db besides the KEYWORD table. 4,318 records have value='music' and 27,058 have value='history'. The keys are 12-byte strings. That doesn't seem like an extreme case to me. Using DBI::ODBC::SQLite in a web application the result is just as bad -- the server times out. Any suggestions would be much appreciated! Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Performance problem
Thanks, Igor, Richard, and Tom, Why doesn't SQLite use the index on key? I can see from the plan that it doesn't, but why not? Can only one index be used per query? This seems strange. I have used SQL Server and Visual Foxpro for this same problem, and they both handle this query in a second if the indexes are there. Is there a good place to read more about this SQLite behavior? I'm fairly familiar with the online documentation and don't recall reading this. Thanks a million! Stephen On Thu, 2007-03-01 at 07:54 -0500, Tom Briggs wrote: You will likely be well served by a compound index on (value,key). As the schema stands now, the indexes will help find records with matching values, but not with matching keys; providing one index that correlates the two should help. Disclaimer: I haven't recreated your schema, added said index and checked that the query plan produced is better. Don't assume this to be good advice without trying it. :) -Tom -Original Message- From: Stephen Toney [mailto:[EMAIL PROTECTED] Sent: Thursday, March 01, 2007 7:00 AM To: sqlite-users@sqlite.org Subject: [sqlite] Performance problem Dear experts: I'm having a performance problem I can't understand. I am running a select count(*) query joining a table on itself, and the query runs for five minutes using Sqlite3.exe before I get bored and kill it. This is on a dual-core box with 4GB of memory, running Windows XP Pro. The Sqlite version is 3.3.7. Here's the problem query with the plan: select count(*) from keyword a, keyword b where a.key=b.key and a.value='music' and b.value='history'; 0|0|TABLE keyword AS a WITH INDEX value 1|1|TABLE keyword AS b WITH INDEX value Here's the schema CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX key on keyword(key); CREATE INDEX nextword on keyword(nextword); CREATE INDEX value on keyword(value); The table has 3,486,410 records and the SQLite database totals 320MB. There are a few small tables in the db besides the KEYWORD table. 4,318 records have value='music' and 27,058 have value='history'. The keys are 12-byte strings. That doesn't seem like an extreme case to me. Using DBI::ODBC::SQLite in a web application the result is just as bad -- the server times out. Any suggestions would be much appreciated! Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com -- --- To unsubscribe, send email to [EMAIL PROTECTED] -- --- -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem
On Thu, 2007-03-01 at 12:46 +, [EMAIL PROTECTED] wrote: Or maybe better yet: Have you looked into using FTS2 for whatever it is you are trying to do? Full-text search is hard to get right and you appear to be trying to create your own. Why not use a FTS subsystem that is already written and testing and available to you? -- D. Richard Hipp [EMAIL PROTECTED] Several reasons: 1. App is 10 years old and working well with other DBMSs, so why mess with it? This problem only occurred since using SQLite as the DBMS. 2. Queries must work with other DBMSs with minimal tinkering (SQL Server, Oracle, Foxpro, etc.) -- using ODBC. 3. Our indexing is tuned to museums, libraries, and other cultural organizations. For example, certain characters are converted before indexing (such as OE diphthong to the two letters OE). We also index words with hyphens and apostrophes both with and without the punctuation so the searcher can enter them various ways. 4. We do not preserve case in the index, so it can ignore incorrect capitalization in the search terms. Maybe FTS does this too? 5. For historical reasons, we use NCRs like #233; instead of UTF-8. Our programs remove these before indexing. I am considering FTS for another project though. I appreciate the suggestion! Stephen -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem
Richard, Thanks for the additional info. I'll look into the multi-column index idea. Sounds as if it might be the solution. Stephen On Thu, 2007-03-01 at 14:42 +, [EMAIL PROTECTED] wrote: Stephen Toney [EMAIL PROTECTED] wrote: Thanks, Igor, Richard, and Tom, Why doesn't SQLite use the index on key? I can see from the plan that it doesn't, but why not? Can only one index be used per query? This seems strange. I have used SQL Server and Visual Foxpro for this same problem, and they both handle this query in a second if the indexes are there. SQLite is limited to a single index per table of the FROM clause. (In your case the same table occurs twice in the FROM clause, so each instance can use a separate indices, but each instance can only use a single index.) Other systems relax this restriction through the use of bitmap indices. SQLite does not (directly) support bitmap indices. You can achieve about the same thing as a bitmap index by playing games with rowids, but the SQL needed to do so is convoluted. In your case, I think the query would need to be: SELECT count(*) FROM keyword AS a CROSS JOIN keyword AS b WHERE a.value='music' AND b.rowid IN ( SELECT rowid FROM keyword WHERE value='history' INTERSECT SELECT rowid FROM keyword WHERE key=a.key ); It seems so much simpler to use a multi-column index. It is almost certainly going to be faster. Is there a good place to read more about this SQLite behavior? I'm fairly familiar with the online documentation and don't recall reading this. You might get a few hints at http://www.sqlite.org/php2004/page-001.html and the pages that follow. That is from a talk I gave in 2004. It is somewhat out of date. My goal for this calendar year is to get some detailed documentation online about the kinds of issues you are seeing. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] - -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -