[sqlite] CREATE INDEX performance
Hi. I have a database with a table having a text column that averages 7.1 characters per row; there are 11 million rows in this table. In one version of the source file that is used to populate this table, the file is laid out such that the column data is populated in sorted order; in another version it is not sorted. In the sorted version, creating the index takes about a minute; in the unsorted version it takes over 8 hours (I killed it before it finished). I've tried this with a couple different versions of SQLite (3.6.1 and 3.6.7) and with various PRAGMA options (cache_size up to 1M, synchronous OFF, etc). I've tried this on linux 2.6.24 and Darwin 9.6 (MacOS X 10.5.6), and Windows XP with the same result. Interestingly, on Vista, it's much faster, perhaps 10 minutes to create the index. I suspect this is because of it's superior disk caching. By increasing the cache_size to 1M, I can get it to index in about 2.5 minutes, but the process consumes about 600 MB of RAM, and decreasing the cache_size after doesn't cause sqlite to release all of the memory it has used. This TODO item is of course interesting: * Develop a new sort implementation that does much less disk seeking. Use to improve indexing performance on large tables. Is there any thought as to exactly how/when this might happen? Any other ideas? Thanks a bunch! -c Accessions is the table of interest: CREATE TABLE AccessionToGoID (accession INT, goid INT); CREATE TABLE Accessions (id INT IDENTITY, accession VARCHAR(30) NOT NULL, taxonomy INT); CREATE TABLE GoID (id INT IDENTITY, qualifier VARCHAR(20), goid INT, evidence TINYINT, aspect CHAR(1), source INT); CREATE TABLE Location (url TEXT); CREATE TABLE Sources (id INT, name VARCHAR(20)); CREATE INDEX ACESSIONS_STRING ON ACCESSIONS("accession"); CREATE INDEX ACESSIONS_TO_GOID ON ACCESSIONTOGOID("accession"); CREATE INDEX GOID_ID ON GOID("id"); Page size in bytes 1024 Pages in the whole file (measured) 2201999 Pages in the whole file (calculated).. 2201998 Pages that store data. 2201998100.000% Pages on the freelist (per header) 00.0% Pages on the freelist (calculated) 10.0% Pages of auto-vacuum overhead. 00.0% Number of tables in the database.. 6 Number of indices. 0 Number of named indices... 0 Automatically generated indices... 0 Size of the file in bytes. 2254846976 Bytes of user payload stored.. 1406889293 62.4% *** Page counts for all tables with their indices ACCESSIONTOGOID... 1159789 52.7% GOID.. 775451 35.2% ACCESSIONS 266755 12.1% LOCATION.. 10.0% SOURCES... 10.0% SQLITE_MASTER. 10.0% *** All tables *** Percentage of total database.. 100.000% Number of entries. 113096707 Bytes of storage consumed. -2040121344 Bytes of payload.. 1406889777 -69.0% Average payload per entry. 12.44 Average unused bytes per entry 0.21 Average fanout 90.00 Fragmentation. 7.8% Maximum payload per entry. 135 Entries that use overflow. 00.0% Index pages used.. 24343 Primary pages used 2177655 Overflow pages used... 0 Total pages used.. 2201998 Unused bytes on index pages... 2982024 12.0% Unused bytes on primary pages. 20271723-1.0% Unused bytes on overflow pages 0 Unused bytes on all pages. 23253747-1.1% *** Table ACCESSIONTOGOID Percentage of total database.. 52.7% Number of entries. 68037552 Bytes of storage consumed. 1187623936 Bytes of payload.. 678849900 57.2% Average payload per entry. 9.98 Average unused bytes per entry 0.20 Average fanout 90.00 Fragmentation. 6.2% Maximum payload per entry. 11 Entries that use overflow. 00.0% Index pages used.. 12856 Primary pages used 1146933 Overflow pages used... 0 Total pages used.. 1159789 Unused bytes on index pages... 1572145 11.9% Unused bytes on primary pages. 12284825 1.0% Unused bytes on overflow pages 0 Unused bytes on all pages. 13856970 1.2% *** Table ACCESSIONS * Percentage of total database.. 12.1% Nu
Re: [sqlite] CREATE INDEX performance
You don't. All I can imagine is that you insert rows in sorted sequence in an initial load so that a simple row scan delivers the rows in key order. Mohd Radzi Ibrahim wrote: - Original Message - From: "John Stanton" <[EMAIL PROTECTED]> To: Sent: Wednesday, March 28, 2007 7:42 AM Subject: Re: [sqlite] CREATE INDEX performance I retract the overflow page theory on your compelling evidence and now understand better what it is doing after looking at the VDBE. By building an index by successive insertions the tree is splitting and balancing as it grows, and that is expensive. Double the size of the key and you get twice as many leaf nodes and quite a few more interior nodes. If the keys order is very random the keys are being inserted all over the tree which is slow. Presenting the keys in sorted sequence should cut back on the fragmentation and will very likely build a more compact tree by ensuring that each leaf node is filled. An optimization for building such a tree would be to extract the keys, sort them and build the tree bottom up. By avoiding all splitting and jumping around the tree it should be an order of magnitude faster or better. I took a quick look at the code and got the impression that a fast index option could be built by a motivated user as a seperate program and might be a handy tool for people managing very large Sqlite databases. Cutting back a 20 hour run to 1-2 hours can be a big win. Sqlite is something of a victim of its success. The embedded lite database is being asked to perform enterprise level tasks which stretch its envelope. How do we do sorting prior to indexing? If it is the initial table it's OK, we can sort it before insert. But for existing table, how do we do that? regards, Radzi. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
- Original Message - From: "John Stanton" <[EMAIL PROTECTED]> To: Sent: Wednesday, March 28, 2007 7:42 AM Subject: Re: [sqlite] CREATE INDEX performance I retract the overflow page theory on your compelling evidence and now understand better what it is doing after looking at the VDBE. By building an index by successive insertions the tree is splitting and balancing as it grows, and that is expensive. Double the size of the key and you get twice as many leaf nodes and quite a few more interior nodes. If the keys order is very random the keys are being inserted all over the tree which is slow. Presenting the keys in sorted sequence should cut back on the fragmentation and will very likely build a more compact tree by ensuring that each leaf node is filled. An optimization for building such a tree would be to extract the keys, sort them and build the tree bottom up. By avoiding all splitting and jumping around the tree it should be an order of magnitude faster or better. I took a quick look at the code and got the impression that a fast index option could be built by a motivated user as a seperate program and might be a handy tool for people managing very large Sqlite databases. Cutting back a 20 hour run to 1-2 hours can be a big win. Sqlite is something of a victim of its success. The embedded lite database is being asked to perform enterprise level tasks which stretch its envelope. How do we do sorting prior to indexing? If it is the initial table it's OK, we can sort it before insert. But for existing table, how do we do that? regards, Radzi. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Dennis Cote wrote: John Stanton wrote: I suspect that the timing difference is due to page overflows. John, I doubt that that is the case. The two fields being indexed are the first field, and a second one that is only separated from the first by the size of the first string (10 bytes) and three integers (max 27 bytes, typically less). The length of the second field is only 15 bytes so all the information that sqlite needs to read during the indexing (52 bytes) should be contained in the initial part of the record even if the other fields do spill onto overflow pages. SQLite does not need to read those fields and won't follow the overflow chain (if one exists, which I doubt). You can use the sqlite3_analyzer tool at http://www.sqlite.org/download.html to see if your table is using any overflow pages. SQLite version 3.3.13 Enter ".help" for instructions sqlite> CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, ...> value, nextword, sec, ipr, fldseq int); sqlite> .explain on sqlite> explain CREATE INDEX valuekey on keyword (value, key); addr opcode p1 p2 p3 -- -- -- - 0 Goto0 39 1 Noop0 0 2 CreateIndex 0 0 3 MemStore0 0 4 Dup 0 0 5 MemStore1 1 6 Integer 0 0 7 OpenWrite 0 1 8 SetNumColumns 0 5 9 NewRowid0 0 10String8 0 0 index 11String8 0 0 valuekey 12String8 0 0 keyword 13MemLoad 1 0 14String8 0 0 CREATE INDEX valuekey on keyword 15MakeRecord 5 0 aaada 16Insert 0 0 17Close 0 0 18Pop 1 0 19MemLoad 0 0 20Integer 0 0 21OpenWrite 2 0 keyinfo(2,BINARY,BINARY) 22Integer 0 0 23OpenRead1 2 24SetNumColumns 1 9 25Rewind 1 32 26Rowid 1 0 27Column 1 4 28Column 1 0 29MakeIdxRec 2 0 bb 30IdxInsert 2 0 31Next1 26 32Close 1 0 33Close 2 0 34Integer 2 0 35SetCookie 0 0 36ParseSchema 0 0 name='valuekey' 37Expire 0 0 38Halt0 0 39Transaction 0 1 40VerifyCookie0 1 41Goto0 1 42Noop0 0 sqlite> The main index operation occurs on lines 26-31. For each record in the table it pushes the rowid, column 4 (value), and column 0 (key) onto the stack, builds an index record, and finally inserts the record into the index. The only difference in the single field case is that only one column is pushed in this loop. That is what seems peculiar to me. The only thing I can think of is that the index records are about double the size in the compound index case, and therefore fewer records fit on a page, and hence more pages must be allocated and linked to build the compound index. I am surprised that this makes it take 5 times as long. Dennis Cote I retract the overflow page theory on your compelling evidence and now understand better what it is doing after looking at the VDBE. By building an index by successive insertions the tree is splitting and balancing as it grows, and that is expensive. Double the size of the key and you get twice as many leaf nodes and quite a few more interior nodes. If the keys order is very random the keys are being inserted all over the tree which is slow. Presenting the keys in sorted sequence should cut back on the fragmentation and will very likely build a more compact tree by ensuring that each leaf node is filled. An optimization for building such a tree would be to extract the keys, sort them and build the tree bottom up. By avoiding all splitting and jumping around the tree it should be an order of magnitude faster or better. I took a quick look at the code and got the impression that a fast index option could be built by a motivated user as a seperate program and might be a handy tool for people managing very large Sqlite databases. Cutting back a 20 hour run to 1-2 hours can be a big win. Sqlite is something of a victim of its success. The embedded lite database is being asked to perform enterprise level tasks which stretch its envelope. -
Re: [sqlite] CREATE INDEX performance
John Stanton wrote: I suspect that the timing difference is due to page overflows. John, I doubt that that is the case. The two fields being indexed are the first field, and a second one that is only separated from the first by the size of the first string (10 bytes) and three integers (max 27 bytes, typically less). The length of the second field is only 15 bytes so all the information that sqlite needs to read during the indexing (52 bytes) should be contained in the initial part of the record even if the other fields do spill onto overflow pages. SQLite does not need to read those fields and won't follow the overflow chain (if one exists, which I doubt). You can use the sqlite3_analyzer tool at http://www.sqlite.org/download.html to see if your table is using any overflow pages. SQLite version 3.3.13 Enter ".help" for instructions sqlite> CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, ...> value, nextword, sec, ipr, fldseq int); sqlite> .explain on sqlite> explain CREATE INDEX valuekey on keyword (value, key); addr opcode p1 p2 p3 -- -- -- - 0 Goto0 39 1 Noop0 0 2 CreateIndex 0 0 3 MemStore0 0 4 Dup 0 0 5 MemStore1 1 6 Integer 0 0 7 OpenWrite 0 1 8 SetNumColumns 0 5 9 NewRowid0 0 10String8 0 0 index 11String8 0 0 valuekey 12String8 0 0 keyword 13MemLoad 1 0 14String8 0 0 CREATE INDEX valuekey on keyword 15MakeRecord 5 0 aaada 16Insert 0 0 17Close 0 0 18Pop 1 0 19MemLoad 0 0 20Integer 0 0 21OpenWrite 2 0 keyinfo(2,BINARY,BINARY) 22Integer 0 0 23OpenRead1 2 24SetNumColumns 1 9 25Rewind 1 32 26Rowid 1 0 27Column 1 4 28Column 1 0 29MakeIdxRec 2 0 bb 30IdxInsert 2 0 31Next1 26 32Close 1 0 33Close 2 0 34Integer 2 0 35SetCookie 0 0 36ParseSchema 0 0 name='valuekey' 37Expire 0 0 38Halt0 0 39Transaction 0 1 40VerifyCookie0 1 41Goto0 1 42Noop0 0 sqlite> The main index operation occurs on lines 26-31. For each record in the table it pushes the rowid, column 4 (value), and column 0 (key) onto the stack, builds an index record, and finally inserts the record into the index. The only difference in the single field case is that only one column is pushed in this loop. That is what seems peculiar to me. The only thing I can think of is that the index records are about double the size in the compound index case, and therefore fewer records fit on a page, and hence more pages must be allocated and linked to build the compound index. I am surprised that this makes it take 5 times as long. Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
On Tue, 2007-03-27 at 13:24 -0600, John Stanton wrote: > > Another reason for my puzzlement -- although I love SQLite, my > > expectations are based on using Foxpro for many years. Foxpro's indexing > > speed for a problem like this is about 10 - 20 times faster. And I've > > never come across a Foxpro database where the indexing took longer than > > the loading -- and Foxpro is blazingly fast at loading. So I assumed > > that every DBMS would be faster at indexing than loading. (Both use > > B-trees for indexes, so I believe it's a meaningful comparison. But > > maybe in this case the single-file architecture of SQLite works against > > it; Foxpro uses a binary format for its B-trees.) > > > > In other words, since SQLite is so fast at some things, I expect it to > > be fast at all things. Is this unreasonable? Is it optimized for fast > > retrieval and not indexing? > > > > > > Thanks! > You are comparing an ACID RDBMS with rollback and commit with a much > simpler situation. For example we developed a data storage software > product which was of the same generation as Foxpro. I wrote a fast > indexing program which would create a 10 million entry B-Tree index in > less than a minute on a very slow machine, but it had minimal features, > unlike Sqlite. I do all the indexing in a single transaction, which I thought would have turned off any rollback potential, and also saved time. > I assume that if Foxpro had all the features you want now you would not > be changing from it. Of course. I'm just trying to learn what I don't know :-) > Have you thought of doing the index creation as a background process > unseen by the user? Actually, it already is a separate process launched on the server by the web app. But since users can't use the system until the index is built, I show them the progress of the indexing so at least they have something to look at! Thanks, Stephen Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
On Tue, 2007-03-27 at 13:12 -0600, John Stanton wrote: > I suspect that the timing difference is due to page overflows. I did > only a cursory browse of the B-Tree code but it is just a guess. A test > would be to make a simple table with two adjacent integer columns and > time raising an index on one column and on both. If the times are > comparable the speed difference reported in this thread is a page > overflow artifact. Thanks, John, Good idea -- I'll give that a try. I wouldn't have expected that with such a small record, but it's worth testing. (The schema again, in which "value" is a string of max 15 chars: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); Thanks! Stephen Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Stephen Toney wrote: On Tue, 2007-03-27 at 11:53 -0600, Dennis Cote wrote: Stephen Toney wrote: Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? Stephen, There is nothing wrong with your question. In fact it was very clearly stated. I just don't have any answers for you. I suspect that others are in the same boat. It is normal for an index creation operation to take some time since it is inserting index records into a btree in random order. It involves many updates to pages throughout the index. It does seem strange that you are seeing such different times for the two cases (single column vs compound index). Are you sure about the times you posted? Were they indexing the exact same table? Were both the indexes created after the fill operation during your timing tests (ie. fill + index(single) and then fill + index(compound)) and not one after the other (i.e. fill + index(single) + index(compound))? Thanks, Dennis, I feel reasonably confident about my numbers, but since the system is in development, other factors may have changed. I would re-test before asking anyone else to try to replicate this. The timings were done by recreating the db content each time with no indexes, then building either the multi-column index or the two single-column indexes. Can you supply sample data if someone wants to try some test of their own? It wouldn't have to be the full data set you are using. We could use a subset to get relative timings in the seconds range rather than minutes (This assumes that you are not running into some cache size problems that slow down the larger data set disproportionately). Yes, I'd be glad to supply sample data, but do not have an FTP site. I'm not sure how big the sample should be. Maybe I should test some samples myself before asking anyone else to. As you say, the problem may not exist at a smaller size. Another reason for my puzzlement -- although I love SQLite, my expectations are based on using Foxpro for many years. Foxpro's indexing speed for a problem like this is about 10 - 20 times faster. And I've never come across a Foxpro database where the indexing took longer than the loading -- and Foxpro is blazingly fast at loading. So I assumed that every DBMS would be faster at indexing than loading. (Both use B-trees for indexes, so I believe it's a meaningful comparison. But maybe in this case the single-file architecture of SQLite works against it; Foxpro uses a binary format for its B-trees.) In other words, since SQLite is so fast at some things, I expect it to be fast at all things. Is this unreasonable? Is it optimized for fast retrieval and not indexing? Thanks! You are comparing an ACID RDBMS with rollback and commit with a much simpler situation. For example we developed a data storage software product which was of the same generation as Foxpro. I wrote a fast indexing program which would create a 10 million entry B-Tree index in less than a minute on a very slow machine, but it had minimal features, unlike Sqlite. I assume that if Foxpro had all the features you want now you would not be changing from it. Have you thought of doing the index creation as a background process unseen by the user? - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Dennis Cote wrote: Stephen Toney wrote: Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? Stephen, There is nothing wrong with your question. In fact it was very clearly stated. I just don't have any answers for you. I suspect that others are in the same boat. It is normal for an index creation operation to take some time since it is inserting index records into a btree in random order. It involves many updates to pages throughout the index. It does seem strange that you are seeing such different times for the two cases (single column vs compound index). Are you sure about the times you posted? Were they indexing the exact same table? Were both the indexes created after the fill operation during your timing tests (ie. fill + index(single) and then fill + index(compound)) and not one after the other (i.e. fill + index(single) + index(compound))? Can you supply sample data if someone wants to try some test of their own? It wouldn't have to be the full data set you are using. We could use a subset to get relative timings in the seconds range rather than minutes (This assumes that you are not running into some cache size problems that slow down the larger data set disproportionately). Dennis Cote I suspect that the timing difference is due to page overflows. I did only a cursory browse of the B-Tree code but it is just a guess. A test would be to make a simple table with two adjacent integer columns and time raising an index on one column and on both. If the times are comparable the speed difference reported in this thread is a page overflow artifact. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
On Tue, 2007-03-27 at 11:53 -0600, Dennis Cote wrote: > Stephen Toney wrote: > > > > Meta-question: this is the second time I've asked this question. The > > first was about a month ago and got not a single reply. Is there > > something wrong with my postings? Or is this just not an interesting > > topic? > > > > > Stephen, > > There is nothing wrong with your question. In fact it was very clearly > stated. > > I just don't have any answers for you. I suspect that others are in the > same boat. > > It is normal for an index creation operation to take some time since it > is inserting index records into a btree in random order. It involves > many updates to pages throughout the index. > > It does seem strange that you are seeing such different times for the > two cases (single column vs compound index). Are you sure about the > times you posted? Were they indexing the exact same table? Were both the > indexes created after the fill operation during your timing tests (ie. > fill + index(single) and then fill + index(compound)) and not one after > the other (i.e. fill + index(single) + index(compound))? Thanks, Dennis, I feel reasonably confident about my numbers, but since the system is in development, other factors may have changed. I would re-test before asking anyone else to try to replicate this. The timings were done by recreating the db content each time with no indexes, then building either the multi-column index or the two single-column indexes. > Can you supply sample data if someone wants to try some test of their > own? It wouldn't have to be the full data set you are using. We could > use a subset to get relative timings in the seconds range rather than > minutes (This assumes that you are not running into some cache size > problems that slow down the larger data set disproportionately). Yes, I'd be glad to supply sample data, but do not have an FTP site. I'm not sure how big the sample should be. Maybe I should test some samples myself before asking anyone else to. As you say, the problem may not exist at a smaller size. Another reason for my puzzlement -- although I love SQLite, my expectations are based on using Foxpro for many years. Foxpro's indexing speed for a problem like this is about 10 - 20 times faster. And I've never come across a Foxpro database where the indexing took longer than the loading -- and Foxpro is blazingly fast at loading. So I assumed that every DBMS would be faster at indexing than loading. (Both use B-trees for indexes, so I believe it's a meaningful comparison. But maybe in this case the single-file architecture of SQLite works against it; Foxpro uses a binary format for its B-trees.) In other words, since SQLite is so fast at some things, I expect it to be fast at all things. Is this unreasonable? Is it optimized for fast retrieval and not indexing? Thanks! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Stephen Toney wrote: Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? Stephen, There is nothing wrong with your question. In fact it was very clearly stated. I just don't have any answers for you. I suspect that others are in the same boat. It is normal for an index creation operation to take some time since it is inserting index records into a btree in random order. It involves many updates to pages throughout the index. It does seem strange that you are seeing such different times for the two cases (single column vs compound index). Are you sure about the times you posted? Were they indexing the exact same table? Were both the indexes created after the fill operation during your timing tests (ie. fill + index(single) and then fill + index(compound)) and not one after the other (i.e. fill + index(single) + index(compound))? Can you supply sample data if someone wants to try some test of their own? It wouldn't have to be the full data set you are using. We could use a subset to get relative timings in the seconds range rather than minutes (This assumes that you are not running into some cache size problems that slow down the larger data set disproportionately). Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Thanks, Martin, Still, even if my indexing can't be speeded up, this seems like an important question, as I can't see why creating one index with two words would take several times as long as creating two indexes with one word each. Either my mental model or SQLite'd indexing is screwy. I'm perfectly willing to assume it's me, but I'd like to learn why. Or if it's SQLite, then my timing observations are of some value. Best, Stephen On Tue, 2007-03-27 at 18:20 +0100, Martin Jenkins wrote: > Stephen Toney wrote: > > > Meta-question: this is the second time I've asked this question. The > > first was about a month ago and got not a single reply. Is there > > something wrong with my postings? Or is this just not an interesting > > topic? > > I think it just boils down to how much time people have. > > Martin > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Stephen Toney wrote: Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? I think it just boils down to how much time people have. Martin - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
After. I create it after because the total populate-plus-index time is much slower if the index is created before (44 minutes compared to 25). Thanks, Joel! Any suggestions? Meta-question: this is the second time I've asked this question. The first was about a month ago and got not a single reply. Is there something wrong with my postings? Or is this just not an interesting topic? Many thanks, Stephen On Tue, 2007-03-27 at 11:15 -0400, Joel Cochran wrote: > Did you create the index before or after populating the database? > > -- > Joel Cochran > > > > On 3/27/07, Stephen Toney <[EMAIL PROTECTED]> wrote: > Hi everyone, > > I'm trying to speed up index creation: > > CREATE TABLE keyword (key, contyp int, imagecount int, > searchcat int, > value, nextword, sec, ipr, fldseq int); > CREATE INDEX valuekey on keyword (value, key); > > The value field is a string, max 15 bytes. The key field is a > string of > fixed-width 10 bytes. > > It took only 7 minutes to fill this table with 5.7 million > records, but > it's taking 18 minutes to build the index. This is on a > dual-core Windows > XP Pro machine with 4GB memory. Any ideas on how to improve > this? It will > have to be done as part of a software installation, and I > can't see users > waiting that long. > > By comparison, building separate indexes on the two fields in > the multi- > column index took only 2-3 minutes. Why would it be so much > longer for a > multi-column index? > > > Thanks for any ideas! > -- > > Stephen Toney > Systems Planning > [EMAIL PROTECTED] > http://www.systemsplanning.com > > > > - > To unsubscribe, send email to > [EMAIL PROTECTED] > > - > > > > - > To unsubscribe, send email to > [EMAIL PROTECTED] > > - > > > > -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] CREATE INDEX performance
Did you create the index before or after populating the database? -- Joel Cochran On 3/27/07, Stephen Toney <[EMAIL PROTECTED]> wrote: Hi everyone, I'm trying to speed up index creation: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); The value field is a string, max 15 bytes. The key field is a string of fixed-width 10 bytes. It took only 7 minutes to fill this table with 5.7 million records, but it's taking 18 minutes to build the index. This is on a dual-core Windows XP Pro machine with 4GB memory. Any ideas on how to improve this? It will have to be done as part of a software installation, and I can't see users waiting that long. By comparison, building separate indexes on the two fields in the multi- column index took only 2-3 minutes. Why would it be so much longer for a multi-column index? Thanks for any ideas! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] CREATE INDEX performance
Hi everyone, I'm trying to speed up index creation: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); The value field is a string, max 15 bytes. The key field is a string of fixed-width 10 bytes. It took only 7 minutes to fill this table with 5.7 million records, but it's taking 18 minutes to build the index. This is on a dual-core Windows XP Pro machine with 4GB memory. Any ideas on how to improve this? It will have to be done as part of a software installation, and I can't see users waiting that long. By comparison, building separate indexes on the two fields in the multi- column index took only 2-3 minutes. Why would it be so much longer for a multi-column index? Thanks for any ideas! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] CREATE INDEX performance
Thanks again for all the good suggestions last week. I am now using a multi-column index and results of a table self-join are instantaneous! Even a 5-way join takes only 1-2 seconds. I'm very pleased. But it takes 30 minutes to build the index on a dual-core Windows machine with 4GB memory. Any ideas on how to improve this? It will have to be done as part of a software installation, and I can't see users waiting that long. Here are the details: CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int, value, nextword, sec, ipr, fldseq int); CREATE INDEX valuekey on keyword (value, key); The value field is a string, max 15 bytes. The key field is a string of fixed-width 10 bytes. There are about 3.5 million records to be indexed. By comparison, indexing on either of these fields separately took only 2-3 minutes. Why would it be so much longer for a multi-column index? Thanks! -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com - To unsubscribe, send email to [EMAIL PROTECTED] -