[sqlite] CREATE INDEX performance

2009-01-06 Thread Christopher Mason

Hi.

I have a database with a table having a text column that averages 7.1 
characters per row; there are 11 million rows in this table.  In one 
version of the source file that is used to populate this table, the file 
is laid out such that the column data is populated in sorted order; in 
another version it is not sorted.  In the sorted version, creating the 
index takes about a minute; in the unsorted version it takes over 8 
hours (I killed it before it finished).  I've tried this with a couple 
different versions of SQLite (3.6.1 and 3.6.7) and with various PRAGMA 
options (cache_size up to 1M, synchronous OFF, etc).  I've tried this on 
linux 2.6.24 and Darwin 9.6 (MacOS X 10.5.6), and Windows XP with the 
same result.  Interestingly, on Vista, it's much faster, perhaps 10 
minutes to create the index.  I suspect this is because of it's superior 
disk caching.  By increasing the cache_size to 1M, I can get it to index 
in about 2.5 minutes, but the process consumes about 600 MB of RAM, and 
decreasing the cache_size after doesn't cause sqlite to release all of 
the memory it has used.

This TODO item is of course interesting:

  * Develop a new sort implementation that does much less disk seeking. 
Use to improve indexing performance on large tables.


Is there any thought as to exactly how/when this might happen?

Any other ideas?

Thanks a bunch!

-c



Accessions is the table of interest:

CREATE TABLE AccessionToGoID (accession INT, goid INT);
CREATE TABLE Accessions (id INT IDENTITY, accession VARCHAR(30) NOT 
NULL, taxonomy INT);
CREATE TABLE GoID (id INT IDENTITY, qualifier VARCHAR(20), goid INT, 
evidence TINYINT, aspect CHAR(1), source INT);
CREATE TABLE Location (url TEXT);
CREATE TABLE Sources (id INT, name VARCHAR(20));
CREATE INDEX ACESSIONS_STRING ON ACCESSIONS("accession");
CREATE INDEX ACESSIONS_TO_GOID ON ACCESSIONTOGOID("accession");
CREATE INDEX GOID_ID ON GOID("id");



Page size in bytes 1024
Pages in the whole file (measured) 2201999
Pages in the whole file (calculated).. 2201998
Pages that store data. 2201998100.000%
Pages on the freelist (per header) 00.0%
Pages on the freelist (calculated) 10.0%
Pages of auto-vacuum overhead. 00.0%
Number of tables in the database.. 6
Number of indices. 0
Number of named indices... 0
Automatically generated indices... 0
Size of the file in bytes. 2254846976
Bytes of user payload stored.. 1406889293  62.4%

*** Page counts for all tables with their indices 

ACCESSIONTOGOID... 1159789 52.7%
GOID.. 775451  35.2%
ACCESSIONS 266755  12.1%
LOCATION.. 10.0%
SOURCES... 10.0%
SQLITE_MASTER. 10.0%

*** All tables ***

Percentage of total database.. 100.000%
Number of entries. 113096707
Bytes of storage consumed. -2040121344
Bytes of payload.. 1406889777 -69.0%
Average payload per entry. 12.44
Average unused bytes per entry 0.21
Average fanout 90.00
Fragmentation.   7.8%
Maximum payload per entry. 135
Entries that use overflow. 00.0%
Index pages used.. 24343
Primary pages used 2177655
Overflow pages used... 0
Total pages used.. 2201998
Unused bytes on index pages... 2982024 12.0%
Unused bytes on primary pages. 20271723-1.0%
Unused bytes on overflow pages 0
Unused bytes on all pages. 23253747-1.1%

*** Table ACCESSIONTOGOID 

Percentage of total database..  52.7%
Number of entries. 68037552
Bytes of storage consumed. 1187623936
Bytes of payload.. 678849900   57.2%
Average payload per entry. 9.98
Average unused bytes per entry 0.20
Average fanout 90.00
Fragmentation.   6.2%
Maximum payload per entry. 11
Entries that use overflow. 00.0%
Index pages used.. 12856
Primary pages used 1146933
Overflow pages used... 0
Total pages used.. 1159789
Unused bytes on index pages... 1572145 11.9%
Unused bytes on primary pages. 12284825 1.0%
Unused bytes on overflow pages 0
Unused bytes on all pages. 13856970 1.2%

*** Table ACCESSIONS *

Percentage of total database..  12.1%
Nu

Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread John Stanton
You don't.  All I can imagine is that you insert rows in sorted sequence 
in an initial load so that a simple row scan delivers the rows in key order.


Mohd Radzi Ibrahim wrote:


- Original Message - From: "John Stanton" <[EMAIL PROTECTED]>
To: 
Sent: Wednesday, March 28, 2007 7:42 AM
Subject: Re: [sqlite] CREATE INDEX performance


I retract the overflow page theory on your compelling evidence and now 
understand better what it is doing after looking at the VDBE.  By 
building an index by successive insertions the tree is splitting and 
balancing as it grows, and that is expensive.  Double the size of the 
key and you get twice as many leaf nodes and quite a few more interior 
nodes.


If the keys order is very random the keys are being inserted all over 
the tree which is slow.  Presenting the keys in sorted sequence should 
cut back on the fragmentation and will very likely build a more 
compact tree by ensuring that each leaf node is filled.


An optimization for building such a tree would be to extract the keys, 
sort them and build the tree bottom up.  By avoiding all splitting and 
jumping around the tree it should be an order of magnitude faster or 
better.  I took a quick look at the code and got the impression that a 
fast index option could be built by a motivated user as a seperate 
program and might be a handy tool for people managing very large 
Sqlite databases. Cutting back a 20 hour run to 1-2 hours can be a big 
win.


Sqlite is something of a victim of its success.  The embedded lite 
database is being asked to perform enterprise level tasks which 
stretch its envelope.




How do we do sorting prior to indexing? If it is the initial table it's 
OK, we can sort it before insert. But for existing table, how do we do 
that?


regards,
Radzi.


- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Mohd Radzi Ibrahim


- Original Message - 
From: "John Stanton" <[EMAIL PROTECTED]>

To: 
Sent: Wednesday, March 28, 2007 7:42 AM
Subject: Re: [sqlite] CREATE INDEX performance


I retract the overflow page theory on your compelling evidence and now 
understand better what it is doing after looking at the VDBE.  By building 
an index by successive insertions the tree is splitting and balancing as 
it grows, and that is expensive.  Double the size of the key and you get 
twice as many leaf nodes and quite a few more interior nodes.


If the keys order is very random the keys are being inserted all over the 
tree which is slow.  Presenting the keys in sorted sequence should cut 
back on the fragmentation and will very likely build a more compact tree 
by ensuring that each leaf node is filled.


An optimization for building such a tree would be to extract the keys, 
sort them and build the tree bottom up.  By avoiding all splitting and 
jumping around the tree it should be an order of magnitude faster or 
better.  I took a quick look at the code and got the impression that a 
fast index option could be built by a motivated user as a seperate program 
and might be a handy tool for people managing very large Sqlite databases. 
Cutting back a 20 hour run to 1-2 hours can be a big win.


Sqlite is something of a victim of its success.  The embedded lite 
database is being asked to perform enterprise level tasks which stretch 
its envelope.




How do we do sorting prior to indexing? If it is the initial table it's OK, 
we can sort it before insert. But for existing table, how do we do that?


regards,
Radzi. 




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread John Stanton

Dennis Cote wrote:

John Stanton wrote:



I suspect that the timing difference is due to page overflows.


John,

I doubt that that is the case. The two fields being indexed are the 
first field, and a second one that is only separated from the first by 
the size of the first string (10 bytes) and three integers (max 27 
bytes, typically less). The length of the second field is only 15 bytes 
so all the information that sqlite needs to read during the indexing (52 
bytes) should be contained in the initial part of the record even if the 
other fields do spill onto overflow pages.  SQLite does not need to read 
those fields and won't follow the overflow chain (if one exists, which I 
doubt). You can use the sqlite3_analyzer tool at 
http://www.sqlite.org/download.html to see if your table is using any 
overflow pages.


SQLite version 3.3.13
Enter ".help" for instructions
sqlite> CREATE TABLE keyword (key, contyp int, imagecount int, searchcat 
int,

  ...> value, nextword, sec, ipr, fldseq int);
sqlite> .explain on
sqlite> explain CREATE INDEX valuekey on keyword (value, key);
addr  opcode  p1  p2  p3
  --  --  --  
-

0 Goto0   39
1 Noop0   0
2 CreateIndex 0   0
3 MemStore0   0
4 Dup 0   0
5 MemStore1   1
6 Integer 0   0
7 OpenWrite   0   1
8 SetNumColumns   0   5
9 NewRowid0   0
10String8 0   0   index
11String8 0   0   valuekey
12String8 0   0   keyword
13MemLoad 1   0
14String8 0   0   CREATE INDEX valuekey on 
keyword

15MakeRecord  5   0   aaada
16Insert  0   0
17Close   0   0
18Pop 1   0
19MemLoad 0   0
20Integer 0   0
21OpenWrite   2   0   keyinfo(2,BINARY,BINARY)
22Integer 0   0
23OpenRead1   2
24SetNumColumns   1   9
25Rewind  1   32
26Rowid   1   0
27Column  1   4
28Column  1   0
29MakeIdxRec  2   0   bb
30IdxInsert   2   0
31Next1   26
32Close   1   0
33Close   2   0
34Integer 2   0
35SetCookie   0   0
36ParseSchema 0   0   name='valuekey'
37Expire  0   0
38Halt0   0
39Transaction 0   1
40VerifyCookie0   1
41Goto0   1
42Noop0   0
sqlite>

The main index operation occurs on lines 26-31. For each record in the 
table it pushes the rowid, column 4 (value), and column 0 (key) onto the 
stack, builds an index record, and finally inserts the record into the 
index.


The only difference in the single field case is that only one column is 
pushed in this loop. That is what seems peculiar to me. The only thing I 
can think of is that the index records are about double the size in the 
compound index case, and therefore fewer records fit on a page, and 
hence more pages must be allocated and linked to build the compound 
index. I am surprised that this makes it take 5 times as long.


Dennis Cote


I retract the overflow page theory on your compelling evidence and now 
understand better what it is doing after looking at the VDBE.  By 
building an index by successive insertions the tree is splitting and 
balancing as it grows, and that is expensive.  Double the size of the 
key and you get twice as many leaf nodes and quite a few more interior 
nodes.


If the keys order is very random the keys are being inserted all over 
the tree which is slow.  Presenting the keys in sorted sequence should 
cut back on the fragmentation and will very likely build a more compact 
tree by ensuring that each leaf node is filled.


An optimization for building such a tree would be to extract the keys, 
sort them and build the tree bottom up.  By avoiding all splitting and 
jumping around the tree it should be an order of magnitude faster or 
better.  I took a quick look at the code and got the impression that a 
fast index option could be built by a motivated user as a seperate 
program and might be a handy tool for people managing very large Sqlite 
databases.  Cutting back a 20 hour run to 1-2 hours can be a big win.


Sqlite is something of a victim of its success.  The embedded lite 
database is being asked to perform enterprise level tasks which stretch 
its envelope.


-

Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Dennis Cote

John Stanton wrote:


I suspect that the timing difference is due to page overflows. 


John,

I doubt that that is the case. The two fields being indexed are the 
first field, and a second one that is only separated from the first by 
the size of the first string (10 bytes) and three integers (max 27 
bytes, typically less). The length of the second field is only 15 bytes 
so all the information that sqlite needs to read during the indexing (52 
bytes) should be contained in the initial part of the record even if the 
other fields do spill onto overflow pages.  SQLite does not need to read 
those fields and won't follow the overflow chain (if one exists, which I 
doubt). You can use the sqlite3_analyzer tool at 
http://www.sqlite.org/download.html to see if your table is using any 
overflow pages.


SQLite version 3.3.13
Enter ".help" for instructions
sqlite> CREATE TABLE keyword (key, contyp int, imagecount int, searchcat 
int,

  ...> value, nextword, sec, ipr, fldseq int);
sqlite> .explain on
sqlite> explain CREATE INDEX valuekey on keyword (value, key);
addr  opcode  p1  p2  p3
  --  --  --  
-

0 Goto0   39
1 Noop0   0
2 CreateIndex 0   0
3 MemStore0   0
4 Dup 0   0
5 MemStore1   1
6 Integer 0   0
7 OpenWrite   0   1
8 SetNumColumns   0   5
9 NewRowid0   0
10String8 0   0   index
11String8 0   0   valuekey
12String8 0   0   keyword
13MemLoad 1   0
14String8 0   0   CREATE INDEX valuekey on 
keyword

15MakeRecord  5   0   aaada
16Insert  0   0
17Close   0   0
18Pop 1   0
19MemLoad 0   0
20Integer 0   0
21OpenWrite   2   0   keyinfo(2,BINARY,BINARY)
22Integer 0   0
23OpenRead1   2
24SetNumColumns   1   9
25Rewind  1   32
26Rowid   1   0
27Column  1   4
28Column  1   0
29MakeIdxRec  2   0   bb
30IdxInsert   2   0
31Next1   26
32Close   1   0
33Close   2   0
34Integer 2   0
35SetCookie   0   0
36ParseSchema 0   0   name='valuekey'
37Expire  0   0
38Halt0   0
39Transaction 0   1
40VerifyCookie0   1
41Goto0   1
42Noop0   0
sqlite>

The main index operation occurs on lines 26-31. For each record in the 
table it pushes the rowid, column 4 (value), and column 0 (key) onto the 
stack, builds an index record, and finally inserts the record into the 
index.


The only difference in the single field case is that only one column is 
pushed in this loop. That is what seems peculiar to me. The only thing I 
can think of is that the index records are about double the size in the 
compound index case, and therefore fewer records fit on a page, and 
hence more pages must be allocated and linked to build the compound 
index. I am surprised that this makes it take 5 times as long.


Dennis Cote



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Stephen Toney
On Tue, 2007-03-27 at 13:24 -0600, John Stanton wrote:
> > Another reason for my puzzlement -- although I love SQLite, my
> > expectations are based on using Foxpro for many years. Foxpro's indexing
> > speed for a problem like this is about 10 - 20 times faster. And I've
> > never come across a Foxpro database where the indexing took longer than
> > the loading -- and Foxpro is blazingly fast at loading. So I assumed
> > that every DBMS would be faster at indexing than loading. (Both use
> > B-trees for indexes, so I believe it's a meaningful comparison. But
> > maybe in this case the single-file architecture of SQLite works against
> > it; Foxpro uses a binary format for its B-trees.)
> > 
> > In other words, since SQLite is so fast at some things, I expect it to
> > be fast at all things. Is this unreasonable? Is it optimized for fast
> > retrieval and not indexing?
> > 
> > 
> > Thanks!
> You are comparing an ACID RDBMS with rollback and commit with a much 
> simpler situation.  For example we developed a data storage software 
> product which was of the same generation as Foxpro. I wrote a fast 
> indexing program which would create a 10 million entry B-Tree index in 
> less than a minute on a very slow machine, but it had minimal features, 
> unlike Sqlite.

I do all the indexing in a single transaction, which I thought would
have turned off any rollback potential, and also saved time. 


> I assume that if Foxpro had all the features you want now you would not 
> be changing from it.

Of course. I'm just trying to learn what I don't know  :-)


> Have you thought of doing the index creation as a background process 
> unseen by the user?

Actually, it already is a separate process launched on the server by the
web app. But since users can't use the system until the index is built,
I show them the progress of the indexing so at least they have something
to look at!


Thanks,
Stephen
 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Stephen Toney
On Tue, 2007-03-27 at 13:12 -0600, John Stanton wrote:
> I suspect that the timing difference is due to page overflows.  I did 
> only a cursory browse of the B-Tree code but it is just a guess.  A test 
> would be to make a simple table with two adjacent integer columns and 
> time raising an index on one column and on both.  If the times are 
> comparable the speed difference reported in this thread is a page 
> overflow artifact.


Thanks, John,

Good idea -- I'll give that a try. I wouldn't have expected that with
such a small record, but it's worth testing.

(The schema again, in which "value" is a string of max 15 chars:
CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int,
value, nextword, sec, ipr, fldseq int);
CREATE INDEX valuekey on keyword (value, key);


Thanks!
Stephen


Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread John Stanton

Stephen Toney wrote:

On Tue, 2007-03-27 at 11:53 -0600, Dennis Cote wrote:


Stephen Toney wrote:


Meta-question: this is the second time I've asked this question. The
first was about a month ago and got not a single reply. Is there
something wrong with my postings? Or is this just not an interesting
topic?

 


Stephen,

There is nothing wrong with your question. In fact it was very clearly 
stated.


I just don't have any answers for you. I suspect that others are in the 
same boat.


It is normal for an index creation operation to take some time since it 
is inserting index records into a btree in random order. It involves 
many updates to pages throughout the index.


It does seem strange that you are seeing such different times for the 
two cases (single column vs compound index). Are you sure about the 
times you posted? Were they indexing the exact same table? Were both the 
indexes created after the fill operation during your timing tests (ie. 
fill + index(single) and then fill + index(compound)) and not one after 
the other (i.e. fill + index(single) + index(compound))?



Thanks, Dennis,

I feel reasonably confident about my numbers, but since the system is in
development, other factors may have changed. I would re-test before
asking anyone else to try to replicate this.

The timings were done by recreating the db content each time with no
indexes, then building either the multi-column index or the two
single-column indexes.



Can you supply sample data if someone wants to try some test of their 
own? It wouldn't have to be the full data set you are using. We could 
use a subset to get relative timings in the seconds range rather than 
minutes (This assumes that you are not running into some cache size 
problems that slow down the larger data set disproportionately).



Yes, I'd be glad to supply sample data, but do not have an FTP site. I'm
not sure how big the sample should be. Maybe I should test some samples
myself before asking anyone else to. As you say, the problem may not
exist at a smaller size.


Another reason for my puzzlement -- although I love SQLite, my
expectations are based on using Foxpro for many years. Foxpro's indexing
speed for a problem like this is about 10 - 20 times faster. And I've
never come across a Foxpro database where the indexing took longer than
the loading -- and Foxpro is blazingly fast at loading. So I assumed
that every DBMS would be faster at indexing than loading. (Both use
B-trees for indexes, so I believe it's a meaningful comparison. But
maybe in this case the single-file architecture of SQLite works against
it; Foxpro uses a binary format for its B-trees.)

In other words, since SQLite is so fast at some things, I expect it to
be fast at all things. Is this unreasonable? Is it optimized for fast
retrieval and not indexing?


Thanks!
You are comparing an ACID RDBMS with rollback and commit with a much 
simpler situation.  For example we developed a data storage software 
product which was of the same generation as Foxpro. I wrote a fast 
indexing program which would create a 10 million entry B-Tree index in 
less than a minute on a very slow machine, but it had minimal features, 
unlike Sqlite.


I assume that if Foxpro had all the features you want now you would not 
be changing from it.


Have you thought of doing the index creation as a background process 
unseen by the user?


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread John Stanton

Dennis Cote wrote:

Stephen Toney wrote:



Meta-question: this is the second time I've asked this question. The
first was about a month ago and got not a single reply. Is there
something wrong with my postings? Or is this just not an interesting
topic?

  


Stephen,

There is nothing wrong with your question. In fact it was very clearly 
stated.


I just don't have any answers for you. I suspect that others are in the 
same boat.


It is normal for an index creation operation to take some time since it 
is inserting index records into a btree in random order. It involves 
many updates to pages throughout the index.


It does seem strange that you are seeing such different times for the 
two cases (single column vs compound index). Are you sure about the 
times you posted? Were they indexing the exact same table? Were both the 
indexes created after the fill operation during your timing tests (ie. 
fill + index(single) and then fill + index(compound)) and not one after 
the other (i.e. fill + index(single) + index(compound))?


Can you supply sample data if someone wants to try some test of their 
own? It wouldn't have to be the full data set you are using. We could 
use a subset to get relative timings in the seconds range rather than 
minutes (This assumes that you are not running into some cache size 
problems that slow down the larger data set disproportionately).


Dennis Cote

I suspect that the timing difference is due to page overflows.  I did 
only a cursory browse of the B-Tree code but it is just a guess.  A test 
would be to make a simple table with two adjacent integer columns and 
time raising an index on one column and on both.  If the times are 
comparable the speed difference reported in this thread is a page 
overflow artifact.


- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Stephen Toney
On Tue, 2007-03-27 at 11:53 -0600, Dennis Cote wrote:
> Stephen Toney wrote:
> >
> > Meta-question: this is the second time I've asked this question. The
> > first was about a month ago and got not a single reply. Is there
> > something wrong with my postings? Or is this just not an interesting
> > topic?
> >
> >   
> Stephen,
> 
> There is nothing wrong with your question. In fact it was very clearly 
> stated.
> 
> I just don't have any answers for you. I suspect that others are in the 
> same boat.
> 
> It is normal for an index creation operation to take some time since it 
> is inserting index records into a btree in random order. It involves 
> many updates to pages throughout the index.
> 
> It does seem strange that you are seeing such different times for the 
> two cases (single column vs compound index). Are you sure about the 
> times you posted? Were they indexing the exact same table? Were both the 
> indexes created after the fill operation during your timing tests (ie. 
> fill + index(single) and then fill + index(compound)) and not one after 
> the other (i.e. fill + index(single) + index(compound))?

Thanks, Dennis,

I feel reasonably confident about my numbers, but since the system is in
development, other factors may have changed. I would re-test before
asking anyone else to try to replicate this.

The timings were done by recreating the db content each time with no
indexes, then building either the multi-column index or the two
single-column indexes.


> Can you supply sample data if someone wants to try some test of their 
> own? It wouldn't have to be the full data set you are using. We could 
> use a subset to get relative timings in the seconds range rather than 
> minutes (This assumes that you are not running into some cache size 
> problems that slow down the larger data set disproportionately).

Yes, I'd be glad to supply sample data, but do not have an FTP site. I'm
not sure how big the sample should be. Maybe I should test some samples
myself before asking anyone else to. As you say, the problem may not
exist at a smaller size.


Another reason for my puzzlement -- although I love SQLite, my
expectations are based on using Foxpro for many years. Foxpro's indexing
speed for a problem like this is about 10 - 20 times faster. And I've
never come across a Foxpro database where the indexing took longer than
the loading -- and Foxpro is blazingly fast at loading. So I assumed
that every DBMS would be faster at indexing than loading. (Both use
B-trees for indexes, so I believe it's a meaningful comparison. But
maybe in this case the single-file architecture of SQLite works against
it; Foxpro uses a binary format for its B-trees.)

In other words, since SQLite is so fast at some things, I expect it to
be fast at all things. Is this unreasonable? Is it optimized for fast
retrieval and not indexing?


Thanks!
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Dennis Cote

Stephen Toney wrote:


Meta-question: this is the second time I've asked this question. The
first was about a month ago and got not a single reply. Is there
something wrong with my postings? Or is this just not an interesting
topic?

  

Stephen,

There is nothing wrong with your question. In fact it was very clearly 
stated.


I just don't have any answers for you. I suspect that others are in the 
same boat.


It is normal for an index creation operation to take some time since it 
is inserting index records into a btree in random order. It involves 
many updates to pages throughout the index.


It does seem strange that you are seeing such different times for the 
two cases (single column vs compound index). Are you sure about the 
times you posted? Were they indexing the exact same table? Were both the 
indexes created after the fill operation during your timing tests (ie. 
fill + index(single) and then fill + index(compound)) and not one after 
the other (i.e. fill + index(single) + index(compound))?


Can you supply sample data if someone wants to try some test of their 
own? It wouldn't have to be the full data set you are using. We could 
use a subset to get relative timings in the seconds range rather than 
minutes (This assumes that you are not running into some cache size 
problems that slow down the larger data set disproportionately).


Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Stephen Toney
Thanks, Martin,

Still, even if my indexing can't be speeded up, this seems like an
important question, as I can't see why creating one index with two words
would take several times as long as creating two indexes with one word
each. Either my mental model or SQLite'd indexing is screwy. I'm
perfectly willing to assume it's me, but I'd like to learn why. Or if
it's SQLite, then my timing observations are of some value.

Best,
Stephen

On Tue, 2007-03-27 at 18:20 +0100, Martin Jenkins wrote:
> Stephen Toney wrote:
> 
> > Meta-question: this is the second time I've asked this question. The
> > first was about a month ago and got not a single reply. Is there
> > something wrong with my postings? Or is this just not an interesting
> > topic?
> 
> I think it just boils down to how much time people have.
> 
> Martin
> 
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Martin Jenkins

Stephen Toney wrote:


Meta-question: this is the second time I've asked this question. The
first was about a month ago and got not a single reply. Is there
something wrong with my postings? Or is this just not an interesting
topic?


I think it just boils down to how much time people have.

Martin

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Stephen Toney
After. I create it after because the total populate-plus-index time is
much slower if the index is created before (44 minutes compared to 25).

Thanks, Joel! Any suggestions?

Meta-question: this is the second time I've asked this question. The
first was about a month ago and got not a single reply. Is there
something wrong with my postings? Or is this just not an interesting
topic?

Many thanks,
Stephen

On Tue, 2007-03-27 at 11:15 -0400, Joel Cochran wrote:
> Did you create the index before or after populating the database?
> 
> -- 
> Joel Cochran
> 
> 
> 
> On 3/27/07, Stephen Toney <[EMAIL PROTECTED]> wrote:
> Hi everyone,
> 
> I'm trying to speed up index creation:
> 
> CREATE TABLE keyword (key, contyp int, imagecount int,
> searchcat int,
> value, nextword, sec, ipr, fldseq int);
> CREATE INDEX valuekey on keyword (value, key); 
> 
> The value field is a string, max 15 bytes. The key field is a
> string of
> fixed-width 10 bytes.
> 
> It took only 7 minutes to fill this table with 5.7 million
> records, but
> it's taking 18 minutes to build the index. This is on a
> dual-core Windows 
> XP Pro machine with 4GB memory. Any ideas on how to improve
> this? It will
> have to be done as part of a software installation, and I
> can't see users
> waiting that long.
> 
> By comparison, building separate indexes on the two fields in
> the multi- 
> column index took only 2-3 minutes. Why would it be so much
> longer for a
> multi-column index?
> 
> 
> Thanks for any ideas!
> --
> 
> Stephen Toney
> Systems Planning
> [EMAIL PROTECTED]
> http://www.systemsplanning.com
> 
> 
> 
> -
> To unsubscribe, send email to
> [EMAIL PROTECTED]
> 
> -
> 
> 
> 
> - 
> To unsubscribe, send email to
> [EMAIL PROTECTED]
> 
> -
> 
> 
> 
> 
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] CREATE INDEX performance

2007-03-27 Thread Joel Cochran

Did you create the index before or after populating the database?

--
Joel Cochran



On 3/27/07, Stephen Toney <[EMAIL PROTECTED]> wrote:


Hi everyone,

I'm trying to speed up index creation:

CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int,
value, nextword, sec, ipr, fldseq int);
CREATE INDEX valuekey on keyword (value, key);

The value field is a string, max 15 bytes. The key field is a string of
fixed-width 10 bytes.

It took only 7 minutes to fill this table with 5.7 million records, but
it's taking 18 minutes to build the index. This is on a dual-core Windows
XP Pro machine with 4GB memory. Any ideas on how to improve this? It will
have to be done as part of a software installation, and I can't see users
waiting that long.

By comparison, building separate indexes on the two fields in the multi-
column index took only 2-3 minutes. Why would it be so much longer for a
multi-column index?


Thanks for any ideas!
--

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com



-
To unsubscribe, send email to [EMAIL PROTECTED]

-



-
To unsubscribe, send email to [EMAIL PROTECTED]

-




[sqlite] CREATE INDEX performance

2007-03-27 Thread Stephen Toney
Hi everyone,

I'm trying to speed up index creation:

CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int,
value, nextword, sec, ipr, fldseq int);
CREATE INDEX valuekey on keyword (value, key);

The value field is a string, max 15 bytes. The key field is a string of
fixed-width 10 bytes.

It took only 7 minutes to fill this table with 5.7 million records, but
it's taking 18 minutes to build the index. This is on a dual-core Windows
XP Pro machine with 4GB memory. Any ideas on how to improve this? It will 
have to be done as part of a software installation, and I can't see users
waiting that long.

By comparison, building separate indexes on the two fields in the multi-
column index took only 2-3 minutes. Why would it be so much longer for a 
multi-column index?


Thanks for any ideas!
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] CREATE INDEX performance

2007-03-06 Thread Stephen Toney
Thanks again for all the good suggestions last week. I am now using a
multi-column index and results of a table self-join are instantaneous!
Even a 5-way join takes only 1-2 seconds. I'm very pleased.

But it takes 30 minutes to build the index on a dual-core Windows
machine with 4GB memory. Any ideas on how to improve this? It will have
to be done as part of a software installation, and I can't see users
waiting that long.

Here are the details:

CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int,
value, nextword, sec, ipr, fldseq int);
CREATE INDEX valuekey on keyword (value, key);

The value field is a string, max 15 bytes. The key field is a string of
fixed-width 10 bytes.

There are about 3.5 million records to be indexed.

By comparison, indexing on either of these fields separately took only
2-3 minutes. Why would it be so much longer for a multi-column index?

Thanks!
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-