[sqlite] Index generation efficiency

2005-07-25 Thread Mathieu Blondel

Hi everyone,
I'm using sqlite3 for my project, Nihongo Benkyo, which is a japanese 
dictionary and learning tool. 

This program can import data from files in various formats in the sqlite 
database. Generally speaking, one import does about 500,000 INSERT queries 
in a single transaction and it is working fine. 

The main benefit from using a transaction is to reduce the number of disc 
accesses if my understanding is correct. However, if I put indexes on some 
columns, the import becomes very slow. I don't know the sqlite code but it 
sounds to me like the indexes are generated on the fly which seems to 
require a lot of work. 

I found out it is much more efficient to drop all the indexes, do my big 
transaction and then recreate all the indexes. Dropping indexes is not a 
really long process and just creating the indexes after the end of the 
transaction seems quicker. But if think it is quite dirty to do so everytime 
I import new data in the database. 

I guess there may be good reasons not to generate the indexes all at once at 
the end of the transaction. So first I would be glad to have some 
explanation about this. Then, do you have a better solution than mine to 
speed up the indexes generation ? Is it for example possible to ask 
explicitly sqlite not to update an index and then ask explicitly "by hand" 
to update the indexes. 


Thanks,
Mathieu. 



Re: [sqlite] Index generation efficiency

2005-07-26 Thread Jay Sprenkle
On 7/25/05, Mathieu Blondel <[EMAIL PROTECTED]> wrote:
> I found out it is much more efficient to drop all the indexes, do my big
> transaction and then recreate all the indexes. Dropping indexes is not a
> really long process and just creating the indexes after the end of the
> transaction seems quicker. But if think it is quite dirty to do so everytime
> I import new data in the database.
> 

This is the recommended practice for large loads to oracle. Having a switch
that turns off indexing would have to be very carefully done to avoid creating
a bad database, or creating more problems than it solves..


Re: [sqlite] Index generation efficiency

2005-07-26 Thread Cory Nelson
On 7/25/05, Mathieu Blondel <[EMAIL PROTECTED]> wrote:
> Hi everyone,
> I'm using sqlite3 for my project, Nihongo Benkyo, which is a japanese
> dictionary and learning tool.

Cool!  I'm currently using sqlite 3.x in a lightweight japanese
dictionary of my own, Gozoku (http://dev.int64.org/gozoku.html).

> This program can import data from files in various formats in the sqlite
> database. Generally speaking, one import does about 500,000 INSERT queries
> in a single transaction and it is working fine.
> 
> The main benefit from using a transaction is to reduce the number of disc
> accesses if my understanding is correct. However, if I put indexes on some
> columns, the import becomes very slow. I don't know the sqlite code but it
> sounds to me like the indexes are generated on the fly which seems to
> require a lot of work.
> 
> I found out it is much more efficient to drop all the indexes, do my big
> transaction and then recreate all the indexes. Dropping indexes is not a
> really long process and just creating the indexes after the end of the
> transaction seems quicker. But if think it is quite dirty to do so everytime
> I import new data in the database.

I do this when generating my database but it's not a problem as I
don't modify the database afterward.

> I guess there may be good reasons not to generate the indexes all at once at
> the end of the transaction. So first I would be glad to have some
> explanation about this. Then, do you have a better solution than mine to
> speed up the indexes generation ? Is it for example possible to ask
> explicitly sqlite not to update an index and then ask explicitly "by hand"
> to update the indexes.

I'm guessing this is because you can have select queries in a
transaction which wouldn't work properly with an incomplete index.  It
would certainly be interesting to make a pragma that makes sqlite
queue up index generation until the next select or end of transaction.

> Thanks,
> Mathieu.
> 
> 


-- 
Cory Nelson
http://www.int64.org